Cleaning Up Imported XML Content
You should run the validation on any XML that you import into InDesign for which
you have a DTD. With a large XML file, this step can be problematic, because you will
have a large number of suggestions in the validation window to work through. If possible,
validate the XML before importing it into InDesign.
Fast and Light Credo: Develop Now, Validate Later
If there is no DTD available for you, you can create one from sample XML yourself. Start
by making a list of the different information bits that will make up your content structure.
For example, suppose that I am trying to get some personnel information into my course
catalog. When looking at the text in the current unstructured document, I see that there
are people who have administrative positions, people who are on various boards, people
who have won awards, people who are on staff, and various types of teachers on the
faculty. All of these people are employees, except some of the board members. All of the
faculty have their degrees, degree-granting institutions, and any special awards listed
after their names. In another part of the catalog, these people are listed with phone
I can choose <personnel> as my root element (despite knowing that a few people are
technically not employees). Then I can have a <person> with <name> elements patterned
on a standard model such as <lastname> <firstname> <mi> and <honorific>. Each
person can also have contact information (<phone>, <email>, etc.) and <degrees> con‐
taining <degree> and <institution> elements.
With this as a starting point, I can model the XML directly in InDesign. I work in much
the same way that I did when I was using a DTD, except that I don’t validate as I go.
Once I have a good example set of XML elements, I export the XML. Then with another
XML tool, such as XML Spy or Oxygen, I generate a DTD based on the sample.
Once I have an initial DTD, I load it into my InDesign document that contains the
sample XML content and validate the XML with it. Assuming that it validates, I continue
to make XML to match the structure model of my DTD.
Iterating the Information Structure and DTD
At some point in my development, someone may ask for a new element in the XML
structure. Perhaps he or she wants to add the teacher’s department to each person who
is a teacher and an office location to each staff and administrative person. I can add this
new element structure based on some database fields used for the college’s online di‐
rectory or by looking at a printed directory to see what headings and information are
in the directory that describe the locations of people on campus (such as the building
name and/or number, the floor, room or suite, etc.).
70 | Chapter 8: Validating XML in InDesign
I now delete the current DTD from the Structure pane in InDesign because I don’t want
a lot of validation errors with my new structure. I add the <location> elements to the
contact information for each person in a three-phase process:
1. I build a placeholder for the <location> structure with all the new elements I want
in the first <person> element of my XML content. I export the XML again so that
the new <location> structure is included in the sample XML that I use for DTD
2. I regenerate a DTD with my other XML tool, then load it into my InDesign docu‐
ment and validate the XML that contains my new structure. If I have made the new
<location> structure required, every <person> element that is missing the <loca
tion> structure will now generate an error in the validation window.
3. I repair the XML so that it will validate with the revised DTD. I can duplicate the
first <location> structure by selecting it in the Structure pane and doing a copy/
paste operation to add it to the other <person> element structures. Then I can
validate again and continue fixing the structure until I see the “no known errors”
message in the validation window.
If you know how to work with XSLT, this would be faster than manually
adding elements with a copy/paste operation. You can write a template
that adds new XML structures to your existing XML file. Consult books
such as XSLT Cookbook by Sal Mangano (O’Reilly, 2002) or online ref‐
erences on XSLT for more information on adding structures or modi‐
fying existing ones.
The process of creating the new structure, generating a revised DTD, and validating can
be repeated as I make changes to the structure. At each iteration, I can demonstrate the
XML to stakeholders to see if it meets their needs. Once everyone has agreed to a struc‐
ture, I can annotate the final DTD with comments so that everyone knows what each
element is and how many times it can occur and in what sequence. This documentation
then gets distributed to anyone else who needs to create XML or use the XML that is
Despite all your best efforts to document how the DTD should be used, and to train
authors, someone will always create invalid XML content in InDesign. InDesign does
not provide enough XML authoring support in its current incarnation. If the content
authors are struggling, you might investigate other XML authoring tools that help
authors use a DTD. Adobe makes another product, FrameMaker, which provides guided
Fast and Light Credo: Develop Now, Validate Later | 71
authoring for XML based on a DTD. FrameMaker costs about the same as InDesign but
requires more expert assistance to set up for XML import and export than does InDesign.
A number of other XML editors and authoring tools (freeware, shareware, or commer‐
cial) are also available that may meet your authoring needs and budget.
If you use an XML authoring tool other than FrameMaker or InDesign itself, you can
still import the XML you create into InDesign to make a visually pleasing document. In
this regard, InDesign provides sufficient functionality to be very useful for XML pub‐
lishing. Validate the XML before you import it. Then use the techniques described in
this book to map to paragraph and character styles.
72 | Chapter 8: Validating XML in InDesign
It’s fairly natural to expect that you could use one
piece of XML data in multiple places in an InDesign
layout—but that’s not at all the way that InDesign
works. Once you’ve imported XML, there is a one-
to-one correspondence between the elements in the
Structure view and their expression in the layout.
If you want an element to appear multiple times,
you’ve got to duplicate the element for each
appearance on a document page. (Obviously, you
can get around this in some cases by placing the
XML element on a master page.)
—Olav Martin Kvern and David Blatner,
Real World Adobe InDesign CS4
What InDesign Cannot Do
(or Do Well) with XML
The 1:1 Import Conundrum
As the epigraph to this chapter states (and this is still true for InDesign versions up to
CS5), the expectation is that you import one XML file to fill one content area (text flow)
in your InDesign document. This requirement is contradictory to the spirit of XML,
which is all about reuse of content in multiple documents and in multiple ways. For
example, you might want a standard warning or copyright or other block of content to
appear in many places in a single document of a set of documents collected as a book.
However, from the Structure pane, you cannot drag the same piece of structure into
multiple locations in an InDesign document. If you drag an element into the layout a
second time, InDesign will remove it from its first location in the layout.
Some typographic controls may generate characters—even in later versions of
InDesign—that are not XML-compatible. Adobe mentions this in the InDesign Help
section about exporting XML: “Not all characters are supported in XML (such as the
Automatic Page Number character). InDesign warns you if it cannot include a character
in the XML file.”
InDesign CS2 XML export controls are more limited than those in InDesign CS3
and later. CS2 does not have the Remap Break, Whitespace, and Special Characters
option. As a consequence, the XML that you generate from InDesign CS2 may
contain characters used in publishing applications that are problematic in XML
processing. Chief of these are the characters that make paragraph and manual line
breaks in the text layout. XML doesn’t use these types of characters, and depending
on the processes you run after exporting XML from InDesign, you may have to
clean up the XML to remove these types of characters. See Figure 9-1.
Figure 9-1. Unwanted characters (square) in XML exported from InDesign CS2
All versions of InDesign CS
Related to the “bad characters” export problem is the issue of imported XML that
might contain tabs, spaces, and line breaks. Often this problem is seen in applica‐
tions that “pretty print” XML files with indents and coloring to make them easier
to read. For example, I use SynchroSoft Oxygen Editor. When the pretty-printed
XML created in Oxygen is used for import, it creates unwanted effects in the layout.
To get a clean import, it is sometimes necessary to edit the text in a text editor to
remove the tabs and spaces, play with the import dialog whitespace controls (do
not import contents of whitespace-only elements in CS3 and later), or run an XSL
transformation to remove line endings and tabs from the XML before importing it
Inscrutable Errors, Messages, and Crashes
InDesign often provides helpful error notifications, messages, and crash information—
and sometimes not-so-helpful versions of these, such as the following:
74 | Chapter 9: What InDesign Cannot Do (or Do Well) with XML
Devilish validation suggestions
Are you missing a required attribute? Have you forgotten to put a required element
in your structure? The validation window at the bottom of the structure pane will
tell you the sad story of your incompetence with the DTD, but the suggestions it
offers won’t always tell you enough about fixing the problem—see Chapter 8 for
Exporting from the element with the included DTD will not be valid
Several times when I had a DTD included in the XML that I was exporting and
checked the box to include the DTD declaration on export, I saw a message that
the XML I was exporting was not going to be valid using the DTD. It seemed to me
at the time that the message was bogus, as I had validated the content with the DTD
before export. I opened the exported content in XML Spy to check it, and found
that there was some kind of invisible (line break) character in the XML between
elements. When I switched to EditPlus and looked at the same file, I saw square box
characters in these places in the XML file. I had to do a search and replace on that
character to get an XML file that would validate. This problem is related to the issue
described in “Bad Characters” (page 74).
Making InDesign “think” too hard on import or export with XSL
It seemed to me that I was most likely to cause InDesign to crash if I tried to get too
fancy with my XSLT. I am accustomed to being able to sort, filter, wrap, and unwrap
elements; make substring operations on text in elements; and other tricks of the
XSL trade. If I used these types of functions in XSLT that I was using when importing
or exporting XML with InDesign, sometimes it didn’t work, and sometimes it froze
the application. See Chapter 10. My recommendation, if you need to do a lot of
fancy manipulation of your XML, is to use XSLT as a pre- or postprocessing step
external to InDesign.
InDesign Is Not an XML Authoring Tool
The premise of Adobe’s XML tools for InDesign is that people with databases often have
XML content that can be mapped to InDesign paragraph and character styles or to tables
and images. There is no mapping to “container” elements that are used in making deeply
nested XML structures. This behavior has implications for XML developers, but even
more so for InDesigners who have to work with XML inside InDesign, as I wrote in my
LinkedIn group, “XML Content/InDesign Publishing”—see the sidebar “Deep Struc‐
tures and Flat Apps: The Contradictions of XML and InDesign for Designers” (page 76).
InDesign Is Not an XML Authoring Tool | 75
Deep Structures and Flat Apps: The Contradictions of
XML and InDesign for Designers
By its nature, XML has hierarchy which leads to “deep” structures of elements within
elements within elements . . . to the limits of the content model (DTD or schema). By
nature, the design applications are only managing the styling or visual presentation of
the content, and there is minimal “structure.” In InDesign, the “structure” is: file, layout
spread, page template, text frame, graphic frame, styles (object, table, paragraph, char‐
acter), link (image or other linked resource), fonts, color swatches, etc. Did you notice
that there is no subdivision of the text other than paragraph or character styles? Unlike
a web page, which at least lets you arbitrarily group content into “div” container elements,
there is no grouping or containing mechanism for text in InDesign besides the text frame.
This results in technical difficulties not just in placing deeply structured content into
InDesign, which is mapping complexity to simplicity. It makes it even more troublesome
to reverse the operation, by trying to restore a deep structure from a simple structure.
Reconstructing XML from styles and the text of paragraphs or characters has to rely on
a degree of consistency in the InDesign file that is hard for the layout person to maintain.
A more subtle problem is the cognitive shift that is required to get people who work with
layout and styles to understand what hierarchical content is (and why hierarchy is use‐
ful). A person who has worked with visual design professionally is not necessarily going
to immediately understand what the purpose of XML is. Instead, she is going to get
caught up in the technical difficulty—and XML is likely to make her have to deal with
obscure problems and unfamiliar menus in InDesign that have nothing to do with her
primary job function of making beautiful pages. This results in frustration and resistance
to adopting XML. And that attitude is not the designer’s fault. It is arising directly from
the tension between the worlds of structured content and flat presentations.
If, despite the difficulties, you must work with XML and keep it valid, the topics in
Chapter 8 explains some of the challenges you will face.
76 | Chapter 9: What InDesign Cannot Do (or Do Well) with XML
Documents you may be interested
Documents you may be interested