43
Most modern website design programs, including DreamWeaver, still don’t
produce anything like well-formed HTML, largely because they are intended
for making pages look pretty, rather than getting the markup right. Using a
website design program and its HTML pages as the sole repository of your
information can be a dangerous and expensive mistake. If you’re working
the other way round, however, getting the information design right in XML
first, and then exporting it to a page design produced using a website design
program, it’s probably less important that the HTML is a mess, because
browsers are very forgiving.
C
ONVERTING VALID
HTML
TO
XHTML
If your HTMLfiles are valid (full formal validation with an SGML parser
against one of the published DTDs, not just a simple syntax check), then
try validating them as XHTML with an XML parser. If you have been
creating clean HTML without embedded formatting then this process
should throw up only mismatches in upper/lowercase element and
attribute names, and EMPTY elementslike img, plus any non-standard
element type namesif you use them. Simple hand-editing or a short script
should be enough to fix these changes.
If your HTML validly uses end-tag omission and unquoted attribute
values, this can be fixed automatically by a normalisation program like
sgmlnorm (from the OpenSP package, which is part ofOpenJade), or by the
sgml-normalize function in an editor like Emacs/psgml (don’t be put off by
the names, they both do XML).
If you have a lot ofvalid HTML files, you could write a script todo this
in a programming language which understands SGML markup (such as
Omnimark, SGMLC,oroneofthepopularscriptinglanguages(egPerl,
Python, Tcl, etc), using theirSGML/XML libraries); or you couldeven use
editor macros if you know what you’re doing.
If your HTML is invalid or badly-formed, try the HTML Tidy program
mentioned above. If that doesn’t fix them, I’m afraid you’ll need to write
something special using the procedure below, or doit all by hand-editing,
or copy-and-paste froma browser.
C
ONVERTING TO A NEW DOCUMENT TYPE
If you want to move yourfiles out of HTML into some other DTD entirely,
there are many native XML industrial DTDs, andmodular XML versions of
popular DTDslike TEI (literary, historical, and linguistic documents) and
DocBook (computer documentation) or DITA (technical documentation) to
40