ARCHIVAL PRESERVATION OF WEB RESOURCES: HTML to XHTML Migration Test Technical Considerations, Evaluation, and Recommendations
As noted above, Tidy Utility software is available in two different modes. The first is Tidy.exe, which is
command line (DOS) software initially developed by David Raggett for the World Wide Web Consortium. Tidy.
exe supports 43 different options or parameters that allow users to customize clean up and migration. Selecting
these options is cumbersome for people unfamiliar with DOS so in general, Tidy.exe is not user friendly. One of
the Tidy.exe options is to display a message log of warnings that identifies each instance where Tidy.exe
corrected or cleaned up HTML code to comply with XHTML requirements. This message log allows users to
review each instance of corrected HTML code and accept or reject the correction, which is analogous to the
"find and replace" functionality of MS-Word. This is a time consuming process that is likely to be useful only
for the authors of HTML pages who want to post "valid," interoperable HTML pages on a Web site.
One very useful feature of Tidy.exe is that it can clean up and convert single HTML pages or multiple pages.
The latter requires that all of the "related" HTML pages be cleaned up and converted to a separate directory.
Although batch processing of HTML pages containing both text and images could result in text being
overwritten on an image or some other form of misalignment, there were no instances of text being overwritten
on an image as a result of batch migration of the test bed. In addition, Tidy.exe migration of HTML pages to
XHTML may not consistently produce 100 per cent valid and well-formed XHTML pages in every instance, so
some form of visual inspection may be prudent. Interestingly, the DOS tool in Windows 98 runs in a Windows
environment where drag and drop functionalities are supported.
The second mode of the Tidy Utility is Tidy GUI, which is an adaptation of David Raggett's HTML Tidy.exe.
Tidy GUI has familiar Window features that make it relatively user friendly. Tidy GUI supports all of the Tidy.
exe options, which can be selected by clicking on pull-down menus. Although Tidy GUI is a significant
improvement over Tidy.exe, it processes only one HTML page at a time, which can become quite tedious when
thousands of HTML pages are to be converted to XHTML. Like Tidy.exe, Tidy GUI migration of HTML pages
to XHTML may not consistently produce 100 per cent valid and well-formed XHTML pages, so W3C provides
an on-line validation service to identify and correct errors. The W3C Validation Service is not integrated into
HTML-Kit, which includes a full-featured text editor, was designed to assist authors of HTML XML script to
create, edit, format, validate, preview, and publish Web pages. HTML-Kit is a native 32-bit Windows program
that currently runs on Windows 95, 98, XP, and ME, NT, 2000 or any other platform that emulates 32-bit
Windows functionality. HTML-Kit executes the following migration and validation functions within the same
Opens an original HTML page,
Starts Tidy GUI, selects options, and executes "clean up,"
Converts the cleaned up page to XHTML,
Saves the newly created XHTML page,
Validates the newly created XHTML page, and
Obtains on-line certification that a converted XHTML page is compliant with the W3C standard.
HTML-Kit supports all of the Tidy Utility functions menus, and as a windows application, it allows the opening
of multiple pages or documents at the same time but the migration process deals with one document or page at a
http://www.si.edu/archives/archives/dollarrpt2.html (7 of 26)11/18/2004 7:31:55 AM