90
7
25
An Introduction to XML and Web Technologies
HTML
HTML Validity
Validity
ɷ HTML has a formal syntax specification
ɷ 800 lines ofDTD notation
ɷ A validator gives syntax errors for invalid documents
ɷ Most HTML documents onthe Web are invalid:
ɷ Valid documents may contain this logo:
19 errors
www.sun.com
27 errors
www.google.com
30 errors
www.ibm.com
58 errors
www.cnn.com
123 errors
www.microsoft.com
26
An Introduction to XML and Web Technologies
Validation
ValidationErrors
Errors
Line 3, column 7: document type does not allow element "BODY" here.
<body>
^
Line 4, column 13: document type does not allow element "B" here; assuming missing "CAPTION" start-tag
<table><b>123</i></table>
^
Line 4, column 20: end tag for element "I" which is not open.
<table><b>123</i></table>
^
Line 4, column 28: end tag for "B" omitted, but its declaration does not permit this.
<table><b>123</i></table>
^
Line 4, column 11: start tag was here.
<table><b>123</i></table>
^
Line 4, column 28: end tag for "CAPTION" omitted, but its declaration does not permit this.
<table><b>123</i></table>
^
Line 4, column 11: start tag was here.
<table><b>123</i></table>
^
Line 4, column 28: end tag for "TABLE" which is not finished.
<table><b>123</i></table>
^
Line 6, column 6: end tag for "HTML" which is not finished.
</html>
<html>
<body>
<table><b>123</i></table>
</body>
</html>
27
An Introduction to XML and Web Technologies
Reasons
Reasons
for
for
Invalidity
Invalidity
ɷ Ignorance of the HTML standard
ɷ Lack oftesting
• ”This page is optimizedfor the XYZ browser”
• ”This page is best viewedin 1024x768”
ɷ Automatic tools generate invalid HTML output
ɷ Forgiving browsers try to interpretinvalid input
<h2>Lousy HTML</h1>
<li><a>This is not very</b> good.
<li><i>In fact, it is quite bad</em>
</ul>
But the browser does <a naem="goof">something.
28
An Introduction to XML and Web Technologies
Problems
Problems
with
with
Invalidity
Invalidity
ɷ There are severaldifferentbrowsers
ɷ Eachbrowsers has many differentimplementations
ɷ Eachimplementationmust interpretinvalid HTML
ɷ There are many arbitrary choices to make
ɷ The HTML standard has been undermined
ɷ HTML renders differently for most clients