62
12.1 Introduction
Recently, XML has gained popularity as a data-exchange and message-passing format. As web
services become more widespread, XML plays an even more important role in a developer's
life. With the help of a few extensions, PHP lets you read and write XML for every occasion.
XML provides developers with a structured way to mark up data with tags arranged in a tree-
like hierarchy. One perspective on XML is to treat it as CSV on steroids. You can use XML to
store records broken into a series of fields. But, instead of merely separating each field with a
comma, you can include a field name, type, and attributes alongside the data.
Another view of XML is as a document representation language. For instance, the PHP
Cookbook was written using XML. The book is divided into chapters; each chapter into recipes;
and each recipe into Problem, Solution, and Discussion sections. Within any individual section,
we further subdivide the text into paragraphs, tables, figures, and examples. An article on a
web page can similarly be divided into the page title and headline, the authors of the piece,
the story itself, and any sidebars, related links, and additional content.
XML text looks similar to HTML. Both use tags bracketed by
<
and
>
for marking up text. But
XML is both stricter and looser than HTML. It's stricter because all container tags must be
properly closed. No opening elements are allowed without a corresponding closing tag. It's
looser because you're not forced to use a set list of tags, such as
<a>
,
<img>
, and
<h1>
.
Instead, you have the freedom to choose a series of tag names that best describe your data.
Other key differences between XML and HTML are case-sensitivity, attribute quoting, and
whitespace. In HTML,
<B>
and
<b>
are the same bold tag; in XML, they're two different tags.
In HTML, you can often omit quotation marks around attributes; XML, however, requires
them. So, you must always write:
<element attribute="value">
Additionally, HTML parsers generally ignore whitespace, so a run of 20 consecutive spaces is
treated the same as one space. XML parsers preserve whitespace, unless explicitly instructed
otherwise. Because all elements must be closed, empty elements must end with
/>
. For
instance in HTML, the line break is
<br>
, while in XML, it's written as
<br />
.
[1]
[1]
This is why
nl2br( )
outputs
<br />
; its output is XML-compatible.
There is another restriction on XML documents. Since XML documents can be parsed into a
tree of elements, the outermost element is known as the root element . Just as a tree has only
one trunk, an XML document must have exactly one root element. In the previous book
example, this means chapters must be bundled inside a book tag. If you want to place
multiple books inside a document, you need to package them inside a bookcase or another
container. This limitation applies only to the document root. Again, just like trees can have
multiple branches off of the trunk, it's legal to store multiple books inside a bookcase.