XML Parser Architectures and APIs:
The Python standard library provides a set of interfaces to work with XML. The two most
basic and broadly used APIs to XML data are the SAX and DOM interfaces.
• Simple API for XML (SAX): Here, you register callbacks for events of interest and
then let the parser proceed through the document. This is useful when your documents are
large or you have memory limitations, it parses the file as it reads it from disk and the
entire file is never stored in memory.
• Document Object Model (DOM) API: This is a World Wide Web Consortium
recommendation wherein the entire file is read into memory and stored in a hierarchical
(tree-based) form to represent all the features of an XML document.
The thing is that SAX can’t process information as fast as DOM, when working with large
files. On the other hand, using DOM can kill your resources, especially if used on a lot of
small files. SAX is read-only, while DOM allows changes to the XML file. As these two
APIs complement each other, there is no reason why you can’t use them both for large
projects. Let’s see a simple example for XML file movies.xml:
<collection shelf=“New Arrivals”>
<movie title=“Enemy Behind”>
<description>Talk about a US-Japan war</description>
<type>Anime, Science Fiction</type>
<description>A schientific fiction</description>