Вы находитесь на странице: 1из 8


XML Parsers SAX & DOM

Simple API for XML Document Object Model

(SAX) Simple API for XML


Introduction SAX
Simple API for XML A method for accessing XML documents contents SAX provides event-based parsing for XML documents Uses event-based model
Notifications (events) are raised as document is parsed

The current version is SAX 2.0.1 The official website for SAX is :

SAX Parsers SAX-based parsers

Available for variety of programming languages
e.g., Java, Python, etc.


Events SAX parser

Invokes certain methods when events occur
Programmers override these methods to process data
Method Name Description

startDocument Invoked when the parser encounters the start of an XML document. endDocument Invoked when the parser encounters the end of an XML document. startElement Invoked when the start tag of an element is encountered. endElement Invoked when the end tag of an element is encountered. characters Invoked when text characters are encountered.
Methods invoked by the SAX parser

(DOM) Document Object Model


DOM - Introduction XML Document Object Model (DOM)

W3C standard recommendation Build tree structure in memory for XML documents DOM-based parsers parse these structures
Exist in several languages (Java, C, C++, Python, Perl, etc.)

DOM - Introduction DOM tree

Each node represents an element, attribute, etc.
<?xml version = "1.0"?> <message from = "Paul" to = "Tem"> <body>Hi, Tim!</body> </message>

Node created for element message Element message has child node for body element Element body has child node for text "Hi, Tim!" Attributes from and to also have nodes in tree


DOM classes and interfaces.

Class/Interface Document Description Represents the XML documents top-level node, which provides access to all the documents nodesincluding the root element. Represents an XML document node. Represents a read-only list of Node objects. Represents an element node. Derives from Node.

Node NodeList Element

Some Document methods.

Method Name createElement createAttribute createTextNode getDocumentElement appendChild getChildNodes

Description Creates an element node. Creates an attribute node. Creates a text node. Returns the documents root element. Appends a child node. Returns the child nodes.


Node methods.
Method Name appendChild cloneNode Description Appends a child node.

Duplicates the node. getAttributes Returns the nodes attributes. getChildNodes Returns the nodes child nodes. getNodeName getNodeType Returns the nodes name. Returns the nodes type (e.g., element, attribute, text, etc.). Node types are described in greater detail in Fig. 8.9.


Returns the nodes value. getParentNode Returns the nodes parent. hasChildNodes Returns true if the node has child nodes. removeChild replaceChild setNodeValue insertBefore Removes a child node from the node. Replaces a child node with another node. Sets the nodes value. Appends a child node in front of a child node.

Some node types.


Description Represents an element node. Represents an attribute node. Represents a text node. Represents a comment node.


Element methods.

Method Name getAttribute getTagName removeAttribute setAttribute

Description Returns an attributes value. Returns an elements name. Removes an elements attribute. Sets an attributes value.


Tree-based model
Stores document data in node hierarcy

Data is accessed quickly Provides facilities for adding and removing nodes

Invoke methods when markup (specific tag) is encountered Greater performance than DOM Less memory overhead than DOM Typically used for reading documents (not modifying them)