Вы находитесь на странице: 1из 8

2/28/2014

XML Parsers SAX & DOM


Simple API for XML Document Object Model

(SAX) Simple API for XML

2/28/2014

Introduction SAX
Simple API for XML A method for accessing XML documents contents SAX provides event-based parsing for XML documents Uses event-based model
Notifications (events) are raised as document is parsed

The current version is SAX 2.0.1 The official website for SAX is :
http://www.saxproject.org

SAX Parsers SAX-based parsers


Available for variety of programming languages
e.g., Java, Python, etc.

2/28/2014

Events SAX parser


Invokes certain methods when events occur
Programmers override these methods to process data
Method Name Description

startDocument Invoked when the parser encounters the start of an XML document. endDocument Invoked when the parser encounters the end of an XML document. startElement Invoked when the start tag of an element is encountered. endElement Invoked when the end tag of an element is encountered. characters Invoked when text characters are encountered.
Methods invoked by the SAX parser

(DOM) Document Object Model

2/28/2014

DOM - Introduction XML Document Object Model (DOM)


W3C standard recommendation Build tree structure in memory for XML documents DOM-based parsers parse these structures
Exist in several languages (Java, C, C++, Python, Perl, etc.)

DOM - Introduction DOM tree


Each node represents an element, attribute, etc.
<?xml version = "1.0"?> <message from = "Paul" to = "Tem"> <body>Hi, Tim!</body> </message>

Node created for element message Element message has child node for body element Element body has child node for text "Hi, Tim!" Attributes from and to also have nodes in tree

2/28/2014

DOM classes and interfaces.


Class/Interface Document Description Represents the XML documents top-level node, which provides access to all the documents nodesincluding the root element. Represents an XML document node. Represents a read-only list of Node objects. Represents an element node. Derives from Node.

Node NodeList Element

Some Document methods.

Method Name createElement createAttribute createTextNode getDocumentElement appendChild getChildNodes

Description Creates an element node. Creates an attribute node. Creates a text node. Returns the documents root element. Appends a child node. Returns the child nodes.

2/28/2014

Node methods.
Method Name appendChild cloneNode Description Appends a child node.

Duplicates the node. getAttributes Returns the nodes attributes. getChildNodes Returns the nodes child nodes. getNodeName getNodeType Returns the nodes name. Returns the nodes type (e.g., element, attribute, text, etc.). Node types are described in greater detail in Fig. 8.9.

getNodeValue

Returns the nodes value. getParentNode Returns the nodes parent. hasChildNodes Returns true if the node has child nodes. removeChild replaceChild setNodeValue insertBefore Removes a child node from the node. Replaces a child node with another node. Sets the nodes value. Appends a child node in front of a child node.

Some node types.

Node Type Node.ELEMENT_NODE Node.ATTRIBUTE_NODE Node.TEXT_NODE Node.COMMENT_NODE

Description Represents an element node. Represents an attribute node. Represents a text node. Represents a comment node.

2/28/2014

Element methods.

Method Name getAttribute getTagName removeAttribute setAttribute

Description Returns an attributes value. Returns an elements name. Removes an elements attribute. Sets an attributes value.

DOM vs. SAX DOM


Tree-based model
Stores document data in node hierarcy

Data is accessed quickly Provides facilities for adding and removing nodes

SAX
Invoke methods when markup (specific tag) is encountered Greater performance than DOM Less memory overhead than DOM Typically used for reading documents (not modifying them)

2/28/2014

DOM vs. SAX