Вы находитесь на странице: 1из 19

Document Object Model (DOM). DOM is an in-memory tree representation of the structure of an XML document.

Simple API for XML (SAX). SAX is a standard for event-based XML parsing.

Java API for XML Processing (JAXP). JAXP is a standard interface for processing XML with Java applications. It supports the DOM and SAX standards.

Document Type Definition (DTD). An XML DTD defines the legal structure of an XML document.

XML Schema. Like a DTD, an XML schema defines the legal structure of an XML document.

XML Namespaces. Namespaces are a mechanism for differentiating element and attribute names.

Binary XML. Both scalable and nonscalable DOMs can save XML documents in this format.

XML Parsing in Java

XMLParser

 

parse()

 

XMLDOMImplementation

is the abstract base class for the XML parser for Java. An instantiated parser invokes the

method to read an XML document.

factory methods provide another method to parse Binary XML to create

scalable DOM. Figure 4-1illustrates the basic parsing process, using

XMLDOMImplementation

().

XMLParser

. The diagram does not apply to

Figure 4-1 The XML Parser Process

diagram does not apply to Figure 4-1 The XML Parser Process Description of "Figure 4-1 The

The following APIs provide a Java application with access to a parsed XML document:

● DOM API, which parses XML documents and builds a tree representation of the documents

in memory. Use either a

DOMParser

XMLDOMImplementation

 

object to parse with DOM or the

interface factory methods to create a pluggable, scalable DOM.

● SAX API, which processes an XML document as a stream of events, which means that a

program cannot access random locations in a document. Use a with SAX.

SAXParser

object to parse

● JAXP, which is a Java-specific API that supports DOM, SAX, and XSL. Use a

DocumentBuilder

or

SAXParser

object to parse with JAXP.

The sample XML document in Example 4-1helps illustrate the differences among DOM, SAX, and JAXP. Example 4-1 Sample XML Document

<?xml version="1.0"?>

<EMPLIST>

<EMP>

<ENAME>MARY</ENAME>

</EMP>

<EMP>

<ENAME>SCOTT</ENAME>

</EMP>

</EMPLIST>

DOM in XML Parsing

DOM builds an in-memory tree representation of the XML document. For example, the DOM API receives the document described in Example 4-1and creates an in-memory tree as shown in Figure 4-2. DOM provides classes and methods to navigate and process the tree. In general, the DOM API provides the following advantages:

● DOM API is easier to use than SAX because it provides a familiar tree structure of objects.

● Structural manipulations of the XML tree, such as re-ordering elements, adding to and deleting elements and attributes, and renaming elements, can be performed.

● Interactive applications can store the object model in memory, enabling users to access and manipulate it.

● DOM as a standard does not support XPath. However, most XPath implementations use DOM. The Oracle XDK includes DOM API extensions to support XPath.

● A pluggable, scalable DOM can be created that considerably improves scalability and efficiency.

DOM Creation

In Java XDK, there are three ways to create a DOM:

● Parse a document using

● Create a scalable DOM using

● Use an

​ DOMParser ​

DOMParser

​ DOMParser ​ XMLDOMImplementation ​
​ DOMParser ​ XMLDOMImplementation ​
XMLDOMImplementation ​

XMLDOMImplementation

. This has been the traditional XDK approach.

factory methods.

XMLDocument

constructor. This is not a common solution in XDK.

Scalable DOM

With Oracle 11gRelease 1 (11.1), XDK provides scalable, pluggable support for DOM. This relieves problems of memory inefficiency, limited scalability, and lack of control over the DOM configuration. For the scalable DOM, the configuration and creation are mainly supported using the

XMLDOMImplementation

class.

These are important aspects of scalable DOM:

● Plug-in Data allows external XML representation to be directly used by Scalable DOM without replicating XML in internal representation.

● Scalable DOM is created on top of plug-in XML data through the

R​ eader
R​ eader

and

InfosetWriter XML, ​ XMLType
InfosetWriter
XML, ​
XMLType

abstract interfaces. XML data can be in different forms, such as Binary

, and third-party DOM, and so on.

● Transient nodes. DOM nodes are created lazily and may be freed if not in use.

● Binary XML

● The scalable DOM can use binary XML as both input and output format. Scalable DOM can interact with the data in two ways:

○ Through the abstract Users can (1) use the

InfosetWriter

InfosetReader ​ and ​ ​BinXML​ ​BinXML​
InfosetReader
​ and ​
​BinXML​
​BinXML​
InfosetWriter

InfosetWriter

InfosetWriter InfosetReader ​
InfosetWriter InfosetReader ​
InfosetReader ​

InfosetReader

data, and (2) use other

interfaces.

and

implementation of

to read and write

implementations supplied by the user to read and write in other forms of XML infoset.

○ Through an implementation of the

adaptor for

BinXMLStream

.

InfosetReader

and

InfosetWriter

SAX in the XML Parser

Unlike DOM, SAX is event-based, so it does not build in-memory tree representations of input documents. SAX processes the input document element by element and can report events and significant data to callback methods in the application. The XML document in Example 4-1is parsed as a series of linear events as shown in Figure 4-2. In general, the SAX API provides the following advantages:

● It is useful for search operations and other programs that do not need to manipulate an XML tree.

● It does not consume significant memory resources.

● It is faster than DOM when retrieving XML documents from a database.

Figure 4-2 Comparing DOM (Tree-Based) and SAX (Event-Based) APIs

4-2 Comparing DOM (Tree-Based) and SAX (Event-Based) APIs Description of "Figure 4-2 Comparing DOM (Tree-Based)

JAXP in the XML Parser

The JAXP API enables you to plug in an implementation of the SAX or DOM parser. The SAX and DOM APIs provided in the Oracle XDK are examples of vendor-specific implementations supported by JAXP. In general, the advantage of JAXP is that you can use it to write interoperable applications. If an application uses features available through JAXP, then it can very easily switch the implementation.

The main disadvantage of JAXP is that it runs more slowly than vendor-specific APIs. In addition, several features are available through Oracle-specific APIs that are not available through JAXP APIs. Only some of the Oracle-specific features are available through the extension mechanism provided in JAXP. If an application uses these extensions, however, then the flexibility of switching implementation is lost.

The sample XML considered in the examples is:

01 02 03
01
02
03
04 05
04
05

<employees>

<employee id="111">

<firstName>Rakesh</firstName>

<lastName>Mishra</lastName>

<location>Bangalore</location>

06 ​</employee> 07 ​<employee id="112">
06 ​</employee>
07
​<employee id="112">
08 09
08
09

<firstName>John</firstName>

<lastName>Davis</lastName>

10 11
10
11

<location>Chennai</location>

</employee>

12 13
12
13

<employee id="113">

<firstName>Rajesh</firstName>

14 15
14
15

<lastName>Sharma</lastName>

<location>Pune</location>

16 ​</employee>
16 ​</employee>

17

</employees>

And the obejct into which the XML content is to be extracted is defined as below:

01 class Employee{ 02 ​String id; 03 ​String firstName;
01
class Employee{
02 ​String id;
03
​String firstName;
04 05 06
04
05
06

String lastName;

String location;

07 08
07
08

@Override

public String toString() {

09 10 ​} 11 }
09
10 ​}
11
}

return firstName+" "+lastName+"("+id+")"+location;

There are 3 main parsers for which I have given sample code:

Using DOM Parser

I am making use of the DOM parser implementation that comes with the JDK and in my example I am using JDK 7. The DOM Parser loads the complete XML content into a Tree structure. And we iterate

through the Nodeand NodeListto get the content of the XML. The code for XML parsing using DOM parser is given below.

01 02 03
01
02
03

public class DOMParserDemo {

public static void main(String[] args) throws Exception {

//Get the DOM Builder Factory

DocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance();

//Get the DOM Builder

DocumentBuilder builder = factory.newDocumentBuilder();

//Load and Parse the XML document

//document contains the complete XML as a Tree.

Document document =

builder.parse(

ClassLoader.getSystemResourceAsStream("xml/employee.xml"));

List<Employee> empList = new ArrayList<>();

//Iterating through the nodes and extracting the data.

NodeList nodeList = document.getDocumentElement().getChildNodes();

for (int i = 0; i < nodeList.getLength(); i++) {

//We have encountered an <employee> tag.

Node node = nodeList.item(i);

if (node instanceof Element) {

Employee emp = new Employee();

emp.id = node.getAttributes().

getNamedItem("id").getNodeValue();

NodeList childNodes = node.getChildNodes();

for (int j = 0; j < childNodes.getLength(); j++) {

Node cNode = childNodes.item(j);

//Identifying the child tag of employee encountered.

if (cNode instanceof Element) {

String content = cNode.getLastChild().

getTextContent().trim();

switch (cNode.getNodeName()) {

case "firstName":

41 42 43
41
42
43

emp.firstName = content;

break;

case "lastName":

emp.lastName = content;

break;

case "location":

emp.location = content;

48 ​break; 49 ​} 50 ​} 51 ​}
48
​break;
49
​}
50
​}
51
​}

empList.add(emp);

}

}

//Printing the Employee list populated.

for (Employee emp : empList) {

59 60 ​} 61
59
60 ​}
61

System.out.println(emp);

62 ​} 63 }
62 ​}
63
}
64 65 class Employee{ 66 ​String id; 67 ​String firstName;
64
65
class Employee{
66 ​String id;
67
​String firstName;

String lastName;

String location;

@Override

public String toString() {

73 74 ​} 75 }
73
74 ​}
75
}

return firstName+" "+lastName+"("+id+")"+location;

The output for the above will be:

1

Rakesh Mishra(111)Bangalore

2
2

John Davis(112)Chennai

3

Rajesh Sharma(113)Pune

Using SAX Parser

SAX Parseris different from the DOM Parser where SAX parser doesn’t load the complete XML into the memory, instead it parses the XML line by line triggering different events as and when it encounters

different elements like: opening tag, closing tag, character data, comments and so on. This is the reason why SAX Parser is called an event based parser. Along with the XML source file, we also register a handler which extends the DefaultHandlerclass. The DefaultHandler class provides different callbacks out of which we would be interested in:

startElement()– triggers this event when the start of the tag is encountered.

endElement()– triggers this event when the end of the tag is encountered.

characters()– triggers this event when it encounters some text data.

The code for parsing the XML using SAX Parser is given below:

05 06 07
05
06
07
08 09
08
09
10 11
10
11
12 13
12
13

import java.util.ArrayList;

import java.util.List;

import javax.xml.parsers.SAXParser;

import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;

import org.xml.sax.SAXException;

import org.xml.sax.helpers.DefaultHandler;

public class SAXParserDemo {

public static void main(String[] args) throws Exception {

SAXParserFactory parserFactor = SAXParserFactory.newInstance();

SAXParser parser = parserFactor.newSAXParser();

SAXHandler handler = new SAXHandler();

parser.parse(ClassLoader.getSystemResourceAsStream("xml/employee.xml"),

handler);

//Printing the list of employees obtained from XML

19 20
19
20

for ( Employee emp : handler.empList){

System.out.println(emp);

21 22 ​} 23 }
21
22 ​}
23
}

}

24 /**
24 /**

25

* The Handler for SAX Events.

26 ​*/
26 ​*/
27 28 29
27
28
29
30 31
30
31

class SAXHandler extends DefaultHandler {

List<Employee> empList = new ArrayList<>();

Employee emp = null;

String content = null;

32 ​@Override
32 ​@Override
33 34
33
34

//Triggered when the start of tag is found.

public void startElement(String uri, String localName,

35 36 37
35
36
37

String qName, Attributes attributes)

throws SAXException {

switch(qName){

//Create a new Employee object when the start tag is found

case "employee":

emp = new Employee();

emp.id = attributes.getValue("id");

break;

43 44 ​} 45 ​}
43
44 ​}
45
​}

@Override

public void endElement(String uri, String localName,

49 50
49
50

String qName) throws SAXException {

switch(qName){

51 52
51
52

//Add the employee to list once end tag is found

case "employee":

53 ​empList.add(emp); 54 ​break;
53
​empList.add(emp);
54 ​break;
55 56 57
55
56
57

//For all other end tags the employee has to be updated.

case "firstName":

emp.firstName = content;

58 ​break; 59 ​case "lastName":
58 ​break;
59
​case "lastName":
60 61
60
61

emp.lastName = content;

break;

62 63
62
63

case "location":

emp.location = content;

64 ​break; 65 ​}
64 ​break;
65
​}
66 ​}
66 ​}
67 68 ​@Override
67
68 ​@Override
74 } 75
74
}
75

public void characters(char[] ch, int start, int length)

throws SAXException {

71 72 ​} 73
71
72 ​}
73

content = String.copyValueOf(ch, start, length).trim();

class Employee {

77 78 ​String id; 79 ​String firstName;
77
78 ​String id;
79
​String firstName;

String lastName;

81 82
81
82

String location;

83 84
83
84

@Override

public String toString() {

85 86 ​} 87 }
85
86 ​}
87
}

return firstName + " " + lastName + "(" + id + ")" + location;

The output for the above would be:

1

2
2

3

Rakesh Mishra(111)Bangalore

John Davis(112)Chennai

Rajesh Sharma(113)Pune

Using StAX Parser

StAX stands for Streaming API for XML and StAX Parseris different from DOM in the same way SAX Parser is. StAX parser is also in a subtle way different from SAX parser.

● The SAX Parser pushes the data but StAX parser pulls the required data from the XML.

● The StAX parser maintains a cursor at the current position in the document allows to extract the content available at the cursor whereas SAX parser issues events as and when certain data is encountered.

XMLInputFactoryand XMLStreamReaderare the two class which can be used to load an XML file. And as we read through the XML file using XMLStreamReader, events are generated in the form of integer values and these are then compared with the constants in XMLStreamConstants. The below code shows how to parse XML using StAX parser:

01 02
01
02
03 04
03
04
05 06
05
06
07 08 09
07
08
09
10 11
10
11

import java.util.ArrayList;

import java.util.List;

import javax.xml.stream.XMLInputFactory;

import javax.xml.stream.XMLStreamConstants;

import javax.xml.stream.XMLStreamException;

import javax.xml.stream.XMLStreamReader;

public class StaxParserDemo {

public static void main(String[] args) throws XMLStreamException {

List<Employee> empList = null;

Employee currEmp = null;

String tagContent = null;

XMLInputFactory factory = XMLInputFactory.newInstance();

XMLStreamReader reader =

factory.createXMLStreamReader(

ClassLoader.getSystemResourceAsStream("xml/employee.xml"));

while(reader.hasNext()){

int event = reader.next();

switch(event){

case XMLStreamConstants.START_ELEMENT:

if ("employee".equals(reader.getLocalName())){

currEmp = new Employee();

25 26 ​}
25
26
​}

currEmp.id = reader.getAttributeValue(0);

27 28 29
27
28
29

if("employees".equals(reader.getLocalName())){

empList = new ArrayList<>();

}

30 31
30
31

break;

case XMLStreamConstants.CHARACTERS:

tagContent = reader.getText().trim();

break;

case XMLStreamConstants.END_ELEMENT:

switch(reader.getLocalName()){

case "employee":

empList.add(currEmp);

break;

case "firstName":

currEmp.firstName = tagContent;

break;

case "lastName":

currEmp.lastName = tagContent;

break;

case "location":

currEmp.location = tagContent;

49 ​break; 50 ​} 51 ​break;
49
​break;
50
​}
51
​break;
52 53
52
53

case XMLStreamConstants.START_DOCUMENT:

empList = new ArrayList<>();

break;

55 56 ​} 57
55
56 ​}
57
58 ​} 59
58 ​}
59

//Print the employee list populated from XML

for ( Employee emp : empList){

System.out.println(emp);

63 64
63
64
65 66 }
65
66
}

}

}

class Employee{

String id;

String firstName;

String lastName;

String location;

73 74 ​@Override
73
74 ​@Override
75 76
75
76

public String toString(){

return firstName+" "+lastName+"("+id+") "+location;

77 78 }
77
78
}

}

The output for the above is:

Rakesh Mishra(111) Bangalore

2
2

3

John Davis(112) Chennai

Rajesh Sharma(113) Pune