Академический Документы
Профессиональный Документы
Культура Документы
Version: XML/Handout/0307/1.0
Date: 11-03-08
Cognizant
500 Glen Pointe Center West
Teaneck, NJ 07666
Ph: 201-801-0233
www.cognizant.com
XML - Handout
TABLE OF CONTENTS
Introduction ...................................................................................................................................5
About this Module .........................................................................................................................5
Target Audience ...........................................................................................................................5
Module Objectives ........................................................................................................................5
Pre-requisite .................................................................................................................................5
Session 02: DTD .............................................................................................................................6
Learning Objectives ......................................................................................................................6
Introduction ...................................................................................................................................6
Syntax ...........................................................................................................................................6
Elements and Attributes ...............................................................................................................7
Entity References .........................................................................................................................8
Well-formed and valid XML documents ........................................................................................8
Well-formed documents: XML syntax ...........................................................................................9
Valid documents: XML semantics ..............................................................................................10
Try It Out .....................................................................................................................................11
Summary ....................................................................................................................................12
Test Your Understanding............................................................................................................12
Exercises ....................................................................................................................................12
Session 04: Schema .....................................................................................................................13
Learning Objectives ....................................................................................................................13
Introduction .................................................................................................................................13
Simple Types: .............................................................................................................................14
Defining a Simple Element .........................................................................................................15
What is a Complex Element? .....................................................................................................15
Examples of Complex Elements.................................................................................................15
How to Define a Complex Element.............................................................................................16
Data Types .................................................................................................................................17
Name Conflicts ...........................................................................................................................17
Summary ....................................................................................................................................18
Test Your Understanding............................................................................................................18
Exercises ....................................................................................................................................18
Session 06: SAX ...........................................................................................................................19
Page 2
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Learning Objectives ....................................................................................................................19
Introduction .................................................................................................................................19
Handling Events..........................................................................................................................20
Summary ....................................................................................................................................21
Test Your Understanding............................................................................................................22
Exercises ....................................................................................................................................22
Session 08: DOM ..........................................................................................................................23
Learning Objectives ....................................................................................................................23
DOM API.....................................................................................................................................23
DOM Tree Navigation .................................................................................................................23
Getting DOM Tree: .....................................................................................................................24
Summary ....................................................................................................................................25
Test Your Understanding............................................................................................................25
Exercises ....................................................................................................................................25
Session 10: JAXP .........................................................................................................................27
Learning Objectives ....................................................................................................................27
Introduction: ................................................................................................................................27
Summary ....................................................................................................................................29
Test Your Understanding............................................................................................................29
Session 12: XPath ........................................................................................................................30
Learning Objectives ....................................................................................................................30
XPath ..........................................................................................................................................30
What is XPath? ...........................................................................................................................30
Xpath Nodes ...............................................................................................................................30
Relationship of Nodes ................................................................................................................30
Selecting Nodes..........................................................................................................................31
Predicates ...................................................................................................................................31
Selecting Several Paths .............................................................................................................31
XPath Operators: ........................................................................................................................32
Summary ....................................................................................................................................33
Test Your Understanding............................................................................................................33
Exercises ....................................................................................................................................33
Session 14: X Query .....................................................................................................................35
Learning Objectives ....................................................................................................................35
XQuery........................................................................................................................................35
Page 3
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
What is XQuery?.........................................................................................................................35
Relationship of Nodes ................................................................................................................35
XQuery Comparisons .................................................................................................................36
Selecting and Filtering Elements ................................................................................................36
Summary ....................................................................................................................................37
Test Your Understanding............................................................................................................37
Exercises ....................................................................................................................................37
Session 16: XSLT .........................................................................................................................39
Learning Objectives ....................................................................................................................39
XSL .............................................................................................................................................39
What is XSLT? ............................................................................................................................39
The <xsl:template> element .......................................................................................................39
The <xsl:value-of> element ........................................................................................................40
The <xsl:for-each> element ........................................................................................................41
The <xsl:choose> element .........................................................................................................42
Summary ....................................................................................................................................42
Test Your Understanding............................................................................................................42
Exercises ....................................................................................................................................43
Glossary ........................................................................................................................................44
References ....................................................................................................................................45
Websites .....................................................................................................................................45
Books ..........................................................................................................................................45
STUDENT NOTES: ........................................................................................................................46
Page 4
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Introduction
About this Module
This module provides a handout on the following topics:
An introduction to XML
Target Audience
This module is designed for the entry level trainees.
Module Objectives
After completing this module, you will be able to:
Pre-requisite
The pre-requisite of this course is that the audience taking the course should be familiar
with HTML and JavaScript.
Page 5
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Learning Objectives
After completing this session, you will be able to:
Describe XML.
Introduction
What is XML?
XML tags are not predefined in it. You must define your own tags
Syntax
XML elements are defined using XML tags. XML tags are case sensitive. With XML, the tag
<Letter> is different from the tag <letter>. Opening and closing tags must be written with the
same case:
<Message>This is incorrect</message>
<message>This is correct</message>
In the preceding example, "Properly nested" simply means that as the <i> element is opened
inside the <b> element, then it must be closed inside the <b> element.
Page 6
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
XML documents must contain one element that is the parent of all other elements. This element is
called the root element.
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
In this example, Email is called an element. This element called E-mail has three attributes, to,
from and, subject.
The following rules need to be followed while declaring the XML elements names:
Names must not start with the letters xml (or XML or Xml.)
Any name can be used with no words being reserved, but the idea is to make names descriptive.
Names with an underscore separator are nice.
Examples: <author_name> , <published_date> .
Avoid "-" and "." in names. It could be a mess if your software tried to subtract name from first
(author-name) or think that "name" is a property of the object "author" (author.name).
Element names can be as long as you like, but do not exaggerate. Names should be short and
simple, like <author_name> and not like <name_of_the_author> .
XML documents often have a parallel database, where fieldnames are parallel with element
names. A good rule is to use the naming rules of your databases for easy explanation and
correlation.
Letters like , which are not English are perfectly legal in XML element names, but watch out for
problems if your software vendor does not support it.
Page 7
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
The ":" should not be used in element names because it is reserved to be used for something
called namespaces.
Empty Tags: In cases where you do not have to provide any sub tags, you can close the tag, by
providing a "/" to the closing tag. For example declaring like the following:
<Text></Text>
is same a declaring
<Text />
Entity References
Some characters have a special meaning in XML.
If you place a character like "<" inside an XML element, then it will generate an error because the
parser interprets it as the start of a new element.
This will generate an XML error:
<message>if salary < 1000 then</message>
To avoid this error, replace the "<" character with an entity reference:
<message>if salary < 1000 then</message>
<
less than
>
>
greater than
&
&
ampersand
'
'
apostrophe
"
"
quotation mark
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is
legal, but it is a good habit to replace it.
Well-formed: A well-formed document conforms to all the syntax rules of XML. For
example, if a start-tag appears without a corresponding end-tag, then it is not wellformed. A document that is not well-formed is not considered to be XML document.
A conforming parser is not allowed to process it.
Valid: .A valid document additionally conforms to some semantic rules. These rules
are either defined by user, or included as an XML schema or DTD. For example, if a
document contains an undefined element, then it is not valid. A validating parser is
not allowed to process it.
Page 8
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Page 9
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
<step>Cover with a cloth, and leave for one hour in warm
room.</step>
<step>Knead again.</step>
<step>Place in a bread baking tin.</step>
<step>Cover with a cloth, and leave for one hour in warm
room.</step>
<step>Bake in the oven at 350F for 30 minutes.</step>
</instructions>
</recipe>
Attribute values must always be quoted, using single or double quotes and each attribute name
should appear only once in any element.
XML requires that elements must be properly nested that is elements may never overlap.
Page 10
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
The regular structure and strict parsing rules of XML allow software designers to leave parsing to
standard tools, and as XML provides a general, data model-oriented framework for the
development of languages those are specific to application, software designers need to only
concentrate on the development of rules for their data, at relatively high levels of abstraction.
The tools, which are tested well, exist to validate an XML document "against" a schema. The tool
automatically verifies whether the document conforms to constraints expressed in the schema.
Some of these validation tools are included in XML parsers, and some are packaged separately.
Other usages of schemas also exist. XML editors, for instance, can use schemas to support the
editing process (by suggesting valid elements and attributes names, and so on).
DTD: The oldest schema format for XML is the Document Type Definition (DTD), inherited from
SGML. While DTD support is ubiquitous due to its inclusion in the XML 1.0 standard, it is seen as
limited for the following reasons:
It has no support for newer features of XML that is most importantly namespaces.
It uses a custom syntax that is not XML syntax, which is inherited from SGML, to
describe the schema.
DTD is still used in many applications because it is considered the easiest to read and write.
Try It Out
Problem Statement:
Write a DTD for an anthology consisting of poems, their titles, and the stanzas and lines of which
they are composed.
The XML file is as follows:
XML Code:
<anthology>
<poem><title>The SICK ROSE</title>
<stanza>
<line>O Rose thou art sick.</line>
<line>The invisible worm,</line>
<line>That flies in the night</line>
<line>In the howling storm:</line>
</stanza>
<stanza>
<line>Has found out thy bed</line>
<line>Of crimson joy:</line>
<line>And his dark secret love</line>
<line>Does thy life destroy.</line>
</stanza>
</poem>
<!-- more poems go here
-->
Page 11
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
</anthology>
DTD Code:
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
anthology
poem
title
stanza
<!ELEMENT line
(poem+)>
(title?, stanza+)>
(#PCDATA) >
(line+)
>
(#PCDATA) >
How It Works:
So, the DTD specifies (poem+) for anthology element, where the + indicates that
anthology element can contain multiple poem elements.
So, the DTD specifies (stanza+) for poem element, where the + indicates that the
team element can contain multiple stanza elements.
Summary
This session demonstrates how to write a DTD for a given XML document. It also demonstrates
how elements and attributes are used in an XML document.
Exercises
Write a DTD for an XML file with library as the root element and books, author, date of publishing,
and edition as elements.
Page 12
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Define schema
Introduction
XML schema is an alternative to DTD based on XML. It describes the structure of an XML
document. It is also referred to as XML Schema Definition (XSD).
The purpose of an XML schema is to define the legal building blocks of an XML document, just like
a DTD. An XML schema:
Page 13
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The note element is a complex type because it contains other elements. The other elements (to,
from, heading, and body) are simple types because they do not contain other elements. You will
learn more about simple and complex types in the following sessions:
The following fragment:
xmlns:xs="http://www.w3.org/2001/XMLSchema"
This indicates that the elements and data types used in the schema come from the
"http://www.w3.org/2001/XMLSchema" namespace. It also specifies that the elements and data
types that come from the "http://www.w3.org/2001/XMLSchema" namespace should be prefixed
with xs:
This fragment:
targetNamespace="http://www.w3schools.com"
This indicates that the elements defined by this schema (note, to, from, heading, body.) come from
the "http://www.w3schools.com" namespace.
This fragment:
xmlns="http://www.w3schools.com"
This indicates that the default namespace is "http://www.w3schools.com".
This fragment:
elementFormDefault="qualified"
This indicates that any elements used by the XML instance document, which were declared in this
schema must be namespace qualified.
Simple Types:
A simple element is an XML element that can contain only text. It cannot contain any other
elements or attributes. The text can be of many different types. It can be one of the types included
in the XML Schema definition (boolean, string, date, and so on), or it can be a custom type
that you can define yourself. You can also add restrictions (facets) to a data type in order to limit its
content, or you can require the data to match a specific pattern.
Page 14
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
xs:string
xs:decimal
xs:integer
xs:boolean
xs:date
xs:time
Example:
Here are some XML elements:
<lastname>Refsnes</lastname>
<age>36</age>
<dateborn>1970-03-27</dateborn>
Empty elements
Page 15
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
A complex XML element, "employee", which contains only other elements:
<employee>
<firstname>John</firstname>
<lastname>Smith</lastname>
</employee>
A complex XML element, "description", which contains both elements and text:
<description>
It happened on <date lang="norwegian">03.03.99</date> ....
</description>
If you use the preceding method, then several elements can refer to the same complex type, like
this:
<xs:element name="employee" type="personinfo"/>
<xs:element name="student" type="personinfo"/>
<xs:element name="member" type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
Page 16
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Data Types
String data types are used for values that contain character strings. The string data type can
contain characters, line feeds, carriage returns, and tab characters.
<xs:element name="customer" type="xs:string"/>
Name Conflicts
In XML, element names are defined by the developer. This often results in a conflict when trying to
mix XML documents from different XML applications. The namespace is defined by the xmlns
attribute in the start tag of an element.
<root>
<h:table xmlns:h="http://www.w3.org/TR/html4/">
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table xmlns:f="http://www.w3schools.com/furniture">
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
Page 17
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Summary
The examples discussed earlier talk about different types of schemas and the syntax of their
respective types.
Exercises
Write XML schema for the following XML file.
<?xml version="1.0" encoding="ISO-8859-1"?>
<shiporder orderid="889923"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="shiporder.xsd">
<orderperson>John Smith</orderperson>
<shipto>
<name>Ola Nordmann</name>
<address>Langgt 23</address>
<city>4000 Stavanger</city>
<country>Norway</country>
</shipto>
<item>
<title>Empire Burlesque</title>
<note>Special Edition</note>
<quantity>1</quantity>
<price>10.90</price>
</item>
<item>
<title>Hide your heart</title>
<quantity>1</quantity>
<price>9.90</price>
</item>
</shiporder>
Page 18
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Introduction
SAX stands for Simple API for XML.As and when the SAX parser encounters an element in an
XML document, it generates an event and sends it to the application that are invoked in the parser
and the application can respond to the event appropriately. SAX does not read the entire XML
document into memory. It reads only a small chunk of document at a time, parses it, generates
events, and then reads another small chunk of document. Therefore it does not require large
amount of memory. A SAX parser is suitable for parsing huge XML documents. A DOM parser
reads the entire XML document into memory before parsing and therefore a DOM parser cannot
handle large documents. SAX parser can only read an XML document and retrieve it contents. It
cannot modify a DOM parser.
SAX parser is chosen over DOM parser when memory is a constraint or when it is required to read
the content of an XML only. It is available in several programming languages like Java, C++, Perl,
and python.
There are two major types of XML (or SGML) APIs:
Tree-based APIs: These map an XML document into an internal tree structure, and
then allow an application to navigate that tree. The Document Object Model (DOM)
working group at the World-Wide Web Consortium (W3C) maintains a
recommended tree-based API for XML and HTML documents, and there are many
such APIs from other sources.
Event-based APIs: An event-based API, on the other hand, reports parsing events
(such as the start and end of elements) directly to the application through callbacks,
and does not usually build an internal tree. The application implements handlers to
deal with the different events, much like handling events in a Graphical User
Interface. SAX is the best known example of such an API.
DocumentHandler Interface: The DocumentHandler interface defines events that occur in the
standard course of parsing a document or message. For the purposes of your implementation
within ASN.1, you are only interested in three of them:
Start Element: This event occurs when the parser moves into a new element. An
element in XML is defined to start when a <name> tag is encountered. The name of
the element is passed to the event handling callback function.
End Element: This event occurs when the parser leaves a given element space.
The end of an element in XML is signaled by a </name> tag. The name of the
element is once again passed to the event handling callback function.
Page 19
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Characters: This event occurs when character data (that is a value in name/value
pairs) is encountered. This event does not necessarily have to provide the entire
data value as a single event. The string can be broken up at the discretion of the
parser. It is up to the user to check for consecutive characters events and
concatenate the results to get the complete value. A pointer (or reference in Java) to
the character data along with a character count and offset is passed to the event
handler callback function.
Handling Events
The XML Handlers Module supports the following elements and attributes:
Element
Attributes
Action
event (QName),
targetid (IDREF),
declare ("declare"),
xml:id ([XMLID])
( action | script |
dispatchEvent |
addEventListener |
removeEventListener |
stopPropagation |
preventDefault )+
Script
encoding (Charset),
src (URI),
type (ContentTypes),
xml:id ([XMLID])
PCDATA
dispatchEvent
raise (QName),
destid (IDREF),
bubbles ("bubbles"),
cancelable ("cancelable"),
xml:id ([XMLID])
EMPTY
addEventListener
event* (QName),
handler* (IDREF),
EMPTY
phase ("capture" | "default"*),
xml:id ([XMLID])
event* (QName),
handler* (IDREF),
EMPTY
removeEventListener
phase ("capture" | "default"*),
xml:id ([XMLID])
Action Element: The action element is used to group event handler elements (including other
action elements) that will act in sequence as handlers for an event.
Script Element: The script element contains or references scripts that may register one or
more event handlers for a document through a scripting language that is supported by the
implementation.
Page 20
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Dispatch Event Element: The dispatchEvent element triggers the event identified by the
raise attribute. If the destid attribute is specified, then it names a specific element to which to
dispatch the event. Otherwise the event is just dispatched to the "document" to be handled by any
registered listener.
AddEventListener Element: This element allows the registration of a listener on a specific event.
The most important events are the start and end of the document, the start and end of elements,
and character data.
To find out about the start and end of the document, the client application implements the start
Document() and end Document() methods:
public void startDocument () {
System.out.println("Start document");
public void endDocument ()
{
System.out.println("End document");
}
}
The start and endDocument event handlers take no arguments. When the SAX driver finds the
beginning of the document, it will invoke the startDocument () method once and when it finds
the end, it will invoke the endDocument() method once.
The SAX driver will signal the start and end of elements in much the same way, except that it will
also pass some parameters to the startElement() and endElement ()methods:
public void startElement (String uri, String name,
String qName, Attributes atts)
{
if ("".equals (uri))
System.out.println("Start element: " + qName);
else
System.out.println("Start element: {" + uri + "}" + name);
}
public void endElement (String uri, String name, String qName)
{
if ("".equals (uri))
System.out.println("End element: " + qName);
else
System.out.println("End element:
{" + uri + "}" + name);
}
These methods print a message every time an element starts or ends, with any Namespace URI
(Uniform Resource Identifier) in braces before the element's local name. The qName contains the
raw XML 1.0 name, which you must use for all elements that do not have a namespace URI.
Summary
This session gives you an idea about SAX and event handlers.
Page 21
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Test Your Understanding
1. What is SAX?
2. What are the different event handlers?
Exercises
The process XML file is as follows:
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Page 22
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
DOM API
The XML DOM (Document Object Model) defines a standard way for accessing and manipulating
XML documents. It views an XML tree as a data structure, similar to the DOM from Javascript.
Page 23
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Page 24
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
XML File:
Element xbel None
Text #text ' \012 '
ProcessingInstruction processing 'instruction'
<?xml version="1.0" encoding="iso-8859-1"?>
<xbel>
<?processing instruction?>
<desc>No description</desc>
<folder>
<title>XML bookmarks</title>
<bookmark href="http://www.python.org/sigs/xml-sig/" >
<title>SIG for XML Processing in Python</title>
</bookmark>
</folder>
</xbel>
A DOM tree can be converted back to XML by using the Print(doc, stream) or
PrettyPrint(doc, stream) functions in the xml.dom.ext module. If stream is not provided,
then the resulting XML will be printed to standard output. Print() will simply render the DOM
tree without any changes, while PrettyPrint() will add or remove whitespace in order to nicely
indent the resulting XML.
Summary
This session provides an idea about DOM Parser.
Exercises
Using DOM parser modify the author of CHILDREN catefory of the given XML file.
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
Page 25
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Page 26
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Define DocumentBuilder
Introduction:
The Java API for XML Processing, or JAXP (pronounced jaks-p), is one of the Java XML
programming APIs. It provides the capability of validating and parsing XML documents. The two
basic parsing interfaces are:
JAXP is an API, but it is more accurately called an abstraction layer. It does not provide a new
means of parsing XML and also it does not add anything new to SAX interface or DOM interface.
JAXP makes it easier to use DOM and SAX to deal with some difficult tasks. JAXP is a standard
component in the Java platform.
DOM Interface: The DOM interface is perhaps the easiest to describe. It parses an entire XML
document and constructs a complete in-memory representation of the document using the classes
modeling the concepts found in the Document Object Model (DOM) Level 2 Core Specification.
The DOM parser is called a DocumentBuilder, as it builds an in-memory document
representation. The javax.xml.parsers.DocumentBuilder is created by the
javax.xml.parsers.DocumentBuilderFactory. The DocumentBuilder creates an
org.w3c.dom.Document instance, which is a tree structure containing nodes in the XML document.
Each tree node in the structure implements the org.w3c.dom.Node interface. There are many
different types of tree nodes, representing the type of data found in an XML document. The most
important node types are:
Text nodes representing the text found between the start and end tags of a
document element
java.io.File;
java.io.IOException;
java.io.OutputStreamWriter;
java.io.Writer;
// JAXP
import javax.xml.parsers.FactoryConfigurationError;
Page 27
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
// DOM
import
import
import
import
import
org.w3c.dom.Document;
org.w3c.dom.DocumentType;
org.w3c.dom.NamedNodeMap;
org.w3c.dom.Node;
org.w3c.dom.NodeList;
Page 28
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
}
}
private static void printNode(Node node, String indent)
// print the DOM tree
}
startDocument() and endDocument() methods are called at the start and end
of an XML document.
startElement() and endElement() methods are called at the start and end of a
document element.
Characters() method that is called with the text data contents contained between
the start and end tags of an XML document element.
Clients provide a subclass of the DefaultHandler that overrides these methods and processes
the data. This may involve storing the data into a database or writing it out to a stream.
Summary
This session gives you an idea about JAXP and DocumentBuilder and as well as SAX Parser.
Page 29
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
XPath
XPath is a language for finding information in an XML document. XPath is used to navigate
through elements and attributes in an XML document.
What is XPath?
Xpath Nodes
In XPath, there are seven kinds of nodes: element, attribute, text, namespace, processinginstruction, comment, and document (root) nodes. XML documents are treated as trees of nodes.
The root of the tree is called the document node.
Atomic values: Atomic values are nodes with no children or parent.
Items: Items are atomic values or nodes.
Relationship of Nodes
Page 30
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Selecting Nodes
XPath uses path expressions to select nodes in an XML document. The node is selected by
following a path or steps. The most useful path expressions are as follows:
Expression
nodename
Description
Selects all child nodes of the named node
//
Selects nodes in the document from the current node that match the selection
no matter where they are
..
Selects attributes
Predicates
Predicates are used to find a specific node or a node that contains a specific value. Predicates are
always embedded in square brackets.
Examples
In the following table you have listed some path expressions with predicates and the result of the
expressions:
Path Expression
Result
/bookstore/book[1]
Selects the first book element that is the child of the bookstore
element
/bookstore/book[last()]
Selects the last book element that is the child of the bookstore
element
//title[@lang]
Selects all the title elements that have an attribute named lang
//title[@lang='eng']
Selects all the title elements that have an attribute named lang
with a value of 'eng'
Result
//book/title | //book/price
//title | //price
Selects all the title and price elements of all book elements
Selects all the title and price elements in the document
/bookstore/book/title | //price Selects all the title elements of the book element in the
bookstore element and all the price elements in the document
Page 31
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
XPath Axes:
An axis defines a node-set relative to the current node.
AxisName
Result
Selects all ancestors (parent, grandparent, and so on) of the current node
ancestor
Selects all ancestors (parent, grandparent, and so on) of the current node and
the current node itself
ancestor-or-self
attribute
child
Selects all descendants (children, grandchildren, and so on) of the current node
descendant
descendant-or-self Selects all descendants (children, grandchildren, and so on) of the current node
and the current node itself
Selects everything in the document after the closing tag of the current node
following
following-sibling
namespace
parent
Selects everything in the document that is before the start tag of the current node
preceding
preceding-sibling
self
XPath Operators:
An XPath expression returns either a node-set, a string, a Boolean, or a number. The list of the
operators that can be used in XPath expressions are:
Operator
Description
Example
Return value
//book | //cd
Addition
6 + 4
10
Subtraction
6 - 4
Multiplication
6 * 4
24
div
Division
8 div 4
Equal
price=9.80
!=
Not equal
price!=9.80
<
Less than
price<9.80
<=
price<=9.80
Page 32
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Operator
Description
Example
Return value
>
Greater than
price>9.80
>=
price>=9.80
or
or
price=9.80 or
price=9.70
and
and
price>9.00 and
price<9.90
mod
5 mod 2
Summary
This session provides an idea about XPath.
Exercises
From the given XML file retrieve all the child elements using path expressions.
XML File:
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
Page 33
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Page 34
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
XQuery
XQuery was designed to query XML data. XQuery is also known as XML Query.
What is XQuery?
XQuery is supported by all the major database engines like IBM, Oracle, Microsoft,
and so on.
XQuery is a language for finding and extracting elements and attributes from XML
documents.
XQuery Terms: In XQuery, there are seven kinds of nodes, which are element, attribute, text,
namespace, processing-instruction, comment, and document (root) nodes. XML documents are
treated as trees of nodes. The root of the tree is called the document node.
Atomic values: Atomic values are nodes with no children or parent.
Items: Items are atomic values or nodes.
Relationship of Nodes
Parent: Each element and attribute has one parent
Children: Element nodes may have zero, one or more children
Siblings: Nodes that have the same parent
Ancestors: Parent of a node or a parent, and so on
Descendants: Children of a node or other children, and so on
XQuery syntax:
XQuery is case-sensitive
XQuery comments are delimited by (: and :), for example (: XQuery Comment :)
Page 35
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
XQuery Comparisons
In XQuery there are two ways of comparing values, which are as follows:
let: (optional)
The for clause: The for clause binds a variable to each item returned by the in expression. The
for clause results in iteration. There can be multiple for clauses in the same FLWOR.
expression.
To loop a specific number of times in a for clause, you may use the to keyword:
for $x in (1 to 5)
return <test>{$x}</test>
The let clause: The let clause allows variable assignments and it avoids repeating the same
expression many times. The let clause does not result in iteration.
let $x := (1 to 5)
return <test>{$x}</test>
The where clause: The where clause is used to specify one or more criteria for the result.
where $x/price>30 and $x/price<100
Page 36
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
The order by clause: The order by clause is used to specify the sort order of the result. Here
you want to order the result by category and title.
for $x in doc("books.xml")/bookstore/book
order by $x/@category, $x/title
return $x/title
Summary
You have described that XQuery was designed to query anything that can appear as XML,
including databases. You have also described how to query the XML data with FLWOR
expressions, and how to construct XHTML (eXtensible Hypertext Markup Language) output from
the collected data.
Exercises
Using XQuery retrieve the child elements from the following XML file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
Page 37
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Page 38
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Define XSL
XSL
XSL stands for eXtensible Stylesheet Language.
What is XSLT?
The correct way to declare an XSL stylesheet according to the W3C XSLT Recommendation is:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
Page 39
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
<td>.</td>
</tr>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
As an XSL stylesheet is an XML document itself, it always begins with the XML declaration: <?xml
version="1.0" encoding="ISO-8859-1"?>.
The next element, <xsl:stylesheet>, defines that this document is an XSLT stylesheet
document (along with the version number and XSLT namespace attributes).
The <xsl:template> element defines a template. The match="/" attribute associates the
template with the root of the XML source document.
The content inside the <xsl:template> element defines some HTML to write to the output.
The last two lines define the end of the template and the end of the style sheet.
Page 40
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
The value of the select attribute is an XPath expression. An XPath expression works like
navigating a file system; where a forward slash (/) selects subdirectories.
To sort the output, simply add an <xsl:sort> element inside the <xsl:for-each> element in
the XSL file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
Page 41
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
<xsl:for-each select="catalog/cd">
<xsl:sort select="artist"/>
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Summary
You have explained how to use XSLT to transform XML documents into other formats, like
XHTML. You have explained how to add or remove elements and attributes to or from the output
file. You have also explained how to rearrange and sort elements, perform tests, and make
decisions about which elements to hide and display.
Page 42
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Exercises
Transform the given XML document into XSL document.
XML File:
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Page 43
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
Glossary
API:
DOM:
DTD:
JAXP:
SAX :
URI:
XML:
XSD:
XSL:
XSLT:
Page 44
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
References
Websites
www.w3.org
www.xml.org
www.xml.com
www.w3schools.com/xml/
Books
Page 45
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
XML - Handout
STUDENT NOTES:
Page 46
Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected