Вы находитесь на странице: 1из 19

XQuery:

An XML query
language
by D. Chamberlin

The World Wide Web Consortium has tation for transforming XML documents from one
convened a working group to design a query representation to another.
language for Extensible Markup Language
(XML) data sources. This new query language, XML makes it possible for applications to exchange
called XQuery, is still evolving and has been data in a standard format that is independent of stor-
described in a series of drafts published by age. For example, one application may use a native
the working group. XQuery is a functional XML storage format, whereas another may store data
language comprised of several kinds of in a relational database. Since XML is emerging as
expressions that can be nested and a standard for data exchange, it is natural that que-
composed with full generality. It is based on ries among applications should be expressed as que-
ries against data in XML format. This use gives rise
the type system of XML Schema and is
to a requirement for a query language designed ex-
designed to be compatible with other XML-
pressly for XML data sources. In October 1999, W3C
related standards. This paper explains the
convened the XML Query Working Group 6 for the
need for an XML query language, provides a purpose of designing such a query language, to be
tutorial overview of XQuery, and includes called XQuery.
several examples of its use.
XML data are different from relational data in sev-
eral important respects that influence the design of
a query language. Relational data tend to have a reg-
ular structure, which allows the descriptive meta-data
Increasingly, Extensible Markup Language (XML) 1 for these data to be stored in a separate catalog. XML
is considered the format of choice for the exchange data, in contrast, are often quite heterogeneous, and
of information among various applications on the distribute their meta-data throughout the document.
Internet. The popularity of XML is due in large part XML documents often contain many levels of nested
to its flexibility for representing many kinds of in- elements, whereas relational data are “flat.” XML
formation. The use of tags makes XML data self-de- documents have an intrinsic order, whereas relational
scribing, and the extensible nature of XML makes it data are unordered except where an ordering can
possible to define new kinds of documents for spe- be derived from data values. Relational data are usu-
cialized purposes. As the importance of XML has in- ally “dense” (nearly every column has a value), and
creased, a series of standards has grown up around 娀Copyright 2002 by International Business Machines Corpora-
it, many of which were defined by the World Wide tion. Copying in printed form for private use is permitted with-
Web Consortium (W3C). 2 For example, XML Sche- out payment of royalty provided that (1) each reproduction is done
ma 3 provides a notation for defining new types of without alteration and (2) the Journal reference and IBM copy-
elements and documents; XML Path Language right notice are included on the first page. The title and abstract,
but no other portions, of this paper may be copied or distributed
(XPath) 4 provides a notation for selecting elements royalty free without further permission by computer-based and
within an XML document; and Extensible Stylesheet other information-service systems. Permission to republish any
Language Transformations (XSLT) 5 provides a no- other portion of this paper must be obtained from the Editor.

IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 0018-8670/02/$5.00 © 2002 IBM CHAMBERLIN 597
relational systems often represent missing informa- man writing and understanding. This paper describes
tion by a special null value. XML data, in contrast, only the human-oriented version of XQuery.
are often “sparse” and can represent missing infor-
mation simply by the absence of an element. For The initial design of XQuery is focused only on in-
these and other reasons, existing relational query lan- formation retrieval and does not provide facilities
guages are not directly suitable for querying XML for updating existing XML documents. The XML
data. Query Working Group may consider the addition
of an update facility after completing the design of
The design of XQuery is still in progress. The XML the first version of XQuery.
Query Working Group has published working drafts
of several documents that describe the current state This paper describes the data model on which
of the design. Of these, perhaps the most important XQuery is based, and then presents an overview of
the XQuery language in the form of a series of ex-
amples. This paper is not intended to provide a rig-
orous or exhaustive definition of the language. The
The design of reader is referred to Reference 7 for an XQuery syn-
XQuery has been tax and a more complete language description.
subject to a number
of influences. Data model
Formally, the input and output of XQuery are de-
fined in terms of a data model, described in Refer-
is XQuery 1.0: An XML Query Language, 7 which con- ence 9. The query data model provides an abstract
tains a syntax and informal description of the lan- representation of one or more XML documents or
guage. The working group has also published a list document fragments. The data model is based on
of requirements, 8 a description of the data model the notion of a sequence. A sequence is an ordered
that underlies the language, 9 a formal semantic de- collection of zero or more items. An item may be a
scription, 10 a list of functions and operators, 11 and node or an atomic value. An atomic value is an in-
a collection of use cases that illustrate applications stance of one of the built-in data types defined by XML
of the language. 12 Each of these documents is up- Schema, such as strings, integers, decimals, and dates.
dated from time to time as the design of XQuery A node conforms to one of seven node kinds, which
evolves. This paper is based on the most recent include element, attribute, text, document, comment,
XQuery design at the time of its publication, but since processing instruction, and namespace nodes. A
this design is still changing, the documents refer- node may have other nodes as children, thus form-
enced in this paragraph should be consulted for the ing one or more node hierarchies. Some kinds of
latest developments. nodes, such as element and attribute nodes, have
names or typed values, or both. A typed value is a
The design of XQuery has been subject to a number sequence of zero or more atomic values. Nodes have
of influences. Perhaps the most important of these identity (that is, two nodes may be distinguishable
is compatibility with existing W3C standards, includ- even though their names and values are the same),
ing Schema, XSLT, XPath, and XML itself. XPath, in but atomic values do not have identity. Among all
particular, is so important and so closely related that the nodes in a hierarchy there is a total ordering
XQuery is defined as a superset of XPath. The over- called document order, in which each node appears
all design of XQuery is based on a language proposal before its children. Document order corresponds to
called Quilt. 13 Quilt, in turn, was influenced by the the order in which the nodes would appear if the
functional approach of Object Query Language node hierarchy were represented in XML format.
(OQL), 14 by the keyword-based syntax of Structured Document order between nodes in different hierar-
Query Language (SQL), 15 and by previous XML query chies is implementation-defined but must be consis-
language proposals including XQL, 16 XML-QL, 17 and tent; that is, all the nodes in one hierarchy must be
Lorel. 18 ordered either before or after all the nodes in an-
other hierarchy.
It is an objective of the XML Query Working Group
to define two syntaxes for XQuery: one that is ex- Sequences may be heterogeneous; that is, they may
pressed in XML, and one that is optimized for hu- contain mixtures of various types of nodes and atomic

598 CHAMBERLIN IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002


Figure 1 Data model representation of items.xml

D items.xml

E items

item E A status item E A status

(ALL ITEM ELEMENTS


E E E E E HAVE SIMILAR STRUCTURE)

itemno seller description reserve-price end-date

T T T T T

values. However, a sequence never appears as an against a particular schema, and represents it as a
item in another sequence. All operations that cre- hierarchy of nodes and atomic values, labeled with
ate sequences are defined to “flatten” their operands type information derived from the schema. If an in-
so that the result of the operation is a single-level put document does not have a schema, it is validated
sequence. There is no distinction between an item against a permissive default schema that assigns ge-
and a sequence of length one—in other words, a neric types—nodes are labeled anyType and atomic
node or atomic value is considered to be identical values are labeled anySimpleType. The process of
to a sequence of length one containing that node or schema validation is described in more detail in Ref-
atomic value. erence 3.

Sequences of length zero are valid and are some- The result of a query may be transformed from the
times used to represent missing or unknown infor- query data model into an XML representation by a
mation, in much the same way that null values are process called serialization. The details of serializa-
used in relational systems. tion are beyond the scope of this paper. It is worth
noting that the result of a query is not always a well-
In addition to sequences, the query data model de- formed XML document. For example, a query might
fines a special value called the error value, which is return an atomic value such as the number 47, or a
the result of evaluating an expression that contains sequence of elements with no common parent.
an error. An error value may not be combined in a
sequence with any other value. Example data
Input XML documents can be transformed into the To illustrate the query data model and provide a ba-
query data model by a process called schema vali- sis for later examples, we consider a small XML da-
dation, which parses the document, validates it tabase that contains data from an on-line auction,

IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 CHAMBERLIN 599


Figure 2 Data model representation of bids.xml

D bids.xml

E bids

bid E E bid

(ALL BID ELEMENTS


E E E E HAVE SIMILAR STRUCTURE)

itemno bidder bid-amount bid-date

T T T T

based loosely on Use Case R in Reference 12. The D, E, A, and T represent document, element, at-
database consists of two XML documents named tribute, and text nodes, respectively.
items.xml and bids.xml.
Expressions
The items.xml document contains a root element
named items, which in turn contains an item element We now describe expressions in XQuery.
for each item currently for sale at the auction. Each
item element has a status attribute and subelements Basics. Like XML and XPath, XQuery is a case-sen-
named itemno, seller, description, reserve-price, sitive language, and all its keywords are made up of
and end-date. The reserve-price element names a lowercase characters. Detailed rules for lexing and
minimum selling price set by the owner, and the parsing XQuery are described in Reference 7. Char-
end-date element indicates the ending date of the acters enclosed between “{--” and “--}” are con-
auction. sidered to be comments and are ignored during query
processing (except, of course, inside a quoted string,
The bids.xml document contains a root element where they are considered to be part of the string).
named bids, which in turn contains a bid element
for each bid that has been placed for an item. Each XQuery is a functional language, which means that
bid element has subelements named itemno, bidder, it is made up of expressions that return values and
bid-amount, and bid-date. do not have side effects. XQuery has several kinds
of expressions, most of which are composed from
Figures 1 and 2 show the data model representations lower-level expressions, combined by operators or
of the items.xml and bids.xml documents, respec- keywords. XQuery expressions are fully composable,
tively (including only a representative item and a rep- that is, where an expression is expected, any kind of
resentative bid). In the figures, the circles labeled expression may be used. As noted earlier, the value

600 CHAMBERLIN IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002


of an expression, in general, is a heterogeneous se- 1, 2, 3 is a sequence of three values
quence of nodes and atomic values. (1, 2, 3) is identical to 1, 2, 3
((1, 2), ⵹, 3) is identical to 1, 2, 3
The simplest kind of XQuery expression is a literal, 1 to 3 is identical to 1, 2, 3
which represents an atomic value. The following are
several examples of literals: A variable in XQuery is a name that begins with a
dollar sign. A variable may be bound to a value and
47 is a literal of type integer used in an expression to represent that value. One
4.7 is a literal of type decimal because it con- way to bind a variable is by means of a LET expres-
tains a decimal point sion, which binds one or more variables and then
4.7E3 is a literal of type double because it con- evaluates an inner expression. The value of the LET
tains an exponent expression is the result of evaluating the inner expres-
''47'' is a literal of type string (single quotes sion with the variables bound. The following exam-
ple illustrates a LET expression that returns the se-
are allowed inside double-quoted strings)
quence 1, 2, 3:
' 47 ' is a literal of type string (double quotes
are allowed inside single-quoted strings)
let $start :⫽ 1, $stop :⫽ 3
return $start to $stop
Atomic values of other types may be created by call-
ing constructors. A constructor is a function that cre-
A LET expression is a special case of a FLWR (for,
ates a value of a particular type from a string con-
let, where, return) expression, which provides addi-
taining a lexical representation of the desired type. tional ways to bind variables. FLWR expressions are
In general, a constructor has the same name as the described in more detail later.
type it constructs. The following example uses a con-
structor to create a value of type date: Another simple form of XQuery expression is a func-
tion call. XQuery provides a core function library,
date(''2002-5-31'')
described in Reference 11, and a mechanism
whereby users can define additional functions, de-
Any XQuery expression may be enclosed in paren- scribed in the next section. Function calls in XQuery
theses. Parentheses are useful for making explicit the employ the usual notation in which the arguments
order in which an expression should be evaluated. of the function are enclosed in parentheses. The fol-
The following examples of arithmetic expressions lowing example calls the core library function
show how parentheses can be used to control the pre- substring to extract the first six characters from a
cedence of operators. Arithmetic expressions are dis- string:
cussed in more detail in a later subsection.
substring(''Martha Washington'', 1, 6)
(2 ⫹ 4) ⴱ 5 has the value 30 because the sub-
expression (2 ⫹ 4) is evaluated Path expressions. Path expressions in XQuery are
first based on the syntax of XPath. 4 A path expression
2 ⫹ 4 ⴱ 5 has the value 22 because ⴱ has a consists of a series of steps, separated by the slash
higher precedence than ⫹ character (“/”). The result of each step is a sequence
of nodes. The value of the path expression is the node
The comma operator concatenates two values to sequence that results from the last step in the path.
form a sequence. Sequences are often enclosed in
parentheses as explicit delimiters, although this is not Each step is evaluated in the context of a particular
required. An empty pair of parentheses denotes an node, called the context node. In general, a step can
empty sequence. Since sequences cannot be nested, be any expression that returns a sequence of nodes.
the comma operator constructs a sequence consist- One important kind of step, called an axis step, can
ing of all the items in its left operand, followed by be thought of as beginning at the context node and
all the items in its right operand. A sequence can moving through the node hierarchy in a particular
also be constructed by the to operator, which returns direction, called an axis. As the axis step moves along
a sequence consisting of all the integers between its the designated axis, it selects nodes that satisfy a se-
left operand and its right operand, inclusive. The fol- lection criterion. The selection criterion can select
lowing examples illustrate construction of sequences: nodes based on their names, their positions with re-

IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 CHAMBERLIN 601


spect to the context node, or a predicate based on length of many path expressions. For example, Q1
the value of a node. XPath defines 13 axes, and some may be abbreviated as follows:
or all of them will be supported by XQuery as well.
Current plans are for XQuery to support the six axes document(''items.xml'')
named child, descendant, parent, attribute, self, /ⴱ/item[seller ⫽ ''Smith'']/description
and descendant-or-self.
When two steps are separated by a double slash
As a path expression is evaluated, the nodes selected rather than by a single slash, it means that the sec-
by each step serve in turn as context nodes for the ond step may traverse multiple levels of the hierar-
following step. If a step has several context nodes, chy, using the descendants axis rather than the single-
it is evaluated for each of the context nodes in turn, level child axis. For example, Q2 searches for
description elements that are descendants of the
and the resulting node sequences are combined by
the union operator to form the result of the step. The root node of a given document. The result of Q2 is
result of a step is always a sequence of distinct nodes a sequence of element nodes that could, in princi-
(without duplicates based on node identity), in doc- ple, have been found at various levels of the node
ument order. hierarchy (though, in our sample document, all
description nodes are found at the same level).

Path expressions may be written in either unabbre- (Q2) List all description elements found in the doc-
viated syntax or abbreviated syntax. The unabbre- ument items.xml.
viated syntax for an axis step consists of an axis and
a selection criterion, separated by two colons. Q1 il- document(''items.xml'')//description
lustrates a four-step path expression using unabbre-
viated syntax. The first step invokes the built-in Within a path expression, a single dot (“ . ”) refers
document function, which returns the document node to the context node, and two consecutive dots (“ .. ”)
for the document named items.xml. The second step refer to the parent of the context node. These no-
is an axis step that finds all children of the document tations are abbreviated invocations of the self and
node (“ⴱ” selects all nodes on the given axis, which parent axes, respectively. Names found in path ex-
in this case is only a single element node named pressions are usually interpreted as names of ele-
items). The third step follows the child axis again ment nodes; however, a name prefixed by the “@”
to find all the child elements at the next level that character is interpreted as the name of an attribute
are named item and that in turn have a child named node. This is an abbreviation for a step that traverses
seller with the value “Smith.” The result of the third the attribute axis. These abbreviations are illus-
step is a sequence of item element nodes. Each of trated by Q3, which begins at the node that is bound
these item nodes is used in turn as the context node to the variable $description, traverses the parent
for the fourth step, which follows the child axis again axis to the parent item node, and then traverses the
to find the description elements that are children attribute axis to find an attribute named status.
of the given item. The final result of the path expres- The result of Q3 is a single attribute node.
sion is the result of the fourth step: a sequence of
description element nodes, in document order. (Q3) Find the status attribute of the item that is the
parent of a given description.
(Q1) List the descriptions of all items offered for sale
by Smith. $description/../@status

document(''items.xml'')/child::ⴱ Predicates. In XQuery, a predicate is an expression,


/child::item[child::seller ⫽ ''Smith''] enclosed in square brackets, that is used to filter a
/child::description sequence of values. Predicates are often used in the
steps of a path expression. For example, in the step
In practice, path expressions are usually written us- item[seller ⫽ ''Smith''], the phrase seller ⫽
ing abbreviated syntax. Several kinds of abbreviations ''Smith'' is a predicate that is used to select certain
are provided. Perhaps the most important of these item nodes and discard others. We will refer to the
is that the axis specifier may be omitted when the items in the sequence being filtered by a predicate
child axis is used. Since child is the most commonly as candidate items. The predicate is evaluated for
used axis, this abbreviation is helpful in reducing the each candidate item, using the candidate item as the

602 CHAMBERLIN IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002


context item for evaluating the predicate expression. General comparison operators: ⫽, !⫽, ⬎, ⬎⫽, ⬍, ⬍⫽.
The term “context item” is a generalization of the These operators can deal with operands that are se-
term “context node” and may indicate either a node quences, providing implicit “existential” semantics
or an atomic value. Within a predicate expression, for both operands. Like the value comparison op-
a single dot (“ . ”) indicates the context item. Each erators, the general comparison operators automat-
candidate item is selected or discarded according to ically extract values from nodes. For example,
the following rules. item[reserve-price ⬎ 1000] selects an item node
if it has at least one reserve-price child node whose
value is greater than 1000.
If the predicate expression evaluates to a Boolean
value, the candidate item is selected if the value of
the predicate expression is true. This type of pred-
icate is illustrated by the following example, which
A predicate is an expression,
selects item nodes that have a reserve-price child
node whose value is greater than 1000: enclosed in square brackets,
that is used to filter
item[reserve-price ⬎ 1000] a sequence of values.

If the predicate expression evaluates to a number,


the candidate item is selected if its ordinal position Node comparison operators: is and isnot. These op-
in the list of candidate items is equal to the number. erators compare the identities of two nodes. For ex-
This type of predicate is illustrated by the following ample, $node1 is $node2 is true if the variables $node1
example, which selects the fifth item node on the and $node2 are bound to the same node (that is, the
child axis: node identity is the same for both variables).

item[5] Order comparison operators: These operators com-


pare the positions of two nodes. For example,
If the predicate expression evaluates to an empty se- $node1 ⬍⬍ $node2 is true if the node bound to $node1
quence, the candidate item is discarded, but if the occurs earlier in document order than the node
predicate expression evaluates to a sequence con- bound to $node2.
taining at least one node, the candidate item is se-
lected. This form of predicate can be used to test for Logical operators: and and or operators can be used
the existence of a child node that satisfies some con- to combine logical conditions inside a predicate. For
dition. This is illustrated by the following example, example, the following predicate selects item nodes
which selects item nodes that have a reserve-price that have exactly one seller child element with the
child node, regardless of its value: value “Smith,” and also have at least one
reserve-price child element with any value:
item[reserve-price] item[seller eq ''Smith'' and reserve-price].

Several different kinds of operators and functions Negation: not is a function rather than an operator.
are often used inside predicates. In the following six It serves to invert a Boolean value, turning true into
false and false into true. The following step uses
paragraphs, some of the commonest and most use-
ful of these operators and functions are described. the not function with an existence test to find item
nodes that have no reserve-price child element:
item[not(reserve-price)].
Value comparison operators: eq, ne, lt, le, gt, ge.
These operators can compare two scalar values, but In all of the above examples, element and attribute
they raise an error if either operand is a sequence names have been simple identifiers. However, the
of length greater than one. If either operand is a XML Namespace recommendation 19 allows elements
node, the value comparison operator extracts its and attributes to have two-part names in which the
value before performing the comparison. For exam- first part is a namespace prefix, followed by a colon.
ple, item[reserve-price gt 1000] selects an item A name qualified by a namespace prefix is called a
node if it has exactly one reserve-price child node QName. Each namespace prefix must be bound to
whose value is greater than 1000. a URI (uniform resource identifier) that uniquely

IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 CHAMBERLIN 603


identifies a namespace. This convention allows each whose value is computed by some expression. In this
application to define names in its own namespace case, the expression is enclosed in curly braces to in-
without danger of colliding with names defined by dicate that it is to be evaluated rather than treated
other applications, and it allows a query to unam- as literal text. The expression is evaluated and re-
biguously refer to names defined by various appli- placed by its value in the element constructor. In the
cations. If the prefix auction were bound to the following example, the values of the elements and
namespace URI of our on-line auction application, attributes are computed by expressions. The varia-
the step item[reserve-price ⬎ 1000] might be writ- bles $s, $i, and $bids used in these expressions must
ten using QNames as follows: be bound by some enclosing expression.

auction:item[auction:reserve-price ⬎ 1000] ⬍highbid status ⫽ ''{$s}''⬎


⬍itemno⬎ {$i} ⬍/itemno⬎
The process of binding a prefix to a namespace URI ⬍bid-amount⬎
is described in the next-to-last section. In most of {max($bids[itemno ⫽ $i]/bid-amount)}
our examples, we use one-part names rather than ⬍/bid-amount⬎
QNames. This use is realistic because XQuery pro- ⬍/highbid⬎
vides a way to specify a default namespace for a
query. Such use makes it unnecessary for a query to The content of an element constructor may be any
use QNames unless it needs to refer to names from expression. In general, the expression used in an el-
multiple namespaces. ement constructor may generate a sequence of items,
including atomic values, elements, and attributes. At-
This paper provides only a brief introduction to the tributes that are generated inside an element con-
path expressions available in XPath and XQuery. structor become attached to the constructed element.
Like XQuery, XPath is an evolving language. A Elements and atomic values that are generated in-
working draft of a new version of XPath, called side an element constructor become the content of
XPath 2.0, 20 has recently been published jointly by the constructed element. In the following example,
the XML Query and XSLT Working Groups. It is ex- an element constructor contains an expression, en-
pected that XPath 2.0 and XQuery will share not only closed in curly braces, that generates one attribute
a common syntax for path expressions and predicates and two subelements. The variable $b must be bound
but several other kinds of expressions as well. by some enclosing expression.

Element constructors. Path expressions are power- ⬍highbid⬎


ful, but they have an important limitation: they can {
only select existing nodes. A full query language $b/@status,
needs a facility to construct new elements and at- $b/itemno,
tributes and to specify their contents and relation- $b/bid-amount
ships. This facility is provided in XQuery by a kind }
of expression called an element constructor. ⬍/highbid⬎

The simplest kind of element constructor looks ex- The element node produced by an element construc-
actly like the XML syntax for the element to be cre- tor is a new node with its own node identity. If the
ated. For example, the following expression con- newly constructed element has child nodes and at-
structs an element named highbid containing one tributes that are derived from existing nodes, as in
attribute named status and two child elements the above example, the new child nodes and at-
named itemno and bid-amount: tributes are copies of the nodes from which they were
derived, with new node identities.
⬍highbid status ⫽ ''pending''⬎
⬍itemno⬎4871⬍/itemno⬎ In the above examples of element constructors, even
⬍bid-amount⬎250.00⬍/bid-amount⬎ though the content of the element may be computed,
⬍/highbid⬎ the name of the constructed element is a known con-
stant. However, it is sometimes necessary to construct
In the example above, the values of the elements and an element whose name as well as its content is com-
attributes are constants. However, in many cases it puted. For this purpose, XQuery provides a special
is necessary to create an element or an attribute kind of constructor called a computed element con-

604 CHAMBERLIN IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002


structor. A computed element constructor consists The result of this simple iterative expression is the
of the keyword element, followed by two expressions sequence (3, 4).
in curly braces—the first expression computes the
name of the element, and the second expression com- A for clause may specify more than one variable, with
putes the content of the element. an iteration sequence for each variable. Such a for
clause produces tuples of variable bindings that form
For an example of the use of a computed construc- the Cartesian product of the iteration sequences. Un-
tor, suppose that the variable $e is bound to an el- less otherwise specified, the binding tuples are gen-
ement with a numeric value. We need to construct erated in an order that preserves the order of the
a new element that has the same name as $e and the
same attributes as $e, but we want its value to be
twice the value of $e. This construction can be ac-
complished by the following expression, which uses Iteration is
the data function to extract the numeric value of the an important
original node: part of a query
language.
element
{name($e)}
{$e/@ⴱ, data($e)ⴱ2}
iteration sequences, using the leftmost variable as
Similar to a computed element constructor, XQuery the “outer loop” and the rightmost variable as the
provides a computed attribute constructor, which con- “inner loop.” The following example illustrates a for
sists of the keyword attribute, followed by two ex- clause that contains two variables and two iteration
pressions in curly braces—the first expression com- sequences:
putes the name of the attribute and the second
expression computes its value. An attribute construc- for $m in (2, 3), $n in (5, 10)
return ⬍fact⬎{$m} times {$n} is
tor can be used anywhere an attribute is valid, for
{$m ⴱ $n}⬍/fact⬎
example, inside an element constructor. The follow-
ing attribute constructor, based on the bound vari-
able $p, might generate an attribute that looks like The result of this expression is the following sequence
father⫽''Frank'' or mother⫽''Mary'' . This example
of four elements:
uses a conditional (if-then-else) expression, described
⬍fact⬎2 times 5 is 10⬍/fact⬎
later.
⬍fact⬎2 times 10 is 20⬍/fact⬎
⬍fact⬎3 times 5 is 15⬍/fact⬎
attribute
⬍fact⬎3 times 10 is 30⬍/fact⬎
{if $p/sex⫽''M'' then ''father'' else ''mother''}
{$p/name}
The for clauses illustrated above and the let clause
illustrated earlier are both special cases of a more
Iteration and sorting. Iteration is an important part general expression called a FLWR (pronounced “flow-
of a query language. XQuery provides a way to it- er”) expression. In its most general form, a FLWR
erate over a sequence of values, binding a variable expression may have multiple for clauses, multiple
to each of the values in turn and evaluating an expres- let clauses, an optional where clause, and a return
sion for each binding of the variable. clause.

The simplest form of iteration in XQuery consists As we have already seen, the function of the for
of a for clause that names a variable and provides clause and let clause is to bind variables. Each of
a sequence of values over which the variable is to these clauses contains one or more variables and an
iterate, followed by a return clause that contains the expression associated with each variable. The expres-
expression to be evaluated for each variable bind- sions evaluate to sequences and may contain refer-
ing. The following example illustrates this simple ences to variables bound in previous clauses. The dif-
form of iteration: ference between a for clause and a let clause is that
a for clause iterates each variable over the associ-
for $n in (2, 3) return $n ⫹ 1 ated sequence, binding the variable in turn to each

IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 CHAMBERLIN 605


item in the sequence, whereas a let clause binds each ⬍popular-item⬎
variable to the associated sequence as a whole. This {
difference is illustrated by the following pair of claus- $i/itemno,
es: $i/description,
⬍bid-count⬎ {count ($b)} ⬍/bid-count⬎
for $i in (1 to 3) }
let $j :⫽ (1 to $i) ⬍/popular-item⬎

This pair of clauses is not a full FLWR expression be- The for clause and let clause produce a binding pair
cause it does not have a return clause. The for clause for each item in items.xml. In each binding pair, $i
and let clause simply produce a sequence of binding is bound to the item and $b is bound to a sequence
tuples. The clauses in the above example produce containing all the bids for that item. The where clause
the following sequence of three binding pairs: retains only those binding tuples in which $b con-
tains more than ten bids. The return clause then gen-
$i ⫽ 1, $j ⫽ 1 erates an output element for each of these bindings,
$i ⫽ 2, $j ⫽ (1, 2) containing the item number, description, and bid
$i ⫽ 3, $j ⫽ (1, 2, 3)
count.

In general, the number of binding tuples produced By default, the order of the output sequence of a
by a series of for clauses and let clauses is equal to FLWR expression preserves the order of the itera-
tion sequences. The prefix operator unordered can
the product of the cardinalities of the iteration ex-
be used before any expression to indicate that the
pressions in the for clauses. A let clause without any
order of the result is not significant. This gives the
for clause, of course, produces only a single binding
implementation greater flexibility to optimize the ex-
tuple.
ecution of the expression (for example, by iterating
in a different order).
The binding tuples produced by the for clauses and
let clauses in a FLWR expression are filtered by the Any sequence can be reordered by a sortby clause
optional where clause. The where clause contains an that contains one or more ordering expressions. For
expression that is evaluated for each binding tuple. each item in the original sequence, the ordering ex-
If the value of the where expression is the Boolean pressions are evaluated using the given item as the
value true or a sequence containing at least one node context item. The items in the original expression
(an “existence test”), the binding tuple is retained; are then reordered into ascending or descending or-
otherwise the binding tuple is discarded. der based on the values of their ordering expressions.
Of course, each ordering expression must return a
The return clause of the FLWR expression is then ex- single result, and these results must be comparable
ecuted once for each binding tuple retained by the by the gt operator. For the purpose of a sortby clause,
where clause, in order. The results of these execu- an empty sequence can be treated either as greater
tions are concatenated into a sequence that serves than any other value or as less than any other value,
as the result of the FLWR expression. under user control.

A sortby clause is often useful in reordering the re-


The power of FLWR is illustrated by Q4, a query over sults of a FLWR expression. For example, if it is de-
our auction database. sired for the popular-item elements generated by
Q4 to be sorted into descending order by bid-count,
(Q4) For each item that has more than ten bids, gen- the following clause could be added at the end of
erate a popular-item element containing the item num- Q4:
ber, description, and bid count.
sortby bid-count descending
for $i in document(''items.xml'')/ⴱ/item
let $b :⫽ document(''bids.xml'') It is important to realize that sortby is not a part of
/ⴱ/bid[itemno ⫽ $i/itemno] a FLWR expression but a separate kind of XQuery
where count ($b) ⬎ 10 expression that can be used to reorder any sequence,
return whether generated by a FLWR expression or not.

606 CHAMBERLIN IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002


However, when a FLWR expression is followed by and a pay element whose value is the employee’s to-
sortby, a smart optimizer will realize that the reor- tal pay. For those employees whose commission or
dering of the output items relaxes the usual con- bonus is missing ($e/commission or $e/bonus eval-
straints on the ordering of the binding tuples. uates to an empty sequence), the generated pay el-
ement will be empty.
Q4 illustrates how a FLWR expression can have some
of the same characteristics as a join query in a re- (Q5) Given a sequence of emp elements, replace their
lational database system and also some of the same salary, commission, and bonus subelements with a new
characteristics as a grouping query. Q4 is like a join pay element containing the sum of the values of the
query because it correlates elements found in two original elements, and order the resulting sequence in
different XML files, named items.xml and bids.xml. ascending order by the value of the pay element.
It is also like a grouping query because it groups bids
together by item number and computes the number for $e in $emps
of bids in each group (in SQL, this might be expressed return
as GROUP BY itemno). ⬍emp⬎
{
Arithmetic. We have already seen several examples $e/name,
of the use of arithmetic operators. XQuery provides ⬍pay⬎ {$e/salary ⫹ $e/commission
the usual arithmetic operators: ⫹, ⫺, ⴱ, div, and mod, ⫹ $e/bonus} ⬍/pay⬎
as well as the aggregating functions sum, avg, count, }
max, and min, which operate on a sequence of num- ⬍/emp⬎
bers and return a numeric result. The division op- sortby (pay)
erator in XQuery is called div to distinguish it from
the slash that is used in path expressions. When the In some cases, it may be desirable to provide a de-
subtraction operator follows a name, it must have fault value that can be substituted for missing op-
a preceding blank to distinguish it from a hyphen, erands in an arithmetic expression. The next section
since a hyphen is a valid name character in XML. of this paper illustrates how a user-defined function
can be written for this purpose.
Arithmetic operators are defined on numeric values
(or, in the case of the aggregating functions, sequences
Operations on sequences. In a sense, all XQuery
of numeric values). Numeric values include values
operations are operations on sequences, since every
of type integer, decimal, float, double, or types de-
rived from these types. When the operands of an value in XQuery is either a sequence or an error.
arithmetic operator are mixed, they are promoted However, XQuery provides three operators that are
to the nearest common type using the promotion hi- specifically designed for combining sequences of
erarchy integer 3 decimal 3 float 3 double. If nodes: union, intersect, and except. A union of two
an operand of an arithmetic operator is a node, its node sequences is a sequence containing all the
typed value is automatically extracted. nodes that occur in either of the operands. The
intersect operator produces a sequence contain-
The behavior of arithmetic operators on empty se- ing all the nodes that occur in both of its operands.
quences is an important special case. In XQuery, an The except operator produces a sequence contain-
empty sequence is sometimes used to represent miss- ing all the nodes that occur in its first operand but
ing or unknown information, in much the same way not in its second operand.
that a null value is used in relational systems. For
this reason, the ⫹, ⫺, ⴱ, div, and mod operators are The union, intersect, and except operators return
defined to return an empty sequence if either of their node sequences in document order and eliminate du-
operands is an empty sequence. To illustrate the ap- plicates from their result sequences, based on node
plication of this rule, suppose that the variable $emps identity. Query Q6 provides an example of the use
is bound to a sequence of emp elements, each of which of the intersect operator.
represents an employee and contains a name element,
a salary element, an optional commission element, (Q6) Construct a new element named recent-large-bids,
and an optional bonus element. The expression in containing copies of all the bid elements in the doc-
Q5 transforms this sequence into a new sequence of ument bids.xml that have a bid-amount of more than
emp elements, each of which contains a name element 1000 and a bid-date after January 1, 2002.

IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 CHAMBERLIN 607


⬍recent-large-bids⬎ erators are sometimes used in a step of a path expres-
document(''bids.xml'') sion. For example, the following path expression
/ⴱ/bid[bid-amount ⬎ 1000.00] finds the union of all b children and c children of
intersect nodes in the sequence bound to $a; the nodes in this
document(''bids.xml'') union then serve as context nodes for the next step
/ⴱ/bid[bid-date ⬎ date(''2002-01-01'')] in the path.
⬍/recent-large-bids⬎
$a/(b 兩 c)/d
Expressions that apply the union, intersect, and
except operators can often be expressed in another Conditional expressions. A conditional expression
way. For example, the following query is equivalent provides a way of executing one of two expressions,
to Q6: depending on the value of a third expression. It is
written in the familiar if . . . then . . . else format pro-
⬍recent-large-bids⬎ vided by many languages. In XQuery, all three
document(''bids.xml'')/ⴱ/bid clauses (if, then, and else) are required, and the
[bid-amount ⬎ 1000.00 and bid-date expression in the if clause must be enclosed in pa-
⬎ date(''2002-01-01'')] rentheses.
⬍/recent-large-bids⬎
The result of a conditional expression depends on
It is important to remember that intersect and the value of the expression in the if clause, called
except are not useful in combining sequences of the test expression. The rules are as follows:
nodes from different documents, since there is no
possibility that two nodes in different documents If the value of the test expression is the Boolean value
could have the same node identity. For example, con- true, or a sequence containing at least one node
sider the following query: (serving as an “existence test”), the then clause is
executed.
document(''items.xml'')//itemno
except
document(''bids.xml'')//itemno
If the value of the test expression is the Boolean value
false or an empty sequence, the else clause is ex-
This query applies the except operator to two se- ecuted.
quences of itemno nodes. Since the node sequences
are selected from different documents, there is no Otherwise, the conditional expression returns the er-
possibility that any node in the second sequence ror value.
could be identical to a node in the first sequence.
Therefore, this query returns all the itemno nodes The following simple conditional expression might
in items.xml. If the intent of the query had been to be used to return the price of a part, depending on
make a list of itemno elements for items that have the existence of an attribute named discounted (in-
no bids, this could have been accomplished as fol- dependently of the value of the attribute):
lows, using the library function empty, which returns
true if its operand is an empty sequence: if ($part/@discounted) then $part/wholesale
else $part/retail
for $i in document(''items.xml'')//item
where empty(document(''bids.xml'') Q7 in Figure 3 is an example of a more complex query
//bid[itemno eq $i/itemno]) that contains a conditional expression. The query also
return $i/itemno illustrates several levels of nesting of FLWR expres-
sions and element constructors.
In the above example, the predicate itemno eq $i/
itemno compares two itemno nodes by extracting and Quantified expressions. Quantified expressions al-
comparing their content rather than by their iden- low testing of some condition to see whether it is true
tity. for some value in a sequence (called an existential
quantifier), or for every value in a sequence (called
The 兩 operator, retained for compatibility with XPath a universal quantifier). The result of a quantified
1.0, is equivalent to the union operator. These op- expression is always true or false.

608 CHAMBERLIN IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002


Figure 3 Example of more complex query

(Q7) Generate a report containing the status of the bids for various items. Label each bid with
a status “OK,” “too small,” or “too late.” Enclose the report in an element called bid-status-report.

<bid-status-report>
for $i in document ("items.xml")/*/item
return
<item>
{
$i/itemno,
for $b in document ("bids.xml")/*/bid[itemno = $i/itemno]
return
<bid>
{
$b/bidder,
$b/bid-amount,
<status>
{
if ($b/bid-date > $i/end-date) then "too late"
else if ($b/bid-amount < $i/reserve-price)
then "too small"
else "OK"
}
</status>
}
</bid>
}
</item>
</bid-status-report>

Like a FLWR expression, a quantified expression al- (Q8) Find the items in items.xml for which all the bids
lows a variable to iterate over the items in a sequence, received were more than twice the reserve price. Re-
being bound in turn to each item in the sequence. turn copies of all these item elements, enclosed in a
For each variable binding, a test expression is eval- new element called underpriced-items.
uated. A quantified expression that begins with some
returns the value true if the test expression is true for ⬍underpriced-items⬎
some variable binding, as in the following example: for $i in document(''items.xml'')
where every $b in document(''bids.xml'')
some $n in (5, 7, 9, 11) satisfies $n ⬎ 10 /ⴱ/bid[itemno ⫽ $i/itemno]
satisfies $b/bid-amount
A quantified expression that begins with every, in ⬎ 2 ⴱ $i/reserve-price
contrast, returns the value true if the test expres- return $i
sion is true for every variable binding. For example, ⬍/underpriced-items⬎
the following quantified expression returns the value
false because the test expression is true for some
but not all bindings: Functions
every $n in (5, 7, 9, 11) satisfies $n ⬎ 10 We have already seen several examples of functions,
including the document function and aggregating
The use of a quantified expression in a query is il- functions such as avg. XQuery provides a library of
lustrated by Q8. predefined functions, listed in Reference 11, and also

IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 CHAMBERLIN 609


allows users to define functions of their own. A func- promotion hierarchy integer 3 decimal 3 float
tion may take zero or more parameters. A function 3 double. An argument is also considered to be a
definition must specify the name of the function and match if the type of the argument is derived from
the names of its parameters. It may optionally spec- (i.e., a subtype of) the declared parameter type. If
ify types for the parameters and the result of the func- a function that expects an atomic value is called with
tion. It must also provide the body of the function, an argument that is an element, the typed value of
which is an expression enclosed in curly braces. When the element is extracted and checked for compat-
the function is called, the arguments of the function ibility with the expected parameter type before it is
call are bound to the parameters of the function and passed to the function. The value produced by the
the body is executed, producing the result of the func- body of a function must also match the return type
tion call. If no type is specified for a function param- declared in the function definition, using the same
eter, that parameter accepts values of any type. If rules that are used for parameter matching.
no type is specified for the result of the function, the
function may return a value of any type. The following example illustrates how a user might
write a function to provide a default value for miss-
The following example defines a function named ing data. The function named defaulted takes two
highbid that takes an element node as its param- parameters: a (possibly missing) element node, and
eter and returns a decimal value. The function in- a default value. If the element is present and has a
terprets its parameter as an item element and ex- nonempty value, the function returns that value; but
tracts its item number; it then finds and returns the if the element is absent or empty, the function re-
largest bid-amount that has ever been recorded for turns the default value.
that item number. The example also illustrates a
function call that invokes the highbid function on define function defaulted
the item with item number 1234. (element? $e, anySimpleType $d)
returns anySimpleType
define function highbid(element $item) {
returns decimal if (empty($e)) then $d
{ else if (empty($e/ⴱ)) then $d
max(document(''bids.xml'') else data($e)
//bid[itemno ⫽ $item/itemno]/bid-amount) }
}
Using this function, query Q5 could be rewritten as
highbid(document(''items.xml'') follows. In this formulation, missing or empty com-
//item[itemno ⫽ ''1234'']) mission or bonus elements are treated as though they
have the value zero:
The types used as the argument-types and result-type
of a function definition may be simple types such as for $e in $emps
decimal, or more complex types such as elements return
and attributes. The rules for declaring types in func- ⬍emp⬎
tion definitions are described in more detail in the {
next section. $e/name,
⬍pay⬎ { $e/salary
XQuery does not support overloading of user-de- ⫹ defaulted ($e/commission, 0)
fined functions; that is, it does not permit two user- ⫹ defaulted ($e/bonus, 0) }
defined functions to have the same qualified name. ⬍/pay⬎
Nevertheless, some of the XQuery built-in functions }
are overloaded. For example, the string function ⬍/emp⬎
can convert an argument of almost any type into a sortby(pay)
string.
A function that invokes itself in its own body is called
The arguments of a function call must match the de- a recursive function, and two functions whose bodies
clared types of the function parameters. For this pur- invoke each other are called mutually recursive func-
pose, a function argument of a numeric type may be tions. Recursion is a powerful feature in function def-
promoted to the declared parameter type, using the initions, particularly in functions that are defined

610 CHAMBERLIN IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002


over a hierarchical data model such as XML. As an node denotes any node, and item denotes any item
illustration of a recursive function, the depth func- (node or atomic value).
tion in the following example can be invoked on an
element and returns the depth of the element hier- XQuery also provides additional syntax that makes
archy beginning with its argument. If the argument it possible to refer to other kinds of nodes and to
element has no descendants, the depth of the hier- element types that are defined in a local part of a
archy is one. Otherwise, the depth of the hierarchy schema. For example, element city in customer/
is one more than the maximum depth of any hier- address refers to the element named city, as de-
archy rooted in a child of the argument element; this fined in the schema context customer/address.
value is computed by a recursive call to the depth
function. The example also illustrates a function call A reference to a type may optionally be followed by
that invokes the depth function to find the depth of one of three occurrence indicators: “ⴱ” means “zero
the document named bids.xml. or more”; “⫹” means “one or more,” and “?” means
“zero or one.” The absence of an occurrence indi-
define function depth(element $e) cator denotes exactly one occurrence of the indicated
returns integer type. The use of occurrence indicators is illustrated
{ by the following examples:
if (empty($e/ⴱ)) then 1
else 1 ⫹ max element memo? denotes an optional occurrence of an
(for $c in $e/ⴱ return depth($c)) element with the name memo
} element of type order⫹ denotes one or more
elements with the type order
depth(document(''bids.xml'')) elementⴱ denotes zero or more unrestricted elements
attribute? denotes an optional attribute of any
name or type
Types
Type references occur not only in function defini-
In writing a query, it is sometimes necessary to refer
tions but also in several other places in XQuery. One
to a particular type. For example, function defini-
of these places is the second operand of instance of,
tions need to describe the types of the function pa-
a binary operator that returns true if its first oper-
rameters and result, as noted in the previous sec-
and is an instance of the type named in its second
tion. Other types of XQuery expressions, described
operand. The following examples illustrate usage of
later in this section, also need to refer to specific
the instance of operator, presuming that the prefix
types.
xs is bound to the schema namespace, http://
www.w3.org/2001/XMLSchema:
One way to refer to a type is by its qualified name,
or QName. A QName may refer to a built-in type 49 instance of xs:integer returns true
such as xs:integer or to a type that is defined in some ''Hello'' instance of xs:integer returns false
schema, such as abc:address. If the QName has a ⬍partno⬎369⬍/partno⬎ instance of elementⴱ returns
namespace prefix (the part to the left of the colon), true
that prefix must be bound to a specific namespace $a instance of element shipto returns true if $a
URI. This binding is accomplished by a namespace is bound to an element whose name is shipto
declaration in the query prolog, described in the next
section. Occasionally it may be necessary for a query to pro-
cess an expression in a way that depends on the dy-
Another way to refer to a type is by a generic key- namic (run-time) type of the expression. For exam-
word such as element or attribute. This keyword ple, a query might be preparing mailing labels and
may optionally be followed by a QName that fur- might need to extract geographical information from
ther restricts the name or type of the node. For ex- various types of addresses. For such applications,
ample, element denotes any element; element shipto XQuery provides an expression called typeswitch,
denotes an element whose name is shipto; and which is loosely modeled on the switch statement
element of type abc:address denotes an element of the C or Java** languages. The first part of a
whose type is address as declared in the namespace typeswitch consists of the expression whose type is
abc. The keyword attribute denotes any attribute, being tested, called the operand expression, and op-

IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 CHAMBERLIN 611


tionally a variable that is bound to the value of the but it will return an error if $mystring has the value
operand expression. This is followed by one or more ''Hello'' . The cast expression cannot be used to cast
case clauses, each of which contains a type and an a value into a user-defined type; however, a user can
expression. The operand expression is tested against write a function for this purpose.
the type named in each of the case clauses in turn.
The first case clause for which the operand expres- A treat expression is used to ensure that the dynamic
sion is an instance of the named type is called the (run-time) type of an expression conforms to an ex-
effective case; the expression in this case clause is eval- pected type. For example, suppose that the static
uated and serves as the result of the typeswitch. If (compile-time) type of the expression $customer/
the operand expression does not conform to any of shipping-address is Address. A certain subexpres-
the types named by the case clauses, the result of sion may be meaningful only for values that conform
the typeswitch is taken from a final default clause. to a subtype of Address, such as USAddress. The writer
of the subexpression may use a treat expression to
The use of a typeswitch expression is illustrated by declare the expected type of the subexpression, as
the following example. This expression might occur in the following example:
inside a loop in which the variable $customer iter-
ates over a set of customer elements, each of which treat as USAddress($customer/billing-address)
has a subelement named billing-address. The
billing-address subelements are of several differ- Unlike a cast expression, a treat expression does
ent types, each of which needs to be handled in its not actually change the type of its operand. Instead,
own way. In the example, $a is bound to a its effect is twofold: (1) it assigns a specific static type
billing-address and then one of several expressions to its operand, which can be used for type-checking
is evaluated, depending on the dynamic type of $a. when the query is compiled; and (2) at execution
Within each case clause, $a has a specific type, for time, if the actual value of the expression does not
example, in the first case clause, the type of $a is conform to the named type, it returns an error value.
known to be element of type USAddress. If a
billing-address element is encountered that does To see how a query processor might make use of the
not conform to one of the expected types, the result information provided by a treat expression, consider
of this example expression is “unknown.” the following example:

typeswitch($customer/billing-address) as $a $customer/billing-address/zipcode
case element of type USAddress
return $a/state A type-checking XQuery compiler might consider
case element of type CanadaAddress the above example to be a type error, since the static
return $a/province type of $customer/billing-address is Address, and
case element of type JapanAddress the Address type does not in general have a zipcode
return $a/prefecture subelement. However, in the following reformula-
default return ''unknown'' tion of the example, the static type of the expression
is changed to USAddress, which has a zipcode sub-
Type names are also used in three similar-looking element, and the type error is removed:
XQuery expressions called cast, treat, and assert.
Each of these expressions consists of a keyword, a (treat as USAddress
reference to a type, and an expression enclosed in ($customer/billing-address))/zipcode
parentheses.
Like treat, assert is used to provide a query pro-
A cast expression is used to convert the result of an cessor with information that may be useful for type-
expression into one of the built-in types of XML checking. An assert expression serves as an asser-
Schema. A predefined set of casts is supported. For tion to the query processor that its operand
example, the result of the expression $x div 5 could expression has a particular static type. If the proces-
be cast to the xs:double type by the expression sor is checking a query for static type-safety, it may
cast as xs:double($x div 5). A cast may return an raise an error if it cannot verify that the given expres-
error value if it is unsuccessful. For example, sion conforms to the asserted type. An assert expres-
cast as xs:integer($mystring) will be successful sion is more strict than a treat expression, because
if $mystring is a string representation of an integer, it pertains to the static type of the expression, and

612 CHAMBERLIN IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002


therefore it is independent of input data and can be validate { ⬍shipto⬎
checked before execution of the query. A treat ⬍street⬎123 Elm St.⬍/street⬎
expression, in contrast, pertains to the dynamic type ⬍city⬎Elko, NV⬍/city⬎
of the expression, and therefore depends on input ⬍zipcode⬎85039⬍/zipcode⬎
data and can only be checked during query process- ⬍/shipto⬎ }
ing.
Type annotations are used by expressions such as
The following example, unlike the similar treat instance of and typeswitch that test the type of an
expression discussed above, will generate a compile- element, and by expressions such as function calls
time type error if the static type of $customer/ that require an element of a particular type. For ex-
billing-address is Address: ample, validation of the shipto element above might
assign it a type annotation of USAddress, which might
(assert as USAddress enable it to be used as an argument to a function
($customer/billing-address))/zipcode whose parameter type is element of type USAddress.

XQuery does not require an implementation to pro-


vide static type-checking. For a query processor that Structure of a query
does not provide static type-checking, an assert In XQuery, a query consists of two parts called the
expression is equivalent to a treat expression. query prolog and the query body. The query prolog
contains a series of declarations that define the envi-
Validation ronment for processing the query body. The query
body is simply an expression whose value defines the
The process of schema validation is defined in Ref- result of the query.
erence 3. Schema validation may be applied to an
XML document or to a part of a document such as
an individual element. The material being validated The query prolog is needed only if the query body
is compared with the definitions in a given schema, depends on one or more namespaces, schemas, or
which describes a particular kind of document. The functions. If such a dependency exists, the object(s)
validation process may label an element as valid or that the query body depends on must be declared
invalid; it may also assign a specific type to an el- in the query prolog. We will discuss declarations for
ement and provide additional information such as namespaces, schemas, and functions separately.
default values for certain attributes. For example,
validation of an element named shipto might assign A namespace declaration defines a namespace pre-
it the specific type USAddress and might provide a fix and associates it with a namespace URI. The pre-
default value for its carrier attribute. fix can be any identifier. For example, the following
namespace declaration defines the prefix xyz and as-
Schema validation is applied to input documents as sociates it with a specific URI:
part of the process of representing them in the query
data model. In addition, schema validation can be namespace xyz
invoked explicitly on a query result or on some in- ⫽ ''http://www.xyz.com/example/names''
termediate expression within a query.
This declaration enables the prefix xyz to be used
The query data model associates a type annotation in QNames in the query body. It associates the pre-
with each element node. A type annotation indicates fix with the URI of a specific namespace and serves
that an element has been validated as conforming as a unique qualifier for names of elements, at-
to a specific named type. Elements that have not been tributes, and types. For example, xyz:billing-
validated or that do not conform to a named type address might uniquely identify the billing-address
have the generic type annotation anyType. For ex- element defined in the namespace http://www.
ample, an element that is created by an element con- xyz.com/example/names. If desired, multiple
structor has the type annotation anyType until it is namespace prefixes can be associated with the same
given a more specific type by a validate expression. namespace.
The following example constructs an element and
validates it against the schema(s) that are named in The query prolog can declare a default namespace
the query prolog: that applies to all unqualified element and type

IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 CHAMBERLIN 613


names and another default namespace that applies Conclusion
to all unqualified function names. The syntax for de-
claring default namespaces is illustrated in the fol- XQuery is a functional language consisting of sev-
eral types of expressions that can be composed with
lowing example:
full generality. XQuery expression-types include path
expressions, element constructors, function calls,
default element namespace
arithmetic and logical expressions, conditional ex-
⫽ ''http://www.xyz.com/example/names''
pressions, quantified expressions, expressions on se-
default function namespace
quences, and expressions on types.
⫽ ''http://www.xyz.com/example/functions''
XQuery is defined in terms of a data model based
If no default namespaces are provided, unqualified on heterogeneous sequences of nodes and atomic
names of elements, types, and functions are consid- values. An instance of this data model may contain
ered to be in no namespace. Unqualified attribute one or more XML documents or fragments of doc-
names are always considered to be in no namespace. uments. A query provides a mapping from one in-
stance of the data model to another instance of the
In addition to namespace declarations, a query pro- data model. A query consists of a prolog that estab-
log can contain one or more schema imports. A lishes the processing environment, and an expres-
schema import identifies a schema by its URI and op- sion that generates the result of the query.
tionally provides a second URI that specifies the lo-
cation where the schema file can be found. The pur- Currently, XQuery is defined only by a series of work-
pose of the schema import is to make available to ing drafts, and design of the language is an ongoing
the query processor the definitions of elements, at- activity of the W3C XML Query Working Group. The
tributes, and types that are declared in the named working group is actively discussing the XQuery type
schema. The query processor can use these defini- system and how it is mapped to and from the type
tions for validating newly constructed elements, for system of XML Schema. It is also discussing full-text
optimization, and for doing static type analysis on search functions, serialization of query results, error-
handling, and a number of other issues. It is likely
a query.
that the final XQuery specification will include mul-
tiple conformance levels; for example, it may define
A schema usually defines a set of elements, attributes, how static type-checking is done but not require that
and types in a particular namespace, called its target it be done by every conforming implementation. It
namespace, but it does not define a namespace pre- is also expected that a subset of XQuery will be des-
fix. Therefore, a schema import may specify a ignated as XPath Version 2.0 and will be made avail-
namespace prefix to be bound to the target able for embedding in other languages such as XSLT. 5
namespace of the given schema. For example, the
following schema import binds the namespace pre- This paper has presented an overview of XQuery,
fix xhtml to the target namespace of a particular illustrated with some example queries. For a more
schema and also provides the system with a nonbind- complete description of XQuery and a BNF (Backus-
ing “hint” for where this schema can be found: Naur Form) grammar for the language, the reader
is referred to Reference 7. The document at this URI
schema namespace xhtml will be updated periodically to contain the latest
⫽ ''http://www.w3.org/1999/xhtml'' XQuery specification as the language continues to
at ''http://www.w3.org/1999/xhtml/xhtml.xsd'' evolve.

In addition to namespace declarations and schema Just as XML is emerging as an application-indepen-


imports, a query prolog may contain one or more dent format for exchange of information on the In-
function definitions. We have seen examples of func- ternet, XQuery is designed to serve as an application-
tion definitions in an earlier section. The functions independent format for exchange of queries. If
defined in the query prolog may be used in the query XQuery is successful in providing a standard way to
body or in the bodies of other functions. It is expected retrieve information from XML data sources, it will
help XML to realize its potential as a universal in-
that the query prolog will also provide a means of
formation representation.
importing function definitions from an external func-
tion library. **Trademark or registered trademark of Sun Microsystems, Inc.

614 CHAMBERLIN IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002


Cited references degree in engineering from Harvey Mudd College and a Ph.D.
degree in electrical engineering from Stanford University. He is
1. Extensible Markup Language (XML) 1.0 (Second Edition), an ACM Fellow and a member of the National Academy of En-
W3C Recommendation (6 October 2000), see http://www. gineering and the IBM Academy of Technology. Dr. Chamber-
w3.org/TR/REC-xml. lin is currently a research staff member at the Almaden Research
2. World Wide Web Consortium (W3C), see http://www.w3.org. Center, where his work is focused on relational database tech-
3. XML Schema, Parts 0, 1, and 2, W3C Recommendation (2 nology, document processing, and XML. He serves as one of
May 2001), see http://www.w3.org/TR/xmlschema-0, http:// IBM’s representatives to the W3C XML Query Working Group
www.w3.org/TR/xmlschema-1 and http://www.w3.org/TR/ and as an editor of the XQuery language specification.
xmlschema-2.
4. XML Path Language (XPath) Version 1.0, W3C Recommen-
dation (16 November 1999), see http://www.w3.org/TR/xpath.
5. XSL Transformation (XSLT) Version 1.0, W3C Recommen-
dation (16 November 1999), see http://www.w3.org/TR/xslt.
6. W3C XML Query Working Group, see http://www.w3.
org/XML/Query.
7. XQuery 1.0: An XML Query Language, W3C Working Draft
(16 August 2002), see http://www.w3.org/TR/xquery.
8. XML Query Requirements, W3C Working Draft (15 Febru-
ary 2001), see http://www.w3.org/TR/xmlquery-req.
9. XQuery 1.0 and XPath 2.0 Data Model, W3C Working
Draft (16 August 2002), see http://www.w3.org/TR/query-
datamodel.
10. XQuery 1.0 Formal Semantics, W3C Working Draft (16 Au-
gust 2002), see http://www.w3.org/TR/query-semantics.
11. XQuery 1.0 and XPath 2.0 Functions and Operators, W3C
Working Draft (16 August 2002), see http://www.w3.org/
TR/xquery-operators.
12. XML Query Use Cases, W3C Working Draft (16 August 2002),
see http://www.w3.org/TR/xmlquery-use-cases.
13. D. Chamberlin, J. Robie, and D. Florescu, “Quilt: An XML
Query Language for Heterogeneous Data Sources,” Lecture
Notes in Computer Science, Springer-Verlag (December 2000),
see also http://www.almaden.ibm.com/cs/people/chamberlin/
quilt.html.
14. T. Atwood, D. Barry, J. Duhl, J. Eastman, G. Ferran, D. Jor-
dan, M. Loomis, and D. Wade, The Object Database Stan-
dard: ODMG-93, Release 1.2, R. G. C. Catell, Editor, Mor-
gan Kaufmann Publishers, San Francisco, CA (1996).
15. Information Technology-Database Language SQL, Standard
No. ISO/IEC 9075, International Organization for Standard-
ization (ISO) (1999); available from American National Stan-
dards Institute, New York, NY 10036, (212) 642– 4900.
16. J. Robie, J. Lapp, and D. Schach, XML Query Language
(XQL), see http://www.w3.org/TandS/QL/QL98/pp/xql.html.
17. A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Su-
ciu, A Query Language for XML, see http://www.research.
att.com/⬃mff/files/final.html.
18. S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wie-
ner, “The Lorel Query Language for Semistructured Data,”
International Journal on Digital Libraries 1, No. 1, 68 – 88 (April
1997), see http://www-db.stanford.edu/⬃widom/pubs.html.
19. Namespaces in XML, W3C Recommendation (14 January
1999), see http://www.w3.org/TR/REC-xml-names.
20. XML Path Language (XPath) 2.0, W3C Working Draft (20
December 2001), see http://www.w3.org/TR/xpath20.

Accepted for publication June 17, 2002.

Don Chamberlin IBM Research Division, Almaden Research Cen-


ter, 650 Harry Road, San Jose, California 95120 (electronic mail:
chamberlin@almaden.ibm.com). Dr. Chamberlin is best known
as co-inventor of the SQL database language and as author of
two books on IBM’s relational database products. He holds a B.S.

IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 CHAMBERLIN 615

Вам также может понравиться