Вы находитесь на странице: 1из 59

Chapter 10: XML

Database System Concepts


©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
XML

„ Structure of XML Data


„ XML Document Schema
„ Querying and Transformation
„ Application Program Interfaces to XML
„ Storage of XML Data
„ XML Applications

Database System Concepts - 5th Edition, Aug 22, 2005. 10.2 ©Silberschatz, Korth and Sudarshan
Introduction

„ XML: Extensible Markup Language


„ Defined by the WWW Consortium (W3C)
„ Derived from SGML (Standard Generalized Markup Language), but
simpler to use than SGML
„ Documents have tags giving extra information about sections of the
document
z E.g. <title> XML </title> <slide> Introduction …</slide>
„ Extensible, unlike HTML
z Users can add new tags, and separately specify how the tag should be
handled for display

Database System Concepts - 5th Edition, Aug 22, 2005. 10.3 ©Silberschatz, Korth and Sudarshan
XML Introduction (Cont.)

„ The ability to specify new tags, and to create nested tag structures make
XML a great way to exchange data, not just documents.
z Much of the use of XML has been in data exchange applications, not as a
replacement for HTML
„ Tags make data (relatively) self-documenting
z E.g.
<bank>
<account>
<account_number> A-101 </account_number>
<branch_name> Downtown </branch_name>
<balance> 500 </balance>
</account>
<depositor>
<account_number> A-101 </account_number>
<customer_name> Johnson </customer_name>
</depositor>
</bank>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.4 ©Silberschatz, Korth and Sudarshan
XML: Motivation

„ Data interchange is critical in today’s networked world


z Examples:
 Banking: funds transfer
 Order processing (especially inter-company orders)
 Scientific data
– Chemistry: ChemML, …
– Genetics: BSML (Bio-Sequence Markup Language), …
z Paper flow of information between organizations is being replaced
by electronic flow of information
„ Each application area has its own set of standards for representing
information
„ XML has become the basis for all new generation data interchange
formats

Database System Concepts - 5th Edition, Aug 22, 2005. 10.5 ©Silberschatz, Korth and Sudarshan
XML Motivation (Cont.)

„ Earlier generation formats were based on plain text with line headers
indicating the meaning of fields
z Similar in concept to email headers
z Does not allow for nested structures, no standard “type” language
z Tied too closely to low level document structure (lines, spaces, etc)
„ Each XML based standard defines what are valid elements, using
z XML type specification languages to specify the syntax
 DTD (Document Type Descriptors)
 XML Schema
z Plus textual descriptions of the semantics
„ XML allows new tags to be defined as required
z However, this may be constrained by DTDs
„ A wide variety of tools is available for parsing, browsing and querying XML
documents/data

Database System Concepts - 5th Edition, Aug 22, 2005. 10.6 ©Silberschatz, Korth and Sudarshan
Comparison with Relational Data

„ Inefficient: tags, which in effect represent schema information, are


repeated
„ Better than relational tuples as a data-exchange format
z Unlike relational tuples, XML data is self-documenting due to
presence of tags
z Non-rigid format: tags can be added
z Allows nested structures
z Wide acceptance, not only in database systems, but also in
browsers, tools, and applications

Database System Concepts - 5th Edition, Aug 22, 2005. 10.7 ©Silberschatz, Korth and Sudarshan
Structure of XML Data

„ Tag: label for a section of data


„ Element: section of data beginning with <tagname> and ending with
matching </tagname>
„ Elements must be properly nested
z Proper nesting
 <account> … <balance> …. </balance> </account>
z Improper nesting
 <account> … <balance> …. </account> </balance>
z Formally: every start tag must have a unique matching end tag,
that is in the context of the same parent element.
„ Every document must have a single top-level element

Database System Concepts - 5th Edition, Aug 22, 2005. 10.8 ©Silberschatz, Korth and Sudarshan
Example of Nested Elements
<bank-1>
<customer>
<customer_name> Hayes </customer_name>
<customer_street> Main </customer_street>
<customer_city> Harrison </customer_city>
<account>
<account_number> A-102 </account_number>
<branch_name> Perryridge </branch_name>
<balance> 400 </balance>
</account>
<account>

</account>
</customer>
.
.
</bank-1>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.9 ©Silberschatz, Korth and Sudarshan
Motivation for Nesting

„ Nesting of data is useful in data transfer


z Example: elements representing customer_id, customer_name, and
address nested within an order element
„ Nesting is not supported, or discouraged, in relational databases
z With multiple orders, customer name and address are stored
redundantly
z normalization replaces nested structures in each order by foreign key
into table storing customer name and address information
z Nesting is supported in object-relational databases
„ But nesting is appropriate when transferring data
z External application does not have direct access to data referenced
by a foreign key

Database System Concepts - 5th Edition, Aug 22, 2005. 10.10 ©Silberschatz, Korth and Sudarshan
Structure of XML Data (Cont.)
„ Mixture of text with sub-elements is legal in XML.
z Example:
<account>
This account is seldom used any more.
<account_number> A-102</account_number>
<branch_name> Perryridge</branch_name>
<balance>400 </balance>
</account>
z Useful for document markup, but discouraged for data
representation

Database System Concepts - 5th Edition, Aug 22, 2005. 10.11 ©Silberschatz, Korth and Sudarshan
Attributes

„ Elements can have attributes


<account acct-type = “checking” >
<account_number> A-102 </account_number>
<branch_name> Perryridge </branch_name>
<balance> 400 </balance>
</account>
„ Attributes are specified by name=value pairs inside the starting tag of an
element
„ An element may have several attributes, but each attribute name can
only occur once
<account acct-type = “checking” monthly-fee=“5”>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.12 ©Silberschatz, Korth and Sudarshan
Attributes vs. Subelements

„ Distinction between subelement and attribute


z In the context of documents, attributes are part of markup, while
subelement contents are part of the basic document contents
z In the context of data representation, the difference is unclear and
may be confusing
 Same information can be represented in two ways
– <account account_number = “A-101”> …. </account>
– <account>
<account_number>A-101</account_number> …
</account>
z Suggestion: use attributes for identifiers of elements, and use
subelements for contents

Database System Concepts - 5th Edition, Aug 22, 2005. 10.13 ©Silberschatz, Korth and Sudarshan
Namespaces

„ XML data has to be exchanged between organizations


„ Same tag name may have different meaning in different organizations,
causing confusion on exchanged documents
„ Specifying a unique string as an element name avoids confusion
„ Better solution: use unique-name:element-name
„ Avoid using long unique names all over document by using XML
Namespaces
<bank Xmlns:FB=‘http://www.FirstBank.com’>

<FB:branch>
<FB:branchname>Downtown</FB:branchname>
<FB:branchcity> Brooklyn </FB:branchcity>
</FB:branch>

</bank>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.14 ©Silberschatz, Korth and Sudarshan
More on XML Syntax

„ Elements without subelements or text content can be abbreviated by


ending the start tag with a /> and deleting the end tag
z <account number=“A-101” branch=“Perryridge” balance=“200 />
„ To store string data that may contain tags, without the tags being
interpreted as subelements, use CDATA as below
z <![CDATA[<account> … </account>]]>
Here, <account> and </account> are treated as just strings
CDATA stands for “character data”

Database System Concepts - 5th Edition, Aug 22, 2005. 10.15 ©Silberschatz, Korth and Sudarshan
XML Document Schema

„ Database schemas constrain what information can be stored, and the


data types of stored values
„ XML documents are not required to have an associated schema
„ However, schemas are very important for XML data exchange
z Otherwise, a site cannot automatically interpret data received from
another site
„ Two mechanisms for specifying XML schema
z Document Type Definition (DTD)
 Widely used
z XML Schema
 Newer, increasing use

Database System Concepts - 5th Edition, Aug 22, 2005. 10.16 ©Silberschatz, Korth and Sudarshan
Document Type Definition (DTD)

„ The type of an XML document can be specified using a DTD


„ DTD constraints structure of XML data
z What elements can occur
z What attributes can/must an element have
z What subelements can/must occur inside each element, and how
many times.
„ DTD does not constrain data types
z All values represented as strings in XML
„ DTD syntax
z <!ELEMENT element (subelements-specification) >
z <!ATTLIST element (attributes) >

Database System Concepts - 5th Edition, Aug 22, 2005. 10.17 ©Silberschatz, Korth and Sudarshan
Element Specification in DTD
„ Subelements can be specified as
z names of elements, or
z #PCDATA (parsed character data), i.e., character strings
z EMPTY (no subelements) or ANY (anything can be a subelement)
„ Example
<! ELEMENT depositor (customer_name account_number)>
<! ELEMENT customer_name (#PCDATA)>
<! ELEMENT account_number (#PCDATA)>
„ Subelement specification may have regular expressions
<!ELEMENT bank ( ( account | customer | depositor)+)>
 Notation:
– “|” - alternatives
– “+” - 1 or more occurrences
– “*” - 0 or more occurrences

Database System Concepts - 5th Edition, Aug 22, 2005. 10.18 ©Silberschatz, Korth and Sudarshan
Bank DTD

<!DOCTYPE bank [
<!ELEMENT bank ( ( account | customer | depositor)+)>
<!ELEMENT account (account_number branch_name balance)>
<! ELEMENT customer(customer_name customer_street
customer_city)>
<! ELEMENT depositor (customer_name account_number)>
<! ELEMENT account_number (#PCDATA)>
<! ELEMENT branch_name (#PCDATA)>
<! ELEMENT balance(#PCDATA)>
<! ELEMENT customer_name(#PCDATA)>
<! ELEMENT customer_street(#PCDATA)>
<! ELEMENT customer_city(#PCDATA)>
]>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.19 ©Silberschatz, Korth and Sudarshan
Attribute Specification in DTD

„ Attribute specification : for each attribute


z Name
z Type of attribute
 CDATA

 ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs)


– more on this later
z Whether
 mandatory (#REQUIRED)
 has a default value (value),
 or neither (#IMPLIED)
„ Examples
z <!ATTLIST account acct-type CDATA “checking”>
z <!ATTLIST customer
customer_id ID # REQUIRED
accounts IDREFS # REQUIRED >
Database System Concepts - 5th Edition, Aug 22, 2005. 10.20 ©Silberschatz, Korth and Sudarshan
IDs and IDREFs

„ An element can have at most one attribute of type ID


„ The ID attribute value of each element in an XML document must be
distinct
z Thus the ID attribute value is an object identifier
„ An attribute of type IDREF must contain the ID value of an element in
the same document
„ An attribute of type IDREFS contains a set of (0 or more) ID values.
Each ID value must contain the ID value of an element in the same
document

Database System Concepts - 5th Edition, Aug 22, 2005. 10.21 ©Silberschatz, Korth and Sudarshan
Bank DTD with Attributes

„ Bank DTD with ID and IDREF attribute types.


<!DOCTYPE bank-2[
<!ELEMENT account (branch, balance)>
<!ATTLIST account
account_number ID # REQUIRED
owners IDREFS # REQUIRED>
<!ELEMENT customer(customer_name, customer_street,
customer_city)>
<!ATTLIST customer
customer_id ID # REQUIRED
accounts IDREFS # REQUIRED>
… declarations for branch, balance, customer_name,
customer_street and customer_city
]>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.22 ©Silberschatz, Korth and Sudarshan
XML data with ID and IDREF attributes

<bank-2>
<account account_number=“A-401” owners=“C100 C102”>
<branch_name> Downtown </branch_name>
<balance> 500 </balance>
</account>
<customer customer_id=“C100” accounts=“A-401”>
<customer_name>Joe </customer_name>
<customer_street> Monroe </customer_street>
<customer_city> Madison</customer_city>
</customer>
<customer customer_id=“C102” accounts=“A-401 A-402”>
<customer_name> Mary </customer_name>
<customer_street> Erin </customer_street>
<customer_city> Newark </customer_city>
</customer>
</bank-2>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.23 ©Silberschatz, Korth and Sudarshan
Limitations of DTDs

„ No typing of text elements and attributes


z All values are strings, no integers, reals, etc.
„ Difficult to specify unordered sets of subelements
z Order is usually irrelevant in databases (unlike in the document-
layout environment from which XML evolved)
z (A | B)* allows specification of an unordered set, but
 Cannot ensure that each of A and B occurs only once
„ IDs and IDREFs are untyped
z The owners attribute of an account may contain a reference to
another account, which is meaningless
 owners attribute should ideally be constrained to refer to
customer elements

Database System Concepts - 5th Edition, Aug 22, 2005. 10.24 ©Silberschatz, Korth and Sudarshan
XML Schema

„ XML Schema is a more sophisticated schema language which


addresses the drawbacks of DTDs. Supports
z Typing of values
 E.g. integer, string, etc
 Also, constraints on min/max values
z User-defined, comlex types
z Many more features, including
 uniqueness and foreign key constraints, inheritance
„ XML Schema is itself specified in XML syntax, unlike DTDs
z More-standard representation, but verbose
„ XML Scheme is integrated with namespaces
„ BUT: XML Schema is significantly more complicated than DTDs.

Database System Concepts - 5th Edition, Aug 22, 2005. 10.25 ©Silberschatz, Korth and Sudarshan
XML Schema Version of Bank DTD
<xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema>
<xs:element name=“bank” type=“BankType”/>
<xs:element name=“account”>
<xs:complexType>
<xs:sequence>
<xs:element name=“account_number” type=“xs:string”/>
<xs:element name=“branch_name” type=“xs:string”/>
<xs:element name=“balance” type=“xs:decimal”/>
</xs:squence>
</xs:complexType>
</xs:element>
….. definitions of customer and depositor ….
<xs:complexType name=“BankType”>
<xs:squence>
<xs:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>
</xs:sequence>
</xs:complexType>
</xs:schema>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.26 ©Silberschatz, Korth and Sudarshan
XML Schema Version of Bank DTD

„ Choice of “xs:” was ours -- any other namespace prefix could be


chosen
„ Element “bank” has type “BankType”, which is defined separately
z xs:complexType is used later to create the named complex type
“BankType”
„ Element “account” has its type defined in-line

Database System Concepts - 5th Edition, Aug 22, 2005. 10.27 ©Silberschatz, Korth and Sudarshan
More features of XML Schema

„ Attributes specified by xs:attribute tag:


z <xs:attribute name = “account_number”/>
z adding the attribute use = “required” means value must be
specified
„ Key constraint: “account numbers form a key for account elements
under the root bank element:
<xs:key name = “accountKey”>
<xs:selector xpath = “]bank/account”/>
<xs:field xpath = “account_number”/>
<\xs:key>
„ Foreign key constraint from depositor to account:
<xs:keyref name = “depositorAccountKey” refer=“accountKey”>
<xs:selector xpath = “]bank/account”/>
<xs:field xpath = “account_number”/>
<\xs:keyref>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.28 ©Silberschatz, Korth and Sudarshan
Querying and Transforming XML Data

„ Translation of information from one XML schema to another


„ Querying on XML data
„ Above two are closely related, and handled by the same tools
„ Standard XML querying/translation languages
z XPath
 Simple language consisting of path expressions
z XSLT
 Simple language designed for translation from XML to XML
and XML to HTML
z XQuery
 An XML query language with a rich set of features

Database System Concepts - 5th Edition, Aug 22, 2005. 10.29 ©Silberschatz, Korth and Sudarshan
Tree Model of XML Data

„ Query and transformation languages are based on a tree model of XML


data
„ An XML document is modeled as a tree, with nodes corresponding to
elements and attributes
z Element nodes have child nodes, which can be attributes or
subelements
z Text in an element is modeled as a text node child of the element
z Children of a node are ordered according to their order in the XML
document
z Element and attribute nodes (except for the root node) have a single
parent, which is an element node
z The root node has a single child, which is the root element of the
document

Database System Concepts - 5th Edition, Aug 22, 2005. 10.30 ©Silberschatz, Korth and Sudarshan
XPath

„ XPath is used to address (select) parts of documents using


path expressions
„ A path expression is a sequence of steps separated by “/”
z Think of file names in a directory hierarchy
„ Result of path expression: set of values that along with their
containing elements/attributes match the specified path
„ E.g. /bank-2/customer/customer_name evaluated on the bank-2
data we saw earlier returns
<customer_name>Joe</customer_name>
<customer_name>Mary</customer_name>
„ E.g. /bank-2/customer/customer_name/text( )
returns the same names, but without the enclosing tags

Database System Concepts - 5th Edition, Aug 22, 2005. 10.31 ©Silberschatz, Korth and Sudarshan
XPath (Cont.)

„ The initial “/” denotes root of the document (above the top-level tag)
„ Path expressions are evaluated left to right
z Each step operates on the set of instances produced by the previous
step
„ Selection predicates may follow any step in a path, in [ ]
z E.g. /bank-2/account[balance > 400]
 returns account elements with a balance value greater than 400
 /bank-2/account[balance] returns account elements containing a
balance subelement
„ Attributes are accessed using “@”
z E.g. /bank-2/account[balance > 400]/@account_number
 returns the account numbers of accounts with balance > 400
z IDREF attributes are not dereferenced automatically (more on this
later)

Database System Concepts - 5th Edition, Aug 22, 2005. 10.32 ©Silberschatz, Korth and Sudarshan
Functions in XPath
„ XPath provides several functions
z The function count() at the end of a path counts the number of
elements in the set generated by the path
 E.g. /bank-2/account[count(./customer) > 2]
– Returns accounts with > 2 customers
z Also function for testing position (1, 2, ..) of node w.r.t. siblings
„ Boolean connectives and and or and function not() can be used in
predicates
„ IDREFs can be referenced using function id()
z id() can also be applied to sets of references such as IDREFS and
even to strings containing multiple references separated by blanks
z E.g. /bank-2/account/id(@owner)
 returns all customers referred to from the owners attribute of
account elements.

Database System Concepts - 5th Edition, Aug 22, 2005. 10.33 ©Silberschatz, Korth and Sudarshan
More XPath Features
„ Operator “|” used to implement union
z E.g. /bank-2/account/id(@owner) | /bank-2/loan/id(@borrower)
 Gives customers with either accounts or loans
 However, “|” cannot be nested inside other operators.
„ “//” can be used to skip multiple levels of nodes
z E.g. /bank-2//customer_name
 finds any customer_name element anywhere under the
/bank-2 element, regardless of the element in which it is
contained.
„ A step in the path can go to parents, siblings, ancestors and
descendants of the nodes generated by the previous step, not just
to the children
z “//”, described above, is a short from for specifying “all
descendants”
z “..” specifies the parent.
„ doc(name) returns the root of a named document

Database System Concepts - 5th Edition, Aug 22, 2005. 10.34 ©Silberschatz, Korth and Sudarshan
XQuery
„ XQuery is a general purpose query language for XML data
„ Currently being standardized by the World Wide Web Consortium
(W3C)
z The textbook description is based on a January 2005 draft of the
standard. The final version may differ, but major features likely to
stay unchanged.
„ XQuery is derived from the Quilt query language, which itself borrows
from SQL, XQL and XML-QL
„ XQuery uses a
for … let … where … order by …result …
syntax
for Ù SQL from
where Ù SQL where
order by Ù SQL order by
result Ù SQL select
let allows temporary variables, and has no equivalent in SQL

Database System Concepts - 5th Edition, Aug 22, 2005. 10.35 ©Silberschatz, Korth and Sudarshan
FLWOR Syntax in XQuery
„ For clause uses XPath expressions, and variable in for clause ranges over
values in the set returned by XPath
„ Simple FLWOR expression in XQuery
z find all accounts with balance > 400, with each result enclosed in an
<account_number> .. </account_number> tag
for $x in /bank-2/account
let $acctno := $x/@account_number
where $x/balance > 400
return <account_number> { $acctno } </account_number>
z Items in the return clause are XML text unless enclosed in {}, in which
case they are evaluated
„ Let clause not really needed in this query, and selection can be done In
XPath. Query can be written as:
for $x in /bank-2/account[balance>400]
return <account_number> { $x/@account_number }
</account_number>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.36 ©Silberschatz, Korth and Sudarshan
Joins
„ Joins are specified in a manner very similar to SQL
for $a in /bank/account,
$c in /bank/customer,
$d in /bank/depositor
where $a/account_number = $d/account_number
and $c/customer_name = $d/customer_name
return <cust_acct> { $c $a } </cust_acct>
„ The same query can be expressed with the selections specified as
XPath selections:
for $a in /bank/account
$c in /bank/customer
$d in /bank/depositor[
account_number = $a/account_number and
customer_name = $c/customer_name]
return <cust_acct> { $c $a } </cust_acct>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.37 ©Silberschatz, Korth and Sudarshan
Nested Queries
„ The following query converts data from the flat structure for bank
information into the nested structure used in bank-1
<bank-1> {
for $c in /bank/customer
return
<customer>
{ $c/* }
{ for $d in /bank/depositor[customer_name = $c/customer_name],
$a in /bank/account[account_number=$d/account_number]
return $a }
</customer>
} </bank-1>
„ $c/* denotes all the children of the node to which $c is bound, without the
enclosing top-level tag
„ $c/text() gives text content of an element without any subelements / tags

Database System Concepts - 5th Edition, Aug 22, 2005. 10.38 ©Silberschatz, Korth and Sudarshan
Sorting in XQuery
„ The order by clause can be used at the end of any expression. E.g. to return customers
sorted by name
for $c in /bank/customer
order by $c/customer_name
return <customer> { $c/* } </customer>
„ Use order by $c/customer_name to sort in descending order
„ Can sort at multiple levels of nesting (sort by customer_name, and by account_number
within each customer)
<bank-1> {
for $c in /bank/customer
order by $c/customer_name
return
<customer>
{ $c/* }
{ for $d in /bank/depositor[customer_name=$c/customer_name],
$a in /bank/account[account_number=$d/account_number] }
order by $a/account_number
return <account> $a/* </account>
</customer>
} </bank-1>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.39 ©Silberschatz, Korth and Sudarshan
Functions and Other XQuery Features

„ User defined functions with the type system of XMLSchema


function balances(xs:string $c) returns list(xs:decimal*) {
for $d in /bank/depositor[customer_name = $c],
$a in /bank/account[account_number = $d/account_number]
return $a/balance
}
„ Types are optional for function parameters and return values
„ The * (as in decimal*) indicates a sequence of values of that type
„ Universal and existential quantification in where clause predicates
z some $e in path satisfies P
z every $e in path satisfies P
„ XQuery also supports If-then-else clauses

Database System Concepts - 5th Edition, Aug 22, 2005. 10.40 ©Silberschatz, Korth and Sudarshan
XSLT

„ A stylesheet stores formatting options for a document, usually


separately from document
z E.g. an HTML style sheet may specify font colors and sizes for
headings, etc.
„ The XML Stylesheet Language (XSL) was originally designed for
generating HTML from XML
„ XSLT is a general-purpose transformation language
z Can translate XML to XML, and XML to HTML
„ XSLT transformations are expressed using rules called templates
z Templates combine selection using XPath with construction of
results

Database System Concepts - 5th Edition, Aug 22, 2005. 10.41 ©Silberschatz, Korth and Sudarshan
XSLT Templates
„ Example of XSLT template with match and select part
<xsl:template match=“/bank-2/customer”>
<xsl:value-of select=“customer_name”/>
</xsl:template>
<xsl:template match=“*”/>
„ The match attribute of xsl:template specifies a pattern in XPath
„ Elements in the XML document matching the pattern are processed by the
actions within the xsl:template element
z xsl:value-of selects (outputs) specified values (here, customer_name)
„ For elements that do not match any template
z Attributes and text contents are output as is
z Templates are recursively applied on subelements
„ The <xsl:template match=“*”/> template matches all
elements that do not match any other template
z Used to ensure that their contents do not get output.
„ If an element matches several templates, only one is used based on a
complex priority scheme/user-defined priorities

Database System Concepts - 5th Edition, Aug 22, 2005. 10.42 ©Silberschatz, Korth and Sudarshan
Creating XML Output
„ Any text or tag in the XSL stylesheet that is not in the xsl namespace
is output as is
„ E.g. to wrap results in new XML elements.
<xsl:template match=“/bank-2/customer”>
<customer>
<xsl:value-of select=“customer_name”/>
</customer>
</xsl;template>
<xsl:template match=“*”/>
z Example output:
<customer> Joe </customer>
<customer> Mary </customer>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.43 ©Silberschatz, Korth and Sudarshan
Creating XML Output (Cont.)
„ Note: Cannot directly insert a xsl:value-of tag inside another tag
E.g. cannot create an attribute for <customer> in the previous example
z
by directly using xsl:value-of
z XSLT provides a construct xsl:attribute to handle this situation
 xsl:attribute adds attribute to the preceding element
 E.g. <customer>
<xsl:attribute name=“customer_id”>
<xsl:value-of select = “customer_id”/>
</xsl:attribute>
</customer>
results in output of the form
<customer customer_id=“….”> ….
„ xsl:element is used to create output elements with computed names

Database System Concepts - 5th Edition, Aug 22, 2005. 10.44 ©Silberschatz, Korth and Sudarshan
Structural Recursion
„ Template action can apply templates recursively to the contents of a
matched element
<xsl:template match=“/bank”>
<customers>
<xsl:template apply-templates/>
</customers >
</xsl:template>
<xsl:template match=“/customer”>
<customer>
<xsl:value-of select=“customer_name”/>
</customer>
</xsl:template>
<xsl:template match=“*”/>
„ Example output:
<customers>
<customer> John </customer>
<customer> Mary </customer>
</customers>
Database System Concepts - 5th Edition, Aug 22, 2005. 10.45 ©Silberschatz, Korth and Sudarshan
Joins in XSLT

„ XSLT keys allow elements to be looked up (indexed) by values of


subelements or attributes
z Keys must be declared (with a name) and, the key() function can then
be used for lookup. E.g.
<xsl:key name=“acctno” match=“account”
use=“account_number”/>
<xsl:value-of select=key(“acctno”, “A-101”)
„ Keys permit (some) joins to be expressed in XSLT
<xsl:key name=“acctno” match=“account” use=“account_number”/>
<xsl:key name=“custno” match=“customer” use=“customer_name”/>
<xsl:template match=“depositor”>
<cust_acct>
<xsl:value-of select=key(“custno”, “customer_name”)/>
<xsl:value-of select=key(“acctno”, “account_number”)/>
</cust_acct>
</xsl:template>
<xsl:template match=“*”/>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.46 ©Silberschatz, Korth and Sudarshan
Sorting in XSLT
„ Using an xsl:sort directive inside a template causes all elements
matching the template to be sorted
z Sorting is done before applying other templates
<xsl:template match=“/bank”>
<xsl:apply-templates select=“customer”>
<xsl:sort select=“customer_name”/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match=“customer”>
<customer>
<xsl:value-of select=“customer_name”/>
<xsl:value-of select=“customer_street”/>
<xsl:value-of select=“customer_city”/>
</customer>
<xsl:template>
<xsl:template match=“*”/>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.47 ©Silberschatz, Korth and Sudarshan
Application Program Interface

„ There are two standard application program interfaces to XML data:


z SAX (Simple API for XML)
 Based on parser model, user provides event handlers for parsing
events
– E.g. start of element, end of element
– Not suitable for database applications
z DOM (Document Object Model)
 XML data is parsed into a tree representation
 Variety of functions provided for traversing the DOM tree
 E.g.: Java DOM API provides Node class with methods
getParentNode( ), getFirstChild( ), getNextSibling( )
getAttribute( ), getData( ) (for text node)
getElementsByTagName( ), …
 Also provides functions for updating DOM tree

Database System Concepts - 5th Edition, Aug 22, 2005. 10.48 ©Silberschatz, Korth and Sudarshan
Storage of XML Data

„ XML data can be stored in


z Non-relational data stores
 Flat files
– Natural for storing XML
– But has all problems discussed in Chapter 1 (no concurrency,
no recovery, …)
 XML database
– Database built specifically for storing XML data, supporting
DOM model and declarative querying
– Currently no commercial-grade systems
z Relational databases
 Data must be translated into relational form
 Advantage: mature database systems
 Disadvantages: overhead of translating data and queries

Database System Concepts - 5th Edition, Aug 22, 2005. 10.49 ©Silberschatz, Korth and Sudarshan
Storage of XML in Relational Databases

„ Alternatives:
z String Representation
z Tree Representation
z Map to relations

Database System Concepts - 5th Edition, Aug 22, 2005. 10.50 ©Silberschatz, Korth and Sudarshan
String Representation
„ Store each top level element as a string field of a tuple in a relational
database
z Use a single relation to store all elements, or
z Use a separate relation for each top-level element type
 E.g. account, customer, depositor relations
– Each with a string-valued attribute to store the element
„ Indexing:
z Store values of subelements/attributes to be indexed as extra fields
of the relation, and build indices on these fields
 E.g. customer_name or account_number
z Some database systems support function indices, which use the
result of a function as the key value.
 The function should return the value of the required
subelement/attribute

Database System Concepts - 5th Edition, Aug 22, 2005. 10.51 ©Silberschatz, Korth and Sudarshan
String Representation (Cont.)

„ Benefits:
z Can store any XML data even without DTD
z As long as there are many top-level elements in a document,
strings are small compared to full document
 Allows fast access to individual elements.
„ Drawback: Need to parse strings to access values inside the elements
z Parsing is slow.

Database System Concepts - 5th Edition, Aug 22, 2005. 10.52 ©Silberschatz, Korth and Sudarshan
Tree Representation

„ Tree representation: model XML data as tree and store using relations
nodes(id, type, label, value)
child (child_id, parent_id)
bank (id:1)

customer (id:2) account (id: 5)

customer_name account_number
(id: 3) (id: 7)

„ Each element/attribute is given a unique identifier


„ Type indicates element/attribute
„ Label specifies the tag name of the element/name of attribute
„ Value is the text value of the element/attribute
„ The relation child notes the parent-child relationships in the tree
z Can add an extra attribute to child to record ordering of children

Database System Concepts - 5th Edition, Aug 22, 2005. 10.53 ©Silberschatz, Korth and Sudarshan
Tree Representation (Cont.)

„ Benefit: Can store any XML data, even without DTD


„ Drawbacks:
z Data is broken up into too many pieces, increasing space
overheads
z Even simple queries require a large number of joins, which can be
slow

Database System Concepts - 5th Edition, Aug 22, 2005. 10.54 ©Silberschatz, Korth and Sudarshan
Mapping XML Data to Relations

„ Relation created for each element type whose schema is known:


z An id attribute to store a unique id for each element
z A relation attribute corresponding to each element attribute
z A parent_id attribute to keep track of parent element
 As in the tree representation
 Position information (ith child) can be store too
„ All subelements that occur only once can become relation attributes
z For text-valued subelements, store the text as attribute value
z For complex subelements, can store the id of the subelement
„ Subelements that can occur multiple times represented in a separate
table
z Similar to handling of multivalued attributes when converting ER
diagrams to tables

Database System Concepts - 5th Edition, Aug 22, 2005. 10.55 ©Silberschatz, Korth and Sudarshan
Storing XML Data in Relational Systems

„ Publishing: process of converting relational data to an XML format


„ Shredding: process of converting an XML document into a set of
tuples to be inserted into one or more relations
„ XML-enabled database systems support automated publishing and
shredding
„ Some systems offer native storage of XML data using the xml data
type. Special internal data structures and indices are used for
efficiency

Database System Concepts - 5th Edition, Aug 22, 2005. 10.56 ©Silberschatz, Korth and Sudarshan
SQL/XML

„ New standard SQL extension that allows creation of nested XML


output
z Each output tuple is mapped to an XML element row
<bank>
<account>
<row>
<account_number> A-101 </account_number>
<branch_name> Downtown </branch_name>
<balance> 500 </balance>
</row>
…. more rows if there are more output tuples …
</account>
</bank>

Database System Concepts - 5th Edition, Aug 22, 2005. 10.57 ©Silberschatz, Korth and Sudarshan
SQL Extensions

„ xmlelement creates XML elements


„ xmlattributes creates attributes

select xmlelement (name “account,


xmlattributes (account_number as account_number),
xmlelement (name “branch_name”, branch_name),
xmlelement (name “balance”, balance))
from account

Database System Concepts - 5th Edition, Aug 22, 2005. 10.58 ©Silberschatz, Korth and Sudarshan
Web Services

„ The Simple Object Access Protocol (SOAP) standard:


z Invocation of procedures across applications with distinct
databases
z XML used to represent procedure input and output
„ A Web service is a site providing a collection of SOAP procedures
z Described using the Web Services Description Language (WSDL)
z Directories of Web services are described using the Universal
Description, Discovery, and Integration (UDDI) standard

Database System Concepts - 5th Edition, Aug 22, 2005. 10.59 ©Silberschatz, Korth and Sudarshan

Вам также может понравиться