Вы находитесь на странице: 1из 66

University of Dublin

Trinity College

Fundamentals of XML

Owen.Conlan@scss.tcd.ie
What is Markup
•  <Lecture>
•  Sequence of characters within a text or word
processing file to define
–  Print properties
–  Display properties
–  Document's logical structure
•  Markup indicators are often called "tags"
–  Examples
</>
•  RTF +
\ '
•  EDIFACT
<>
•  XML {} :
"
Mark Up: RTF
\li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel2\a
djustright\rin0\lin0\itap0
\b\f1\fs26\lang2057\langfe1033\cgrid\langnp2057\langfenp1033
{\lang6153\langfe1033\langnp6153 Entity Relationship Diagram
\par }\pard\plain \s1\ql
\li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\a
djustright\rin0\lin0\itap0 \cbpat17
\b\f1\fs24\lang2057\langfe1033\kerning32\cgrid\langnp2057\langfenp1033
{\lang6153\langfe1033\langnp6153 Entity Type
\par }\pard\plain \ql
\li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
\fs24\lang2057\langfe1033\cgrid\langnp2057\langfenp1033
{\b\fs20\ul\lang6153\langfe1033\langnp6153
Def.:}{\b\fs20\lang6153\langfe1033\langnp6153 }{
\fs20\lang6153\langfe1033\langnp6153 An object or concept that is
identified by the enterprise as having an independent existence.
\par }\pard\plain \s1\ql
\li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\a
djustright\rin0\lin0\itap0 \cbpat17
\b\f1\fs24\lang2057\langfe1033\kerning32\cgrid\langnp2057\langfenp1033
{\lang6153\langfe1033\langnp6153 Entity
\par }\pard\plain \ql
Mark Up: EDIFACT

'''ED2'''OPENET:1111111:OVT':003705655815:OVT'ABC1234567'0'TYP:ORDERS'N
RQ:1'''
UNA:+.?
'UNB+UNOC:2+003705655815:30+1111111:30+980729:2233+4++ORDERS911+++KKK
KATE+1'UNH+
1+ORDERS:001:911:UN:FI0030'BGM+640+1234567'DTM+4:19981201:102'DTM+2:199
90101:102'DTM+2:9901:616'RFF+BC:123'RFF+VN:123456'NAD+BY+003705655815:1
00'
NAD+SE+11111111::92'NAD+PL+53432::92++KAUPPA:KAUPUNKI+KATU
9+KAUPUNKI++00007'NAD+CN+-::ZZ++TERMINAALI+OVI 42+TOINEN
KAUPUNKI++00069'UNS+D'LIN+1++23442423234
:EN'PIA+5+3244:MF'PIA+5+2341234324:ZBU'PIA+5+234243:ZCG'IMD+F+8+-
::91:KUKKAPUR
KKI:SAVI'QTY+21:8:KPL'FTX+AAA+++T.HARMAA:V[RI'FTX+AAA+++10:KOKO'PRI+NTP
:7.23:+
RP:7.32:PE'TAX+7+VAT+++:::22.00'LIN+2++543434554345:EN'PIA+5+535:MF'PIA
+5+45:
PCE‘UNT+38+2'UNZ+2+4'
'''EOF'''9'
Mark Up: XML
<fragment>
<section>
<title>Introduction</title>
<para>Since the emergence of <acronym refid="xml">XML</acronym> in
early 1998 and it's subsequent adoption across diverse application
domains, one of the key benefits it enabled was the separation of
content and presentation <bibref refloc="Bos97"/>. <acronym
refid="xml">XML</acronym> borrowed this model (along with other
important concepts) from the <acronym.grp><acronym
refid="sgml">SGML</acronym><expansion id="sgml">Standard
Generalised Markup Language</expansion></acronym.grp>. An
<acronym refid="sgml">SGML</acronym> document consists of
logically structured content and uses a separate file (style
sheet) to specify how the content should be formatted for
[...]
<figure id="img1">
<title>ePublishing Components</title>
<graphic href="02-04-03-fig01.jpg" width="321" height="214"/>
</figure>
</section>
</fragment>
What is SGML?
•  Standard Generalised Mark- <!DOCTYPE anthology [
<!ELEMENT anthology - - (poem+) >
Up Language
<!ELEMENT poem - - (title?, stanza+)>
•  ISO standard since 1986 <!ELEMENT title - O (#PCDATA) >
•  Meta-language for defining <!ELEMENT stanza - O (line+) >
document mark-up <!ELEMENT line O O (#PCDATA) >
vocabularies ]]>

•  Uses logical mark-up <anthology>


(structure, content) instead <poem>
physical (how document looks <title> The SICK ROSE </title>
on printed page) <stanza>
<line>O Rose thou art sick.</line>
•  Platform-, system-, vendor- <line>The invisible worm,</line>
and version-independent [...]
</stanza>
documents <stanza>
•  Very powerful, but contains a <line>Has found out thy bed</line>
number of complex features <line>Of crimson joy:</line>
[...]
</stanza>
</poem>
</anthology>
What is HTML?
•  HTML, the de facto standard <html>
for publishing Web content, is <head>
an SGML vocabulary <title>The SICK ROSE</title>
</head>
•  Supporting full SGML on the
Web was too difficult so HTML <body>
made some simplifications <h1>The SICK ROSE</h1>
–  not extensible
<p>
–  limited structure
O Rose thou art sick.<br />
–  not content oriented The invisible worm,<br />
–  cannot be validated [...]
•  HTML is a simple language to </p>
understand and use
<p>
•  Most of the content available Has found out thy bed<br />
on the Web has been created Of crimson joy:<br />
with HTML [...]
</p>
</body>
</html>
What is XML?
•  eXtensible Markup Language
•  XML is a simplified subset of SGML
•  Can also be used to define document markup
vocabularies (e.g. XHTML)
–  These can have a strictly defined structure (DTD)
•  Retains the powerful features of SGML (extensibility,
structure, validation)
•  Ignores the complex features of SGML and is
therefore easier to use and implement
•  XML documents look similar to HTML documents
•  Separates structure and presentation (like SGML)
Design of XML
• The design goals for XML as set out in the 1.0
specification are as follows:
1. XML shall be straightforwardly usable over the
Internet.
2. XML shall support a wide variety of applications.
3. XML shall be compatible with SGML.
4. It shall be easy to write programs which process
XML documents.
5. The number of optional features in XML is to be
kept to the absolute minimum, ideally zero.
Design of XML
6.  XML documents should be human-legible and
reasonably clear.
7.  The XML design should be prepared quickly.
8.  The design of XML shall be formal and concise.
9.  XML documents shall be easy to create.
10. Terseness in XML markup is of minimal
importance.
XML Example
<?xml version='1.0' encoding='ISO-8859-1' standalone='yes' ?>
<doc type="book" isbn="1-56592-796-9" xml:lang="en">
<title>A Guide to XML</title>
<author>Norman Walsh</author>
<chapter>
<title>What Do XML Documents Look Like?</title>
<paragraph>If you are [...]</paragraph>
<ol>
<item>
<paragraph>The document begins [...]</paragraph>
</item>
<item>
<paragraph>Empty elements have [...]</paragraph>
<paragraph>In a very [...]</paragraph>
</item>
</ol>
<section>[...]</section>
[...]
</chapter>
<chapter>[...]</chapter>
</doc>
Meta Language vs. Vocabulary

Meta Languages
SGML

XML

XSL
HTML

Vocabularies
SMIL
XHTML

SVG
HL7 CEN ASTM
SynExML
v3 TC251 31.25 XTM

Electronic Patient Record Vocabularies Presentation Vocabularies


Why is the emergence of XML
an important development?
•  XML is a tool for defining languages
–  XML languages are easy to read
–  XML is self describing
•  Parse tree embedded in document
•  Grammar for language referenced via DTD/Schema
•  XML languages are easy for computers to
process, exchange and display
–  XML tools are ubiquitous, free and conform to
established standards
–  Natural affinity with Object serialization
–  Data source neutral
XML technologies
CSS, Cascading Style Sheets
Presentation XSL, Extensible Stylesheet Language
XPath, XQuery

Linking XLink, XBase


XPointer
Topics Maps, Ontology Web Language
Semantics RDF, Resource Description Framework

Structure XML Schema, RelaxNG, RDF Schema,


Document Type Definition (DTD)
XML Namespaces
Syntax XML 1.0
XML 1.0
•  The XML 1.0 specification describes the syntax
for XML documents (elements and attributes)
and DTDs
•  An XML document is a hierarchical data
structure using self-definable tags
–  e.g. <doc><author>[..]</author></doc>
•  There are many other technologies related to
XML
•  XML is
A simple common layer for tree structures in a character stream.
Design goals of XML 1.0 specification
1.  XML shall be straightforwardly usable over the
Internet.
2.  XML shall support a wide variety of applications.
3.  XML shall be compatible with SGML.
4.  It shall be easy to write programs which process XML
documents.
5.  The number of optional features in XML is to be kept
to the absolute minimum, ideally zero.
6.  XML documents should be human-legible and
reasonably clear.
7.  The XML design should be prepared quickly.
8.  The design of XML shall be formal and concise.
9.  XML documents shall be easy to create.
10. Terseness in XML markup is of minimal importance.
Physical Parts of XML documents
Physical parts of XML documents
•  XML Declaration
•  Elements
•  Attributes
•  Document Type Declaration
•  Entities
•  Processing Instructions
•  Comments
•  Character Data Sections

•  XML Namespaces
XML Declaration
•  Placed at the start of an <?xml version="1.0" ?>
XML document
•  Informs XML software of <?xml
version="1.0“
–  the version of XML the encoding="UTF-8" ?>
document conforms to
–  the character encoding
<?xml
scheme used in the
version="1.0“
document
encoding="UTF-8"
–  whether or not a set of standalone="yes" ?>
external declarations
affect the interpretation of
this document
Elements
•  Define logical structure and <?xml version="1.0" ?>
sections of XML documents
•  Four different content types:
<doc>
–  Data content
<title>Java Gently</title>
–  Element content
–  Mixed content <author>Judy Bishop</author>
–  Empty. <publisher name=‘HH’ />
•  Each element must be <chapter>
completely enclosed by <thetext> this is <bold>
another element, except for bold </bold> text </thetext>
the root
•  Note <paragraph/>
–  Any XML name must start </chapter>
with a letter, underscore but </doc>
after that can include also
digits, fullstops, hyphens.
Don’t start with colon due to
namespaces
Don’t include spaces
Attributes
•  Provides additional <?xml version="1.0" ?>
information about an
element <doc type="book"
isbn="0-201-71050-1">
•  Attributes are contained
within the start-tag <title>Java Gently</title>
•  Consists of a name and <author>Judy Bishop</author>
associated value <chapter>
separated by an equals <paragraph type="abstract">
In this book ...
sign
</paragraph>
•  The attribute value must </chapter>
always be enclosed by
quotes </doc>
•  The order of attributes is
insignificant
ELEMENT vs. ATTRIBUTE
•  Lexically little difference,
•  application specific,
•  no hard/fast rules available.

ELEMENT ATTRIBUTE
•  Constituent data, •  Inherent data,
•  Used for content, •  Used for meta-data,
•  White space can be •  No further nesting
ignored or preserved possible (atomic data),
•  Nesting allowed (child •  Default values,
elements), •  Minimal datatypes,
•  Convenient for large
values, or binary
entities.
Entities
•  Storage units for <math>
5 &lt; 6 and 6 &gt; 5
repeated text </math>
–  Defined in a DTD
<copyright>
•  Character entities are
&copyright-notice;
used to insert characters </copyright>
that cannot be typed
directly <bullet>
XML contains a number
•  XML contains a number of &apos;built-in&apos;
of 'built-in' entities entities
<list>
–  &quot; <item>&amp;quot;</item>
–  &apos; <item>&amp;apos;</item>
<item>&amp;lt;</item>
–  &lt;
<item>&amp;gt;</item>
–  &gt; <item>&amp;amp;</item>
–  &amp; </list>
</bullet>
Character Data Sections
•  Data which is to be <![CDATA[
parsed is called PCDATA You don't need to escape
special characters in CDATA
•  An XML parser will not sections, such as <, >, &, ,
treat the contents of a ' and ".
]]>
CDATA section as
markup
<![CDATA[<<< STOP now >>>]]>
–  Used to simplify mark-up
by escaping a selection of
<![CDATA[<?xml version='1.0'?>
text
•  Entity references are not <person>
<name>Mike</name>
resolved <age>24</age>
•  Useful for including </person>]]>
source code in XML
Processing Instructions
•  Pass additional <?xml-stylesheet type='text/
css' href='style.css'?>
information to
application (e.g. parser) <?xml-stylesheet type='text/
•  Application-specific xsl' href='style.xsl'?>
instructions
<?myapp filename='test.txt'?>
•  Consists of a PI Target
and PI Value
•  Processed by
applications that
recognise the PI Target
Comments
•  Used to comment XML <!–- one-line comment -->
documents
<!--
•  Not considered to be This
part of an XML is a
document multi-line comment
•  An XML parser is not -->
required to pass
comments to higher-
level applications
Well formed XML
•  XML Declaration required
•  At least one element
–  Exactly one root element
•  Empty elements are written in one of two ways:
–  Closing tag (e.g. "<br></br>")
–  Special start tag (e.g. "<br />")
•  For non-empty elements, closing tags are required
•  Start tag must match closing tag (name & case)
•  Correct nesting of elements
•  Attribute values must always be quoted
•  Attribute minimisation not allowed
Document Type Declaration
•  Internal/embedded DTD <?xml version='1.0'
standalone='yes'>

<!DOCTYPE person [
<!ELEMENT person (name,
adult, nationality)>

]>

<?xml version='1.0'>
•  External DTD
<!DOCTYPE person SYSTEM
'person.dtd'>
What are XML Namespaces?
•  W3C recommendation (January 1999)
•  Each XML vocabulary is considered to own a
namespace in which all elements (and attributes) are
unique
•  A single document can use elements and attributes
from multiple namespaces
–  A prefix is declared for each namespace used within a
document.
–  The namespace is identified using a URI (Uniform Resource
Identifier)
•  An element or attribute can be associated with a
namespace by placing the namespace prefix before its
name (i.e. 'prefix:name')
–  Elements (and attributes) belonging to the default namespace
do not require a prefix
Example: XML Namespaces
<?xml version='1.0'?>

St. James’s Hospital <Accident Report


xmlns:sjh="http://hospital/sjh"
<!ELEMENT Patient (Name, DOB)> xmlns:dub=http://airport/dub >
<!ELEMENT Name (First, Last)>
<sjh:Patient>
<!ELEMENT First (#PCDATA)>
<sjh:Name>
<!ELEMENT Last (#PCDATA)>
<sjh:First>Mike</sjh:First>
<!ELEMENT DOB (#PCDATA)> <sjh:Last>Murphy</sjh:Last>
</sjh:Name>
<sjh:DOB>12/12/1950</sjh:DOB>
</sjh:Patient>
Airport Pharmacy
<dub:Drug>
<!ELEMENT Drug <dub:Name>Nurofen</dub:Name>
((Name|Substance), Code)> <dub:Code>IE-975-2</dub:Code>
</dub:Drug>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Substance (#PCDATA)> [...]
<!ELEMENT Code (#PCDATA)> </Accident Report>

© 2003 B. Jung
Why Namespaces?
•  Important for creating XML documents
containing different types of data
•  An XML document can be assembled using
elements (and attributes) from different XML
vocabularies
•  Must be able to
–  avoid conflicts between names
–  identify the vocabulary an element belongs to
XML Processing: DOM Processing

XML
Doc Character Navigation
Process
Stream API
into Tree

DOM Application

•  It views an XML tree as a data structure


•  It is quite large and complex...
–  Level 1 Core: W3C Recommendation, October 1998
•  primitive navigation and manipulation of XML trees
•  other Level 1 parts: HTML
–  Level 2 Core: W3C Recommendation, November 2000
•  adds Namespace support and minor new features
•  other Level 2 parts: Events, Views, Style, Traversal and Range
–  Level 3 Core: W3C Working Draft, April 2002
•  adds minor new features
•  other Level 3 parts: Schemas, XPath, Load/Save
Example: A Recipe
<recipe>
<title>Zuppa Inglese</title>
<ingredient name="egg yolks" amount="4" />
<ingredient name="milk" amount="2.5" unit="cup" />
<ingredient name="Savoiardi biscuits" amount="21" />
<ingredient name="sugar" amount="0.75" unit="cup" />
<ingredient name="Alchermes liquor" amount="1" unit="cup" />
<ingredient name="lemon zest" amount="*" /> <ingredient name="flour"
amount="0.5" unit="cup" />
<ingredient name="fresh whipping cream" amount="*" /> -
<preparation> <step>Warm up the milk in a nonstick sauce pan</step>
<step>In a large bowl beat the egg yolks with the sugar, add the flour and
combine the ingredients until well mixed.</step> <step>Add the milk, a little
bit at the time to the egg mixture, mixing well.</step> <step>Put the mixture
into the sauce pan and cook it on the stove at a medium low heat. Mix the cream
continuously with a wooden spoon. When it starts to thicken remove it from the
heat and pour it on a large plate to cool off.</step> <step>Stir the cream
now and then so that the top doesn't harden.</step> <step>Dip quickly both
sides of the lady fingers in the liquor. Layer them one at the time in a glass
bowl large enough to contain 7 biscuits.</step> <step>Spread 1/3 of the cream
and repeat the layer with lady fingers. Finish with the cream.</step>
</preparation>
<comment>Refrigerate for at least 4 hours better yet overnight. Before
serving decorate the zuppa inglese with whipped cream.</comment>
<nutrition calories="612" fat="49" carbohydrates="45" protein="4"
alcohol="2" />
</recipe>
Example: Getting a Recipe
import java.io.*;
import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.*;
public class FirstRecipeDOM {
public static void main(String[] args) {
try {
DOMParser p = new DOMParser();
p.parse(args[0]);
Document doc = p.getDocument();
Node n = doc.getDocumentElement().getFirstChild();
while (n!=null && !n.getNodeName().equals("recipe"))
n = n.getNextSibling();
PrintStream out = System.out;
out.println("<?xml version=\"1.0\"?>");
out.println("<collection>");
if (n!=null)
print(n, out);
out.println("</collection>");
} catch (Exception e) {e.printStackTrace();}}

© 2003 B. Jung COPYRIGHT © 2000-2003 ANDERS MØLLER & MICHAEL I. SCHWARTZBACH


XML Processing: SAX Processing

XML
Doc Character Stream Events API

SAX Application

•  An XML tree is not viewed as a data structure, but as a stream of


events generated by the parser.
•  The kinds of events are:
–  the start of the document is encountered
–  the end of the document is encountered
–  the start tag of an element is encountered
–  the end tag of an element is encountered
–  character data is encountered
–  a processing instruction is encountered
•  Scanning the XML file from start to end, each event invokes a
corresponding callback method that the programmer writes
Example: Getting total amount of Flour
import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.apache.xerces.parsers.SAXParser;

public static void main(String[] args) {


Flour f = new Flour();
SAXParser p = new SAXParser();
p.setContentHandler(f);
try { p.parse(args[0]); }
catch (Exception e) {e.printStackTrace();}
System.out.println(f.amount);
}

public class Flour extends DefaultHandler {


float amount = 0;
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) {
if (namespaceURI.equals("http://recipes.org") && localName.equals("ingredient")) {
String n = atts.getValue("","name");
if (n.equals("flour")) {
String a = atts.getValue("","amount"); // assume 'amount' exists
amount = amount + Float.valueOf(a).floatValue();
}
}
}

}
COPYRIGHT © 2000-2003 ANDERS MØLLER & MICHAEL I. SCHWARTZBACH
Summary
•  XML = eXtensible Markup Language
•  An XML document is a hierarchical data structure
using self-definable tags
•  Physical parts of XML document
–  XML Declaration
–  Elements
–  Attributes
–  Document Type Declaration
–  Entities
–  Processing Instructions
–  Comments
–  Character Data Sections
–  XML Namespaces
•  Two types of APIs popular for XML Processing: DOM &
SAX
•  </Lecture>
University of Dublin
Trinity College

Defining XML Vocabularies


DTDs and XML Schemas

Owen.Conlan@scss.tcd.ie
What is an XML vocabulary?
•  Synonyms
–  ‘Application of XML’
–  XML Language
•  Set of elements and attributes for
representing domain-specific information
•  “Instance” of a Mark Up Language
•  Defined by DTD or XML Schema
•  Some are approved by standard organisations
–  E.g. ebXML, MathML, XSL etc.

Remember: XML is syntax!


What is a DTD?
•  Document Type Definition,
•  Defines structure/model of XML documents
–  Elements and Cardinality
–  Attributes
–  Aggregation
•  Defines default ATTRIBUTE values
•  Defines ENTITIES
•  Stored in a plain text file and referenced by an XML document
(external)
•  Alternatively a DTD can be placed in the XML document itself
(internal)
•  Used to validate an XML document
•  “Is there a need for a DTD”?
Why use a DTD?
•  Applications may require all documents to be
consistent instances of a particular vocabulary
•  Indicates what structures and names can be
used in a document
•  Documents are constructed and named in a
conformant manner
–  Ease constructing (provide structure)
–  Ease parsing
•  Validate documents in order to find
inconsistencies
Valid XML
•  Well-formed plus conforms to DTD
•  All elements and attributes are declared within
a DTD (internal or external)
•  Elements and attributes match the
declarations in the DTD
Element Type Declaration
•  Define grouping of <!ELEMENT doc
(title, author, editor,
elements chapter, appendix)>
–  "(", “)"
<!ELEMENT title (#PCDATA)>

•  Define sequence of
<!ELEMENT author
elements (name | synonym)>
–  ",": followed-by
(Sequence) <!ELEMENT image EMPTY>

–  "|": logical or
<!ELEMENT paragraph
(Choice) (#PCDATA | bold | italic)*>
Element Type Declaration
•  Define occurrences of <!ELEMENT doc
(title, author+, editor?,
elements chapter+, appendix*)>
–  ?: zero-or-one
<!ELEMENT chapter
–  +: one-or-more
(title,
–  *: zero-or-more (section+ | paragraph+))>

<!ELEMENT list
(item?, item?, item)>

<!ENTITY % list "ordered |


unordered | definition">

<!ELEMENT paragraph
(#PCDATA | %list;)*>
Attribute List Declaration
•  Define type of attribute <!ATTLIST person
ssn ID #IMPLIED>
–  ID
–  IDREF <!ATTLIST adult
–  ENTITY age CDATA #REQUIRED>
–  NMTOKEN
–  NOTATION <!ATTLIST mml
version ‘1.0’ #FIXED>
•  Define default values of
attributes <!ATTLIST person
–  #REQUIRED sex (m | f) #REQUIRED>
–  #IMPLIED
–  #FIXED <!ATTLIST day
temperature (l | m | h) "l">
–  A list of values with
default selection
Entity Declaration
•  Internal entities <!ENTITY author
"Norman Walsh, Sun Corp.">
–  Built-in

•  External entities <!ENTITY copyright


SYSTEM "copyright.xml">
–  References to a file
(text, images etc.)

•  Parameter entities <!ENTITY % part


"(title?, (paragraph |
–  Used inside DTDs section)*)">
Simple DTD Example
<!ENTITY % part "(title?, (paragraph | section)*)">

<!ELEMENT doc (title, author+, chapter+, appendix*)>


<!ATTLIST doc type (book | article) "book“
isbn CDATA #REQUIRED>

<!ELEMENT title (#PCDATA)>


<!ELEMENT author (#PCDATA)>
<!ELEMENT chapter %part;>
<!ELEMENT appendix %part;>
<!ELEMENT section %part;>
<!ELEMENT paragraph (#PCDATA | url | ol)*>
<!ATTLIST paragraph type CDATA #IMPLIED>
<!ELEMENT ol (item+)>
<!ELEMENT item (paragraph+)>
<!ELEMENT url (#PCDATA)>
Example XML and related DTD
<database> <!DOCTYPE database [
<person age='34'>
<name>
<!ELEMENT database (person*)>
<title> Mr </title>
<firstname> John </firstname>
<firstname> Paul </firstname> <!ELEMENT person (name,hobby*)>
<surname> Murphy </surname> <!ATTLIST person age CDATA
</name> #IMPLIED>
<hobby> Football </hobby>
<hobby> Racing </hobby> <!ELEMENT name (title?, firstname
</person> +, surname)>

<person > <!ELEMENT hobby (#PCDATA)>


<name>
<!ELEMENT title (#PCDATA)>
<firstname> Mary </firstname>
<!ELEMENT firstname (#PCDATA)>
<surname> Donnelly </surname>
</name> <!ELEMENT surname (#PCDATA)>
</person>
</database> ]>
What are XML Schemas?
•  W3C Recommendation, 2 May 2001
–  Part 0: Primer
–  Part 1: Structures
–  Part 2: Datatypes
•  DTDs use a non-XML syntax and have a
number of limitations
–  no namespace support
–  lack of data-types
•  XML Schemas are an alternative to DTDs
•  Used to formally specify a "class" of XML
documents ( n "instance document")
•  Supports simple/complex data-types
Why use XML Schemas?
•  Uses an XML syntax
•  Supports simple and complex data-types such
as user-defined types
•  An XML document and its contents can be
validated against a Schema
•  Can validate documents containing multiple
namespaces
•  Schemas are more powerful than DTDs and
will eventually replace DTDs
Named Types – simple

<!ELEMENT firstname (#PCDATA)>


DTD

<xsd:element name="firstname" type="xsd:string"/>


XML Schema

<firstname>Michael</firstname>
XML doc. Instance
Named Types – complex

<!ELEMENT name (firstname, lastname)>


DTD

<xsd:complexType name="namePerson">
<xsd:sequence>
<xsd:element name="firstname" type="xsd:string"/>
XML Schema

<xsd:element name="lastname" type="xsd:string/>


</xsd:sequence>
</xsd:complexType>
<xsd:element name="name" type="namePerson"/>

<name>
XML doc. Instance

<firstname>Michael</firstname>
<lastname>Porter</lastname>
</name>
Primitive Datatypes
•  string •  gYearMonth
•  boolean •  gYear
•  decimal •  gMonthDay
•  float •  gDay
•  double •  gMonth
•  duration •  hexBinary
•  dateTime •  base64Binary
•  time •  anyURI
•  date •  QName
•  NOTATION

http://www.w3.org/TR/xmlschema-2/
Simple Type - Restriction

<simpleType name='celsiusBodyTemp'>
<restriction base='decimal'>
<totalDigits value='4'/>
XML Schema

<fractionDigits value='1'/>
<minInclusive value='36.4'/>
<maxInclusive value='40.5'/>
</restriction>
</simpleType>
<xsd:element name="temp" type="celsiusBodyTemp"/>

<temp>37.2</temp>
XML doc. Instance
Simple Type - Enumeration

<xsd:simpleType name="weekday">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Sunday"/>
<xsd:enumeration value="Monday"/>
XML Schema

<xsd:enumeration value="Tuesday"/>
[...]
</xsd:restriction>
</xsd:simpleType>
<xsd:element name="delivery" type="weekday"/>

<delivery>Tuesday</delivery>
XML doc. Instance
Complex Type - Cardinalities

<!ENTITY % fullname "title?, firstname*, lastname">


DTD

<!ELEMENT name (%fullname;)>

<xsd:complexType name="fullname">
<xsd:sequence>
<xsd:element name="title" minOccurs="0"/>
XML Schema

<xsd:element name="firstname" minOccurs="0"


maxOccurs="unbounded"/>
<xsd:element name="lastname"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="name" type="fullname"/>

<name>
XML doc. Instance

<firstname>Michael</firstname>
<firstname>Jason</firstname>
<lastname>Porter</lastname>
</name>
Complex Type – Derived Type by extension

<!ENTITY % name "title?, firstname*, lastname">


DTD

<!ELEMENT name (%name;, maidenname?)>

<xsd:complexType name="fullnameExt">
<xsd:complexContent>
<xsd:extension base="fullname">
<xsd:sequence>
XML Schema

<xsd:element name="maidenname" minOccurs="0"/>


</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:element name="name" type="fullnameExt"/>

<name>
XML doc. Instance

<firstname>Jane</firstname>
<lastname>Porter</lastname>
<maidenname>Hughes</maidenname>
</name>
Complex Type – Derived Type by Restriction

<xsd:complexType name="simpleName">
<xsd:complexContent>
<xsd:restriction base="fullname">
<xsd:sequence>
XML Schema

<xsd:element name="title" maxOccurs="0"/>


<xsd:element name="firstname" minOccurs="1"/>
<xsd:element name="lastname"/>
</xsd:sequence>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
<xsd:element name="name" type="simpleName"/>
<name>
XML doc. Instance

<firstname>Jane</firstname>
<lastname>Porter</lastname>
</name>
Structure - Sequence

<!ELEMENT name (title?, firstname*, lastname)>


DTD

<xsd:complexType name="fullname">
<xsd:sequence>
<xsd:element name="title" minOccurs="0"/>
XML Schema

<xsd:element name="firstname" minOccurs="0"


maxOccurs="unbounded"/>
<xsd:element name="lastname"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="name" type="fullname"/>

<name>
XML doc. Instance

<firstname>Michael</firstname>
<firstname>Jason</firstname>
<lastname>Porter</lastname>
</name>
Structure - Choice

<!ELEMENT pay (product, number, (cash | cheque))>


DTD

<xsd:complexType name="payment">
<xsd:sequence>
<xsd:element ref="product"/>
<xsd:element ref="number"/>
XML Schema

<xsd:choice>
<xsd:element ref="cash"/>
<xsd:element ref="cheque"/>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="pay" type="payment"/>

<pay>
XML doc. Inst.

<product>Ericsson Telefon MD110</product>


<number>1544-198-J</number>
<cash>IR£150</cash>
</pay>
Attributes
<!ELEMENT greeting (#PCDATA)>
DTD

<!ATTLIST greeting language CDATA "English">

<xsd:element name="greeting">
<xsd:complexType>
<xsd:simpleContent>
XML Schema

<xsd:extension base="xsd:string">
<xsd:attribute name="language" type="xsd:string"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>

<greeting language="German">Hello!</greeting>
XML doc. Instance
Attribute Groups
<!ELEMENT img EMPTY>
<!ATTLIST img src CDATA #REQUIRED
DTD

width CDATA #IMPLIED


height CDATA #IMPLIED>

<xsd:attributeGroup name="imgAttributes">
<xsd:attribute name="src" type="xsd:string" use="required"/>
<xsd:attribute name="width" type="xsd:integer"/>
<xsd:attribute name="height" type="xsd:integer"/>
XML Schema

</xsd:attributeGroup>

<xsd:element name="img">
<xsd:complexType>
<xsd:attributeGroup ref="imgAttributes"/>
<xsd:complexType>
</xsd:element>

<img src="XMLmanager.gif" width="60"/>


XML Inst.
Mixed Content
<!ELEMENT p (#PCDATA | b | i)*>
DTD

<!ELEMENT b (#PCDATA)>

<xsd:complexType name="bolditalicText" mixed="true">


<xsd:choice minOccurs="0" maxOccurs="unbounded"/>
<xsd:element ref="b" />
XML Schema

<xsd:element ref="i" />


</xsd:choice>
</xsd:complexType>

<xsd:element name="p" type="bolditalicText"/>

<p>This is <b>bold</b> and <i>italic</i> text</p>


XML doc. Instance
Empty Element
<!ELEMENT img EMPTY>
DTD

<!ATTLIST src CDATA #REQUIRED>

<xsd:element name="img">
<xsd:complexType>
<xsd:attribute name="src" type="xsd:string"/>
XML Schema

</xsd:complexType>
</xsd:element>

<img src="XMLmanager.gif"/>
XML doc. Instance
XML Schema Example
<?xml version="1.0" encoding="utf-8"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
<xsd:element name="book">

<xsd:complexType>
<xsd:sequence>
<xsd:element name="title" type="xsd:string"/>
<xsd:element name="author" type="xsd:string"/>
<xsd:element name="character” type="xsd:string"
minOccurs="0" maxOccurs="unbounded">
</xsd:element>
</xsd:sequence>

<xsd:attribute name="isbn" type="xsd:string"/>


</xsd:complexType>

</xsd:element>
</xsd:schema>
Summary
•  XML Vocabularies are defined using
–  DTD
–  XSD
•  DTDs/XSDs used to validate XML documents
•  XSD – more powerful than DTDs
–  Supports simple and complex data-types such as
user-defined types
–  Can validate documents containing multiple
namespaces

Вам также может понравиться