Вы находитесь на странице: 1из 12

XML SCHEMA

CHAPTER 3

Introduction
In the previous chapter we learnt DTD that is Traditional way of validating an XML document, which
were inherited from SGML. Over many times people have complained to the W3C about the
complexity of DTDs and have asked for something simpler. W3C for the above complaint assigned a
committee to work on the problem, and came up with a solution, which is more complex than DTDs
called XML Schemas. On the other hand XML Schemas are also far more powerful than DTDs ever-
were.

What DTDs cannot provide?


Specific data types for attribute type.
But where as Schemas supports data types for attributes.

A Schema is a set of rules for constraining the structure and articulating the information set of
XML documents.

Advantages of Schema over DTDs

XML Schema is based on XML, not some specialized syntax.


XML can be parsed and manipulated just like any other XML document.
XML Schemas support a verity of data types (int, floats, Booleans, dates, Strings...)
XML Schemas present an open-ended data model, which allows you to extend
vocabularies and establish inheritance relationships between elements without
invalidating documents.
XML Schemas support namespace integration, which allows you o associate
individual nodes of a document with type declarations in a schema.
XML Schemas support attribute groups, which allows you to logically combine
attributes. One of the original proponents of XML Schemas was Microsoft. Microsoft
documentation on
XML frequently decried DTD as being too complex and said that schemas would fix
the problem. In fact, the Microsoft implementation of XML Schemas in IE was
promptly outdated not long after it was introduced.

XML Schemas in Internet Explorer


As with many other developers, Microsoft got caught basing its software on a relatively early XML
specification, which promptly changed. As implemented in IE, Microsoft's Schemas are based on
the XML data.
Writing XML Schema
The DTD is very straight forward, primarily because XML schema is a pretty simple vocabulary by
most standards. The root element of all XML schema documents is schema, which is declared in
the DTD as potentially containing three child elements: AttributeType, ElementType and
Description. In addition to these elements, the XML schema vocabulary declares several other
elements that are used to describe document schemas. The following are the elements that make
you the XML schema vocabulary

Schema Serves as the root element for XML schema documents


Datatype Describes data types for elements and attributes
ElementType Describes a type of element
Element Identifies an element that can occur with in another element type
Group Organizes elements into groups for ordering purposes
Attribute type Describes a type of attribute
Attribute Identifies an attribute that can occur within an element type
Description Provides documentation for an element or attribute

The Schema Element


The schema element serves as the root (document) element for XML schema documents and acts as
a container for all other schema content. The schema element includes two attributes

Name The name of the schema

Xmlns The namespace for the schema

The name attribute establishes the name of the schema. The Xmlns attribute is very
important in that it establishes the namespace for the schema. This attribute must be set to
urn: schemas-microsoft-com: xml -data in order to use Microsoft's XML schema
implementation.
<Schema name="myschema" xmlns="urn:schemas-microsoft-com: xml -data">
<!--schema content goes here-->
</ Schema >

NOTE
Namespace are used in XML documents to guarantee uniqueness among element and attribute
names associated with a given XML vocabulary. Namespaces take the form of URLs, which are
often the familiar URLs, used to identify resources on the Web.

In addition to specifying the namespace for the schema, usually it is also necessary to specify the
namespace for XML schema data types. The data type namespace is typically assigned to the
xmlns:dt attribute and is set to urn: schemas-microsoft-com:datatypes. You must set this
namespace in order to use any of the XML schema datatypes, such as date, time, int and float.

<Schema name="myschema "xmlns="urn:schemas-microsoft-com:xml-data"


xmlns:dt="urn:schemas-microsoft-com:datatypes">
<! --Schema content goes here-->
</schema>

The schema element can contain child elements of type AttributeType, ElementType and
Description. The AttributeType and ElementType elements define attribute types and element
types.

The ElementType Element


The Element Type element is used to define element types that establish the schema of documents.
The ElementType element can contain datatypes, element, group, AttributeType, attribute and
Description child elements. The element attribute identifies an instance of a child element with in the
element; you use the element attribute to establish the content model for an element type.
Attributes for an element type are established using the AttributeType and attribute elements. The
AttributeType element defines a type of attribute, while the attribute element identifies an actual
attribute of the element type. Any attribute types defined with in an ElementType element are
considered local to that element.
The ElementType element includes several attributes for defining the specific parameters of the
element type:

name The name of the element

model Whether the content model is open or closed

content The type of content contained within the element

order The order of the child elements and groups contain within the element

dt:type The type of the element

The following are the examples of element types defined using the ElementType element:

<ElementType name="name" content="textOnly" dt:type="string"/>


<ElementType name="type" content="textOnly" dt:type="string"/>
<ElementType name="product" content="eltOnly" model="closed" order="seq">
<element type="name"/>
<element type="type"/>
<ElementType/>
<ElementType name="products" content="eltOnly" model="closed" order="seq">
<element type="product"/>
<ElementType/>

Notice that the name and type elements are first declared using the ElementType element, and
then are identified within the content model of the session element using the element.

The name and model Attributes


The name attribute is used to specify the name of the ElementType and is required attribute. This
value must be unique for element types within the scope in which it is defined. The model attribute
specifies whether the schema document adheres to an open or closed content model. An open model
allows additional elements to be defined within the element type that aren't declared in schema, for a
very extensible schema. Element types will assume an open model by default.

The content Attribute


The content attribute of ElementType is used to establish the type of content contained within the
element type.
The following are acceptable values for this attribute:

empty The element type doesn't contain any content


textOnly The element type can only contain text (if the content model is open, the element
type may also contain other unspecified elements)
eltOnly The element type can only contain the specified child elements
mixed The element type can contain the mixture of text and specified child elements (if the
content model is open, the element type may also contain other unspecified
elements)

The order Attribute


The order attribute is used to establish the order and frequency of the group of child elements
contained within the element type. The following are acceptable values for this attribute:

one Only one of a set of elements is allowed


seq The elements must occur in the specified sequence
many The elements can occur any number of times in any order
The dt:type Attribute
The dt:type Attribute is used to establish the type of content contained within the element type.
The types allowed in the dt:type attribute match those that are allowed in the datatypes element.
XML Schema datatypes will be covered later.

The element Element


The Element element is used to declare an instance of an element with a group or element type.
The Element element includes three attributes for describing additional information about an
element instance:

type The type of element


minoccurs The minimum number of times the element must occur
maxoccurs The maximum number of times the element must occur

The type attribute is used to specify the type of the element. The value assigned to the type
attribute must be the name of an element type already declared in the schema.
The minoccurs and maxoccurs attributes are used to establish the number of times an element
can occur within a group or element type. Both attributes have default values of 1 in the XML-Data
note, which means that an element must occur exactly one by default.
The relationship between the minoccurs and maxoccurs Attributes and the number of times an
Element or Group can occur

minoccurs maxoccurs # Of Times Element /Group can occur


0 1 0or1
1 1 1
0 * Any number of times
1 * At least once
>0 * At least minoccurs times
>maxoccurs >0 0
Any value <minoccurs 0

Note
The table applies to the group element, because groups also have minOccurs and
maxOccurs attributes that serve the same purpose.

The following is an example of the element used to declare element instances within an element
type:
<ElementType name="location" content="textOnly"/>
<ElementType name="comments" content="textOnly"/>
<ElementType name="session" model="closed" content="eltOnly" order="seq">
<element type="location" minOccurs="1" maxOccurs="1"/>
<element type="comments" minOccurs="0" maxOccurs="1"/>
</ElementType>

The Group Element


The group element is used to group elements for organizational purpose and for establishing
complex content models. A complex content model consists of more than one group of elements.
The group element includes three attributes for fine-tuning groups:

order The order of the child elements contained within the group
minoccurs The minimum number of times the group must occur
maxOccurs The maximum number of times the group must occur

The order attribute works exactly like its counterpart in the ElementType element. The following
are acceptable values for this attribute:

one Only one of a set of elements is allowed within the group.


seq The elements must occur in the specified sequence in the group.
many The elements can Occur any number of times and in any order in the group.

The minOccurs and maxOccurs attributes play the exact same role in the group element as they
did in the element, which is constraining the number of times the group can Occur.

The AttributeType Element


The attribute type element is used to define attribute types for use in elements. Similar to the
ElementType element, the attribute type element simply defines an attribute type. To actually
declare an attribute as part of an element, you must use the attribute element, which reference an
attribute type element. Attribute type may be defined at the top level of a schema document or
within individual element type. This allows you to create either global attributes or local attributes
within a given scope. Global attributes are handy because they can be used in multiple elements.
On the other hand, local attributes can be used within a given scope to supercede another
attribute of the same name.
The AttributeType element includes the following attributes to allow you to fully describe an
attribute type:

name The name of the attribute type


dt:type The data type of the attribute type
dt:values The list of possible values for an enumerated attribute; only applicable when
dt:type is set to enumeration
default The default value for the attribute
required Flag indicating whether the attribute must be provided in the element

The name attributes specifies the name of the attribute type and is a required attribute. This name
must be unique among attributes within a given scope. The dt:type attribute specifies the data
type of the attribute.

The dt:values attribute is used to specify a list of possible values for enumerated attributes. This
attribute is applicable only when dt:type is set to enumeration. The list of enumerated attribute
values is specified as a single string with spaces between each possible value.
The following is an example of an enumerated attribute definition:

<AttributeType name="type" dt:type="enumeration" dt:value="running cycling swimming"/>

In this example, the available values that can be assigned to the type attribute are running,
cycling and swimming. Any value other than one of these three will be considered an error during
validation.

The default Attribute of the AttributeType element is used to establish the default value for the
attribute type. The following is an example of establishing the default value of an attribute:

<AttributeType name="type" dt:type="enumeration" dt:values="running cycling swimming"


default="running"/>

The required attribute is basically a flag that is used to specify whether the attribute type is
required of the element in which it is defined. Acceptable values for the required attribute are yes
and no, which indicate the requirement of the attribute type.
The Attribute Element
The attribute element is used to declare an instance of an attribute for an element type. The
attribute element includes three attributes for describing additional information about an attribute
instance:

type The type of the attribute


default The default value for the attribute
required Flag indicating whether the attribute must be provided in the element

The type attribute is used to specify the type of the attribute. The value assigned to the type
attribute must be the name of an attribute type already declared in the schema. The type attribute
is what ties attribute instances to their associated attribute types. The default and required
attribute serve the same purposes as their equivalents in the AttributeType element, and they will
supercede the equalent attributes if they are set in the attribute type.

The following is an example of the attribute element used to declare attribute instances within an
element type.

<AttributeType name="type" dt:type="enumeration" dt:values="running cycling swimming"/>


<AttributeType name="date" dt:type="date"/>
<ElementType name="session" content="eltOnly" order="seq">
<element type="duration" minoccurs="1" maxoccurs="1"/>
<element type="distance" minoccurs="1" maxoccurs="1"/>
<element type="location" minoccurs="1" maxoccurs="1"/>
<element type="comments" minoccurs="0" maxoccurs="1"/>
<attribute type="type" default="running"/>
<attribute type="date"/>
</ElementType>

In this example, the type and date attributes are first declared using the AttributeType element
and then associated with an element type using the attribute element.
Notice that the default value of the type attribute is set in the attribute element instead of the
AttributeType element.

Note
There is no constraint on the order of attributes within an element, but there can be no
more than one attribute of a given name per element.
The description Element
The last element used in XML Schema documents is the description element, which simply
provides a means of placing a text description within a schema. The description element is a text
only element that is designed for documentation purposes. You can use description element in
any way you choose to provide documentation about an XML Schema construct. The following is
an example of how you might add documentation to element type:

<ElementType name="trainlog" content="eltOnly">


<description>
This element type represents training log consisting of one or more training sessions.
</description>
<element type="session" minOccurs="1" maxOccurs="*"/>
</ElementType>

XML Schema Data Types


As you know, XML DTDs offer a limited number of data types and they are rather primitive. For all
practical purposes, XML really only supports a string data type, which is extremely limiting if
you're creating structured document schemas. The XML-Data note defines a number of rich data
types that can be used to specify familiar data types, such as integers, floating point numbers,
dates, and times, to name a few. As of Internet Explorer 5.0, XML Schema supports all of these
data types in elements and hopefully will support them for attributes at some point in the future.
XML Schema data types are referenced from the urn:schema-microsoft-com: datatypes data types
namespace. To make referencing the data types easier, you must declare this namespace at the
document level of your schema documents. The data type namespace is typically assigned to the
xmlns:dt attribute, which means that you reference XML Schema data type by preceding them
with dt:.

Example
<Schema name="Myschema" xmlns="urn:schema-microsoft-com:xml-data"
xmlns:dt="urn:schema-microsoft-com:datatypes">
-
-
-
</Schema>

The whole point of declaring the XML Schema data type namespace is so you can use the data
types it supports. The following is a list of these data types, which go far beyond the limited data
types supported in XML 1.0:

char Character (text string with a length of one)


boolean Boolean (0 or 1)
int Whole number (integer)
float Real (floating point) number with fractional part and optional exponent
number Real number (same as float)
fixed.14.4 Real number with 14 whole digits and 4 fractional digits
i1 One-byte integer
i2 Two-byte integer
i4 Four-byte integer
r4 Four-byte real number
r8 Eight-byte real number (same as float)
ui1 One-byte unsigned integer
ui2 Two-byte unsigned integer
ui4 Four-byte unsigned integer
bin.hex Hexadecimal (base 16) number
bin.base64 Base 64 number
date Date (without time or zone)
dateTime Date with optional time (without time zone)
dateTime.tz Date with optional time and time zone
time Time (without data and time zone)
time.tz Time with time zone (without data)
uri Universal Resource Identifier (URI)
uuid Global identifier

The following are the primitive data types available for use in XML Schema:

string A string type


enumeration An enumerated type (attributes only)
notation A NOTATION type
entity The ENTITY type
entities The ENTITIES type
id The ID type
idref The IDREF type
idrefs The IDREFS type
nmtoken The NMTOKEN type
nmtokens The NMTOKENS type

Employees.xml

<?xml version="1.0"?>
<employees xmlns="x-schema:employees.xml">
<employee>
<eid id="A100">A100</eid>
<ename>Surya</ename>
<sal>50000.00</sal>
<desig>CEO</desig>
<phno>3751135</phno>
<email>suryaactive@hotmail.com</email>
</employee>
<employee>
<eid id="A101">A101</eid>
<ename>Rajesh</ename>
<sal>30000.00</sal>
<desig>Director</desig>
<phno>3751238</phno>
<email>insbrajesh@rediffmail.com</email>
</employee>
</employees>
empSchema.xml

Note
Microsoft Schema extension is .xml, whereas W3C Schema file extension is .xsd
<?xml version="1.0"?>
<Schema xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-
com:datatypes">
<!--
Above is a Microsoft namespace for Schemas data & datatypes
xmlns=>XML Namespace
urn=> Uniform Resource Namespace
dt=> datatype
-->
</Schema>

How do you associate a schema with this document as far as Internet Explorer is concerned?
You do so by specifying a default namespace attribute in the root element, and prefacing the
name of the schema file with x-schema: like this:
<?xml version="1.0"?>
<programming_team xmlns="x-schema:schema1.xml">
<programmer>Fred Samson</programmer>
<programmer>Edward</programmer>
</programming_team>

Here, I'm naming the schema file schema1.xml (IE Schema does not insist on any special
extension for schema file)
Creating Schema file you can name the schema using the name attribute in Schema.

<Schema name="schema1" xmlns="urn:schemas-microsoft-com:xml-data"


xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ElementType name="programming" content="textonly" model="closed"/>
<ElementType name="programming_team" content="eltonly" model="closed">
<element type="programming" minOccurs="1" mixOccurs="*"/>
</ElementType>
</Schema>

One of the advantages of using schemas is that they allow you to specify the actual data types
that you want to use, but those data types weren't fully fleshed out at the time Microsoft decided
to implement schemas, so Microsoft implemented its own. To create a schema for Internet
Explorer, you set up a default namespace