XML SchemaBestPracticeIn

3
Best Practices in a Nutshell

Below is a very brief synopsis of the Best Practice guidelines. Each of the items will be
elaborated upon in great detail in the following guidelines.
1. Make targetNamespace the default namespace.
2. Make two identical copies of all your schemas, where the copies differ only in the value of
elementFormDefault (in one copy set elementFormDefault=qualified, in the other copy
set elementFormDefault=unqualified)
3. Uniquely identify all schema components with the id attribute. Note: this is NOT the same
thing as creating an element with an attribute that has an ID-datatype. Rather, what is being
referred to here is the capability to associate an id attribute with every schema component
(types, elements, attributes, etc). Here are some examples:
<xsd:element name=elevation type=xsd:integer
id=flight:aircraft:elevation/>
<xsd:complexType name=publication id=wrox:book:publication/>
This provides a finer level of granularity for identifying components than does namespaces,
which provides only a course level of granularity.
4. Postpone decisions as long as possible.
4.1 Postpone binding schema components to a namespace.
Corrollary: Dont give schemas a targetNamespace. Let schemas which <include> your no-
namespace schema supply a targetNamespace, one that makes sense to the <include>ing
schema
4.2 Postpone binding a type reference to an implementation, i.e., use dangling types.
Corrollary: In an <import> element the schemaLocation attribute is optional. Dont use it.
5. Create extensible schemas.
5.1 Recognize your limitations as a schema designer, i.e, be smart enough to know that youre
not smart enough to anticipate all the varieties of data that an instance document author
might need to use in creating an instance document.
Corrollary: use the <any> element.
6. Recognize that with XML Schemas you will not be able to express all your business rules.
Express those business rules using either XSLT or Schematron.
4
Default Namespace - targetNamespace or
XMLSchema?
Table of Contents
Issue
Introduction
Approach 1: Default XMLSchema, Qualify targetNamespace
Approach 2: Qualify XMLSchema, Default targetNamespace
Approach 3: No Default Namespace - Qualify both XMLSchema and
targetNamespace
Best Practice
Issue
When creating a schema should XMLSchema (i.e., http://www.w3.org/2001/XMLSchema) be the
default namespace, or should the targetNamespace be the default, or should there be no default
namespace?
Introduction
Except for no-namespace schemas, every XML Schema uses at least two namespaces - the
targetNamespace and the XMLSchema (http://www.w3.org/2001/XMLSchema) namespace, e.g.,
Library
Book
BookCatalogue
http://www.publishing.org (targetNamespace)
element
annotation
documentation
complexType
schema
sequence
http://www.w3.org/2001/XMLSchema
string
integer
Library.xsd
This schema is comprised
of components from two
namespaces. Which
namespace should be
the default?
CardCatalogueEntry
5
There are three ways to design your schemas, with regards to dealing with these two namespaces:
1. Make XMLSchema the default namespace, and explicitly qualify all references to components
in the targetNamespace.
2. Vice versa - make the targetNamespace the default namespace, and explicitly qualify all
components from the XMLSchema namespace.
3. Do not use a default namespace - explicitly qualify references to components in the
targetNamespace and explicitly qualify all components from the XMLSchema namespace.
Lets look at each approach in detail. In the following discussions we will consider this scenario:
targetNamespace="http://www.publishing.org"
include BookCatalogue.xsd
"ref" the Book element in BookCatalogue

Library.xsd
BookCatalogue.xsd
The BookCatalogue schema must
either:
- have the same namespace as
the Library schema, or
- have no namespace.
Declare an element "Book"
globally so that it can be
reused by other schemas,
e.g., the Library schema
6
Approach 1: Default XMLSchema, Qualify targetNamespace
Below is a Library schema which demonstrates this design approach. It <include>s a
BookCatalogue schema, which contains a declaration for a Book element. The Library schema
references (refs) the Book element.
<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.library.org"
xmlns:lib="http://www.library.org"
elementFormDefault="qualified">
<include schemaLocation="BookCatalogue.xsd"/>
<element name="Library">
<complexType>
<sequence>
<element name="BookCatalogue">
<complexType>
<sequence>
<element ref="lib:Book"
maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
</sequence>
</complexType>
</element>
</schema>
Default namespace is XMLSchema
Set "lib" to point to the targetNamespace
Qualify the reference to Book
Note that XMLSchema is the default namespace. Consequently, all the components used to
construct the schema - element, include, complexType, sequence, schema, etc - have no
namespace qualifier on them.
There is a namespace prefix, lib, which is associated with the targetNamespace. Any references
(using the ref attribute) to components in the targetNamespace (Library, BookCatalogue,
Book, etc) are explicitly qualified with lib (in this example there is a ref to lib:Book).
Advantages:
If your schema is referencing components from multiple namespaces then this approach gives a
consistent way of referring to the components (i.e., you always qualify the reference).
Disadvantages:
Schemas which have no-targetNamespace must be designed so that the XMLSchema
components (element, complexType, sequence, etc) are qualified. If you adopt this approach to
designing your schemas then in some of your schemas you will qualify the XMLSchema
components and in other schemas you wont qualify the XMLSchema components. Changing
from one way of designing your schemas to another way can be confusing.
7
Approach 2: Qualify XMLSchema, Default targetNamespace
This design approach is the mirror image of the first approach:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.library.org"
<xsd:include schemaLocation="BookCatalogue.xsd"/>
<xsd:element name="Library">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="BookCatalogue">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="Book"
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Default namespace is targetNamespace
Book is in the default namespace (thus,
no namespace qualifier required)
Set "xsd" to point to the XMLSchema
namespace
Qualify the XMLSchema components
(schema, include, element, complexType,
sequence)
With this approach all the components used to construct a schema are namespace qualified (with
xsd:).
There is a default namespace declaration that declares the targetNamespace to be the default
namespace. Any references to components in the targetNamespace are not namespace qualified
(note that the ref to Book is not namespace qualified).
Advantages:
Schemas which have no-targetNamespace must be designed so that the XMLSchema
components (element, complexType, sequence, etc) are qualified. This approach will work
whether your schema has a targetNamespace or not. Thus, with this approach you have a
consistent approach to designing your schemas - always namespace-qualify the XMLSchema
components.
Disadvantages:
If your schema is referencing components from multiple namespaces then for some references
you will namespace-qualify the reference, whereas other times you will not (i.e., when you are
referencing components in the targetNamespace). This variable use of namespace qualifiers in
referencing components can be confusing.
8
Approach 3: No Default Namespace - Qualify both XMLSchema and
targetNamespace
This design approach does not have a default namespace:
xmlns:lib="http://www.library.org"
<xsd:element name="Library">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="BookCatalogue">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="lib:Book"
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Qualify the XMLSchema components
(schema, include, element, complexType,
sequence)
Set "xsd" to point to the XMLSchema
namespace
Set "lib" to point to the targetNamespace
Qualify the reference to Book
Note that both the XMLSchema components are explicitly qualified, as well as are references to
components in the targetNamespace.
Advantages:
[1] Schemas which have no-targetNamespace must be designed so that the XMLSchema
components (element, complexType, sequence, etc) are qualified. With this approach all
your schemas are designed in a consistent fashion.
[2] If your schema is referencing components from multiple namespaces then this approach
gives a consistent way of referencing components (i.e., you always qualify the reference).
Disadvantages:
Very cluttered: being very explicit by namespace qualifying all components and all references
can be annoying when reading the schema.
Best Practice
There is no clear-cut best practice with regards to this issue. In large part it is a matter of
personal preference.
9
ShoulditbeanElementoraType?
TableofContents
Issue
Introduction
BestPractice
Issue
Whenshouldanitembedeclaredasanelementversuswhenshoulditbedefinedasatype?
Introduction
Thisissueisbestdiscussedbywayofexample:
Example
Should Warranty be declared as an element:
<xsd:element name=Warranty>
...
</xsd:element>
or as a type:
<xsd:complexType name=Warranty>
...
</xsd:complexType>
BestPractice
[1] Whenindoubt,makeitatype.Youcanalwayscreateanelementfromthetype,ifneeded.
Withatype,otherelementscanreusethattype.
Example.IfyoucantdecidewhethertomakeWarrantyanelementoratype,thenmakeita
type:
...
</xsd:complexType>
10
IfyoudecidelaterthatyouneedaWarrantyelement,youcancreateoneusingtheWarranty
type:
<xsd:element name=Warranty type=Warranty/>
RecallthatelementsandtypesareindifferentSymbolSpaces.Hence,youcanhaveanelement
andtypewiththesamename.
[2] Iftheitemisnotintendedtobeanelementininstancedocumentsthendefineitasatype.
Example.Ifyouwillneverseethisinaninstancedocument:
<Warranty>
...
</Warranty>
thendefineWarrantyasacomplexType.
[3] Iftheitemscontentistobereusedbyotheritemsthendefineitasatype.
Example.IfotheritemsneedtouseWarrantyscontent,thendefineWarrantyasatype:
...
</xsd:complexType>
...
<xsd:element name=PromissoryNote type=Warranty/>
<xsd:element name=AutoCertificate type=Warranty/>
Theexampleshowstwoelements-PromissoryNoteandAutoCertificate-reusingtheWarranty
type.
[4] ftheitemisintendedtobeusedasanelementininstancedocuments,anditsrequiredthat
sometimesitbenillableandothertimesnot,thenitmustbedefineditasatype.
Example.Letsfirstseehownottodoit.SupposethatwecreateaWarrantyelement:
...
</xsd:element>
TheWarrantyelementcanbereusedelsewherebyrefingit:
<xsd:element ref=Warranty/>
SupposethatwealsoneedaversionofWarrantythatsupportsanilvalue.Youmightbetempted
todothis:
<xsd:element ref=Warranty nillable=true/>
11
Thisisnotlegal.Thisdynamicmorphingcapability(i.e.,reusingaWarrantyelementdeclaration
whilesimultaneouslyaddingnillability)cannotbeachievedusingelements.Thereasonforthis
isthattherefandnillableattributesaremutuallyexclusive-youcanuseref,oryoucanuse
nillable,butnotboth.Theonlywaytoaccomplishthedynamicmorphingcapabilityisby
definingWarrantyasatype:
...
</xsd:complexType>
andthenreusingthetype:
<xsd:element name=Warranty nillable=true type=Warranty/>
...
<xsd:element name=Warranty type=Warranty/>
InthefirstcaseWarrantyisnillable.Inthesecondcaseitsnotnillable.
[5] Iftheitemisintendedtobeusedasanelementininstancedocumentsandotherelements
aretobeallowedtosubstitutefortheelement,thenitmustbedeclaredasanelement.
Example.Supposethatwewouldliketoenableinstancedocumentauthorstouse
interchangeablythevocabulary(i.e.,tagname)Warranty,Guarantee,orPromise,i.e.,
<xsd:Warranty>
...
</xsd:Warranty>
...
<xsd:Guarantee>
...
</xsd:Guarantee>
...
<xsd:Promise>
...
</xsd:Promise>
Toenablethissubstitutable-tag-namecapability,Warranty,Guarantee,andPromisemustbe
declaredaselements,andmademembersofasubstitutionGroup:
...
</xsd:element>
<xsd:element name=Guarantee substitutionGroup=Warranty/>
<xsd:element name=Promise substitutionGroup=Warranty/>
75
Extending XML Schemas
Table of Contents
Issue
Tutorial
Introduction
Three Options for Extending XML Schemas
Supplement with Another Schema Language
Write Code to Express Additional Constraints
Express Additional Constraints with an XSLT/XPath Stylesheet
Advantages/Disadvantages of the Three Options
Advantages/Disadvantages of Supplementing with Another Schema
Language
Advantages/Disadvantages of Writing Code to Express Additional
Constraints
Advantages/Disadvantages of Expressing Additional Constraints with
an XSLT/XPath Stylesheet
Issue
What is Best Practice for checking instance documents for constraints that are not expressable
by XML Schemas?
Introduction
XML Schemas is very powerful. However, it is not all powerful. There are many constraints
which cannot be expressed with XML Schemas. Here are some examples:
Ensure that the value of the aircraft <Elevation> element is greater than the value of the
obstacle <Height> element.
Ensure that:
if the value of the attribute, mode, is water then the value of the element <Transportation>
is either airplane or hot-air balloon.
if the value of the attribute, mode, is air then <Transportation> is either boat or hovercraft.
if the value of the attribute, mode, is ground then <Transportation> is either car or bicycle.
Ensure that the value of <PaymentReceived> is equal to the value of <PaymentDue>, where
these elements are in separate documents!
To check all these constraints we will need to supplement XML Schemas with another tool.
76
Example. Consider this simple instance document:
<Demo xmlns="http://www.demo.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.demo.org demo.xsd">
<A>10</A>
<B>20</B>
</Demo>
With XML Schemas we can check the following constraints:
the Demo (root) element contains a sequence of elements, A followed by B
the A element contains an integer
the B element contains an integer
In fact, heres an XML Schema which implements these constraints:
targetNamespace="http://www.demo.org"
xmlns="http://www.demo.org"
<xsd:element name="Demo">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="A" type="xsd:integer"/>
<xsd:element name="B" type="xsd:integer"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
77
XML Schemas does not give us the capability to express the following constraint:
the value of A must be greater than the value of B
So what do we do to check this constraint? (Interestingly, for the above instance document, the
XML Schema that is shown would accept it as valid, whereas, in fact it is not since the value of A
is less than the value of B. We need something else to check this constraint.) There are three
options.
Three Options for Extending XML Schemas
(1) Supplement with Another Schema Language
There are many other schema languages besides XML Schemas:
Schematron
TREX
RELAX
SOX
XDR
HOOK
DSD
Assertion Grammars
xlinkit
Thus, the first option is to use one (or more) of these schema languages to express the additional
constraints. Lets look at one of these languages - Schematron.
Using Schematron you embed the additional constraints (as assertions) within the schema
document (within <appinfo> elements). A Schematron engine will then extract the assertions and
validate the instance document against the assertions.
XML
Schema
Schematron
Extract assertions
from <appinfo>
elements
XML
Data
Valid/invalid
78
Thus, heres the architecture for determining if your data meets all constraints:
Your XML
data
Schema
Validator
XML
Schema
valid
Now you know your XML data is valid!
Schematron valid
With Schematron
state a constraint using an <assert> element
a <rule> is comprised of one or more <assert> elements
a <pattern> is comprised of one or more <rule> elements.
<pattern>
<rule>
<assert> </assert>
<assert> </assert>
</rule>
</pattern>
The <pattern> element is embedded within an XML Schema <appinfo> element.
79
Heres an example of an assertion:
<assert test="d:A > d:B">A should be greater than B</assert>
XPath expression Text description of the constraint
In the <assert> element you express a constraint using an XPath expression. Additionally, you
state the constraint in natural language. The later helps make the schemas self-documenting.
The <rule> element is used to specify the context for the <assert> elements.
This is read as: Within the context of the Demo element we assert that the A element should be
greater than the B element.
You can associate an <assert> with a <diagnostic> element. The <diagnostic> element is used for
printing error messages when the XML data fails the assertion. The <diagnostic> element is
embedded within a <diagnostics> element, which immediately follows the <pattern> element.
<pattern name=Check A greater than B>
<rule context=d:Demo>
<assert test=d:A > d:B diagnostics=lessThan>
A should be greater than B
</assert>
</rule>
</pattern>
<diagnostics>
<diagnostic id=lessThan>
Error! A is less than B
A = <value-of select=d:A/>
B = <value-of select=d:B/>
</diagnostic>
</diagnostics>
<rule context=d:Demo>
<assert test=d:A > d:B>A should be greater than B</assert>
</rule>
80
To identify the schematron elements, they must be namespace-qualified with sch:.
Heres what demo.xsd looks like after enhancing it with the Schematron elements: (next page)
The schema document shown earlier is enhanced with Schematron directives:
targetNamespace="http://www.demo.org"
xmlns="http://www.demo.org"
xmlns:sch="http://www.ascc.net/xml/Schematron
<xsd:annotation>
<xsd:appinfo>
<sch:title>Schematron validation</sch:title>
<sch:ns prefix="d" uri="http://www.demo.org"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element name="Demo">
<xsd:annotation>
<xsd:appinfo>
<sch:pattern name="Check A greater than B">
<sch:rule context="d:Demo">
<sch:assert test="d:A > d:B" diagnostics="lessThan">A should be greater than B</sch:assert>
</sch:rule>
</sch:pattern>
<sch:diagnostics>
<sch:diagnostic id="lessThan">
Error! A is less than B. A = <sch:value-of select="d:A"/> B = <sch:value-of select="d:B"/>
</sch:diagnostic>
</sch:diagnostics>
</xsd:appinfo>
</xsd:annotation>
<xsd:complexType>
<xsd:sequence>
<xsd:element name="A" type="xsd:integer" minOccurs="1" maxOccurs="1"/>
<xsd:element name="B" type="xsd:integer" minOccurs="1" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Schematron will extract the directives out of the schema document to create a Schematron
schema. Schematron will then validate the instance document against the Schematron schema.
The key points to note about using Schematron are:
The additional constraints are embedded in <appinfo> elements within the XML
Schema document
The constraints are expressed using <assert> elements
81
(2) Write Code to Express Additional Constraints
The second option is to write some Java, Perl, C++, etc code to check additional constraints.
(3) Express Additional Constraints with an XSLT/XPath Stylesheet
The third option is to write a stylesheet to check the constraints.
Your XML
data
Schema
Validator
XML
Schema
XSL
Processor
XSLT Stylesheet
containing code
to check additional
constraints
valid
valid
Now you know your XML data is valid!
For example, the following stylesheet checks instance documents to see if the contents of the A
element is greater than the contents of the B element:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:d="http://www.demo.org"
version="1.0">

<xsl:output method="text"/>
<xsl:template match="/">
<xsl:if test="/d:Demo/d:A < /d:Demo/d:B">
<xsl:text>Error! A is less than B</xsl:text>
<xsl:text>
</xsl:text> 
<xsl:text>A = </xsl:text><xsl:value-of select="/d:Demo/d:A"/>
<xsl:text>
</xsl:text> 
<xsl:text>B = </xsl:text><xsl:value-of select="/d:Demo/d:B"/>
</xsl:if>
<xsl:if test="/d:Demo/d:A >= /d:Demo/d:B">
<xsl:text>Instance document is valid</xsl:text>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
82
Upon running this stylesheet on the above XML data the following output is generated:
Error! A is less than B. A = 10, B = 20
This is exactly what is desired. Thus, the methodology for this third option is:
check as many constraints as you can using XML Schemas
for all other constraints write a stylesheet to do the checking
If both the schema validator and the XSL processor generate a positive output then you know that
your instance document is valid. This combination of XML Schemas plus stylesheets provides
for a powerful constraint checking mechanism.
Advantages/Disadvantages of the Three Options
Advantages
Collocated Constraints: Above we saw how Schematron can be used to express
additional constraints. We saw that you embed the Schematron directives within
the XML Schema document. There is something very appealing about having all
the constraints expressed within one document rather than being dispersed over
multiple documents. [This ability to collocate constraints within the schema
document is a feature of Schematron. The other schema languages do not have
this capability.]
Simplicity: Many of the schema languages were created in reaction to the com-
plexity and limitations of XML Schemas. Consequently, most of them are rela-
tively simple to learn and use.
Disadvantages
Multiple Schema Languages may be Required: Each schema language has its
own capabilities and limitations. Multiple schema languages may be required to
express all the additional constraints. For example, while Schematron is very
powerful it is not able to express all constraints (for an example, see the ISBN
simpleType definition on http://www.xfront.com). Also, Schematron forces you to
go through many contortions to express your assertion. This is due to the fact that
it does not have loops and variables.
Yet Another Vocabulary (YAV): There are many schema languages, each with its
own vocabulary and semantics. How do you find a schema language with the
capability to express your problems additional constraints? You have to take the
time to learn each of the schema languages. Hopefully, you will find one that
supports expression of your constraints. Although relatively easy to learn and use,
it still takes time to learn a new vocabulary and semantics.
83
Questionable Long Term Support: In most cases the schema languages listed
above were created by a single author. These authors are busy, very bright people.
Someday their interests will move to something else. At that time you may be left
with a product which is no longer supported. [Editors Note: Schematron is
basically a few XSLT/XPath stylesheets. Consequently, Schematron will be
supported as long as there are XSL processors. Also, the author of RELAX has
publically promised to support RELAX for the next five years.]
(2) Write Code to Express Additional Constraints
Advantages
Full Power of a Programming Language: The advantage of this option is that
with a single programming language you can express all the additional con-
straints.
Disadvantages
Not Leveraging other XML Technologies: There are other XML technologies
that could be used to express the additional constraints in a declarative manner,
without going through the compiling, linking, executing effort.
(3) Express Additional Constraints with an XSLT/XPath Stylesheet
Advantages
Application Specific Constraint Checking: Each application can create its own
stylesheet to check constraints that are unique to the application. We can enhance
the schema without touching it!
Core Technology: XSLT/XPath is a core technology which is well supported,
well understood, and with lots of material written on it.
Expressive Power: XSLT/XPath is a very powerful language. Most, if not every,
constraint that you might ever need to express can be expressed using XSLT/
XPath. Thus you dont have to learn multiple schema languages to express your
additional constraints
Long Term Support: XSLT/XPath is well supported, and will be around for a
long time.
Disadvantages
Separate Documents: With this approach you will write your XML Schema
document, then you will write a separate XSLT/XPath document to express
additional constraints. Keeping the two documents in synch needs to be carefully
managed.
64
Creating Extensible Content Models
Table of Contents
Issue
Definition
Introduction
Extensibility via Type Substitution
Extensibility via the <any> Element
Non-determinism and the <any> element
Best Practice
Issue
What is Best Practice for creating extensible content models?
Definition
An element has an extensible content model if in instance documents the authors can extend the
contents of that element with additional elements beyond what was specified by the schema.
Introduction
<xsd:element name= Book>
<xsd:complexType>
<xsd:sequence>
<xsd:element name=Title type=string/>
<xsd:element name=Author type=string/>
<xsd:element name=Date type=string/>
<xsd:element name=ISBN type=string/>
<xsd:element name=Publisher type=string/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
This schema snippet dictates that in instance documents the <Book> elements must always be
comprised of exactly 5 elements <Title>, <Author>, <Date>, <ISBN>, and <Publisher>. For
example:
<Book>
<Title>The First and Last Freedom</TItle>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-0064831-7</ISBN>
<Publisher>Harper & Row</Publisher>
</Book>
The schema specifies a fixed/static content model for the Book element. Books content must
rigidly conform to just the schema specification. Sometimes this rigidity is a good thing.
Sometimes we want to give our instance documents more flexibility.
65
How do we design the schema so that Books content model is extensible? Below are two
methods for implementing extensible content models.
Extensibility via Type Substitution
Consider this version of the above schema, where Books content model has been defined using a
type definition:
<xsd:complexType name=BookType>
<xsd:sequence>
<xsd:element name=Title type=xsd:string/>
<xsd:element name=Author type=xsd:string/>
<xsd:element name=Date type=xsd:string/>
<xsd:element name=ISBN type=xsd:string/>
<xsd:element name=Publisher type=xsd:string />
</xsd:sequence>
</xsd:complexType>
<xsd:element name=BookCatalogue>
<xsd:complexType>
<xsd:sequence>
<xsd:element name=Book type=BookType
maxOccurs=unbounded/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Recall that via the mechanism of type substitutability, the contents of <Book> can be substituted
by any type that derives from BookType.
<Book>
-- content --
</Book>
For example, if a type is created which derives from BookType:
<xsd:complexType name=BookTypePlusReviewer>
<xsd:complexContent>
<xsd:extension base=BookType >
<xsd:sequence>
<xsd:element name=Reviewer type=xsd:string/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
66
then instance documents can create a <Book> element that contains a <Reviewer> element,
along with the other five elements:
<Book xsi:type=BookTypePlusReviewer>
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>1998</Date>
<ISBN>94303-12021-43892</ISBN>
<Publisher>McMillin Publishing</Publisher>
<Reviewer>Roger Costello</Reviewer>
</Book>
Thus, Books content model has been extended with a new element (Reviewer)!
In this example, BookTypePlusReviewer has been defined within the same schema as BookType.
In general, however, this may not be the case. Other schemas can import/include the
BookCatalogue schema and define types which derive from BookType. Thus, the contents of
Book may be extended, without modifying the BookCatalogue schema, as we see on the next
page:
Extend a Schema, without Touching it!
<xsd:complexType name="BookType">
<xsd:sequence>
<xsd:element name="Title" type="xsd:string"/>
<xsd:element name="Author" type="xsd:string"/>
<xsd:element name="Date" type="xsd:year"/>
<xsd:element name="ISBN" type="xsd:string"/>
<xsd:element name="Publisher" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="BookTypePlusReviewer">
<xsd:extension base="BookType" >
<xsd:sequence>
<xsd:element name="Reviewer" type="xsd:string"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexType>
<xsd:element Book type="BookType"/>
BookCatalogue.xsd
xmlns=" http://www.publishing.org"
xmlns=" http://www.publishing.org"
MyTypeDefinitions.xsd
And heres what an instance document would look like:
67
<Book xsi:type="BookTypePlusReviewer">
<Title>The First and Last Freedom</Title>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Reviewer>Roger L. Costello</Reviewer>
</Book>
xsi:schemaLocation="http://www.publishing.org
MyTypeDefinitions.xsd"
xmlns="http://www.publishing.org"
We have type-substituted
Book's content with the
type specified in the new
schema. Thus, we have
extended BookCatalogue.xsd
without touching it!
This type substitutability mechanism is a powerful extensibility mechanism. However, it suffers
from two problems:
Disadvantages:
Location Restricted Extensibility: The extensibility is restricted to appending elements onto the
end of the content model (after the <Publisher> element). What if we wanted to extend <Book>
by adding elements to the beginning (before <Title>), or in the middle, etc? We cant do it with
this mechanism.
Unexpected Extensibility: If you look at the declaration for Book:
<xsd:element name=Book type=BookType
and the definition for BookType:
<xsd:sequence>
<xsd:element name=Date type=xsd:gYear/>
<xsd:element name=Publisher type=xsd:string/>
</xsd:sequence>
</xsd:complexType>
68
it is easy to be fooled into thinking that in instance documents the <Book> elements will always
contain just <Title>, <Author>, <Date>, <ISBN>, and <Publisher>. It is easy to forget that
someone could extend the content model using the type substitutability mechanism.
Extensibility is unexpected! Consequently, if you write a program to process BookCatalogue
instance documents, you may forget to take into account the fact that a <Book> element may
contain more than five children.
It would be nice if there was a way to explicitly flag places where extensibility may occur: hey,
instance documents may extend <Book> at this point, so be sure to write your code taking this
possibility into account. In addition, it would be nice if we could extend Books content model
at locations other than just the end ... The <any> element gives us these capabilities beautifully,
as is discussed in the next section.
Extensibility via the <any> Element
An <any> element may be inserted into a content model to enable instance documents to contain
additional elements. Heres an example showing an <any> element at the end of Books content
model:
<xsd:element name= Book>
<xsd:complexType>
<xsd:sequence>
<xsd:element name=Title type=string/>
<xsd:element name=Author type=string/>
<xsd:element name=Date type=string/>
<xsd:element name=ISBN type=string/>
<xsd:element name=Publisher type=string/>
<xsd:any namespace=##any minOccurs=0/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
The content of Book is Title, Author, Date, ISBN, Publisher and then (optionally) any well-
formed element. The new element may come from any namespace.
Note the <any> element may be inserted at any point, e.g., it could be inserted at the top, in the
middle, etc.
69
In this version of the schema it has been explicitly specified that after the <Publication> element
any well-formed XML element may occur and that XML element may come from any
namespace. For example, suppose that the instance document author discovers a schema,
containing a declaration for a Reviewer element:
<xsd:element name="Reviewer">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Name">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="First" type="xsd:string"/>
<xsd:element name="Last" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
And suppose that for an instance document author it is important that, in addition to specifying
the Title, Author, Date, ISBN, and Publisher of each Book, he/she specify a Reviewer. Because
the schema has been designed with extensibility in mind, the instance document author can use
the Reviewer element in his/her BookCatalogue:
<Book>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<rev:Reviewer>
<rev:Name>
<rev:Last>Costello</rev:Last>
<rev:First>Roger</rev:First>
</rev:Name>
</rev:Reviewer>
</Book>

The instance document author has enhanced the instance document with an element that the
schema designer may have never even envisioned. We have empowered the instance author with
a great deal of flexibility in creating the instance document. Wow!
70
An alternate schema design is to create a BookType (as we did above) and embed the <any>
element within the BookType:
<xsd:element name="Book">
<xsd:sequence>
<xsd:element name="Date" type="xsd:year"/>
<xsd:any namespace="##any" minOccurs="0"/>
</xsd:sequence>
</xsd:element>
and then declare Book of type BookType:
<xsd:element Book type="BookType"/>
However, then we are then back to the unexpected extensibility" problem. Namely, after the
<Publication> element any well-formed XML element may occur, and after that anything could
be present.
There is a way to control the extensibility and still use a type. We can add a block attribute to
Book:
<xsd:element Book type="BookType" block="#all"/>
The block attribute prohibits derived types from being used in Books content model. Thus, by
this method we have created a reusable component (BookType), and yet we still have control
over the extensibility.
With the <any> element we have complete control over where, and how much extensibility we
want to allow. For example, suppose that we want to enable there to be at most two new elements
at the top of Books content model. Heres how to specify that using the <any> element:
71
<xsd:complexType name="Book">
<xsd:sequence>
<xsd:any namespace="##other" minOccurs="0" maxOccurs="2"/>
<xsd:element name="Date" type="xsd:string"/
</xsd:sequence>
</xsd:complexType>

Note how the <any> element has been placed at the top of the content model, and it has set
maxOccurs=2". Thus, in instance documents the <Book> content will always end with <Title>,
<Author>, <Date>, <ISBN>, and <Publisher>. Prior to that, two well-formed XML elements
may occur.
In summary:
We can put the <any> element specifically where we desire extensibility.
If we desire extensibility at multiple locations, we can insert multiple <any> elements.
With maxOccurs we can specify how much extensibility we will allow.
Non-Determinism and the <any> element
In the above BookType definition we used an <any> element at the beginning of the content
model. We specified that the <any> element must come from an other namepace (i.e., it must not
be an element from the targetNamespace). If, instead, we had specified namespace=##any then
we would have gotten a non-deterministic content model error when validating an instance
document. Lets see why.
A non-deterministic content model is one where, upon encountering an element in an instance
document, it is ambiguous which path was taken in the schema document. For example. Suppose
that we were to declare BookType using ##any, as follows:
<xsd:sequence>
<xsd:any namespace="##any" minOccurs="0" maxOccurs="2"/>
</xsd:sequence>
</xsd:complexType>

72
And suppose that we have this (snippet of an) instance document:
<Book>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
</Book>
Lets see what happens when a schema validator gets to the <Title> element in this instance
document. The schema validator must determine what this Title element declarationt this
corresponds to in the schema document. Do you see the ambiguity? There is no way no know,
without doing a look-ahead, whether the Title element comes from the <any> element, or comes
from the <xsd:element name=Title .../> declaration. This is a non-deterministic content model:
if your schema has a content model which would require a schema validator to look-ahead then
your schema is non-deterministic. Non-deterministic schemas are not allowed.
The solution in our example is to declare that the <any> element must come from an other
namespace, as was shown earlier. That works fine in this example where all the BookCatalogue
elements come from the targetNamespace, and the <any> element comes from a different
namespace. Suppose, however, that the BookCatalogue schema imported element declarations
from other namespaces. For example:
<xsd:schema ...
xmlns:bk="http://www.books.com" >
<xsd:import namespace="http://www.books.com"
schemaLocation="Books.xsd"/>
<xsd:sequence>
<xsd:any namespace="##other" minOccurs="0"/>
<xsd:element ref="bk:Title"/>
<xsd:element name="Date" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>

73
Now consider this instance document:
<Book>
<bk:Title>The First and Last Freedom</bk:TItle>
<Date>1954</Date>
<ISBN>0-06-0064831-7</ISBN>
</Book>
When a schema validator encounters bk:Title it will try to validate it against the appropriate
element declaration in the schema. But is this the Title refered to by the schema (i.e., in the http://
www.books.com namespace), or does this Title come from using the <any> element? It is
ambiguous, and consequently non-deterministic. Thus, this schema is also illegal.
As you can see, prohibiting non-deterministic content models makes the use of the <any>
element quite restricted. So, what do you do when you want to enable extensibility at arbitrary
locations? Answer: put in an optional <other> element and let its content be <any>. Heres how
to do it:
<xsd:sequence>
<xsd:element name="other" minOccurs="0">
<xsd:any namespace="##any" maxOccurs="2"/>
</xsd:element>
</xsd:sequence>
</xsd:complexType>

Now, instance document authors have an explicit container element (<other>) in which to put
additional elements. This isnt the most ideal solution, but its the best that we can do given the
rule that schemas may not have non-deterministic content models. Write to the XML Schema
working group and tell them that you want the prohibition of non-deterministic content models
revoked!
Best Practice
The <any> element is an enabling technology. It turns instance documents from static/rigid
structures into rich, dynamic, flexible data objects. It shifts focus from the schema designer to the
instance document author in terms of defining what data makes sense. It empowers instance
document authors with the ability to decide what data makes sense to him/her.
As a schema designer you need to recognize your limitations. You have no way of anticipating all
the varieties of data that an instance document author might need in creating an instance
document. Be smart enough to know that youre not smart enough to anticipate all possible
needs! Design your schemas with flexibility built-in.
74
Definition: an open content schema is one that allows instance documents to contain additional
elements beyond what is declared in the schema. As we have seen, this may be achieved by using
the <any> (and <anyAttribute>) element in the schema.
Sprinkling <any> and <anyAttribute> elements liberally throughout your schema will yield
benefits in terms of how evolvable your schema is:
Enabling Schema Evolution using Open Content Schemas
In todays rapidly changing market static schemas will be less commonplace, as the market
pushes schemas to quickly support new capabilities. For example, consider the cellphone
industry. Clearly, this is a rapidly evolving market. Any schema that the cellphone community
creates will soon become obsolete as hardware/software changes extend the cellphone
capabilities. For the cellphone community rapid evolution of a cellphone schema is not just a
nicety, the market demands it!
Suppose that the cellphone community gets together and creates a schema, cellphone.xsd.
Imagine that every week NOKIA sends out to the various vendors an instance document
(conforming to cellphone.xsd), detailing its current product set. Now suppose that a few months
after cellphone.xsd is agreed upon NOKIA makes some breakthroughs in their cellphones - they
create new memory, call, and display features, none of which are supported by cellphone.xsd. To
gain a market advantage NOKIA will want to get information about these new capabilities to its
vendors ASAP. Further, they will have little motivation to wait for the next meeting of the
cellphone community to consider upgrades to cellphone.xsd. They need results NOW. How does
open content help? That is described next.
Suppose that the cellphone schema is declared open". Immediately NOKIA can extend its
instance documents to incorporate data about the new features. How does this change impact the
vendor applications that receive the instance documents? The answer is - not at all. In the worst
case, the vendors application will simply skip over the new elements. More likely, however, the
vendors are showing the cellphone features in a list box and these new features will be
automatically captured with the other features. Lets stop and think about what has been just
described Without modifying the cellphone schema and without touching the vendors
applications, information about the new NOKIA features has been instantly disseminated to the
marketplace! Open content in the cellphone schema is the enabler for this rapid dissemination.
Clearly some types of instance document extensions may require modification to the vendors
applications. Recognize, however, that thevendors are free to upgrade their applications in their
own time. The applications do not need to be upgraded before changes can be introduced into
instance documents. At the very worst, the vendors applications will simply skip over the
extensions. And, of course, those vendors do not need to upgrade in lock-step
To wrap up this example suppose that several months later the cellphone community reconvenes
to discuss enhancements to the schema. The new features that NOKIA first introduced into the
marketplace are then officially added into the schema. Thus completes the cycle. Changes to the
instance documents have driven the evolution of the schema.
22
Global versus Local
Table of Contents
Issue
Introduction
Russian Doll Design
Salami Slice Design
Russian Doll Design Characteristics
Salami Slice Design Characteristics
Venetian Blind Design
Venetian Blind Design Characteristics
Best Practice
Issue
When should an element or type be declared global versus when should it be declared local?
Introduction
[Recall that a component (element, complexType, or simpleType) is global if it is an
immediate child of <schema>, whereas it is local if it is not an immediate child of <schema>,
i.e., it is nested within another component.]
What advice would you give to someone who was to ask you, In general, when should an
element (or type) be declared global versus when should it be declared local? The purpose of
this chapter is to provide answers to that question.
Below is a snippet of an XML instance document. We will explore the different design strategies
using this example.
<Book>
<Title>Illusions</Title>
<Author>Richard Bach</Author>
</Book>
Russian Doll Design
This design approach has the schema structure mirror the instance document structure, e.g.,
declare a Book element and within it declare a Title element followed by an Author element:
<xsd:element name=Book>
<xsd:complexType>
<xsd:sequence>
</xsd:sequence>
</xsd:complexType>
</element>
23
The instance document has all its components bundled together. Likewise, the schema is
designed to bundle together all its element declarations.
This design represents one end of the design spectrum.
Salami Slice Design
The Salami Slice design represents the other end of the design spectrum. With this design we
disassemble the instance document into its individual components. In the schema we define each
component (as an element declaration), and then assemble them together:
<xsd:element name=Book>
<xsd:complexType>
<xsd:sequence>
<xsd:element ref=Title/>
<xsd:element ref=Author/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Note how the schema declared each component individually (Title, and Author) and then
assembled them together (by refing them) in the creation of the Book component.
These two designs represent opposite ends of the design spectrum.
To understand these designs it may help to think in terms of boxes, where a box represents an
element or type:
The Russian Doll design corresponds to having a single box, and it has nested within it boxes,
which in turn have boxes nested within them, and so on. (boxes within boxes, just like a
Russian Doll!)
The Salami Slice design corresponds to having many separate boxes which are assembled
together (separate boxes combined together, just like Salami slices brought together in a
sandwich!)
Lets examine the characteristics of each of the two designs. (In so doing it will yield insights
into another design.)
Russian Doll Design Characteristics
[1] Opaque content. The content of Book is opaque to other schemas, and to other parts of the
same schema. The impact of this is that none of the types or elements within Book are
reusable.
[2] Localized scope. The region of the schema where the Title and Author element declarations
are applicable is localized to within the Book element. The impact of this is that if the
schema has set elementFormDefault=unqualified then the namespaces of Title and Author
are hidden (localized) within the schema.
24
[3] Compact. Everything is bundled together into a tidy, single unit.
[4] Decoupled. With this design approach each component is self-contained (i.e., they dont
interact with other components). Consequently, changes to the components will have limited
impact. For example, if the components within Book changes it will have a limited impact
since they are not coupled to components outside of Book.
[5] Cohesive. With this design approach all the related data is grouped together into self-
contained components, i.e., the components are cohesive.
Salami Slice Design Characteristics
[1] Transparent content. The components which make up Book are visible to other schemas,
and to other parts of the same schema. The impact of this is that the types and elements
within Book are reusable.
[2] Global scope. All components have global scope. The impact of this is that, irrespective of
the value of elementFormDefault, the namespaces of Title and Author will be exposed in
instance documents.
[3] Verbose. Everything is laid out and clearly visible.
[4] Coupled. In our example we saw that the Book element depends on the Title and Author
elements. If those elements were to change it would impact the Book element. Thus, this
design produces a set of interconnected (coupled) components.
[5] Cohesive. With this design approach all the related data is also grouped together into self-
contained components. Thus, the components are cohesive.
The two design approaches differ in a couple of important ways:
The Russian Doll design facilitates hiding (localizing) namespace complexities. The Salami
Slice design does not.
The Salami Slice design facilitates component reuse. The Russian Doll design does not.
Is there a design which facilitates hiding (localizing) namespace complexities, and facilitates
component reuse? Yes there is!
Consider the Book example again. An alternative design is to create a global type definition
which nests the Title and Author element declarations within it:
<xsd:complexType name=Publication>
<xsd:sequence>
</xsd:sequence>
</xsd:complexType>
<xsd:element name=Book type=Publication/>
This design has both benefits:
it is capable of hiding (localizing) the namespace complexity of Title and Author, and
it has a reusable Publication type component.
25
Venetian Blind Design
With this design approach we disassemble the problem into individual components, as the Salami
Slice design does, but instead of creating element declarations, we create type definitions. Heres
what our example looks like with this design approach:
<xsd:simpleType name=Title>
<xsd:restriction base=xsd:string>
<xsd:enumeration value=Mr./>
<xsd:enumeration value=Mrs./>
<xsd:enumeration value=Dr./>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name=Name>
<xsd:restriction base=xsd:string>
<xsd:minLength value=1/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name=Publication>
<xsd:sequence>
<xsd:element name=Title type=Title/>
<xsd:element name=Author type=Name/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name=Book type=Publication/>
This design has:
maximized reuse (there are four reusable components - the Title type, the Name type, the
Publication type, and the Book element)
maximized the potential to hide (localize) namespaces [note how this has been phrased:
maximized the potential ...Whether, in fact, the namespaces of Title and Author are hidden
or exposed, is determined by the elementFormDefault switch"].
The Venetian Blind design espouses these guidelines ...
Design your schema to maximize the potential for hiding (localizing) namespace complexities.
Use elementFormDefault to act as a switch for controlling namespace exposure - if you want
element namespaces exposed in instance documents, simply turn the elementFormDefault
switch to on" (i.e, set elementFormDefault= qualified"); if you dont want element
namespaces exposed in instance documents, simply turn the elementFormDefault switch to
off" (i.e., set elementFormDefault=unqualified").
Design your schema to maximize reuse.
Use type definitions as the main form of component reuse.
Nest element declarations within type definitions.
Lets compare the Venetian Blind design with the Salami Slice design. Recall our example:
26
Salami Slice Design:
<xsd:element name=Title" type=xsd:string"/>
<xsd:element name=Author" type=xsd:string"/>
<xsd:element name=Book">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref=Title"/>
<xsd:element ref=Author" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
The Salami Slice design also results in creating reusable (element) components, but it has
absolutely no potential for namespace hiding.
However", you argue, Suppose that I want namespaces exposed in instance documents. [We
have seen cases where this is desired.] So the Salami Slice design is a good approach for me.
Right?"
Lets think about this for a moment. What if at a later date you change your mind and wish to
hide namespaces (what if your users hate seeing all those namespace qualifiers in instance
documents)? You will need to redesign your schema (possibly scraping it and starting over).
Better to adopt the Venetian Blind Design, which allows you to control whether namespaces are
hidden or exposed by simply setting the value of elementFormDefault. No redesign of your
schema is needed as you switch from exposing to hiding, or vice versa.
[That said ... your particular project may need to sacrifice the ability to turn on/off namespace
exposure because you require instance documents to be able to use element substitution. In such
circumstances the Salami Slice design approach is the only viable alternative.]
Here are the characteristics of the Venetian Blind Design.
Venetian Blind Design Characteristics:
[1] Maximum reuse. The primary component of reuse are type definitions.
[2] Maximum namespace hiding. Element declarations are nested within types, thus maximizing
the potential for namespace hiding.
[3] Easy exposure switching. Whether namespaces are hidden (localized) in the schema or
exposed in instance documents is controlled by the elementFormDefault switch.
[4] Coupled. This design generates a set of components which are interconnected (i.e.,
dependent).
[5] Cohesive. As with the other designs, the components group together related data. Thus, the
components are cohesive.
27
Best Practice
[1] The Venetian Blind design is the one to choose where your schemas require the flexibility to
turn namespace exposure on or off with a simple switch, and where component reuse is
important.
[2] Where your task requires that you make available to instance document authors the option
to use element substitution, then use the Salami Slice design.
[3] Where mimimizing size and coupling of components is of utmost concern then use the
Russian Doll design.
12
Hide (Localize) Namespaces
Versus
Expose Namespaces
Table of Contents
Issue
Introduction
Example
Technical Requirements for Hiding (Localizing) Namespaces
Best Practice
Issue
When should a schema be designed to hide (localize) within the schema the namespaces of the
elements and attributes it is using, versus when should it be designed to expose the namespaces
in instance documents?
Introduction
A typical schema will reuse elements and types from multiple schemas, each with different
namespaces.
<xsd: schema
t ar get Namespace=A>
A. xsd
<xsd: schema
t ar get Namespace=B>
B. xsd
<xsd: schema
t ar get Namespace=" C" >
<xsd: i mpor t namespace=" A"
schemaLocat i on=" A. xsd" / >
<xsd: i mpor t namespace=" B"
schemaLocat i on=" B. xsd" / >

</ xsd: schema>
C. xsd
13
A schema, then, may be comprised of components from multiple namespaces. Thus, when a
schema is designed the schema designer must decide whether or not the origin (namespace) of
each element should be exposed in the instance documents.
The namespaces of the components are not
visible in the instance documents.
<myDoc
schemaLocat i on=C C. xsd>
</ myDoc>
Instance Document
<myDoc
schemaLocat i on=C C. xsd>
</ myDoc>
The namespaces of the components are visible
in the instance documents.
Instance Document
14
A binary switch attribute in the schema is used to control the hiding/exposure of namespaces: by
setting elementFormDefault=unqualified the namespaces will be hidden (localized) within the
schema, and by setting elementFormDefault=qualified the namespaces will be exposed in
instance documents.
elementFormDefault - the
Exposure Switch
hide
expose
el ement For mDef aul t
<xsd: schema
el ement For mDef aul t =qualified>
<xsd: schema
el ement For mDef aul t =unqualified>
vs
Schema

Example:
Camer a. xsd
Ni kon. xsd Ol ympus. xsd Pent ax. xsd
Below is a schema for describing a camera. The camera schema reuses components from other
schemas - the cameras <body> element reuses a type from the Nikon schema, the cameras
<lens> element reuses a type from the Olympus schema, and the cameras <manual_adaptor>
element reuses a type from the Pentax schema.
15
Camera.xsd
<?xml ver si on=" 1. 0" ?>
<xsd: schema xml ns: xsd=" ht t p: / / www. w3. or g/ 2001/ XMLSchema"
t ar get Namespace=" ht t p: / / www. camer a. or g"
xml ns: ni kon=" ht t p: / / www. ni kon. com"
xml ns: ol ympus=" ht t p: / / www. ol ympus. com"
xml ns: pent ax=" ht t p: / / www. pent ax. com"
elementFormDefault="unqualified">
<xsd:import namespace="http://www.nikon.com"
schemaLocation="Nikon.xsd"/>
<xsd:import namespace="http://www.olympus.com"
schemaLocation="Olympus.xsd"/>
<xsd:import namespace="http://www.pentax.com"
schemaLocation="Pentax.xsd"/>
<xsd: el ement name=" camer a" >
<xsd: compl exType>
<xsd: sequence>
<xsd: el ement name=" body" type="nikon:body_type"/ >
<xsd: el ement name=" l ens" type="olympus:lens_type"/ >
<xsd: el ement name=" manual _ adapt er "
type="pentax:manual_adapter_type"/ >
</ xsd: sequence>
</ xsd: compl exType>
</ xsd: el ement >
</ xsd: schema>
This schema is designed to hide namespaces

Note the three <import> elements for importing the Nikon, Olympus, and Pentax components.
Also note that the <schema> attribute, elementFormDefault has been set to the value of
unqualified. This is a critical attribute. Its value controls whether the namespaces of the elements
being used by the schema will be hidden or exposed in instance documents (thus, it behaves like
a switch turning namespace exposure on/off). Because it has been set to unqualified in this
schema, the namespaces will be remain hidden (localized) within the schema, and will not be
visible in instance documents, as we see here:
16
Camera.xml (namespaces hidden)
<?xml ver si on=" 1. 0"?>
<my: camer a xml ns: my=" ht t p: / / www. camer a. or g"
xml ns: xsi ="ht t p: / / www. w3. or g/ 2001/ XMLSchema- i nst ance"
xsi : schemaLocat i on=
"ht t p: / / www. camer a. or g
Camer a. xsd">
<body>
<descr i pt i on>Er gonomi cal l y desi gned casi ng f or easy handl i ng</ descr i pt i o
</ body>
<l ens>
<zoom>300mm</ zoom>
<f - st op>1. 2</ f - st op>
</ l ens>
<manual _adapt er >
<speed>1/ 10, 000 sec t o 100 sec</ speed>
</ manual _adapt er >
</ my: camer a>
Instance document with namespaces hidden
(localized) within the schema
<body>
<descr i pt i on>Er gonomi cal l y desi gned casi ng f or easy handl i ng</ descr i pt i on>
</ body>
<l ens>
<zoom>300mm</ zoom>
<f - st op>1. 2</ f - st op>
</ l ens>
<manual _adapt er >
Instance document with namespaces hidden (localized) in the schema.
--> The fact that the <descr i pt i on>element comes from the Nikon schema,
the <zoom>and <f - st op>elements come from the Olympus schema, and the
<speed>element comes from the Pentax schema is totally transparent to the
instance document.
Instance Document
The only namespace qualifier exposed in the instance document is on the <camera> root
element. The rest of the document is completely free of namespace qualifiers. The Nikon,
Olympus, and Pentax namespaces are completely hidden (localized) within the schema!
Looking at the instance document one would never realize that the schema got its components
from three other schemas. Such complexities are localized to the schema. Thus, we say that the
schema has been designed in such a fashion that its component namespace complexities are
hidden from the instance document.
On the other hand, if the above schema had set elementFormDefault=qualified then the
namespace of each element would be exposed in instance documents. Heres what the instance
document would look like:
17
Camera.xml (namespaces exposed)
<c: camer a xml ns: c="ht t p: / / www. camer a. or g"
xml ns: ni kon="ht t p: / / www. ni kon. com"
xml ns: ol ympus="ht t p: / / www. ol ympus. com"
xml ns: pent ax="ht t p: / / www. pent ax. com"
xml ns: xsi =" ht t p: / / www. w3. or g/ 2001/ XMLSchema- i nst ance"
Camer a. xsd>
<c: body>
<ni kon: descr i pt i on>Er gonomi cal l y desi gned casi ng f or easy
handl i ng</ ni kon: descr i pt i on>
</ c: body>
<c: l ens>
<ol ympus: zoom>300mm</ ol ympus: zoom>
<ol ympus: f - st op>1. 2</ ol ympus: f - st op>
</ c: l ens>
<c: manual _adapt er >
<pent ax: speed>1/ 10, 000 sec t o 100 sec</ pent ax: speed>
</ c: manual _adapt er >
</ c: camer a>
Instance document with namespaces exposed
Instance Document
Note that each element is explicitly namespace-qualified. Also, observe the declaration for each
namespace. Due to the way the schema has been designed, the complexities of where the schema
obtained its components have been pushed out to the instance document. Thus, the reader of
this instance document is exposed to the fact that the schema obtained the description element
from the Nikon schema, the zoom and f-stop elements from the Olympus schemas, and the speed
element from the Pentax schema.
All Schemas must have a Consistent Value for elementFormDefault!
Be sure to note that elementFormDefault applies just to the schema that it is in. It does not
apply to schemas that it includes or imports. Consequently, if you want to hide namespaces then
all schemas involved must have set elementFormDefault=unqualified. Likewise, if you want to
expose namespaces then all schemas involved must have set elementFormDefault=qualified.
To see what happens when you mix elementFormDefault values, lets suppose that Camera.xsd
and Olympus.xsd have both set in their schema elementFormDefault=unqualified, whereas
Nikon.xsd and Pentax.xsd have both set elementFormDefault=qualified.
18
Ni kon. xsd
el ement For mDef aul t =" qual i f i ed"
Ol ympus. xsd
el ement For mDef aul t =" unqual i f i ed"
Pent ax. xsd
el ement For mDef aul t =" qual i f i ed"
Camer a. xsd
el ement For mDef aul t =" unqual i f i ed"
Heres what an instance document looks like with this mixed design:
Camera.xml (mixed design)
Hiding/exposure mix: This instance document has the Nikon and
Pentax namespaces exposed, while the Camera and Olympus
namespaces are hidden.
Instance Document
<my: camer a xml ns: my=" ht t p: / / www. camer a. or g"
xml ns: ni kon="ht t p: / / www. ni kon. com"
xml ns: pent ax="ht t p: / / www. pent ax. com"
xml ns: xsi ="ht t p: / / www. w3. or g/ 2001/ XMLSchema- i nst ance"
Camer a. xsd>
<body>
</ body>
<l ens>
</ l ens>
<manual _adapt er >
</ my: camer a>
Instance document with namespaces hidden
(localized) within the schema
<body>
<descr i pt i on>Er gonomi cal l y desi gned casi ng f or easy
handl i ng</ descr i pt i on>
</ body>
<l ens>
<zoom>300mm</ zoom>
<f - st op>1. 2</ f - st op>
</ l ens>
<manual _adapt er >
19
Observe that in this instance document some of the elements are namespace-qualified, while
others are not. Namely, those elements from the Camera and Olympus schemas are not qualified,
whereas the elements from the Nikon and Pentax schemas are qualified.
Technical Requirements for Hiding (Localizing) Namespaces
There are two requirements on an element for its namespace to be hidden from instance
documents:
[1] The value of elementFormDefault must be unqualified.
[2] The element must not be globally declared. For example:
<?xml version=1.0?>
<xsd:schema ...>
<xsd:element name=foo>
...
</xsd:schema ...>
The element foo can never have its namespace hidden from instance documents, regardless of the
value of elementFormDefault. foo is a global element (i.e., an immediate child of <schema>) and
therefore must always be qualified. To enable namespace hiding the element must be a local
element.
Best Practice
For this issue there is no definitive Best Practice with respect to whether to design your schemas
to hide/localize namespaces, or design it to expose namespaces. Sometimes its best to hide the
namespaces. Othertimes its best to expose the namespaces. Both have their pluses and minus, as
is discussed below.
However, there are Best Practices with regards to other aspects of this issue. They are:
1. Whenever you create a schema, make two copies of it. The copies should be identical,
except that in one copy set elementFormDefault=qualified, whereas in the other copy set
elementFormDefault=unqualified. If you make two versions of all your schemas then
people who use your schemas will be able to implement either design approach - hide
(localize) namespaces, or expose namespaces.
2. Minimize the use of global elements and attributes so that elementFormDefault can behave
as an exposure switch. The rationale for this was described above, in Technical
Requirements for Hiding (Localizing) Namespaces
Advantages of Hiding (Localizing) Component Namespaces within the Schema
The instance document is simple. Its easy to read and understand. There are no namespace
qualifiers cluttering up the document, except for the one on the document element (which is okay
because it shows the domain of the document). The knowledge of where the schema got its
components is irrelevant and localized to the schema.
20
Design your schema to hide (localize) namespaces within the schema ...
when simplicity, readability, and understandability of instance documents is of utmost
importance when namespaces in the instance document provide no necessary additional
information. In many scenarios the users of the instance documents are not XML-experts.
Namespaces would distract and confuse such users, where they are just concerned about
structure and content.
when you need the flexibility of being able to change the schema without impact to instance
documents. To see this, imagine that when a schema is originally designed it imports elements/
types from another namespace. Since the schema has been designed to hide (localize) the
namespaces, instance documents do not see the namespaces of the imported elements. Then,
imagine that at a later date the schema is changed such that instead of importing the elements/
types, those elements and types are declared/defined right within the schema (inline). This
change from using elements/types from another namespace to using elements/types in the local
namespace has no impact to instance documents because the schema has been designed to
shield instance documents from where the components come from.
Advantages of Exposing Namespaces in Instance Documents
If your company spends the time and money to create a reusable schema component, and makes
it available to the marketplace, then you will most likely want recognition for that component.
Namespaces provide a means to achieve recognition. For example,
<nikon:description>
Ergonomically designed casing for easy handling</
nikon:description>
There can be no misunderstanding that this component comes from Nikon. The namespace
qualifier is providing information on the origin/lineage of the description element.
Another case where it is desirable to expose namespaces is when processing instance documents.
Oftentimes when processing instance documents the namespace is required to determine how an
element is to be processed (e.g., if the element comes from this namespace then well process it
in this fashion, if it comes from this other namespace then well process it in a different
fashion). If the namespaces are hidden then your application is forced to do a lookup in the
schema for every element. This will be unacceptably slow.
Design your schema to expose namespaces in instance documents ...
when lineage/ownership of the elements are important to the instance document users (such as
for copyright purposes).
when there are multiple elements with the same name but different semantics then you may
want to namespace-qualify them so that they can be differentiated (e.g, publisher:body versus
human:body). [In some cases you have multiple elements with the same name and different
semantics but the context of the element is sufficient to determine its semantics. Example: the
title element in <person><title> is easily distinguished from the title element in
<chapter><title>. In such cases there is less justification for designing your schema to expose
the namespaces.]
21
when processing (by an application) of the instance document elements is dependent upon
knowledge of the namespaces of the elements.
Note about elementFormDefault and xpath Expressions
We have seen how to design your schema so that elementFormDefault acts as an exposure
switch. Simply change the value of elementFormDefault and it dictates whether or not elements
are qualified in instance documents. In general, no other changes are needed in the schema other
than changing the value of elementFormDefault. However, if your schema is using <key> or
<unique> then you will need to make modifications to the xpath expressions when you change
the value of elementFormDefault.
If elementFormDefault=qualified then you must qualify all the references in the xpath
expression.
Example:
<xsd:key name=PK>
<xsd:selector xpath=c:Camera/c:lens>
<xsd:field xpath=c:zoom>
</xsd:key>
Note that each element in the xpath expression is namespace-qualified.
If elementFormDefault=unqualified then you must NOT qualify the references in the xpath
expression.
Example:
<xsd:key name=PK>
<xsd:selector xpath=Camera/lens>
<xsd:field xpath=zoom>
</xsd:key>
Note that none of the elements in the xpath expressions are namespace-qualified.
So, as you switch between exposing and hiding namespaces you will need to take the xpath
changes into account.
84
Achieving Maximum Dynamic Capability
in your Schemas
Too often schemas are designed in a static, fixed, rigid fashion. Everything is hardcoded when
the schema is designed. There is no variability. This is not reflective of nature. Nature constantly
changes and evolves. Nothing is fixed. As a general rule of thumb: more dynamic capability =
better schema
Definition of Dynamic: the ability of a schema to change at run-time (i.e., schema validation
time). Contrast this with rigid, fixed, static schemas where everything is predetermined and
unchanging.
Limiting Dynamic Capability
1. Hardcoding a collection of components to a namespace.
When you bind a schema to a targetNamespace you are rigidly fixing the components in that
schema to a fixed semantics (in as much as a targetNamespace gives semantics to a schema).
2. Hardcoding a reference to a type to the implementation of that type.
When you specify in <import> a value for schemaLocation you are rigidly fixing the identity of
the schema to implement a type.
Achieving Maximum Dynamic Capability
The key to achieving dynamic schemas is to postpone decisions as long as possible. Here are
some ways to do that.
1. Dont hardcode a schema to a targetNamespace. That is, create no-namespace schemas. Let
the application which uses the schema decide on a targetNamespace that is appropriate for
the application. Thus we postpone binding a schema to a targetNamespace as long as
possible -> until application-use-time. Also, the using-application can <redefine>
components in the schema. This is making the schema dynamic/morphable. It is not fixed to
one namespace (semantics). See the discussion on Zero One Or Many Namespaces for more
info.
2. Dont hardcode the identity of an <import>ed schema. Example, suppose that you declare
an element to have a type from another namespace, e.g.,
<xsd:element name=sensor type=s:sensor_type/>
Observe that sensor_type is from another namespace. Thus, this schema will need to do an
<import>. Normally we see <import> elements with two attributes - namespace and
schemaLocation. However, schemaLocation is actually optional. When you do specify
85
schemaLocation then you are rigidly fixing the identity of a schema which is to provide an
implementation for sensor_type. We can make things a lot more dynamic by not specifying
schemaLocation. Instead, let the instance document author identify a schema that implements
sensor_type. This creates a very dynamic schema. The type of the sensor element is not fixed,
static. Thus we postpone binding the type reference (type=s:sensor_type) to an implementation
of the type as long as possible -> until schema-validation-time. See the discussion on Dangling
Types for more info.
1
XML Namespace Name: URN or URL?
Issue
Is it better to formulate an XML Schema namespace as a URN or a URL?
Example:
urn:publishing:book
versus
http://www.publishing.com/book
What is an XML Schema Namespace Name?
- Namespace names are unique values.
- Namespace names are just labels.
- There is no requirement (or expectation) to resolve the namespace to an online
resource.
- The XML Schema Part 0: Primer (http://www.w3.org/TR/xmlschema-0) states that
target namespaces enable us to distinguish between definitions and declarations
from different vocabularies.
What is a Uniform Resource Identifier (URI)?
- URI Generic Syntax (RFC 2396 http://www.ietf.org/rfc/rfc2396.txt ) defines the
following:
- Identifier: An identifier is an object that can act as a reference to something that
has identify.
- A URI can be further classified as a locator, a name, or both.
- The term "Uniform Resource Locator" (URL) refers to the subset of URI that
identify resources via a representation of their primary access mechanism
(e.g., their network "location"), rather than identifying the resource by name
or by some other attribute(s) of that resource.
- The term "Uniform Resource Name" (URN) refers to the subset of URI that
are required to remain globally unique and persistent even when the resource
ceases to exist or becomes unavailable.
The Case for URN
- URNs are easier to conceptualize as a name and not a location. And since
namespaces are intended to uniquely identify something, not locate something, one
could argue this is a better marriage.
- Users do not expect URNs to locate an entity/resource as they do with URLs.
- Many tool vendors automatically convert URLs to hyperlinks (i.e., turn it blue and
make it clickable), which incorrectly implies that a URL formatted namespace name
is a location.
2
The Case for URL
- URLs are integral to the World Wide Web (www). With a URL, there is potentially
a resource as well. That resource could contain documentation (a schema, pointers
to other schemas, etc.). If in the future the W3C decides to have a namespace name
point to resource, the appropriate syntax will already be in use and namespace
names will not have to change.
- The URL syntax is familiar and memorable to www users.
- URL schema names are already managed. (See http://www.w3.org/Addressing/ for
more information.) Therefore, it would be easier to ensure namespace names are
unique. In other words, with URLs it would be difficult to have two organizations
with identical namespace names.
- One could come up with a namespace scheme that would eliminate the current
confusion about the namespace URI being a location. For example, the namespace
name could be prefaced with something like namespace:// or xmlns:// or ns://.
Best Practice
Whether to use a URN or a URL for an XML Schema namespace name is predominately
a personal preference. However, there seems to be a slight preference for using a URL
because it provides the opportunity for pointing to something (e.g., a Resource Directory
Description Language (RDDL) document) in the future.
45
Creating Variable Content Container Elements
Table of Contents
Issue
Introduction
Example
Method 1: Implementing variable content containers using an abstract element
and element substitution
Method 2: Implementing variable content containers using a <choice> element
Method 3: Implementing variable content containers using an abstract type and
type substitution
Method 4: Implementing variable content containers using a dangling type
Best Practice
Issue
What is the Best Practice for implementing a container element that is to be comprised of
variable content?
Introduction
A typical problem when creating an XML Schema is to design a container element (e.g.,
Catalogue) which is to be comprised of variable content (e.g., Book, or Magazine, or ...)
<Cat al ogue>
- var i abl e cont ent sect i on -
</ Cat al ogue>
<Book> or <Magazi ne> or . . .
Catalogue is called a variable content container
Some things to consider:
Do we allow the elements in the variable content container to come from disjoint sources, i.e.,
do we allow the container element to contain dissimilar, independent, loosely coupled
elements?
How do we design the variable content container so that the kinds of elements which it may
contain can grow over time, i.e., how do we design an extensible variable content container?
46
Example
Throughout this discussion we will consider variable content containers (e.g., <Catalogue>)
which are comprised of a collection of elements, where each element is variable.
Heres an example of a <Catalogue> container element comprised of two different kinds of
elements:
<Catalogue>
<Book> ... </Book>
<Magazine> ... </Magazine>
<Book> ... </Book>
</Catalogue>
Below are four methods for implementing variable content containers.
Method 1: Implementing variable content containers using an abstract element and
element substitution
Description:
There are five XML Schema concepts that must be understood for implementing this method:
an element can be declared abstract.
abstract elements cannot be instantiated in instance documents (they are only placeholders).
in instance documents the abstract element must be substituted by non-abstract (i.e., concrete)
elements which have been declared to be in a substitutionGroup with the abstract element.
elements may be declared to be in a substitutionGroup with the abstract element iff their type
is the same as, or derives from the abstract elements type.
the abstract element and all elements in its substitutionGroup must be declared as global
elements.
<Cat al ogue>
- var i abl e cont ent sect i on
</ Cat al ogue>
Publ i cat i on ( abst r act )
<Book>
<Magazi ne>
substitutionGroup
"substitutable for"
"substitutable for"
47
Implementation:
Declare an abstract element (Publication):
<xsd:element name=Publication abstract=true
type=PublicationType/>
Declare a variable content container element (Catalogue) to have as its content the abstract
element (ref to the abstract element declaration):
<xsd:element name=Catalogue>
<xsd:complexType>
<xsd:sequence>
<xsd:element ref=Publication
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Note that maxOccurs=unbounded, so Catalogue may contain a collection (one or more) of
Publication elements.
Declare the concrete elements (Book and Magazine) that are to be the contents of the variable
content container and declare them to be in a substitutionGroup with the abstract element:
<xsd:element name=Book substitutionGroup=Publication
type=BookType/>
<xsd:element name=Magazine substitutionGroup=Publication
type=MagazineType/>
In order for Book and Magazine to substitute for Publication, their types (BookType and
MagazineType) must derive from Publications type (PublicationType).
Publ i cat i onType
BookType Magazi neType
48
Here are the type definitions:
PublicationType - the base type:
<xsd:complexType name=PublicationType>
<xsd:sequence>
<xsd:element name=Author type=xsd:string
minOccurs=0 maxOccurs=unbounded/>
</xsd:sequence>
</xsd:complexType>
BookType - extends PublicationType by adding two new elements, ISBN and Publisher:
<xsd:extension base=PublicationType>
<xsd:sequence>
<xsd:element name=Publisher type=xsd:string/>
</xsd:sequence>
</xsd:extension>
</xsd:complexType>
MagazineType - restricts PublicationType by striking out the Author element:
<xsd:complexType name=MagazineType>
<xsd:restriction base=PublicationType>
<xsd:sequence>
minOccurs=0 maxOccurs=0/>
</xsd:sequence>
</xsd:restriction>
</xsd:complexType>
The following page shows what an instance document looks like with this method:
49
<Catalogue xmlns="http://www.catalogue.org"
xsi:schemaLocation=
"http://www.catalogue.org
Catalogue.xsd">
<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>
<Magazine>
<Title>Natural Health</Title>
<Date>1999</Date>
</Magazine>
<Book>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
</Book>
</Catalogue>
Advantages:
Extensible: This method allows you to extend the set of elements that may be used in the
variable content container element, even if the schema for the variable content container element
is outside your control. For example, suppose that you do not have privilege to modify the above
Catalogue schema. Currently, the Catalogue element can only contain Book and Magazine
elements. But suppose that your application has a hard requirement for CD elements as well:
<Catalogue>
<Book> ... </Book>
<CD> ... </CD>
<Book> ... </Book>
</Catalogue>
50
How can you extend the set of elements that Catalogue may be comprised of, without modifying
its schema?
Answer: You can create your own separate schema which contains a declaration of CD (with a
type, CDType, that extends the PublicationType in the Catalogue schema), and declares CD to be
in the Publication substitutionGroup:
<xsd:include schemaLocation=Catalogue.xsd/>
<xsd:complexType name=CDType>
<xsd:extension base=PublicationType>
<xsd:sequence>
<xsd:element name=RecordingCompany
type=xsd:string/>
</xsd:sequence>
</xsd:extension>
</xsd:complexType>
<xsd:element name=CD substitutionGroup=Publication
type=CDType/>
The CD element meets the requirements for being in the variable content container:
its type (CDType) derives from the PublicationType, and
it is a member of the Publication elements substitutionGroup.
Book, Magazine, and CD may now be used within the Catalogue element, e.g.,
xsi:schemaLocation=
CD.xsd">
<Book>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
</Book>
<CD>
<Title>Timeless Serenity</Title>
<Author>Dyveke Spino</Author>
<Date>1984</Date>
<RecordingCompany>Dyveke Spino Productions</RecordingCompany>
</CD>
...
</Catalogue>
51
Thus, we see that this method allows us to extend the set of elements that may be used in the
Catalogue element, without modifying its schema. Nice!
Semantic Cohesion: the elements in the variable content container all descend from the same
type hierarchy (PublicationType). This type hierarchy binds them together, giving a structural
(and, by implication, semantic) coherence to all the elements that may be in the variable content
container.
Disadvantages:
No Independent Elements: The type of the elements that are to be used in the variable content
container must all descend from the abstract elements type (PublicationType). Further, the
elements must be in a substitutionGroup with the abstract element. Thus, the variable content
container cannot contain elements whose type does not derive from the abstract elements type,
or is not in the substitutionGroup with the abstract element - as would typically be the case with
independently developed elements. For example, suppose another schema author creates a
Newspaper element, with a type that does not descend from PublicationType. <Catalogue>
would not be able to contain the <Newspaper> element.
Limited Structural Variability: Over time a schema will evolve, and the kinds of elements
which may occur in the variable content container will typically grow. There is no way to know
apriori in what direction it will grow. The new elements may be conceptually related but
structurally vastly different from the original set of elements. The abstract elements type (e.g.,
PublicationType) may have been a good base type for the original set of elements which were all
structurally related, but may not be a good base type for the new elements which have vastly
different structures.
So you are faced with a tradeoff:
create a simple base type to support lots of different structures (but then you can make less
assumptions about the structure of the members), or
create a rich base type to support strong data type checking (but then you reduce the ability to
add elements with radically different types)
Nonscalable Processing: Processing a collection of differently named elements requires a lot of
special-case code. For example, consider a stylesheet to process each element in <Catalogue>:
<xsl:if test=Book>
-- process Book --
</xsl:if>
<xsl:if test=Magazine>
-- process Magazine --
</xsl:if>
This stylesheet snippet suffers from lack of scalability, i.e., it breaks as soon as a new element is
added.
52
This argument needs some qualification. If the contents of <Catalogue> are just elements that
substitute for the abstract Publication element, then each element can be uniformly processed, as
follows:
<xsl:for-each select=Catalogue/*>
-- process the element --
</xsl:for-each>
This stylesheet snippet processes each element within Catalogue, regardless of the element name.
Obviously, this is scalable, and does not break when a new element is added.
Processing becomes non-scalable when Catalogue contains multiple abstract elements:
<xsd:complexType>
<xsd:sequence>
<xsd:element ref=Publication
<xsd:element ref=Retailer
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Suppose that both Publication and Retailer are abstract elements, and there can be any number of
each kind of element within Catalogue. Heres a sample instance:
<Catalogue>
<Book> ... </Book>
<Book> ... </Book>
<MarketBasket> ... </MarketBasket>
<Macys> ... </Macys>
</Catalogue>
If you wish to process just the Publication elements (e.g., Book, Magazine) then you will need to
write special-case code, as shown above. This is not scalable. Every time a new element is added
into the collection of elements that may substitute for the Publication element then your code will
have to be updated. This is costly.
No Control over Namespace Exposure: This method requires that the elements which may be
used in the variable content container be in a substitutionGroup with the abstract element (e.g.,
Book and Magazine must be in a substitutionGroup with Publication). A requirement of using
substitionGroup is that all elements must be declared globally. The namespace of global elements
can never be hidden in instance documents. As a consequence, there is no way to hide (localize)
the namespaces of the elements used in the variable content container. This fails the Best
Practice rule which states that you should design your schema to be able to hide or expose
namespaces at your discretion (using elementFormDefault as an exposure switch). (See the
chapter titled Hide (Localize) Versus Expose Namespaces)
53
Method 2: Implementing variable content containers using a <choice> element
Description:
This method is quite straightforward - simply list within a <choice> element all the elements
which can appear in the variable content container, and embed the <choice> element in the
container element.
<Cat al ogue>
</ Cat al ogue>
<choice>
<element name="Book" />
<element name="Magazine" />
</choice>
Implementation:
Declare within a <choice> element all the elements (e.g., Book, Magazine) that may be used in
the variable content container. Embed the <choice> element within the container element
(Catalogue):
<element name=Catalogue>
<complexType>
<choice maxOccurs=unbounded>
<element name=Book type=BookType/>
<element name=Magazine type=MagazineType/>
</choice>
</complexType>
</element>
Advantages:
Independent Elements: The elements in the variable content container do not need a common
type ancestry. They dont have to be related in any way. Thus, the variable content container can
contain dissimilar, independent, loosely coupled elements.
Disadvantages:
Nonextensible: Suppose that the Catalogue schema is outside your control. Currently the
variable content container only supports Book and Magazine. Suppose that you have a hard
requirement for your instance documents to use CD as well as Book and Magazine within
Catalogue, e.g.,
<Catalogue>
<Book> ... </Book>
<CD> ... </CD>
<Book> ... </Book>
</Catalogue>
This method requires that the <choice> element in the Catalogue schema be modified to include
the CD element. However, we stipulated that the Catalogue schema is outside your control, so it
cannot be modified. This method has serious extensibility restrictions!
54
No Semantic Coherence: The <choice> element allows you to group together dissimilar
elements. While that has been touted as an advantage, it is really a double-edged sword. The
elements in the variable content container have no type hierarchy to bind them together, to
provide structural (and, by implication, semantic) coherence among the elements. Thus, when
processing an instance document you can make no assumptions about the structure of the
elements.
Method 3: Implementing variable content containers using an abstract type and
type substitution
Description:
There are three XML Schema concepts that must be understood for implementing this method:
a complexType can be declared abstract.
an element declared to be of an abstract type cannot have its type instantiated in instance
documents (that is, the element can be instantiated, but its abstract content may not).
in instance documents an element with an abstract type must have its content substituted by
content from a non-abstract (concrete) type which derives from the abstract type. This is called
type substitution.
<Cat al ogue>
<Publ i cat i on xsi : t ype=" ">
</ Publ i cat i on>
</ Cat al ogue>
Publ i cat i onType ( abst r act )
BookType MagazineType
Implementation:
Define an abstract base type (PublicationType):
<xsd:complexType name=PublicationType abstract=true>
<xsd:sequence>
minOccurs=0 maxOccurs=unbounded/>
</xsd:sequence>
</xsd:complexType>
55
Declare the container element (Catalogue) to contain an element (Publication), which is of the
abstract type:
<xsd:complexType>
<xsd:sequence>
<xsd:element name=Publication
type=PublicationType
</xsd:sequence>
</xsd:complexType>
</xsd:element>
In instance documents, the content of <Publication> can only be of a concrete type which derives
from PublicationType, such as BookType or MagazineType (we saw these type definitions in
Method 1 above).
With this method instance documents will look different than we saw with the above two
methods. Namely, <Catalogue> will not contain variable content. Instead, it will always contain
the same element (Publication). However, that element will contain variable content:
xsi:schemaLocation=
Catalogue.xsd">
<Publication xsi:type="BookType">
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
</Publication>
<Publication xsi:type="MagazineType">
<Title>Natural Health</Title>
<Date>1999</Date>
</Publication>
...
</Catalogue>
Advantages:
Extensible: Same extensibility benefits as method 1. Namely, this method allows you to easily
extend the set of elements that may be used in the variable content container simply by creating
new types which derive from the abstract type, e.g.,
56
<include schemaLocation="Catalogue.xsd"/>
<complexType name="CDType">
<complexContent>
<extension base="PublicationType" >
<sequence>
<element name="RecordingCompany" type="string"/>
</sequence>
</extension>
</complexContent>
</complexType>
CD.xsd
Now the content of
<Publication> may be
BookType, or
MagazineType, or
CDType
We have extended the Catalogue schema without modifying it! Heres an example instance
document with the new CD element:
xsi:schemaLocation=
CD.xsd">
<Publication xsi:type="BookType">
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
</Publication>
<Publication xsi:type="CDType">
<Title>Timeless Serenity</Title>
<Author>Dyveke Spino</Author>
<Date>1984</Date>
<RecordingCompany>Dyveke Spino Productions</RecordingCompany>
</Publication>
...
</Catalogue>
Minimal Dependencies: This method has less dependencies (coupling) than method 1. To
extend the collection of elements that may appear in a variable content container using method 1
you need access to both the abstract element (Publication) and its type (PublicationType). With
method 3 you only need access to the abstract type. If we assume that in a typical scenario only
the types will be put in publicly accessible schemas, then method 3 is the only viable method.
57
Scalable Processing: Processing a series of <Publication> elements is scalable. For example, a
stylesheet could process each publication element as follows:
<xsl:for-each select=Publication>
-- do something --
</xsl:for-each>
As new types are created (e.g., CDType) no change is needed to the code.
Semantic Cohesion: the elements in the variable content container all descend from the same
type hierarchy. This type hierarchy binds them together, giving a structural (and, by implication,
semantic) coherence among the elements.
Control over Namespace Exposure: the variable part of the variable content container are the
element declarations that are embedded within type definitions. Consequently, we can control
exposure of the namespaces of the variable content container elements. This is consistent with
the Best Practice design recommendation we issued for hide (localize) versus expose
namespaces. (See the chapter titled Hide (Localize) Versus Expose Namespaces)
Disadvantages:
No Independent Elements: Same weakness as with method 1. All types must descend from an
abstract type. This requirement prohibits the use of types which do not descend from the abstract
type, as would typically be the situation when the type is in another, independently developed
schema.
Limited Structural Variability: Same weakness as with method 1. Namely, to facilitate strong
type checking you want to have a rich base type, but this is in direct conflict with the desire for
components with vastly different structures, which calls for a weak base type.
Method 4: Implementing variable content containers using a dangling type
Motivation:
Thus far our variable content container has contained complex content (i.e., child elements).
Suppose that we want to create a variable content container to hold simple content? None of the
previous methods can be used. We need a method that allows us to create simpleType variable
content containers.
There is one key XML Schema concept that must be understood for implementing this method:
with an <import> element the schemaLocation attribute is optional
Description:
Lets take an example. Suppose that we desire an element, sensor, which contains the name of a
weather station sensor. For example:
<sensor>Barometric Pressure</sensor>
58
There are several things to note:
1. This element holds a simpleType
2. Each weather station may have sensors that are unique to it. Consequently, we must design
our schema so that the sensor element can be customized by each weather station
Heres an elegant design for making the contents of <sensor> customizable by each weather
station:
Implementation:
Lets go through the design, step by step. In your schema, declare the sensor element:
<xsd:element name=sensor type=s:sensor_type/>
Note that the sensor element is declared to have a type sensor_type, which is in a different
namespace - the sensor namespace:
xmlns:s=http://www.sensor.org
Now heres the key - when you <import> this namespace, dont provide a value for
schemaLocation! (In an import element schemaLocation is optional.) For example:
<xsd:import namespace=http://www.sensor.org/>
The instance document must then identify a schema that implements sensor_type. Thus, at run
time (i.e., validation time) we are matching up the reference to sensor_type with an
implementation of sensor_type. For example, an instance document may have this:
xsi:schemaLocation=
http://www.weather-station.org weather-station.xsd
http://www.sensor.org boston-sensors.xsd
In this instance document schemaLocation is identifying a schema, boston-sensors.xsd, which is
to provide the implementation of sensor_type.
Lets take a look at the schemas and instance documents for the weather station sensor example
we have been considering. Heres the main schema, which contains the dangling type:
59
weather-station.xsd
targetNamespace="http://www.weather-station.org"
xmlns="http://www.weather-station.org"
xmlns:s="http://www.sensor.org"
<xsd:import namespace="http://www.sensor.org"/>
<xsd:element name="weather-station">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="sensor" type="s:sensor_type"
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
An import with no
schemaLocation!
Note that the <import> element does not have a schemaLocation attribute to identify a particular
schema which implements sensor_type. (Stated differently, this schema does not hardcode in the
identity of the schema which is to provide the implementation of sensor_type.) The schema
validator will resolve the reference to sensor_type based upon the collection of schemas that is
provided to it in the instance document.
The Boston weather station creates a schema which implements sensor_type:
boston-sensors.xsd
targetNamespace="http://www.sensor.org"
xmlns="http://www.sensor.org"
<xsd:simpleType name="sensor_type">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="barometer"/>
<xsd:enumeration value="thermometer"/>
<xsd:enumeration value="anenometer"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
This schema provides an implementation for the
dangling type, sensor_type.
60
Now an instance document can conform to weather-station.xsd and use boston-sensors.xsd as the
implementation of sensor_type:
boston-weather-station.xml
<weather-station xmlns="http://www.weather-station.org"
xsi:schemaLocation=
"http://www.weather-station.org weather-station.xsd
http://www.sensor.org boston-sensors.xsd">
<sensor>thermometer</sensor>
<sensor>barometer</sensor>
<sensor>anenometer</sensor>
</weather-station>
In the
instance
document
we provide
a schema
which
implements
the
dangling
type.
Suppose that the London weather station has all the sensors that Boston has, plus some additional
ones that are unique to the London weather patterns. Thus, London will create its own
implementation of sensor_type:
london-sensors.xsd
targetNamespace="http://www.sensor.org"
xmlns="http://www.sensor.org"
<xsd:simpleType name="sensor_type">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="barometer"/>
<xsd:enumeration value="thermometer"/>
<xsd:enumeration value="anenometer"/>
<xsd:enumeration value="hygrometer"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
This schema provides a different implementation for the dangling
type, sensor_type.
61
Note that this schema has an additional sensor_type that Boston does not have - hygrometer.
Just as with the Boston weather station instance document, the London weather station instance
document will conform to a collection of schemas: weather-station.xsd and london-sensors.xsd:
london-weather-station.xml
<weather-station xmlns="http://www.weather-station.org"
xsi:schemaLocation=
"http://www.weather-station.org weather-station.xsd
http://www.sensor.org london-sensors.xsd">
<sensor>thermometer</sensor>
<sensor>barometer</sensor>
<sensor>hygrometer</sensor>
<sensor>anenometer</sensor>
</weather-station>
The London weather station is able to customize the content of <sensor> by
using london-sensors.xsd, which defines sensor_type appropriately for the
London weather station. Wow!
Summary:
This method represents an extraordinarily powerful design pattern. The key to this design pattern
is:
1. When you declare the variable content container element give it a type that is in another
namespace, e.g., s:sensor_type
2. When you <import> that namespace dont provide a value for schemaLocation, e.g.,
<xsd:import namespace=http://www.sensors.org/>
3. Create any number of implementations of the dangling type, e.g.,
boston-sensors.xsd
london-sensors.xsd
4. In instance documents identify the schema that you want used to implement the dangling
type, e.g.,
xsi:schemaLocation=
http://www.weather-station.org weather-station.xsd
http://www.sensor.org london-sensors.xsd
62
Both simpleType and complexType:
In our examples we have implemented the dangling type as a simpleType. The implementation of
a dangling type does not have to be a simpleType. A schema could define it as a complexType.
Advantages:
Dynamic: A schema which contains a dangling type is very dynamic. It does not statically hard-
code the identity of a schema to implement the type. Rather, it empowers the instance document
author to identify a schema that implements the dangling type. Thus, at instance-document-
creation the type implementation is provided (rather than at schema-document-creation)
Applicable to both Simple and Complex Types: A dangling type can be implemented as either
a simpleType or a complexType. The other methods are only applicable to creating variable
content containers with a complex type.
Disadvantages:
Different Namespace: The implementation of the dangling type must be in another namespace.
It cannot be in the same namespace as the variable content container element. If you have a hard
requirement that the contents of your variable content container have the same namespace as the
container element then this method cannot be employed.
Best Practice
Which method you should use to create your variable content containers ultimately depends on
your requirements. Here are some things to consider.
Use Method 1 (abstract element plus element substitution) when:
Its okay for all the elements to descend from a common type.
You need to provide the ability to extend the collection of elements in the variable content
container without modifying its schema.
You can live with the container elements all being namespace-exposed in instance documents.
Use Method 2 (<choice> element) when:
You need to contain a collection of dissimilar, independent elements
It is adequate to have an external authority (i.e., a human) verify the collection of legal
elements. Verification is accomplished by the external authority selecting which elements shall
be allowed in the <choice> element
Growth of the collection of elements is tightly determined by the external authority that
controls the schema.
63
Use Method 3 (abstract type with type substitution) when:
All the elements in the variable content container are of the same type, or derived from the
same type
Its okay to give all the elements in a variable content container a uniform name.
The collection of elements may grow, independent of the container schema.
You need to support namespace-hiding.
You need to support scalable processing.
Use Method 4 (dangling type) when:
You need a simpleType variable content container
You need to extend a simpleType
You need very dynamic, customizable content
Best Practice: Method 4 is by far the most flexible approach. Unfortunately, as of today (August
16, 2001) none of the schema validators have implemented dangling types. The workaround is to
use the anyType. For example: <xsd:element name=sensor type=anyType/>. We lose a bit of
type checking with this, but it is the best that we can do today. Encourage the schema validator
developers to support this capability!
1
XML Schema Versioning
Issue
What is the Best Practice for versioning XML schemas?
Introduction
It is clear that XML schemas will evolve over time and it is important to capture the
schemas version. This write-up summarizes two cases for schema changes and some
options for schema versioning. It then provides some best practice guidelines for XML
schema versioning.
Schema Changes Two Cases
Consider two cases for changes to XML schemas:
Case 1. The new schema changes the interpretation of some element.
For example, a construct that was valid and meaningful for the previous
schema does not validate against the new schema.
Case 2. The new schema extends the namespace (e.g., by adding new elements),
but does not invalidate previously valid documents.
Versioning Approaches
Some options for identifying a new a schema version are to:
1. Change the (internal) schema version attribute.
2. Create a schemaVersion attribute on the root element.
3. Change the schema's targetNamespace.
4. Change the name/location of the schema.
Option 1: Change the (internal) schema version attribute.
In this approach one would simply change the number in the optional version attribute at
the start of the XML schema. For example, in the code below one could change
version=1.0 to version=1.1
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
attributeFormDefault="unqualified"
version="1.0">
Advantages:
- Easy. Part of the schema specification.
- Instance documents would not have to change if they remain valid with the new
version of the schema (case 2 above).
- The schema contains information that informs applications that it has changed.
An application could interrogate the version attribute, recognize that this is a
new version of the schema, and take appropriate action.
2
Disadvantages:
- The validator ignores the version attribute. Therefore, it is not an enforceable
constraint.
Option 2: Create a schemaVersion attribute on the root element.
With this approach an attribute is included on the element that introduces the namespace.
In the examples below, this attribute is named schemaVersion. This option could be
used in two ways.
Usage A: First, like option 1, this attribute could be used to capture the schema version.
In this case, one could make the attribute required and the value fixed. Then each
instance that used this schema would have to set the value of the attribute to the value
used in the schema. This makes schemaVersion a constraint that is enforceable by the
validator. With the example schema below, the instance would have to include a
schemaVersion attribute with a value of 1.0 for the instance to validate.
<xs:schema xmlns="http://www.exampleSchema"
targetNamespace="http://www.exampleSchema"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="Example">
<xs:complexType>
.
<xs:attribute name="schemaVersion" type="xs:decimal" use="required" fixed="1.0"/>
</xs:complexType>
</xs:element>
Advantages:
- The schemaVersion attribute is an enforceable constraint. Instances would not
validate without the same version number.
Disadvantages:
- The schemaVersion number in the instance must match exactly. This does not
allow an instance to indicate that it is valid using multiple versions of a schema.
Usage B: The second approach uses the schemaVersion attribute in an entirely different
way. It no longer captures the version of the schema within the schema (i.e., it is not a
fixed value). Rather, it is used in the instance to declare the version (or versions) of the
schema with which the instance is compatible. This approach would have to be done in
conjunction with option 1 (or an alternative indicator in the schema file to identify its
version).
The schemaVersion attributes value could be a list or a convention could be used to
define how this attribute is used. For example, if the convention was that the
schemaVersion attribute declares the latest schema version with which the instance is
compatible, then the example instance below states that the instance should be valid with
schema version 1.2 or earlier.
With this approach, an application could compare the schema version (captured in the
schema file) with the version to which the instance reports that it is compatible.
3
Sample Schema (declares its version as 1.3)
<xs:schema xmlns="http://www.exampleSchema"
targetNamespace="http://www.exampleSchema"
elementFormDefault="qualified" attributeFormDefault="unqualified"
version="1.3">
<xs:element name="Example">
<xs:complexType>
.
<xs:attribute name="schemaVersion" type="xs:decimal" use="required"/>
</xs:complexType>
</xs:element>
Sample Instance (declares it is compatible with version 1.2
(or 1.2 and other versions depending upon the convention used))
<Example schemaVersion="1.2"
xmlns="http://www.example"
xsi:schemaLocation="http://www.example MyLocation\Example.xsd">
Advantages:
- Instance documents may not have to change if they remain valid with the new
schema version (case 2).
- Like option 1, an application would receive an indication that the schema has
changed.
- Could provide an alternative to schemaLocation as a means to point to the
correct schema version. This could be desirable where the business practice
requires the use of a schema in a controlled repository, rather than an arbitrary
location.
Disadvantages:
- Requires extra processing by an application. For example, an application
would have to pre-parse the instance to determine what schema version with
which it is compatible, and compare this value to the version number stored in
the schema file.
Option 3: Change the schema's targetNamespace.
In this approach, the schemas targetNamespace could be changed to designate that a
new version of the schema exists. One way to do this is to include a schema version
number in the designation of the target namespace as shown in the example below.
<xs:schema xmlns="http://www.exampleSchemaV1.0"
targetNamespace="http://www.exampleSchemaV1.0"
elementFormDefault="qualified" attributeFormDefault="unqualified">
Advantages:
- Applications are notified of a change to the schema (i.e., an application would
not recognize the new namespace).
- Requires action to assure that there are no compatibility problems with the new
schema. At a minimum, the instance documents that use the schema, and
schemas that include the relevant schema, must change to reference the new
targetNamespace. This both an advantage and a disadvantage.
4
Disadvantages:
- With this approach, instance documents will not validate until they are changed
to designate the new targetNamepsace. However, one does not want to force
all instance documents to change, even if the change to the schema is really
minor and would not impact an instance.
- Any schemas that include this schema would have to change because the
target namespace of the included components must be the same as the target
namespace of the including schema.
Option 4: Change the name/location of the schema.
This approach changes the file name or location of the schema. This mimics the
convention that many people use for naming their files so that they know which version
is the most current (e.g., append version number or date to end of file name).
Advantages:
Disadvantages:
- As with option 3, this approach forces all instance documents to change,
even if the change to the schema would not impact that instance.
- Any schemas that import the modified schema would have to change since
the import statement provides the name and location of the imported schema.
- Unlike the previous options, with this approach an application receives no
hint that the meaning of various element/attribute names has changed.
- The schemaLocation attribute in the instance document is optional and is not
authoritative even if it is present. It is a hint to help the processor to locate
the schema. Therefore, relying on this attribute is not a good practice (with
the current reading of the specification).
XML Schema Versioning Best Practices
[1] Capture the schema version somewhere in the XML schema.
[2] Identify in the instance document, what version/versions of the schema with which
the instance is compatible.
[3] Make previous versions of an XML schema available.
This allows applications to use previous versions. It also allows users to migrate to
new versions of the schema as compatibility is assured.
One way to do this is to have applications pre-parse the instance and choose the
appropriate schema based on the version number. For example, one could have the
schemaLocation URI point to a document that includes a list of the locations of the
available versions of the schema. A tool could then be used to obtain the correct
version of the schema. The disadvantage of this approach is that this pre-parsing
requires two passes at the XML instance (one to get the correct version of the schema
and one to validate).
5
[4] When an XML schema is only extended, (e.g., new elements, attributes, extensions to
an enumerated list, etc.) one should strive to not invalidate existing instance
documents.
For example, if one is adding new elements or attributes, one could consider making
them optional where this makes sense.
Also, one could come up with a convention for schema versioning to indicate whether
the schema changed significantly (case 1) or was only extended (case 2). For
example, for case 1 a version could increment by one (e.g., v1.0 to v2.0) whereas for
case 2 a version could increment by less than one (e.g., v1.2 to v1.3).
In this case, a possible approach would be to do the following with respect to the
schema:
a. Change the schema version number within the schema (e.g., option 1).
b. Record the changes in the schema in a change history.
c. Make the new and previous versions of the schema available (therefore, one
would want to change the file name/location as well).
[5] Where the new schema changes the interpretation of some element (e.g., a construct
that was valid and meaningful for the previous schema does not validate against the
new schema), one should change the target namespace.
In this case, the changes with respect to the schema are the same as with [4], with one
addition:
d. Change the target namespace.
In this case there are also required changes with respect to the instances that use this
schema.
e. Update the instances to reflect the new target namespace.
f. Confirm that there are no compatibility problems with the new schema.
g. Change the attribute that identifies the version/versions of the schema with
which the instance is valid.
h. Update the schema name/location if appropriate.
28
Zero, One, or Many Namespaces?
Table of Contents
Issue
Introduction
Example
Heterogeneous Namespace Design
Homogeneous Namespace Design
Chameleon Namespace Design
Impact of Design Approach on Instance Documents
<redefine> - only Applicable to Homogeneous and Chameleon Namespace Designs
Default Namespace and the Chameleon Namespace Design
Avoiding Name Collisions with Chameleon Components
Creating Tools for Chameleon Components
Best Practice
Issue:
In a project where multiple schemas are created, should we give each schema a different
targetNamespace, or should we give all the schemas the same targetNamespace, or should some
of the schemas have no targetNamespace?
Managing Multiple Schemas - Same or
Different targetNamespaces?
Schema- 1. xsd
Schema- 2. xsd
Schema- n. xsd
. . .
or no targetNamespace?
29
Introduction
In a typical project many schemas will be created. The schema designer is then confronted with
this issue: shall I define one targetNamespace for all the schemas, or shall I create a different
targetNamespace for each schema, or shall I have some schemas with no targetNamespace?
What are the tradeoffs? What guidance would you give someone starting on a project that will
create multiple schemas?
Here are the three design approaches for dealing with this issue:
[1] Heterogeneous Namespace Design:
give each schema a different targetNamespace
[2] Homogeneous Namespace Design:
give all schemas the same targetNamespace
[3] Chameleon Namespace Design:
give the main schema a targetNamespace and give no
targetNamespace to the supporting schemas (the no-namespace
supporting schemas will take-on the targetNamespace of the main
schema, just like a Chameleon)
To describe and judge the merits of the three design approaches it will be useful to take an
example and see each approach in action.
Example: XML Data Model of a Company
Imagine a project which involves creating a model of a company using XML Schemas. One very
simple model is to divide the schema functionality along these lines:
Company schema
Person schema
Product schema
A company is comprised of people and products.
Here are the company, person, and product schemas using the three design approaches.

30
[1] Heterogeneous Namespace Design
This design approach says to give each schema a different targetNamespace, e.g.,
<xsd: schema
t ar get Namespace=" C">
<xsd: i mpor t namespace=" A"
schemaLocat i on=" A. xsd" / >
<xsd: i mpor t namespace=" B"
schemaLocat i on=" B. xsd" / >

</ xsd: schema>
<xsd: schema
t ar get Namespace=" A" >
A. xsd
<xsd: schema
t ar get Namespace=" B" >
B. xsd
C. xsd
Below are the three schemas designed using this design approach. Observe that each schema has
a different targetNamespace.
Product.xsd
targetNamespace="http://www.product.org"
xmlns="http://www.product.org"
<xsd:complexType name="ProductType">
<xsd:sequence>
<xsd:element name="Type" type="xsd:string" minOccurs="1" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
31
Person.xsd
targetNamespace="http://www.person.org"
xmlns="http://www.person.org"
<xsd:complexType name="PersonType">
<xsd:sequence>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="SSN" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Company.xsd
targetNamespace="http://www.company.org"
xmlns="http://www.company.org"
elementFormDefault="qualified"
xmlns:per="http://www.person.org"
xmlns:pro="http://www.product.org">
<xsd:import namespace="http://www.person.org"
schemaLocation="Person.xsd"/>
<xsd:import namespace="http://www.product.org"
schemaLocation="Product.xsd"/>
<xsd:element name="Company">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Person" type="per:PersonType" maxOccurs="unbounded"/>
<xsd:element name="Product" type="pro:ProductType" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Note the three namespaces that were created by the schemas:
http://www.product.org
http://www.person.org
http://www.company.org
32
[2] Homogeneous Namespace Design
This design approach says to create a single, umbrella targetNamespace for all the schemas, e.g.,
<xsd: schema
t ar get Namespace=" Li br ar y">
<xsd: i ncl ude schemaLocat i on=" Li br ar yBookCat al ogue. xsd" / >
<xsd: i ncl ude schemaLocat i on=" Li br ar yEmpl oyees. xsd" / >

</ xsd: schema>
Li br ar yBookCat al ogue. xsd Li br ar yEmpl oyees. xsd
<xsd: schema
t ar get Namespace=" Li br ar y" >
<xsd: schema
t ar get Namespace=" Li br ar y" >
Li br ar y. xsd
Below are the three schemas designed using this approach. Observe that all schemas have the
same targetNamespace.
Product.xsd
xmlns="http://www.product.org"
<xsd:sequence>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
33
Person.xsd
xmlns="http://www.person.org"
<xsd:sequence>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="SSN" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Company.xsd
<xsd:include schemaLocation="Person.xsd"/>
<xsd:include schemaLocation="Product.xsd"/>
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Person" type="PersonType" maxOccurs="unbounded"/>
<xsd:element name="Product" type="ProductType" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Note that all three schemas have the same targetNamespace:
http://www.company.org
Also note the mechanism used for accessing components in other schemas which have the same
targetNamespace: <include>. When accessing components in a schema with a different
namespace the <import> element is used, as we saw above in the Heterogeneous Design.
34
[3] Chameleon Namespace Design
This design approach says to give the main schema a targetNamespace, and the supporting
schemas have no targetNamespace, e.g.,
<xsd: schema
t ar get Namespace=" Z">
<xsd: i ncl ude schemaLocat i on=" Q. xsd" / >
<xsd: i ncl ude schemaLocat i on=" R. xsd" / >

</ xsd: schema>
Q. xsd R. xsd
<xsd: schema > <xsd: schema >
Z. xsd
In our example, the company schema is the main schema. The person and product schemas are
supporting schemas. Below are the three schemas using this design approach:
Product.xsd (no targetNamespace)
<xsd:sequence>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
35
Person.xsd (no targetNamespace)
<xsd:sequence>
<xsd:element name="Name" type="xsd:string" minOccurs="1" maxOccurs="1"/>
<xsd:element name="SSN" type="xsd:string" minOccurs="1" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Company.xsd (main schema, uses the no-namespace-schemas)
<xsd:include schemaLocation="Product.xsd"/>
<xsd:complexType>
<xsd:sequence>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
There are two things to note about this design approach:
First, as shown above, a schema is able to access components in schemas that have no
targetNamespace, using <include>. In our example, the company schema uses the components in
Product.xsd and Person.xsd (and they have no targetNamespace).
Second, note the chameleon-like characteristics of schemas with no targetNamespace:
The components in the schemas with no targetNamespace get namespace-coerced. That is, the
components take-on the targetNamespace of the schema that is doing the <include>.
For example, ProductType in Products.xsd gets implicitly coerced into the company
targetNamespace.
36
Chameleon effect ... This is a term coined by Henry Thompson to describe the ability of
components in a schema with no targetNamespace to take-on the namespace of other schemas.
This is powerful!
Impact of Design Approach on Instance Documents
Above we have shown how the schemas would be designed using the three design approaches.
Lets turn now to the instance document. Does an instance document differ depending on the
design approach? All of the above schemas have been designed to expose the namespaces in
instance documents (as directed by: elementFormDefault=qualified). If they had instead all
used elementFormDefault=unqualified then instance documents would all have this form:
<c:Company xmlns:c="http://www.company.org"
xsi:schemaLocation=
"http://www.company.org
Company.xsd">
<Person>
<Name>John Doe</Name>
<SSN>123-45-6789</SSN>
</Person>
<Product>
<Type>Widget</Type>
</Product>
</c:Company>
It is when the schemas expose their namespaces in instance documents that differences appear. In
the above schemas, they all specified elementFormDefault=qualified, thus exposing their
namespaces in instance documents. Lets see what the instance documents look like for each
design approach:
[1] Company.xml (conforming to the multiple targetNamespaces version)
<Company xmlns="http://www.company.org"
xmlns:per="http://www.person.org"
xmlns:prod="http://www.product.org"
xsi:schemaLocation=
Company.xsd">
<Person>
<per:Name>John Doe</per:Name>
<per:SSN>123-45-6789</per:SSN>
</Person>
<Product>
<prod:Type>Widget</prod:Type>
</Product>
</Company>
37
Note that:
there needs to be a namespace declaration for each namespace
the elements must all be uniquely qualified (explicitly or with a default namespace)
[2] Company.xml (conforming to the single, umbrella targetNamespace version)
xsi:schemaLocation=
Company.xsd">
<Person>
<SSN>123-45-6789</SSN>
</Person>
<Product>
<Type>Widget</Type>
</Product>
</Company>
Since all the schemas are in the same namespace the instance document is able to take advantage
of that by using a default namespace.
[3] Company.xml (conforming to the main targetNamespace with supporting no-
targetNamespace version)
xsi:schemaLocation=
Company.xsd">
<Person>
<SSN>123-45-6789</SSN>
</Person>
<Product>
<Type>Widget</Type>
</Product>
</Company>
Both of the schemas that have no targetNamespace take on the the company targetNamespace
(ala the Chameleon effect). Thus, all components are in the same targetNamespace and the
instance document takes advantage of this by declaring a default namespace.
38
<redefine> - only Applicable to Homogeneous and Chameleon Namespace Designs
The <redefine> element is used to enable access to components in another schema, while
simultaneously giving the capability to modify zero or more of the components. Thus, the
<redefine> element has a dual functionality:
it does an implicit <include>. Thus it enables access to all the components in the referenced
schema
it enables you to redefine zero or more of the components in the referenced schema, i.e.,
extend or restrict components
Example. Consider again the Company.xsd schema above. Suppose that it wishes to use
ProductType in Product.xsd. However, it would like to extend ProductType to include a product
ID. Heres how to do it using redefine:
<xsd:redefine schemaLocation="Product.xsd">
<xsd:extension base="ProductType">
<xsd:sequence>
<xsd:element name="ID" type="xsd:ID"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexType>
</xsd:redefine>
<xsd:complexType>
<xsd:sequence>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Now the <Product> element in instance documents will contain both <Type> and <ID>, e.g.,
39
xsi:schemaLocation=
Company.xsd">
<Person>
<SSN>123-45-6789</SSN>
</Person>
<Product>
<Type>Widget</Type>
<ID>1001-1-00</ID>
</Product>
</Company>
The <redefine> element is very powerful. However, it can only be used with schemas with the
same targetNamespace or with no targetNamespace. Thus, it only applies to the Homogenous
Namespace Design and the Chameleon Namespace Design.
Name collisions
When a schema uses Chameleon components those components become part of the including
schemas targetNamespace, just as though the schema author had typed the element declarations
and type definitions inline. If the schema <include>s multiple no-namespace schemas then there
will be a chance of name collisions. In fact, the schema may end up not being able to use some of
the no-namespace schemas because their use results in name collisions with other Chameleon
components. To demonstrate the name collision problem, consider this example:
Suppose that there are two schemas with no targetNamespace:
1.xsd
A
B
2.xsd
A
C
Schema 1 creates no-namespace elements A and B. Schema 2 creates no-namespace elements A,
and C. Now if schema 3 <include>s these two no-namespace schemas there will be a name
collision:
3.xsd
targetNamespace=http://www.example.org
<include schemaLocation=1.xsd/>
40
This schema has a name collision - A is defined twice. [Note: its not an error to have two
elements in the same symbol space, provided they have the same type. However, if they have a
different type then it is an error, i.e., name collision.]
Namespaces are the standard way of avoiding such collisions. Above, if instead the components
in 1.xsd and 2.xsd resided in different namespaces then 3.xsd could have <import>ed them and
there would be no name collision. [Recall that two elements/types can have the same name if the
elements/types are in different namespaces.]
How do we address the name collision problem that the Chameleon design presents? Thats next.
Resolving Namespace Collisions using Proxy Schemas
There is a very simple solution to the namespace collision problem: for each no-namespace
schema create a companion namespaced-schema (a proxy schema) that <include>s the no-
namespace schema. Then, the main schema <import>s the proxy schemas.
<xsd: schema
t ar get Namespace=" Z2" >
<xsd: i ncl ude schemaLocat i on="R. xsd"/ >
<xsd: schema
t ar get Namespace=" Z1" >
<xsd: i ncl ude schemaLocat i on="Q. xsd"/ >
<xsd: schema t ar get Namespace=" Z" >
<xsd: i mpor t namespace=" Z1" schemaLocat i on=" Q- Pr oxy. xsd"/ >
<xsd: i mpor t namespace=" Z2" schemaLocat i on=" R- Pr oxy. xsd"/ >

</ xsd: schema>
Q. xsd
R. xsd
<xsd: schema >
<xsd: schema >
Z. xsd
Q- Proxy. xsd R- Proxy. xsd
With this approach we avoid name collisions. This design approach has the added advantage that
it also enables the proxy schemas to customize the Chameleon components using <redefine>.
Thus, this approach is a two-step process:
Create the Chameleon schemas
Create a proxy schema for each Chameleon schema
The main schema <import>s the proxy schemas.
The advantage of this two-step approach is that it enables applications to decide on a domain
(namespace) for the components that it is reusing. Furthermore, applications are able to refine/
41
customize the Chameleon components. This approach requires an extra step (i.e., creating proxy
schemas) but in return it provides a lot of flexibility.
Contrast the above two-step process with the below one-step process where the components are
assigned to a namespace from the very beginning:
1-fixed.xsd
targetNamespace=http://www.1-fixed.org
A
B
2-fixed.xsd
targetNamespace=http://www.2-fixed.org
A
C
main.xsd
targetNamespace=http://www.main.org
<xsd:import namespace=http://www.1-fixed.org
schemaLocation=1-fixed.xsd/>
<xsd:import namespace=http://www.2-fixed.org
schemaLocation=2-fixed.xsd/>
This achieves the same result as the above two-step version. In this example, the components are
not Chameleon. Instead, A, B, and C were hardcoded with a namespace from the very beginning
of their life. The downside of this approach is that if main.xsd wants to <redefine> any of the
elements it cannot. Also, applications are forced to use a domain (namespace) defined by
someone else. These components are in a rigid, static, fixed namespace.
Creating Tools for Chameleon Components
Tools for Chameleon Components
We have seen repeatedly how Chameleon components are able to blend in with the schemas that
use them. That is, they adopt the namespace of the schema that <include>s them.
<xsd: schema >
<xsd: schema
t ar get Namespace="Z1">
<xsd: schema
<xsd: i ncl ude schemaLocat i on=Q. xsd"/ >
Chameleon components take-on the namespace of the <include>ing schema
42
How do you write tools for components that can assume so many different faces (namespaces)?
Tool
?
<xsd: schema >
<xsd: schema
<xsd: schema
How does a tool identify components that can assume many faces?
Certainly not by namespaces.
Consider this no-namespace schema:
1.xsd
A
B
Suppose that we wish to create a tool, T, which must process the two Chameleon components A
and B, regardless of what namespace they reside in. The tool must be able to handle the
following situation: imagine a schema, main.xsd, which <include>s 1.xsd. In addition, suppose
that main.xsd has its own element called A (in a different symbol space, so theres no name
collision). For example:
main.xsd
targetNamespace=http://www.example.org
<element name=stuff>
<complexType>
<sequence>
<element name=A type=xxx/>
...
</sequence>
</complexType>
</element>
How would the tool T be able to distinguish between the Chameleon component A and the local
A in an instance document?
43
Chameleon Component Identification
One simple solution is that when you create Chameleon components assign them a global unique
id (a GUID). The XML Schema spec allows you to add an attribute, id, to all element, attribute,
complexType, and simpleType components.
<xsd: el ement name=" Lat _Lon"
id="http://www.geospacial.org"

</ xsd: el ement >
Each component (element, complexType, simpleType, attribute)
in a schema can have an associated id attribute. This can be used
to uniquely identify each Chameleon component, regardless of
its namespace.
Note that the id attribute is purely local to the schema. There is no representation in the instance
documents. This id attribute could be used by a tool to locate a Chameleon component,
regardless of what face (namespace) it currently wears. That is, the tool can open up an
instance document using DOM, and the DOM API will provide the tool access to the id value for
all components in the instance document.
Tool
<xsd: schema >
<xsd: schema
<xsd: schema
i d=" www. geospaci al . or g"
i d=" www. geospaci al . or g" i d=" www. geospaci al . or g"
A tool can locate the Chameleon component by using the id attribute.
Best Practice
Above we explored the design space for this issue. We looked at the three design approaches in
action, both schemas and instance documents. So which design is better? Under what
circumstances?
44
When you are reusing schemas that someone else created you should <import> those schemas,
i.e., use the Heterogeneous Namespace design. It is a bad idea to copy those components into
your namespace, for two reasons: (1) soon your local copies would get out of sync with the other
schemas, and (2) you lose interoperability with any existing applications that process the other
schemas components. The interesting case (the case we have been considering throughout this
discussion) is how to deal with namespaces in a collection of schemas that you created. Heres
our guidelines for this case:
Use the Chameleon Design:
with schemas which contain components that have no inherent semantics by themselves,
with schemas which contain components that have semantics only in the context of an
<include>ing schema,
when you dont want to hardcode a namespace to a schema, rather you want <include>ing
schemas to be able to provide their own application-specific namespace to the schema
Example. A repository of components - such as a schema which defines an array type, or vector,
linked list, etc - should be declared with no targetNamespace (i.e., Chameleon).
As a rule of thumb, if your schema just contains type definitions (no element declarations) then
that schema is probably a good candidate for being a Chameleon schema.
Use the Homogeneous Namespace Design
when all of your schemas are conceptually related
when there is no need to visually identify in instance documents the origin/lineage of each
element/attribute. In this design all components come from the same namespace, so you loose
the ability to identify in instance documents that element A comes from schema X.
Oftentimes thats okay - you dont want to categorize elements/attributes differently. This
design approach is well suited for those situations.
Use the Heterogeneous Namespace Design
when there are multiple elements with the same name. (Avoid name collision)
when there is a need to visually identify in instance documents the origin/lineage of each
element/attribute. In this design the components come from different namespaces, so you have
the ability to identify in instance documents that element A comes from schema X.
Lastly, as we have seen, in a schema each component can be uniquely identified with an id
attribute (this is NOT the same as providing an id attribute on an element in instance documents.
We are talking here about a schema-internal way of identifying each schema component.)
Consider identifying each schema component using the id attribute. This will enable a finer
degree of traceability than is possible using namespaces. The combination of namespaces plus
the schema id attribute is a powerful tandem for visually and programmatically identifying
components.

XML SchemaBestPracticeIn

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

XML SchemaBestPracticeIn

Загружено:

Авторское право:

Доступные форматы

3

Best Practices in a Nutshell

"ref" the Book element in BookCatalogue

Вам также может понравиться