0 оценок0% нашли этот документ полезным (0 голосов)
47 просмотров90 страниц
Make targetNamespace the Default Namespace and Make two identical copies of all your schemas. Unique identify all schema components with the id attribute. Postpone binding schema components to a namespace. Let schemas which your no-namespace schema supply a targetNamespace.
Make targetNamespace the Default Namespace and Make two identical copies of all your schemas. Unique identify all schema components with the id attribute. Postpone binding schema components to a namespace. Let schemas which your no-namespace schema supply a targetNamespace.
Make targetNamespace the Default Namespace and Make two identical copies of all your schemas. Unique identify all schema components with the id attribute. Postpone binding schema components to a namespace. Let schemas which your no-namespace schema supply a targetNamespace.
Below is a very brief synopsis of the Best Practice guidelines. Each of the items will be elaborated upon in great detail in the following guidelines. 1. Make targetNamespace the default namespace. 2. Make two identical copies of all your schemas, where the copies differ only in the value of elementFormDefault (in one copy set elementFormDefault=qualified, in the other copy set elementFormDefault=unqualified) 3. Uniquely identify all schema components with the id attribute. Note: this is NOT the same thing as creating an element with an attribute that has an ID-datatype. Rather, what is being referred to here is the capability to associate an id attribute with every schema component (types, elements, attributes, etc). Here are some examples: <xsd:element name=elevation type=xsd:integer id=flight:aircraft:elevation/> <xsd:complexType name=publication id=wrox:book:publication/> This provides a finer level of granularity for identifying components than does namespaces, which provides only a course level of granularity. 4. Postpone decisions as long as possible. 4.1 Postpone binding schema components to a namespace. Corrollary: Dont give schemas a targetNamespace. Let schemas which <include> your no- namespace schema supply a targetNamespace, one that makes sense to the <include>ing schema 4.2 Postpone binding a type reference to an implementation, i.e., use dangling types. Corrollary: In an <import> element the schemaLocation attribute is optional. Dont use it. 5. Create extensible schemas. 5.1 Recognize your limitations as a schema designer, i.e, be smart enough to know that youre not smart enough to anticipate all the varieties of data that an instance document author might need to use in creating an instance document. Corrollary: use the <any> element. 6. Recognize that with XML Schemas you will not be able to express all your business rules. Express those business rules using either XSLT or Schematron. 4 Default Namespace - targetNamespace or XMLSchema? Table of Contents Issue Introduction Approach 1: Default XMLSchema, Qualify targetNamespace Approach 2: Qualify XMLSchema, Default targetNamespace Approach 3: No Default Namespace - Qualify both XMLSchema and targetNamespace Best Practice Issue When creating a schema should XMLSchema (i.e., http://www.w3.org/2001/XMLSchema) be the default namespace, or should the targetNamespace be the default, or should there be no default namespace? Introduction Except for no-namespace schemas, every XML Schema uses at least two namespaces - the targetNamespace and the XMLSchema (http://www.w3.org/2001/XMLSchema) namespace, e.g., Library Book BookCatalogue http://www.publishing.org (targetNamespace) element annotation documentation complexType schema sequence http://www.w3.org/2001/XMLSchema string integer Library.xsd This schema is comprised of components from two namespaces. Which namespace should be the default? CardCatalogueEntry 5 There are three ways to design your schemas, with regards to dealing with these two namespaces: 1. Make XMLSchema the default namespace, and explicitly qualify all references to components in the targetNamespace. 2. Vice versa - make the targetNamespace the default namespace, and explicitly qualify all components from the XMLSchema namespace. 3. Do not use a default namespace - explicitly qualify references to components in the targetNamespace and explicitly qualify all components from the XMLSchema namespace. Lets look at each approach in detail. In the following discussions we will consider this scenario: targetNamespace="http://www.publishing.org" include BookCatalogue.xsd
"ref" the Book element in BookCatalogue
Library.xsd BookCatalogue.xsd The BookCatalogue schema must either: - have the same namespace as the Library schema, or - have no namespace. Declare an element "Book" globally so that it can be reused by other schemas, e.g., the Library schema 6 Approach 1: Default XMLSchema, Qualify targetNamespace Below is a Library schema which demonstrates this design approach. It <include>s a BookCatalogue schema, which contains a declaration for a Book element. The Library schema references (refs) the Book element. <?xml version="1.0"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.library.org" xmlns:lib="http://www.library.org" elementFormDefault="qualified"> <include schemaLocation="BookCatalogue.xsd"/> <element name="Library"> <complexType> <sequence> <element name="BookCatalogue"> <complexType> <sequence> <element ref="lib:Book" maxOccurs="unbounded"/> </sequence> </complexType> </element> </sequence> </complexType> </element> </schema> Default namespace is XMLSchema Set "lib" to point to the targetNamespace Qualify the reference to Book Note that XMLSchema is the default namespace. Consequently, all the components used to construct the schema - element, include, complexType, sequence, schema, etc - have no namespace qualifier on them. There is a namespace prefix, lib, which is associated with the targetNamespace. Any references (using the ref attribute) to components in the targetNamespace (Library, BookCatalogue, Book, etc) are explicitly qualified with lib (in this example there is a ref to lib:Book). Advantages: If your schema is referencing components from multiple namespaces then this approach gives a consistent way of referring to the components (i.e., you always qualify the reference). Disadvantages: Schemas which have no-targetNamespace must be designed so that the XMLSchema components (element, complexType, sequence, etc) are qualified. If you adopt this approach to designing your schemas then in some of your schemas you will qualify the XMLSchema components and in other schemas you wont qualify the XMLSchema components. Changing from one way of designing your schemas to another way can be confusing. 7 Approach 2: Qualify XMLSchema, Default targetNamespace This design approach is the mirror image of the first approach: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.library.org" xmlns="http://www.library.org" elementFormDefault="qualified"> <xsd:include schemaLocation="BookCatalogue.xsd"/> <xsd:element name="Library"> <xsd:complexType> <xsd:sequence> <xsd:element name="BookCatalogue"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> Default namespace is targetNamespace Book is in the default namespace (thus, no namespace qualifier required) Set "xsd" to point to the XMLSchema namespace Qualify the XMLSchema components (schema, include, element, complexType, sequence) With this approach all the components used to construct a schema are namespace qualified (with xsd:). There is a default namespace declaration that declares the targetNamespace to be the default namespace. Any references to components in the targetNamespace are not namespace qualified (note that the ref to Book is not namespace qualified). Advantages: Schemas which have no-targetNamespace must be designed so that the XMLSchema components (element, complexType, sequence, etc) are qualified. This approach will work whether your schema has a targetNamespace or not. Thus, with this approach you have a consistent approach to designing your schemas - always namespace-qualify the XMLSchema components. Disadvantages: If your schema is referencing components from multiple namespaces then for some references you will namespace-qualify the reference, whereas other times you will not (i.e., when you are referencing components in the targetNamespace). This variable use of namespace qualifiers in referencing components can be confusing. 8 Approach 3: No Default Namespace - Qualify both XMLSchema and targetNamespace This design approach does not have a default namespace: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.library.org" xmlns:lib="http://www.library.org" elementFormDefault="qualified"> <xsd:include schemaLocation="BookCatalogue.xsd"/> <xsd:element name="Library"> <xsd:complexType> <xsd:sequence> <xsd:element name="BookCatalogue"> <xsd:complexType> <xsd:sequence> <xsd:element ref="lib:Book" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> Qualify the XMLSchema components (schema, include, element, complexType, sequence) Set "xsd" to point to the XMLSchema namespace Set "lib" to point to the targetNamespace Qualify the reference to Book Note that both the XMLSchema components are explicitly qualified, as well as are references to components in the targetNamespace. Advantages: [1] Schemas which have no-targetNamespace must be designed so that the XMLSchema components (element, complexType, sequence, etc) are qualified. With this approach all your schemas are designed in a consistent fashion. [2] If your schema is referencing components from multiple namespaces then this approach gives a consistent way of referencing components (i.e., you always qualify the reference). Disadvantages: Very cluttered: being very explicit by namespace qualifying all components and all references can be annoying when reading the schema. Best Practice There is no clear-cut best practice with regards to this issue. In large part it is a matter of personal preference. 9 ShoulditbeanElementoraType? TableofContents Issue Introduction BestPractice Issue Whenshouldanitembedeclaredasanelementversuswhenshoulditbedefinedasatype? Introduction Thisissueisbestdiscussedbywayofexample: Example Should Warranty be declared as an element: <xsd:element name=Warranty> ... </xsd:element> or as a type: <xsd:complexType name=Warranty> ... </xsd:complexType> BestPractice [1] Whenindoubt,makeitatype.Youcanalwayscreateanelementfromthetype,ifneeded. Withatype,otherelementscanreusethattype. Example.IfyoucantdecidewhethertomakeWarrantyanelementoratype,thenmakeita type: <xsd:complexType name=Warranty> ... </xsd:complexType> 10 IfyoudecidelaterthatyouneedaWarrantyelement,youcancreateoneusingtheWarranty type: <xsd:element name=Warranty type=Warranty/> RecallthatelementsandtypesareindifferentSymbolSpaces.Hence,youcanhaveanelement andtypewiththesamename. [2] Iftheitemisnotintendedtobeanelementininstancedocumentsthendefineitasatype. Example.Ifyouwillneverseethisinaninstancedocument: <Warranty> ... </Warranty> thendefineWarrantyasacomplexType. [3] Iftheitemscontentistobereusedbyotheritemsthendefineitasatype. Example.IfotheritemsneedtouseWarrantyscontent,thendefineWarrantyasatype: <xsd:complexType name=Warranty> ... </xsd:complexType> ... <xsd:element name=PromissoryNote type=Warranty/> <xsd:element name=AutoCertificate type=Warranty/> Theexampleshowstwoelements-PromissoryNoteandAutoCertificate-reusingtheWarranty type. [4] ftheitemisintendedtobeusedasanelementininstancedocuments,anditsrequiredthat sometimesitbenillableandothertimesnot,thenitmustbedefineditasatype. Example.Letsfirstseehownottodoit.SupposethatwecreateaWarrantyelement: <xsd:element name=Warranty> ... </xsd:element> TheWarrantyelementcanbereusedelsewherebyrefingit: <xsd:element ref=Warranty/> SupposethatwealsoneedaversionofWarrantythatsupportsanilvalue.Youmightbetempted todothis: <xsd:element ref=Warranty nillable=true/> 11 Thisisnotlegal.Thisdynamicmorphingcapability(i.e.,reusingaWarrantyelementdeclaration whilesimultaneouslyaddingnillability)cannotbeachievedusingelements.Thereasonforthis isthattherefandnillableattributesaremutuallyexclusive-youcanuseref,oryoucanuse nillable,butnotboth.Theonlywaytoaccomplishthedynamicmorphingcapabilityisby definingWarrantyasatype: <xsd:complexType name=Warranty> ... </xsd:complexType> andthenreusingthetype: <xsd:element name=Warranty nillable=true type=Warranty/> ... <xsd:element name=Warranty type=Warranty/> InthefirstcaseWarrantyisnillable.Inthesecondcaseitsnotnillable. [5] Iftheitemisintendedtobeusedasanelementininstancedocumentsandotherelements aretobeallowedtosubstitutefortheelement,thenitmustbedeclaredasanelement. Example.Supposethatwewouldliketoenableinstancedocumentauthorstouse interchangeablythevocabulary(i.e.,tagname)Warranty,Guarantee,orPromise,i.e., <xsd:Warranty> ... </xsd:Warranty> ... <xsd:Guarantee> ... </xsd:Guarantee> ... <xsd:Promise> ... </xsd:Promise> Toenablethissubstitutable-tag-namecapability,Warranty,Guarantee,andPromisemustbe declaredaselements,andmademembersofasubstitutionGroup: <xsd:element name=Warranty> ... </xsd:element> <xsd:element name=Guarantee substitutionGroup=Warranty/> <xsd:element name=Promise substitutionGroup=Warranty/> 75 Extending XML Schemas Table of Contents Issue Tutorial Introduction Three Options for Extending XML Schemas Supplement with Another Schema Language Write Code to Express Additional Constraints Express Additional Constraints with an XSLT/XPath Stylesheet Advantages/Disadvantages of the Three Options Advantages/Disadvantages of Supplementing with Another Schema Language Advantages/Disadvantages of Writing Code to Express Additional Constraints Advantages/Disadvantages of Expressing Additional Constraints with an XSLT/XPath Stylesheet Issue What is Best Practice for checking instance documents for constraints that are not expressable by XML Schemas? Introduction XML Schemas is very powerful. However, it is not all powerful. There are many constraints which cannot be expressed with XML Schemas. Here are some examples: Ensure that the value of the aircraft <Elevation> element is greater than the value of the obstacle <Height> element. Ensure that: if the value of the attribute, mode, is water then the value of the element <Transportation> is either airplane or hot-air balloon. if the value of the attribute, mode, is air then <Transportation> is either boat or hovercraft. if the value of the attribute, mode, is ground then <Transportation> is either car or bicycle. Ensure that the value of <PaymentReceived> is equal to the value of <PaymentDue>, where these elements are in separate documents! To check all these constraints we will need to supplement XML Schemas with another tool. 76 Example. Consider this simple instance document: <?xml version="1.0"?> <Demo xmlns="http://www.demo.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.demo.org demo.xsd"> <A>10</A> <B>20</B> </Demo> With XML Schemas we can check the following constraints: the Demo (root) element contains a sequence of elements, A followed by B the A element contains an integer the B element contains an integer In fact, heres an XML Schema which implements these constraints: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.demo.org" xmlns="http://www.demo.org" elementFormDefault="qualified"> <xsd:element name="Demo"> <xsd:complexType> <xsd:sequence> <xsd:element name="A" type="xsd:integer"/> <xsd:element name="B" type="xsd:integer"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> 77 XML Schemas does not give us the capability to express the following constraint: the value of A must be greater than the value of B So what do we do to check this constraint? (Interestingly, for the above instance document, the XML Schema that is shown would accept it as valid, whereas, in fact it is not since the value of A is less than the value of B. We need something else to check this constraint.) There are three options. Three Options for Extending XML Schemas (1) Supplement with Another Schema Language There are many other schema languages besides XML Schemas: Schematron TREX RELAX SOX XDR HOOK DSD Assertion Grammars xlinkit Thus, the first option is to use one (or more) of these schema languages to express the additional constraints. Lets look at one of these languages - Schematron. Using Schematron you embed the additional constraints (as assertions) within the schema document (within <appinfo> elements). A Schematron engine will then extract the assertions and validate the instance document against the assertions. XML Schema Schematron Extract assertions from <appinfo> elements XML Data Valid/invalid 78 Thus, heres the architecture for determining if your data meets all constraints: Your XML data Schema Validator XML Schema valid Now you know your XML data is valid! Schematron valid With Schematron state a constraint using an <assert> element a <rule> is comprised of one or more <assert> elements a <pattern> is comprised of one or more <rule> elements. <pattern> <rule> <assert> </assert> <assert> </assert> </rule> </pattern> The <pattern> element is embedded within an XML Schema <appinfo> element. 79 Heres an example of an assertion: <assert test="d:A > d:B">A should be greater than B</assert> XPath expression Text description of the constraint In the <assert> element you express a constraint using an XPath expression. Additionally, you state the constraint in natural language. The later helps make the schemas self-documenting. The <rule> element is used to specify the context for the <assert> elements. This is read as: Within the context of the Demo element we assert that the A element should be greater than the B element. You can associate an <assert> with a <diagnostic> element. The <diagnostic> element is used for printing error messages when the XML data fails the assertion. The <diagnostic> element is embedded within a <diagnostics> element, which immediately follows the <pattern> element. <pattern name=Check A greater than B> <rule context=d:Demo> <assert test=d:A > d:B diagnostics=lessThan> A should be greater than B </assert> </rule> </pattern> <diagnostics> <diagnostic id=lessThan> Error! A is less than B A = <value-of select=d:A/> B = <value-of select=d:B/> </diagnostic> </diagnostics> <rule context=d:Demo> <assert test=d:A > d:B>A should be greater than B</assert> </rule> 80 To identify the schematron elements, they must be namespace-qualified with sch:. Heres what demo.xsd looks like after enhancing it with the Schematron elements: (next page) The schema document shown earlier is enhanced with Schematron directives: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.demo.org" xmlns="http://www.demo.org" xmlns:sch="http://www.ascc.net/xml/Schematron elementFormDefault="qualified"> <xsd:annotation> <xsd:appinfo> <sch:title>Schematron validation</sch:title> <sch:ns prefix="d" uri="http://www.demo.org"/> </xsd:appinfo> </xsd:annotation> <xsd:element name="Demo"> <xsd:annotation> <xsd:appinfo> <sch:pattern name="Check A greater than B"> <sch:rule context="d:Demo"> <sch:assert test="d:A > d:B" diagnostics="lessThan">A should be greater than B</sch:assert> </sch:rule> </sch:pattern> <sch:diagnostics> <sch:diagnostic id="lessThan"> Error! A is less than B. A = <sch:value-of select="d:A"/> B = <sch:value-of select="d:B"/> </sch:diagnostic> </sch:diagnostics> </xsd:appinfo> </xsd:annotation> <xsd:complexType> <xsd:sequence> <xsd:element name="A" type="xsd:integer" minOccurs="1" maxOccurs="1"/> <xsd:element name="B" type="xsd:integer" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> Schematron will extract the directives out of the schema document to create a Schematron schema. Schematron will then validate the instance document against the Schematron schema. The key points to note about using Schematron are: The additional constraints are embedded in <appinfo> elements within the XML Schema document The constraints are expressed using <assert> elements 81 (2) Write Code to Express Additional Constraints The second option is to write some Java, Perl, C++, etc code to check additional constraints. (3) Express Additional Constraints with an XSLT/XPath Stylesheet The third option is to write a stylesheet to check the constraints. Your XML data Schema Validator XML Schema XSL Processor XSLT Stylesheet containing code to check additional constraints valid valid Now you know your XML data is valid! For example, the following stylesheet checks instance documents to see if the contents of the A element is greater than the contents of the B element: <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:d="http://www.demo.org" version="1.0">
<xsl:output method="text"/> <xsl:template match="/"> <xsl:if test="/d:Demo/d:A < /d:Demo/d:B"> <xsl:text>Error! A is less than B</xsl:text> <xsl:text>
</xsl:text> <!-- carriage return --> <xsl:text>A = </xsl:text><xsl:value-of select="/d:Demo/d:A"/> <xsl:text>
</xsl:text> <!-- carriage return --> <xsl:text>B = </xsl:text><xsl:value-of select="/d:Demo/d:B"/> </xsl:if> <xsl:if test="/d:Demo/d:A >= /d:Demo/d:B"> <xsl:text>Instance document is valid</xsl:text> </xsl:if> </xsl:template> </xsl:stylesheet> 82 Upon running this stylesheet on the above XML data the following output is generated: Error! A is less than B. A = 10, B = 20 This is exactly what is desired. Thus, the methodology for this third option is: check as many constraints as you can using XML Schemas for all other constraints write a stylesheet to do the checking If both the schema validator and the XSL processor generate a positive output then you know that your instance document is valid. This combination of XML Schemas plus stylesheets provides for a powerful constraint checking mechanism. Advantages/Disadvantages of the Three Options Advantages Collocated Constraints: Above we saw how Schematron can be used to express additional constraints. We saw that you embed the Schematron directives within the XML Schema document. There is something very appealing about having all the constraints expressed within one document rather than being dispersed over multiple documents. [This ability to collocate constraints within the schema document is a feature of Schematron. The other schema languages do not have this capability.] Simplicity: Many of the schema languages were created in reaction to the com- plexity and limitations of XML Schemas. Consequently, most of them are rela- tively simple to learn and use. Disadvantages Multiple Schema Languages may be Required: Each schema language has its own capabilities and limitations. Multiple schema languages may be required to express all the additional constraints. For example, while Schematron is very powerful it is not able to express all constraints (for an example, see the ISBN simpleType definition on http://www.xfront.com). Also, Schematron forces you to go through many contortions to express your assertion. This is due to the fact that it does not have loops and variables. Yet Another Vocabulary (YAV): There are many schema languages, each with its own vocabulary and semantics. How do you find a schema language with the capability to express your problems additional constraints? You have to take the time to learn each of the schema languages. Hopefully, you will find one that supports expression of your constraints. Although relatively easy to learn and use, it still takes time to learn a new vocabulary and semantics. 83 Questionable Long Term Support: In most cases the schema languages listed above were created by a single author. These authors are busy, very bright people. Someday their interests will move to something else. At that time you may be left with a product which is no longer supported. [Editors Note: Schematron is basically a few XSLT/XPath stylesheets. Consequently, Schematron will be supported as long as there are XSL processors. Also, the author of RELAX has publically promised to support RELAX for the next five years.] (2) Write Code to Express Additional Constraints Advantages Full Power of a Programming Language: The advantage of this option is that with a single programming language you can express all the additional con- straints. Disadvantages Not Leveraging other XML Technologies: There are other XML technologies that could be used to express the additional constraints in a declarative manner, without going through the compiling, linking, executing effort. (3) Express Additional Constraints with an XSLT/XPath Stylesheet Advantages Application Specific Constraint Checking: Each application can create its own stylesheet to check constraints that are unique to the application. We can enhance the schema without touching it! Core Technology: XSLT/XPath is a core technology which is well supported, well understood, and with lots of material written on it. Expressive Power: XSLT/XPath is a very powerful language. Most, if not every, constraint that you might ever need to express can be expressed using XSLT/ XPath. Thus you dont have to learn multiple schema languages to express your additional constraints Long Term Support: XSLT/XPath is well supported, and will be around for a long time. Disadvantages Separate Documents: With this approach you will write your XML Schema document, then you will write a separate XSLT/XPath document to express additional constraints. Keeping the two documents in synch needs to be carefully managed. 64 Creating Extensible Content Models Table of Contents Issue Definition Introduction Extensibility via Type Substitution Extensibility via the <any> Element Non-determinism and the <any> element Best Practice Issue What is Best Practice for creating extensible content models? Definition An element has an extensible content model if in instance documents the authors can extend the contents of that element with additional elements beyond what was specified by the schema. Introduction <xsd:element name= Book> <xsd:complexType> <xsd:sequence> <xsd:element name=Title type=string/> <xsd:element name=Author type=string/> <xsd:element name=Date type=string/> <xsd:element name=ISBN type=string/> <xsd:element name=Publisher type=string/> </xsd:sequence> </xsd:complexType> </xsd:element> This schema snippet dictates that in instance documents the <Book> elements must always be comprised of exactly 5 elements <Title>, <Author>, <Date>, <ISBN>, and <Publisher>. For example: <Book> <Title>The First and Last Freedom</TItle> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-0064831-7</ISBN> <Publisher>Harper & Row</Publisher> </Book> The schema specifies a fixed/static content model for the Book element. Books content must rigidly conform to just the schema specification. Sometimes this rigidity is a good thing. Sometimes we want to give our instance documents more flexibility. 65 How do we design the schema so that Books content model is extensible? Below are two methods for implementing extensible content models. Extensibility via Type Substitution Consider this version of the above schema, where Books content model has been defined using a type definition: <xsd:complexType name=BookType> <xsd:sequence> <xsd:element name=Title type=xsd:string/> <xsd:element name=Author type=xsd:string/> <xsd:element name=Date type=xsd:string/> <xsd:element name=ISBN type=xsd:string/> <xsd:element name=Publisher type=xsd:string /> </xsd:sequence> </xsd:complexType> <xsd:element name=BookCatalogue> <xsd:complexType> <xsd:sequence> <xsd:element name=Book type=BookType maxOccurs=unbounded/> </xsd:sequence> </xsd:complexType> </xsd:element> Recall that via the mechanism of type substitutability, the contents of <Book> can be substituted by any type that derives from BookType. <Book> -- content -- </Book> For example, if a type is created which derives from BookType: <xsd:complexType name=BookTypePlusReviewer> <xsd:complexContent> <xsd:extension base=BookType > <xsd:sequence> <xsd:element name=Reviewer type=xsd:string/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> 66 then instance documents can create a <Book> element that contains a <Reviewer> element, along with the other five elements: <Book xsi:type=BookTypePlusReviewer> <Title>My Life and Times</Title> <Author>Paul McCartney</Author> <Date>1998</Date> <ISBN>94303-12021-43892</ISBN> <Publisher>McMillin Publishing</Publisher> <Reviewer>Roger Costello</Reviewer> </Book> Thus, Books content model has been extended with a new element (Reviewer)! In this example, BookTypePlusReviewer has been defined within the same schema as BookType. In general, however, this may not be the case. Other schemas can import/include the BookCatalogue schema and define types which derive from BookType. Thus, the contents of Book may be extended, without modifying the BookCatalogue schema, as we see on the next page: Extend a Schema, without Touching it! <xsd:complexType name="BookType"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="BookTypePlusReviewer"> <xsd:complexContent> <xsd:extension base="BookType" > <xsd:sequence> <xsd:element name="Reviewer" type="xsd:string"/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element Book type="BookType"/> BookCatalogue.xsd <xsd:include schemaLocation="BookCatalogue.xsd"/> xmlns=" http://www.publishing.org" xmlns=" http://www.publishing.org" MyTypeDefinitions.xsd And heres what an instance document would look like: 67 <Book xsi:type="BookTypePlusReviewer"> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> <Reviewer>Roger L. Costello</Reviewer> </Book> xsi:schemaLocation="http://www.publishing.org MyTypeDefinitions.xsd" xmlns="http://www.publishing.org" We have type-substituted Book's content with the type specified in the new schema. Thus, we have extended BookCatalogue.xsd without touching it! This type substitutability mechanism is a powerful extensibility mechanism. However, it suffers from two problems: Disadvantages: Location Restricted Extensibility: The extensibility is restricted to appending elements onto the end of the content model (after the <Publisher> element). What if we wanted to extend <Book> by adding elements to the beginning (before <Title>), or in the middle, etc? We cant do it with this mechanism. Unexpected Extensibility: If you look at the declaration for Book: <xsd:element name=Book type=BookType maxOccurs=unbounded/> and the definition for BookType: <xsd:complexType name=BookType> <xsd:sequence> <xsd:element name=Title type=xsd:string/> <xsd:element name=Author type=xsd:string/> <xsd:element name=Date type=xsd:gYear/> <xsd:element name=ISBN type=xsd:string/> <xsd:element name=Publisher type=xsd:string/> </xsd:sequence> </xsd:complexType> 68 it is easy to be fooled into thinking that in instance documents the <Book> elements will always contain just <Title>, <Author>, <Date>, <ISBN>, and <Publisher>. It is easy to forget that someone could extend the content model using the type substitutability mechanism. Extensibility is unexpected! Consequently, if you write a program to process BookCatalogue instance documents, you may forget to take into account the fact that a <Book> element may contain more than five children. It would be nice if there was a way to explicitly flag places where extensibility may occur: hey, instance documents may extend <Book> at this point, so be sure to write your code taking this possibility into account. In addition, it would be nice if we could extend Books content model at locations other than just the end ... The <any> element gives us these capabilities beautifully, as is discussed in the next section. Extensibility via the <any> Element An <any> element may be inserted into a content model to enable instance documents to contain additional elements. Heres an example showing an <any> element at the end of Books content model: <xsd:element name= Book> <xsd:complexType> <xsd:sequence> <xsd:element name=Title type=string/> <xsd:element name=Author type=string/> <xsd:element name=Date type=string/> <xsd:element name=ISBN type=string/> <xsd:element name=Publisher type=string/> <xsd:any namespace=##any minOccurs=0/> </xsd:sequence> </xsd:complexType> </xsd:element> The content of Book is Title, Author, Date, ISBN, Publisher and then (optionally) any well- formed element. The new element may come from any namespace. Note the <any> element may be inserted at any point, e.g., it could be inserted at the top, in the middle, etc. 69 In this version of the schema it has been explicitly specified that after the <Publication> element any well-formed XML element may occur and that XML element may come from any namespace. For example, suppose that the instance document author discovers a schema, containing a declaration for a Reviewer element: <xsd:element name="Reviewer"> <xsd:complexType> <xsd:sequence> <xsd:element name="Name"> <xsd:complexType> <xsd:sequence> <xsd:element name="First" type="xsd:string"/> <xsd:element name="Last" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> And suppose that for an instance document author it is important that, in addition to specifying the Title, Author, Date, ISBN, and Publisher of each Book, he/she specify a Reviewer. Because the schema has been designed with extensibility in mind, the instance document author can use the Reviewer element in his/her BookCatalogue: <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> <rev:Reviewer> <rev:Name> <rev:Last>Costello</rev:Last> <rev:First>Roger</rev:First> </rev:Name> </rev:Reviewer> </Book>
The instance document author has enhanced the instance document with an element that the schema designer may have never even envisioned. We have empowered the instance author with a great deal of flexibility in creating the instance document. Wow! 70 An alternate schema design is to create a BookType (as we did above) and embed the <any> element within the BookType: <xsd:element name="Book"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:year"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> <xsd:any namespace="##any" minOccurs="0"/> </xsd:sequence> </xsd:element> and then declare Book of type BookType: <xsd:element Book type="BookType"/> However, then we are then back to the unexpected extensibility" problem. Namely, after the <Publication> element any well-formed XML element may occur, and after that anything could be present. There is a way to control the extensibility and still use a type. We can add a block attribute to Book: <xsd:element Book type="BookType" block="#all"/> The block attribute prohibits derived types from being used in Books content model. Thus, by this method we have created a reusable component (BookType), and yet we still have control over the extensibility. With the <any> element we have complete control over where, and how much extensibility we want to allow. For example, suppose that we want to enable there to be at most two new elements at the top of Books content model. Heres how to specify that using the <any> element: 71 <xsd:complexType name="Book"> <xsd:sequence> <xsd:any namespace="##other" minOccurs="0" maxOccurs="2"/> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/ <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType>
Note how the <any> element has been placed at the top of the content model, and it has set maxOccurs=2". Thus, in instance documents the <Book> content will always end with <Title>, <Author>, <Date>, <ISBN>, and <Publisher>. Prior to that, two well-formed XML elements may occur. In summary: We can put the <any> element specifically where we desire extensibility. If we desire extensibility at multiple locations, we can insert multiple <any> elements. With maxOccurs we can specify how much extensibility we will allow. Non-Determinism and the <any> element In the above BookType definition we used an <any> element at the beginning of the content model. We specified that the <any> element must come from an other namepace (i.e., it must not be an element from the targetNamespace). If, instead, we had specified namespace=##any then we would have gotten a non-deterministic content model error when validating an instance document. Lets see why. A non-deterministic content model is one where, upon encountering an element in an instance document, it is ambiguous which path was taken in the schema document. For example. Suppose that we were to declare BookType using ##any, as follows: <xsd:complexType name="Book"> <xsd:sequence> <xsd:any namespace="##any" minOccurs="0" maxOccurs="2"/> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/ <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType>
72 And suppose that we have this (snippet of an) instance document: <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> </Book> Lets see what happens when a schema validator gets to the <Title> element in this instance document. The schema validator must determine what this Title element declarationt this corresponds to in the schema document. Do you see the ambiguity? There is no way no know, without doing a look-ahead, whether the Title element comes from the <any> element, or comes from the <xsd:element name=Title .../> declaration. This is a non-deterministic content model: if your schema has a content model which would require a schema validator to look-ahead then your schema is non-deterministic. Non-deterministic schemas are not allowed. The solution in our example is to declare that the <any> element must come from an other namespace, as was shown earlier. That works fine in this example where all the BookCatalogue elements come from the targetNamespace, and the <any> element comes from a different namespace. Suppose, however, that the BookCatalogue schema imported element declarations from other namespaces. For example: <?xml version="1.0"?> <xsd:schema ... xmlns:bk="http://www.books.com" > <xsd:import namespace="http://www.books.com" schemaLocation="Books.xsd"/> <xsd:complexType name="Book"> <xsd:sequence> <xsd:any namespace="##other" minOccurs="0"/> <xsd:element ref="bk:Title"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema>
73 Now consider this instance document: <Book> <bk:Title>The First and Last Freedom</bk:TItle> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-0064831-7</ISBN> <Publisher>Harper & Row</Publisher> </Book> When a schema validator encounters bk:Title it will try to validate it against the appropriate element declaration in the schema. But is this the Title refered to by the schema (i.e., in the http:// www.books.com namespace), or does this Title come from using the <any> element? It is ambiguous, and consequently non-deterministic. Thus, this schema is also illegal. As you can see, prohibiting non-deterministic content models makes the use of the <any> element quite restricted. So, what do you do when you want to enable extensibility at arbitrary locations? Answer: put in an optional <other> element and let its content be <any>. Heres how to do it: <xsd:complexType name="Book"> <xsd:sequence> <xsd:element name="other" minOccurs="0"> <xsd:any namespace="##any" maxOccurs="2"/> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/ <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType>
Now, instance document authors have an explicit container element (<other>) in which to put additional elements. This isnt the most ideal solution, but its the best that we can do given the rule that schemas may not have non-deterministic content models. Write to the XML Schema working group and tell them that you want the prohibition of non-deterministic content models revoked! Best Practice The <any> element is an enabling technology. It turns instance documents from static/rigid structures into rich, dynamic, flexible data objects. It shifts focus from the schema designer to the instance document author in terms of defining what data makes sense. It empowers instance document authors with the ability to decide what data makes sense to him/her. As a schema designer you need to recognize your limitations. You have no way of anticipating all the varieties of data that an instance document author might need in creating an instance document. Be smart enough to know that youre not smart enough to anticipate all possible needs! Design your schemas with flexibility built-in. 74 Definition: an open content schema is one that allows instance documents to contain additional elements beyond what is declared in the schema. As we have seen, this may be achieved by using the <any> (and <anyAttribute>) element in the schema. Sprinkling <any> and <anyAttribute> elements liberally throughout your schema will yield benefits in terms of how evolvable your schema is: Enabling Schema Evolution using Open Content Schemas In todays rapidly changing market static schemas will be less commonplace, as the market pushes schemas to quickly support new capabilities. For example, consider the cellphone industry. Clearly, this is a rapidly evolving market. Any schema that the cellphone community creates will soon become obsolete as hardware/software changes extend the cellphone capabilities. For the cellphone community rapid evolution of a cellphone schema is not just a nicety, the market demands it! Suppose that the cellphone community gets together and creates a schema, cellphone.xsd. Imagine that every week NOKIA sends out to the various vendors an instance document (conforming to cellphone.xsd), detailing its current product set. Now suppose that a few months after cellphone.xsd is agreed upon NOKIA makes some breakthroughs in their cellphones - they create new memory, call, and display features, none of which are supported by cellphone.xsd. To gain a market advantage NOKIA will want to get information about these new capabilities to its vendors ASAP. Further, they will have little motivation to wait for the next meeting of the cellphone community to consider upgrades to cellphone.xsd. They need results NOW. How does open content help? That is described next. Suppose that the cellphone schema is declared open". Immediately NOKIA can extend its instance documents to incorporate data about the new features. How does this change impact the vendor applications that receive the instance documents? The answer is - not at all. In the worst case, the vendors application will simply skip over the new elements. More likely, however, the vendors are showing the cellphone features in a list box and these new features will be automatically captured with the other features. Lets stop and think about what has been just described Without modifying the cellphone schema and without touching the vendors applications, information about the new NOKIA features has been instantly disseminated to the marketplace! Open content in the cellphone schema is the enabler for this rapid dissemination. Clearly some types of instance document extensions may require modification to the vendors applications. Recognize, however, that thevendors are free to upgrade their applications in their own time. The applications do not need to be upgraded before changes can be introduced into instance documents. At the very worst, the vendors applications will simply skip over the extensions. And, of course, those vendors do not need to upgrade in lock-step To wrap up this example suppose that several months later the cellphone community reconvenes to discuss enhancements to the schema. The new features that NOKIA first introduced into the marketplace are then officially added into the schema. Thus completes the cycle. Changes to the instance documents have driven the evolution of the schema. 22 Global versus Local Table of Contents Issue Introduction Russian Doll Design Salami Slice Design Russian Doll Design Characteristics Salami Slice Design Characteristics Venetian Blind Design Venetian Blind Design Characteristics Best Practice Issue When should an element or type be declared global versus when should it be declared local? Introduction [Recall that a component (element, complexType, or simpleType) is global if it is an immediate child of <schema>, whereas it is local if it is not an immediate child of <schema>, i.e., it is nested within another component.] What advice would you give to someone who was to ask you, In general, when should an element (or type) be declared global versus when should it be declared local? The purpose of this chapter is to provide answers to that question. Below is a snippet of an XML instance document. We will explore the different design strategies using this example. <Book> <Title>Illusions</Title> <Author>Richard Bach</Author> </Book> Russian Doll Design This design approach has the schema structure mirror the instance document structure, e.g., declare a Book element and within it declare a Title element followed by an Author element: <xsd:element name=Book> <xsd:complexType> <xsd:sequence> <xsd:element name=Title type=xsd:string/> <xsd:element name=Author type=xsd:string/> </xsd:sequence> </xsd:complexType> </element> 23 The instance document has all its components bundled together. Likewise, the schema is designed to bundle together all its element declarations. This design represents one end of the design spectrum. Salami Slice Design The Salami Slice design represents the other end of the design spectrum. With this design we disassemble the instance document into its individual components. In the schema we define each component (as an element declaration), and then assemble them together: <xsd:element name=Title type=xsd:string/> <xsd:element name=Author type=xsd:string/> <xsd:element name=Book> <xsd:complexType> <xsd:sequence> <xsd:element ref=Title/> <xsd:element ref=Author/> </xsd:sequence> </xsd:complexType> </xsd:element> Note how the schema declared each component individually (Title, and Author) and then assembled them together (by refing them) in the creation of the Book component. These two designs represent opposite ends of the design spectrum. To understand these designs it may help to think in terms of boxes, where a box represents an element or type: The Russian Doll design corresponds to having a single box, and it has nested within it boxes, which in turn have boxes nested within them, and so on. (boxes within boxes, just like a Russian Doll!) The Salami Slice design corresponds to having many separate boxes which are assembled together (separate boxes combined together, just like Salami slices brought together in a sandwich!) Lets examine the characteristics of each of the two designs. (In so doing it will yield insights into another design.) Russian Doll Design Characteristics [1] Opaque content. The content of Book is opaque to other schemas, and to other parts of the same schema. The impact of this is that none of the types or elements within Book are reusable. [2] Localized scope. The region of the schema where the Title and Author element declarations are applicable is localized to within the Book element. The impact of this is that if the schema has set elementFormDefault=unqualified then the namespaces of Title and Author are hidden (localized) within the schema. 24 [3] Compact. Everything is bundled together into a tidy, single unit. [4] Decoupled. With this design approach each component is self-contained (i.e., they dont interact with other components). Consequently, changes to the components will have limited impact. For example, if the components within Book changes it will have a limited impact since they are not coupled to components outside of Book. [5] Cohesive. With this design approach all the related data is grouped together into self- contained components, i.e., the components are cohesive. Salami Slice Design Characteristics [1] Transparent content. The components which make up Book are visible to other schemas, and to other parts of the same schema. The impact of this is that the types and elements within Book are reusable. [2] Global scope. All components have global scope. The impact of this is that, irrespective of the value of elementFormDefault, the namespaces of Title and Author will be exposed in instance documents. [3] Verbose. Everything is laid out and clearly visible. [4] Coupled. In our example we saw that the Book element depends on the Title and Author elements. If those elements were to change it would impact the Book element. Thus, this design produces a set of interconnected (coupled) components. [5] Cohesive. With this design approach all the related data is also grouped together into self- contained components. Thus, the components are cohesive. The two design approaches differ in a couple of important ways: The Russian Doll design facilitates hiding (localizing) namespace complexities. The Salami Slice design does not. The Salami Slice design facilitates component reuse. The Russian Doll design does not. Is there a design which facilitates hiding (localizing) namespace complexities, and facilitates component reuse? Yes there is! Consider the Book example again. An alternative design is to create a global type definition which nests the Title and Author element declarations within it: <xsd:complexType name=Publication> <xsd:sequence> <xsd:element name=Title type=xsd:string/> <xsd:element name=Author type=xsd:string/> </xsd:sequence> </xsd:complexType> <xsd:element name=Book type=Publication/> This design has both benefits: it is capable of hiding (localizing) the namespace complexity of Title and Author, and it has a reusable Publication type component. 25 Venetian Blind Design With this design approach we disassemble the problem into individual components, as the Salami Slice design does, but instead of creating element declarations, we create type definitions. Heres what our example looks like with this design approach: <xsd:simpleType name=Title> <xsd:restriction base=xsd:string> <xsd:enumeration value=Mr./> <xsd:enumeration value=Mrs./> <xsd:enumeration value=Dr./> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name=Name> <xsd:restriction base=xsd:string> <xsd:minLength value=1/> </xsd:restriction> </xsd:simpleType> <xsd:complexType name=Publication> <xsd:sequence> <xsd:element name=Title type=Title/> <xsd:element name=Author type=Name/> </xsd:sequence> </xsd:complexType> <xsd:element name=Book type=Publication/> This design has: maximized reuse (there are four reusable components - the Title type, the Name type, the Publication type, and the Book element) maximized the potential to hide (localize) namespaces [note how this has been phrased: maximized the potential ...Whether, in fact, the namespaces of Title and Author are hidden or exposed, is determined by the elementFormDefault switch"]. The Venetian Blind design espouses these guidelines ... Design your schema to maximize the potential for hiding (localizing) namespace complexities. Use elementFormDefault to act as a switch for controlling namespace exposure - if you want element namespaces exposed in instance documents, simply turn the elementFormDefault switch to on" (i.e, set elementFormDefault= qualified"); if you dont want element namespaces exposed in instance documents, simply turn the elementFormDefault switch to off" (i.e., set elementFormDefault=unqualified"). Design your schema to maximize reuse. Use type definitions as the main form of component reuse. Nest element declarations within type definitions. Lets compare the Venetian Blind design with the Salami Slice design. Recall our example: 26 Salami Slice Design: <xsd:element name=Title" type=xsd:string"/> <xsd:element name=Author" type=xsd:string"/> <xsd:element name=Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref=Title"/> <xsd:element ref=Author" /> </xsd:sequence> </xsd:complexType> </xsd:element> The Salami Slice design also results in creating reusable (element) components, but it has absolutely no potential for namespace hiding. However", you argue, Suppose that I want namespaces exposed in instance documents. [We have seen cases where this is desired.] So the Salami Slice design is a good approach for me. Right?" Lets think about this for a moment. What if at a later date you change your mind and wish to hide namespaces (what if your users hate seeing all those namespace qualifiers in instance documents)? You will need to redesign your schema (possibly scraping it and starting over). Better to adopt the Venetian Blind Design, which allows you to control whether namespaces are hidden or exposed by simply setting the value of elementFormDefault. No redesign of your schema is needed as you switch from exposing to hiding, or vice versa. [That said ... your particular project may need to sacrifice the ability to turn on/off namespace exposure because you require instance documents to be able to use element substitution. In such circumstances the Salami Slice design approach is the only viable alternative.] Here are the characteristics of the Venetian Blind Design. Venetian Blind Design Characteristics: [1] Maximum reuse. The primary component of reuse are type definitions. [2] Maximum namespace hiding. Element declarations are nested within types, thus maximizing the potential for namespace hiding. [3] Easy exposure switching. Whether namespaces are hidden (localized) in the schema or exposed in instance documents is controlled by the elementFormDefault switch. [4] Coupled. This design generates a set of components which are interconnected (i.e., dependent). [5] Cohesive. As with the other designs, the components group together related data. Thus, the components are cohesive. 27 Best Practice [1] The Venetian Blind design is the one to choose where your schemas require the flexibility to turn namespace exposure on or off with a simple switch, and where component reuse is important. [2] Where your task requires that you make available to instance document authors the option to use element substitution, then use the Salami Slice design. [3] Where mimimizing size and coupling of components is of utmost concern then use the Russian Doll design. 12 Hide (Localize) Namespaces Versus Expose Namespaces Table of Contents Issue Introduction Example Technical Requirements for Hiding (Localizing) Namespaces Best Practice Issue When should a schema be designed to hide (localize) within the schema the namespaces of the elements and attributes it is using, versus when should it be designed to expose the namespaces in instance documents? Introduction A typical schema will reuse elements and types from multiple schemas, each with different namespaces. <xsd: schema t ar get Namespace=A> A. xsd <xsd: schema t ar get Namespace=B> B. xsd <xsd: schema t ar get Namespace=" C" > <xsd: i mpor t namespace=" A" schemaLocat i on=" A. xsd" / > <xsd: i mpor t namespace=" B" schemaLocat i on=" B. xsd" / >
</ xsd: schema> C. xsd 13 A schema, then, may be comprised of components from multiple namespaces. Thus, when a schema is designed the schema designer must decide whether or not the origin (namespace) of each element should be exposed in the instance documents. The namespaces of the components are not visible in the instance documents. <myDoc schemaLocat i on=C C. xsd> </ myDoc> Instance Document <myDoc schemaLocat i on=C C. xsd> </ myDoc> The namespaces of the components are visible in the instance documents. Instance Document 14 A binary switch attribute in the schema is used to control the hiding/exposure of namespaces: by setting elementFormDefault=unqualified the namespaces will be hidden (localized) within the schema, and by setting elementFormDefault=qualified the namespaces will be exposed in instance documents. elementFormDefault - the Exposure Switch hide expose el ement For mDef aul t <xsd: schema el ement For mDef aul t =qualified> <xsd: schema el ement For mDef aul t =unqualified> vs Schema
Example: Camer a. xsd Ni kon. xsd Ol ympus. xsd Pent ax. xsd Below is a schema for describing a camera. The camera schema reuses components from other schemas - the cameras <body> element reuses a type from the Nikon schema, the cameras <lens> element reuses a type from the Olympus schema, and the cameras <manual_adaptor> element reuses a type from the Pentax schema. 15 Camera.xsd <?xml ver si on=" 1. 0" ?> <xsd: schema xml ns: xsd=" ht t p: / / www. w3. or g/ 2001/ XMLSchema" t ar get Namespace=" ht t p: / / www. camer a. or g" xml ns: ni kon=" ht t p: / / www. ni kon. com" xml ns: ol ympus=" ht t p: / / www. ol ympus. com" xml ns: pent ax=" ht t p: / / www. pent ax. com" elementFormDefault="unqualified"> <xsd:import namespace="http://www.nikon.com" schemaLocation="Nikon.xsd"/> <xsd:import namespace="http://www.olympus.com" schemaLocation="Olympus.xsd"/> <xsd:import namespace="http://www.pentax.com" schemaLocation="Pentax.xsd"/> <xsd: el ement name=" camer a" > <xsd: compl exType> <xsd: sequence> <xsd: el ement name=" body" type="nikon:body_type"/ > <xsd: el ement name=" l ens" type="olympus:lens_type"/ > <xsd: el ement name=" manual _ adapt er " type="pentax:manual_adapter_type"/ > </ xsd: sequence> </ xsd: compl exType> </ xsd: el ement > </ xsd: schema> This schema is designed to hide namespaces
Note the three <import> elements for importing the Nikon, Olympus, and Pentax components. Also note that the <schema> attribute, elementFormDefault has been set to the value of unqualified. This is a critical attribute. Its value controls whether the namespaces of the elements being used by the schema will be hidden or exposed in instance documents (thus, it behaves like a switch turning namespace exposure on/off). Because it has been set to unqualified in this schema, the namespaces will be remain hidden (localized) within the schema, and will not be visible in instance documents, as we see here: 16 Camera.xml (namespaces hidden) <?xml ver si on=" 1. 0"?> <my: camer a xml ns: my=" ht t p: / / www. camer a. or g" xml ns: xsi ="ht t p: / / www. w3. or g/ 2001/ XMLSchema- i nst ance" xsi : schemaLocat i on= "ht t p: / / www. camer a. or g Camer a. xsd"> <body> <descr i pt i on>Er gonomi cal l y desi gned casi ng f or easy handl i ng</ descr i pt i o </ body> <l ens> <zoom>300mm</ zoom> <f - st op>1. 2</ f - st op> </ l ens> <manual _adapt er > <speed>1/ 10, 000 sec t o 100 sec</ speed> </ manual _adapt er > </ my: camer a> Instance document with namespaces hidden (localized) within the schema <body> <descr i pt i on>Er gonomi cal l y desi gned casi ng f or easy handl i ng</ descr i pt i on> </ body> <l ens> <zoom>300mm</ zoom> <f - st op>1. 2</ f - st op> </ l ens> <manual _adapt er > <speed>1/ 10, 000 sec t o 100 sec</ speed> </ manual _adapt er > Instance document with namespaces hidden (localized) in the schema. --> The fact that the <descr i pt i on>element comes from the Nikon schema, the <zoom>and <f - st op>elements come from the Olympus schema, and the <speed>element comes from the Pentax schema is totally transparent to the instance document. Instance Document The only namespace qualifier exposed in the instance document is on the <camera> root element. The rest of the document is completely free of namespace qualifiers. The Nikon, Olympus, and Pentax namespaces are completely hidden (localized) within the schema! Looking at the instance document one would never realize that the schema got its components from three other schemas. Such complexities are localized to the schema. Thus, we say that the schema has been designed in such a fashion that its component namespace complexities are hidden from the instance document. On the other hand, if the above schema had set elementFormDefault=qualified then the namespace of each element would be exposed in instance documents. Heres what the instance document would look like: 17 Camera.xml (namespaces exposed) <?xml ver si on=" 1. 0"?> <c: camer a xml ns: c="ht t p: / / www. camer a. or g" xml ns: ni kon="ht t p: / / www. ni kon. com" xml ns: ol ympus="ht t p: / / www. ol ympus. com" xml ns: pent ax="ht t p: / / www. pent ax. com" xml ns: xsi =" ht t p: / / www. w3. or g/ 2001/ XMLSchema- i nst ance" xsi : schemaLocat i on= "ht t p: / / www. camer a. or g Camer a. xsd> <c: body> <ni kon: descr i pt i on>Er gonomi cal l y desi gned casi ng f or easy handl i ng</ ni kon: descr i pt i on> </ c: body> <c: l ens> <ol ympus: zoom>300mm</ ol ympus: zoom> <ol ympus: f - st op>1. 2</ ol ympus: f - st op> </ c: l ens> <c: manual _adapt er > <pent ax: speed>1/ 10, 000 sec t o 100 sec</ pent ax: speed> </ c: manual _adapt er > </ c: camer a> Instance document with namespaces exposed <ni kon: descr i pt i on>Er gonomi cal l y desi gned casi ng f or easy handl i ng</ ni kon: descr i pt i on> <ol ympus: zoom>300mm</ ol ympus: zoom> <ol ympus: f - st op>1. 2</ ol ympus: f - st op> <pent ax: speed>1/ 10, 000 sec t o 100 sec</ pent ax: speed> Instance Document Note that each element is explicitly namespace-qualified. Also, observe the declaration for each namespace. Due to the way the schema has been designed, the complexities of where the schema obtained its components have been pushed out to the instance document. Thus, the reader of this instance document is exposed to the fact that the schema obtained the description element from the Nikon schema, the zoom and f-stop elements from the Olympus schemas, and the speed element from the Pentax schema. All Schemas must have a Consistent Value for elementFormDefault! Be sure to note that elementFormDefault applies just to the schema that it is in. It does not apply to schemas that it includes or imports. Consequently, if you want to hide namespaces then all schemas involved must have set elementFormDefault=unqualified. Likewise, if you want to expose namespaces then all schemas involved must have set elementFormDefault=qualified. To see what happens when you mix elementFormDefault values, lets suppose that Camera.xsd and Olympus.xsd have both set in their schema elementFormDefault=unqualified, whereas Nikon.xsd and Pentax.xsd have both set elementFormDefault=qualified. 18 Ni kon. xsd el ement For mDef aul t =" qual i f i ed" Ol ympus. xsd el ement For mDef aul t =" unqual i f i ed" Pent ax. xsd el ement For mDef aul t =" qual i f i ed" Camer a. xsd el ement For mDef aul t =" unqual i f i ed" Heres what an instance document looks like with this mixed design: Camera.xml (mixed design) Hiding/exposure mix: This instance document has the Nikon and Pentax namespaces exposed, while the Camera and Olympus namespaces are hidden. Instance Document <?xml ver si on=" 1. 0"?> <my: camer a xml ns: my=" ht t p: / / www. camer a. or g" xml ns: ni kon="ht t p: / / www. ni kon. com" xml ns: pent ax="ht t p: / / www. pent ax. com" xml ns: xsi ="ht t p: / / www. w3. or g/ 2001/ XMLSchema- i nst ance" xsi : schemaLocat i on= "ht t p: / / www. camer a. or g Camer a. xsd> <body> <ni kon: descr i pt i on>Er gonomi cal l y desi gned casi ng f or easy handl i ng</ ni kon: descr i pt i on> </ body> <l ens> <ol ympus: zoom>300mm</ ol ympus: zoom> <ol ympus: f - st op>1. 2</ ol ympus: f - st op> </ l ens> <manual _adapt er > <pent ax: speed>1/ 10, 000 sec t o 100 sec</ pent ax: speed> </ manual _adapt er > </ my: camer a> <ol ympus: zoom>300mm</ ol ympus: zoom> <ol ympus: f - st op>1. 2</ ol ympus: f - st op> Instance document with namespaces hidden (localized) within the schema <body> <descr i pt i on>Er gonomi cal l y desi gned casi ng f or easy handl i ng</ descr i pt i on> </ body> <l ens> <zoom>300mm</ zoom> <f - st op>1. 2</ f - st op> </ l ens> <manual _adapt er > <speed>1/ 10, 000 sec t o 100 sec</ speed> </ manual _adapt er > <ni kon: descr i pt i on>Er gonomi cal l y desi gned casi ng f or easy handl i ng</ ni kon: descr i pt i on> <pent ax: speed>1/ 10, 000 sec t o 100 sec</ pent ax: speed> 19 Observe that in this instance document some of the elements are namespace-qualified, while others are not. Namely, those elements from the Camera and Olympus schemas are not qualified, whereas the elements from the Nikon and Pentax schemas are qualified. Technical Requirements for Hiding (Localizing) Namespaces There are two requirements on an element for its namespace to be hidden from instance documents: [1] The value of elementFormDefault must be unqualified. [2] The element must not be globally declared. For example: <?xml version=1.0?> <xsd:schema ...> <xsd:element name=foo> ... </xsd:schema ...> The element foo can never have its namespace hidden from instance documents, regardless of the value of elementFormDefault. foo is a global element (i.e., an immediate child of <schema>) and therefore must always be qualified. To enable namespace hiding the element must be a local element. Best Practice For this issue there is no definitive Best Practice with respect to whether to design your schemas to hide/localize namespaces, or design it to expose namespaces. Sometimes its best to hide the namespaces. Othertimes its best to expose the namespaces. Both have their pluses and minus, as is discussed below. However, there are Best Practices with regards to other aspects of this issue. They are: 1. Whenever you create a schema, make two copies of it. The copies should be identical, except that in one copy set elementFormDefault=qualified, whereas in the other copy set elementFormDefault=unqualified. If you make two versions of all your schemas then people who use your schemas will be able to implement either design approach - hide (localize) namespaces, or expose namespaces. 2. Minimize the use of global elements and attributes so that elementFormDefault can behave as an exposure switch. The rationale for this was described above, in Technical Requirements for Hiding (Localizing) Namespaces Advantages of Hiding (Localizing) Component Namespaces within the Schema The instance document is simple. Its easy to read and understand. There are no namespace qualifiers cluttering up the document, except for the one on the document element (which is okay because it shows the domain of the document). The knowledge of where the schema got its components is irrelevant and localized to the schema. 20 Design your schema to hide (localize) namespaces within the schema ... when simplicity, readability, and understandability of instance documents is of utmost importance when namespaces in the instance document provide no necessary additional information. In many scenarios the users of the instance documents are not XML-experts. Namespaces would distract and confuse such users, where they are just concerned about structure and content. when you need the flexibility of being able to change the schema without impact to instance documents. To see this, imagine that when a schema is originally designed it imports elements/ types from another namespace. Since the schema has been designed to hide (localize) the namespaces, instance documents do not see the namespaces of the imported elements. Then, imagine that at a later date the schema is changed such that instead of importing the elements/ types, those elements and types are declared/defined right within the schema (inline). This change from using elements/types from another namespace to using elements/types in the local namespace has no impact to instance documents because the schema has been designed to shield instance documents from where the components come from. Advantages of Exposing Namespaces in Instance Documents If your company spends the time and money to create a reusable schema component, and makes it available to the marketplace, then you will most likely want recognition for that component. Namespaces provide a means to achieve recognition. For example, <nikon:description> Ergonomically designed casing for easy handling</ nikon:description> There can be no misunderstanding that this component comes from Nikon. The namespace qualifier is providing information on the origin/lineage of the description element. Another case where it is desirable to expose namespaces is when processing instance documents. Oftentimes when processing instance documents the namespace is required to determine how an element is to be processed (e.g., if the element comes from this namespace then well process it in this fashion, if it comes from this other namespace then well process it in a different fashion). If the namespaces are hidden then your application is forced to do a lookup in the schema for every element. This will be unacceptably slow. Design your schema to expose namespaces in instance documents ... when lineage/ownership of the elements are important to the instance document users (such as for copyright purposes). when there are multiple elements with the same name but different semantics then you may want to namespace-qualify them so that they can be differentiated (e.g, publisher:body versus human:body). [In some cases you have multiple elements with the same name and different semantics but the context of the element is sufficient to determine its semantics. Example: the title element in <person><title> is easily distinguished from the title element in <chapter><title>. In such cases there is less justification for designing your schema to expose the namespaces.] 21 when processing (by an application) of the instance document elements is dependent upon knowledge of the namespaces of the elements. Note about elementFormDefault and xpath Expressions We have seen how to design your schema so that elementFormDefault acts as an exposure switch. Simply change the value of elementFormDefault and it dictates whether or not elements are qualified in instance documents. In general, no other changes are needed in the schema other than changing the value of elementFormDefault. However, if your schema is using <key> or <unique> then you will need to make modifications to the xpath expressions when you change the value of elementFormDefault. If elementFormDefault=qualified then you must qualify all the references in the xpath expression. Example: <xsd:key name=PK> <xsd:selector xpath=c:Camera/c:lens> <xsd:field xpath=c:zoom> </xsd:key> Note that each element in the xpath expression is namespace-qualified. If elementFormDefault=unqualified then you must NOT qualify the references in the xpath expression. Example: <xsd:key name=PK> <xsd:selector xpath=Camera/lens> <xsd:field xpath=zoom> </xsd:key> Note that none of the elements in the xpath expressions are namespace-qualified. So, as you switch between exposing and hiding namespaces you will need to take the xpath changes into account. 84 Achieving Maximum Dynamic Capability in your Schemas Too often schemas are designed in a static, fixed, rigid fashion. Everything is hardcoded when the schema is designed. There is no variability. This is not reflective of nature. Nature constantly changes and evolves. Nothing is fixed. As a general rule of thumb: more dynamic capability = better schema Definition of Dynamic: the ability of a schema to change at run-time (i.e., schema validation time). Contrast this with rigid, fixed, static schemas where everything is predetermined and unchanging. Limiting Dynamic Capability 1. Hardcoding a collection of components to a namespace. When you bind a schema to a targetNamespace you are rigidly fixing the components in that schema to a fixed semantics (in as much as a targetNamespace gives semantics to a schema). 2. Hardcoding a reference to a type to the implementation of that type. When you specify in <import> a value for schemaLocation you are rigidly fixing the identity of the schema to implement a type. Achieving Maximum Dynamic Capability The key to achieving dynamic schemas is to postpone decisions as long as possible. Here are some ways to do that. 1. Dont hardcode a schema to a targetNamespace. That is, create no-namespace schemas. Let the application which uses the schema decide on a targetNamespace that is appropriate for the application. Thus we postpone binding a schema to a targetNamespace as long as possible -> until application-use-time. Also, the using-application can <redefine> components in the schema. This is making the schema dynamic/morphable. It is not fixed to one namespace (semantics). See the discussion on Zero One Or Many Namespaces for more info. 2. Dont hardcode the identity of an <import>ed schema. Example, suppose that you declare an element to have a type from another namespace, e.g., <xsd:element name=sensor type=s:sensor_type/> Observe that sensor_type is from another namespace. Thus, this schema will need to do an <import>. Normally we see <import> elements with two attributes - namespace and schemaLocation. However, schemaLocation is actually optional. When you do specify 85 schemaLocation then you are rigidly fixing the identity of a schema which is to provide an implementation for sensor_type. We can make things a lot more dynamic by not specifying schemaLocation. Instead, let the instance document author identify a schema that implements sensor_type. This creates a very dynamic schema. The type of the sensor element is not fixed, static. Thus we postpone binding the type reference (type=s:sensor_type) to an implementation of the type as long as possible -> until schema-validation-time. See the discussion on Dangling Types for more info. 1 XML Namespace Name: URN or URL? Issue Is it better to formulate an XML Schema namespace as a URN or a URL? Example: urn:publishing:book versus http://www.publishing.com/book What is an XML Schema Namespace Name? - Namespace names are unique values. - Namespace names are just labels. - There is no requirement (or expectation) to resolve the namespace to an online resource. - The XML Schema Part 0: Primer (http://www.w3.org/TR/xmlschema-0) states that target namespaces enable us to distinguish between definitions and declarations from different vocabularies. What is a Uniform Resource Identifier (URI)? - URI Generic Syntax (RFC 2396 http://www.ietf.org/rfc/rfc2396.txt ) defines the following: - Identifier: An identifier is an object that can act as a reference to something that has identify. - A URI can be further classified as a locator, a name, or both. - The term "Uniform Resource Locator" (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource. - The term "Uniform Resource Name" (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable. The Case for URN - URNs are easier to conceptualize as a name and not a location. And since namespaces are intended to uniquely identify something, not locate something, one could argue this is a better marriage. - Users do not expect URNs to locate an entity/resource as they do with URLs. - Many tool vendors automatically convert URLs to hyperlinks (i.e., turn it blue and make it clickable), which incorrectly implies that a URL formatted namespace name is a location. 2 The Case for URL - URLs are integral to the World Wide Web (www). With a URL, there is potentially a resource as well. That resource could contain documentation (a schema, pointers to other schemas, etc.). If in the future the W3C decides to have a namespace name point to resource, the appropriate syntax will already be in use and namespace names will not have to change. - The URL syntax is familiar and memorable to www users. - URL schema names are already managed. (See http://www.w3.org/Addressing/ for more information.) Therefore, it would be easier to ensure namespace names are unique. In other words, with URLs it would be difficult to have two organizations with identical namespace names. - One could come up with a namespace scheme that would eliminate the current confusion about the namespace URI being a location. For example, the namespace name could be prefaced with something like namespace:// or xmlns:// or ns://. Best Practice Whether to use a URN or a URL for an XML Schema namespace name is predominately a personal preference. However, there seems to be a slight preference for using a URL because it provides the opportunity for pointing to something (e.g., a Resource Directory Description Language (RDDL) document) in the future. 45 Creating Variable Content Container Elements Table of Contents Issue Introduction Example Method 1: Implementing variable content containers using an abstract element and element substitution Method 2: Implementing variable content containers using a <choice> element Method 3: Implementing variable content containers using an abstract type and type substitution Method 4: Implementing variable content containers using a dangling type Best Practice Issue What is the Best Practice for implementing a container element that is to be comprised of variable content? Introduction A typical problem when creating an XML Schema is to design a container element (e.g., Catalogue) which is to be comprised of variable content (e.g., Book, or Magazine, or ...) <Cat al ogue> - var i abl e cont ent sect i on - </ Cat al ogue> <Book> or <Magazi ne> or . . . Catalogue is called a variable content container Some things to consider: Do we allow the elements in the variable content container to come from disjoint sources, i.e., do we allow the container element to contain dissimilar, independent, loosely coupled elements? How do we design the variable content container so that the kinds of elements which it may contain can grow over time, i.e., how do we design an extensible variable content container? 46 Example Throughout this discussion we will consider variable content containers (e.g., <Catalogue>) which are comprised of a collection of elements, where each element is variable. Heres an example of a <Catalogue> container element comprised of two different kinds of elements: <Catalogue> <Book> ... </Book> <Magazine> ... </Magazine> <Book> ... </Book> </Catalogue> Below are four methods for implementing variable content containers. Method 1: Implementing variable content containers using an abstract element and element substitution Description: There are five XML Schema concepts that must be understood for implementing this method: an element can be declared abstract. abstract elements cannot be instantiated in instance documents (they are only placeholders). in instance documents the abstract element must be substituted by non-abstract (i.e., concrete) elements which have been declared to be in a substitutionGroup with the abstract element. elements may be declared to be in a substitutionGroup with the abstract element iff their type is the same as, or derives from the abstract elements type. the abstract element and all elements in its substitutionGroup must be declared as global elements. <Cat al ogue> - var i abl e cont ent sect i on </ Cat al ogue> Publ i cat i on ( abst r act ) <Book> <Magazi ne> substitutionGroup "substitutable for" "substitutable for" 47 Implementation: Declare an abstract element (Publication): <xsd:element name=Publication abstract=true type=PublicationType/> Declare a variable content container element (Catalogue) to have as its content the abstract element (ref to the abstract element declaration): <xsd:element name=Catalogue> <xsd:complexType> <xsd:sequence> <xsd:element ref=Publication maxOccurs=unbounded/> </xsd:sequence> </xsd:complexType> </xsd:element> Note that maxOccurs=unbounded, so Catalogue may contain a collection (one or more) of Publication elements. Declare the concrete elements (Book and Magazine) that are to be the contents of the variable content container and declare them to be in a substitutionGroup with the abstract element: <xsd:element name=Book substitutionGroup=Publication type=BookType/> <xsd:element name=Magazine substitutionGroup=Publication type=MagazineType/> In order for Book and Magazine to substitute for Publication, their types (BookType and MagazineType) must derive from Publications type (PublicationType). Publ i cat i onType BookType Magazi neType 48 Here are the type definitions: PublicationType - the base type: <xsd:complexType name=PublicationType> <xsd:sequence> <xsd:element name=Title type=xsd:string/> <xsd:element name=Author type=xsd:string minOccurs=0 maxOccurs=unbounded/> <xsd:element name=Date type=xsd:gYear/> </xsd:sequence> </xsd:complexType> BookType - extends PublicationType by adding two new elements, ISBN and Publisher: <xsd:complexType name=BookType> <xsd:complexContent> <xsd:extension base=PublicationType> <xsd:sequence> <xsd:element name=ISBN type=xsd:string/> <xsd:element name=Publisher type=xsd:string/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> MagazineType - restricts PublicationType by striking out the Author element: <xsd:complexType name=MagazineType> <xsd:complexContent> <xsd:restriction base=PublicationType> <xsd:sequence> <xsd:element name=Title type=xsd:string/> <xsd:element name=Author type=xsd:string minOccurs=0 maxOccurs=0/> <xsd:element name=Date type=xsd:gYear/> </xsd:sequence> </xsd:restriction> </xsd:complexContent> </xsd:complexType> The following page shows what an instance document looks like with this method: 49 <?xml version="1.0"?> <Catalogue xmlns="http://www.catalogue.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.catalogue.org Catalogue.xsd"> <Book> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Book> <Magazine> <Title>Natural Health</Title> <Date>1999</Date> </Magazine> <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> </Book> </Catalogue> Advantages: Extensible: This method allows you to extend the set of elements that may be used in the variable content container element, even if the schema for the variable content container element is outside your control. For example, suppose that you do not have privilege to modify the above Catalogue schema. Currently, the Catalogue element can only contain Book and Magazine elements. But suppose that your application has a hard requirement for CD elements as well: <Catalogue> <Book> ... </Book> <CD> ... </CD> <Magazine> ... </Magazine> <Book> ... </Book> </Catalogue> 50 How can you extend the set of elements that Catalogue may be comprised of, without modifying its schema? Answer: You can create your own separate schema which contains a declaration of CD (with a type, CDType, that extends the PublicationType in the Catalogue schema), and declares CD to be in the Publication substitutionGroup: <xsd:include schemaLocation=Catalogue.xsd/> <xsd:complexType name=CDType> <xsd:complexContent> <xsd:extension base=PublicationType> <xsd:sequence> <xsd:element name=RecordingCompany type=xsd:string/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name=CD substitutionGroup=Publication type=CDType/> The CD element meets the requirements for being in the variable content container: its type (CDType) derives from the PublicationType, and it is a member of the Publication elements substitutionGroup. Book, Magazine, and CD may now be used within the Catalogue element, e.g., <?xml version="1.0"?> <Catalogue xmlns="http://www.catalogue.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.catalogue.org CD.xsd"> <Book> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Book> <CD> <Title>Timeless Serenity</Title> <Author>Dyveke Spino</Author> <Date>1984</Date> <RecordingCompany>Dyveke Spino Productions</RecordingCompany> </CD> ... </Catalogue> 51 Thus, we see that this method allows us to extend the set of elements that may be used in the Catalogue element, without modifying its schema. Nice! Semantic Cohesion: the elements in the variable content container all descend from the same type hierarchy (PublicationType). This type hierarchy binds them together, giving a structural (and, by implication, semantic) coherence to all the elements that may be in the variable content container. Disadvantages: No Independent Elements: The type of the elements that are to be used in the variable content container must all descend from the abstract elements type (PublicationType). Further, the elements must be in a substitutionGroup with the abstract element. Thus, the variable content container cannot contain elements whose type does not derive from the abstract elements type, or is not in the substitutionGroup with the abstract element - as would typically be the case with independently developed elements. For example, suppose another schema author creates a Newspaper element, with a type that does not descend from PublicationType. <Catalogue> would not be able to contain the <Newspaper> element. Limited Structural Variability: Over time a schema will evolve, and the kinds of elements which may occur in the variable content container will typically grow. There is no way to know apriori in what direction it will grow. The new elements may be conceptually related but structurally vastly different from the original set of elements. The abstract elements type (e.g., PublicationType) may have been a good base type for the original set of elements which were all structurally related, but may not be a good base type for the new elements which have vastly different structures. So you are faced with a tradeoff: create a simple base type to support lots of different structures (but then you can make less assumptions about the structure of the members), or create a rich base type to support strong data type checking (but then you reduce the ability to add elements with radically different types) Nonscalable Processing: Processing a collection of differently named elements requires a lot of special-case code. For example, consider a stylesheet to process each element in <Catalogue>: <xsl:if test=Book> -- process Book -- </xsl:if> <xsl:if test=Magazine> -- process Magazine -- </xsl:if> This stylesheet snippet suffers from lack of scalability, i.e., it breaks as soon as a new element is added. 52 This argument needs some qualification. If the contents of <Catalogue> are just elements that substitute for the abstract Publication element, then each element can be uniformly processed, as follows: <xsl:for-each select=Catalogue/*> -- process the element -- </xsl:for-each> This stylesheet snippet processes each element within Catalogue, regardless of the element name. Obviously, this is scalable, and does not break when a new element is added. Processing becomes non-scalable when Catalogue contains multiple abstract elements: <xsd:element name=Catalogue> <xsd:complexType> <xsd:sequence> <xsd:element ref=Publication maxOccurs=unbounded/> <xsd:element ref=Retailer maxOccurs=unbounded/> </xsd:sequence> </xsd:complexType> </xsd:element> Suppose that both Publication and Retailer are abstract elements, and there can be any number of each kind of element within Catalogue. Heres a sample instance: <Catalogue> <Book> ... </Book> <Magazine> ... </Magazine> <Book> ... </Book> <MarketBasket> ... </MarketBasket> <Macys> ... </Macys> </Catalogue> If you wish to process just the Publication elements (e.g., Book, Magazine) then you will need to write special-case code, as shown above. This is not scalable. Every time a new element is added into the collection of elements that may substitute for the Publication element then your code will have to be updated. This is costly. No Control over Namespace Exposure: This method requires that the elements which may be used in the variable content container be in a substitutionGroup with the abstract element (e.g., Book and Magazine must be in a substitutionGroup with Publication). A requirement of using substitionGroup is that all elements must be declared globally. The namespace of global elements can never be hidden in instance documents. As a consequence, there is no way to hide (localize) the namespaces of the elements used in the variable content container. This fails the Best Practice rule which states that you should design your schema to be able to hide or expose namespaces at your discretion (using elementFormDefault as an exposure switch). (See the chapter titled Hide (Localize) Versus Expose Namespaces) 53 Method 2: Implementing variable content containers using a <choice> element Description: This method is quite straightforward - simply list within a <choice> element all the elements which can appear in the variable content container, and embed the <choice> element in the container element. <Cat al ogue> - var i abl e cont ent sect i on </ Cat al ogue> <choice> <element name="Book" /> <element name="Magazine" /> </choice> Implementation: Declare within a <choice> element all the elements (e.g., Book, Magazine) that may be used in the variable content container. Embed the <choice> element within the container element (Catalogue): <element name=Catalogue> <complexType> <choice maxOccurs=unbounded> <element name=Book type=BookType/> <element name=Magazine type=MagazineType/> </choice> </complexType> </element> Advantages: Independent Elements: The elements in the variable content container do not need a common type ancestry. They dont have to be related in any way. Thus, the variable content container can contain dissimilar, independent, loosely coupled elements. Disadvantages: Nonextensible: Suppose that the Catalogue schema is outside your control. Currently the variable content container only supports Book and Magazine. Suppose that you have a hard requirement for your instance documents to use CD as well as Book and Magazine within Catalogue, e.g., <Catalogue> <Book> ... </Book> <CD> ... </CD> <Magazine> ... </Magazine> <Book> ... </Book> </Catalogue> This method requires that the <choice> element in the Catalogue schema be modified to include the CD element. However, we stipulated that the Catalogue schema is outside your control, so it cannot be modified. This method has serious extensibility restrictions! 54 No Semantic Coherence: The <choice> element allows you to group together dissimilar elements. While that has been touted as an advantage, it is really a double-edged sword. The elements in the variable content container have no type hierarchy to bind them together, to provide structural (and, by implication, semantic) coherence among the elements. Thus, when processing an instance document you can make no assumptions about the structure of the elements. Method 3: Implementing variable content containers using an abstract type and type substitution Description: There are three XML Schema concepts that must be understood for implementing this method: a complexType can be declared abstract. an element declared to be of an abstract type cannot have its type instantiated in instance documents (that is, the element can be instantiated, but its abstract content may not). in instance documents an element with an abstract type must have its content substituted by content from a non-abstract (concrete) type which derives from the abstract type. This is called type substitution. <Cat al ogue> <Publ i cat i on xsi : t ype=" "> - var i abl e cont ent sect i on </ Publ i cat i on> </ Cat al ogue> Publ i cat i onType ( abst r act ) BookType MagazineType Implementation: Define an abstract base type (PublicationType): <xsd:complexType name=PublicationType abstract=true> <xsd:sequence> <xsd:element name=Title type=xsd:string/> <xsd:element name=Author type=xsd:string minOccurs=0 maxOccurs=unbounded/> <xsd:element name=Date type=xsd:gYear/> </xsd:sequence> </xsd:complexType> 55 Declare the container element (Catalogue) to contain an element (Publication), which is of the abstract type: <xsd:element name=Catalogue> <xsd:complexType> <xsd:sequence> <xsd:element name=Publication type=PublicationType maxOccurs=unbounded/> </xsd:sequence> </xsd:complexType> </xsd:element> In instance documents, the content of <Publication> can only be of a concrete type which derives from PublicationType, such as BookType or MagazineType (we saw these type definitions in Method 1 above). With this method instance documents will look different than we saw with the above two methods. Namely, <Catalogue> will not contain variable content. Instead, it will always contain the same element (Publication). However, that element will contain variable content: <?xml version="1.0"?> <Catalogue xmlns="http://www.catalogue.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.catalogue.org Catalogue.xsd"> <Publication xsi:type="BookType"> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Publication> <Publication xsi:type="MagazineType"> <Title>Natural Health</Title> <Date>1999</Date> </Publication> ... </Catalogue> Advantages: Extensible: Same extensibility benefits as method 1. Namely, this method allows you to easily extend the set of elements that may be used in the variable content container simply by creating new types which derive from the abstract type, e.g., 56 <include schemaLocation="Catalogue.xsd"/> <complexType name="CDType"> <complexContent> <extension base="PublicationType" > <sequence> <element name="RecordingCompany" type="string"/> </sequence> </extension> </complexContent> </complexType> CD.xsd Now the content of <Publication> may be BookType, or MagazineType, or CDType We have extended the Catalogue schema without modifying it! Heres an example instance document with the new CD element: <?xml version="1.0"?> <Catalogue xmlns="http://www.catalogue.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.catalogue.org CD.xsd"> <Publication xsi:type="BookType"> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Publication> <Publication xsi:type="CDType"> <Title>Timeless Serenity</Title> <Author>Dyveke Spino</Author> <Date>1984</Date> <RecordingCompany>Dyveke Spino Productions</RecordingCompany> </Publication> ... </Catalogue> Minimal Dependencies: This method has less dependencies (coupling) than method 1. To extend the collection of elements that may appear in a variable content container using method 1 you need access to both the abstract element (Publication) and its type (PublicationType). With method 3 you only need access to the abstract type. If we assume that in a typical scenario only the types will be put in publicly accessible schemas, then method 3 is the only viable method. 57 Scalable Processing: Processing a series of <Publication> elements is scalable. For example, a stylesheet could process each publication element as follows: <xsl:for-each select=Publication> -- do something -- </xsl:for-each> As new types are created (e.g., CDType) no change is needed to the code. Semantic Cohesion: the elements in the variable content container all descend from the same type hierarchy. This type hierarchy binds them together, giving a structural (and, by implication, semantic) coherence among the elements. Control over Namespace Exposure: the variable part of the variable content container are the element declarations that are embedded within type definitions. Consequently, we can control exposure of the namespaces of the variable content container elements. This is consistent with the Best Practice design recommendation we issued for hide (localize) versus expose namespaces. (See the chapter titled Hide (Localize) Versus Expose Namespaces) Disadvantages: No Independent Elements: Same weakness as with method 1. All types must descend from an abstract type. This requirement prohibits the use of types which do not descend from the abstract type, as would typically be the situation when the type is in another, independently developed schema. Limited Structural Variability: Same weakness as with method 1. Namely, to facilitate strong type checking you want to have a rich base type, but this is in direct conflict with the desire for components with vastly different structures, which calls for a weak base type. Method 4: Implementing variable content containers using a dangling type Motivation: Thus far our variable content container has contained complex content (i.e., child elements). Suppose that we want to create a variable content container to hold simple content? None of the previous methods can be used. We need a method that allows us to create simpleType variable content containers. There is one key XML Schema concept that must be understood for implementing this method: with an <import> element the schemaLocation attribute is optional Description: Lets take an example. Suppose that we desire an element, sensor, which contains the name of a weather station sensor. For example: <sensor>Barometric Pressure</sensor> 58 There are several things to note: 1. This element holds a simpleType 2. Each weather station may have sensors that are unique to it. Consequently, we must design our schema so that the sensor element can be customized by each weather station Heres an elegant design for making the contents of <sensor> customizable by each weather station: Implementation: Lets go through the design, step by step. In your schema, declare the sensor element: <xsd:element name=sensor type=s:sensor_type/> Note that the sensor element is declared to have a type sensor_type, which is in a different namespace - the sensor namespace: xmlns:s=http://www.sensor.org Now heres the key - when you <import> this namespace, dont provide a value for schemaLocation! (In an import element schemaLocation is optional.) For example: <xsd:import namespace=http://www.sensor.org/> The instance document must then identify a schema that implements sensor_type. Thus, at run time (i.e., validation time) we are matching up the reference to sensor_type with an implementation of sensor_type. For example, an instance document may have this: xsi:schemaLocation= http://www.weather-station.org weather-station.xsd http://www.sensor.org boston-sensors.xsd In this instance document schemaLocation is identifying a schema, boston-sensors.xsd, which is to provide the implementation of sensor_type. Lets take a look at the schemas and instance documents for the weather station sensor example we have been considering. Heres the main schema, which contains the dangling type: 59 weather-station.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.weather-station.org" xmlns="http://www.weather-station.org" xmlns:s="http://www.sensor.org" elementFormDefault="qualified"> <xsd:import namespace="http://www.sensor.org"/> <xsd:element name="weather-station"> <xsd:complexType> <xsd:sequence> <xsd:element name="sensor" type="s:sensor_type" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> An import with no schemaLocation! Note that the <import> element does not have a schemaLocation attribute to identify a particular schema which implements sensor_type. (Stated differently, this schema does not hardcode in the identity of the schema which is to provide the implementation of sensor_type.) The schema validator will resolve the reference to sensor_type based upon the collection of schemas that is provided to it in the instance document. The Boston weather station creates a schema which implements sensor_type: boston-sensors.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.sensor.org" xmlns="http://www.sensor.org" elementFormDefault="qualified"> <xsd:simpleType name="sensor_type"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="barometer"/> <xsd:enumeration value="thermometer"/> <xsd:enumeration value="anenometer"/> </xsd:restriction> </xsd:simpleType> </xsd:schema> This schema provides an implementation for the dangling type, sensor_type. 60 Now an instance document can conform to weather-station.xsd and use boston-sensors.xsd as the implementation of sensor_type: boston-weather-station.xml <?xml version="1.0"?> <weather-station xmlns="http://www.weather-station.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.weather-station.org weather-station.xsd http://www.sensor.org boston-sensors.xsd"> <sensor>thermometer</sensor> <sensor>barometer</sensor> <sensor>anenometer</sensor> </weather-station> In the instance document we provide a schema which implements the dangling type. Suppose that the London weather station has all the sensors that Boston has, plus some additional ones that are unique to the London weather patterns. Thus, London will create its own implementation of sensor_type: london-sensors.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.sensor.org" xmlns="http://www.sensor.org" elementFormDefault="qualified"> <xsd:simpleType name="sensor_type"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="barometer"/> <xsd:enumeration value="thermometer"/> <xsd:enumeration value="anenometer"/> <xsd:enumeration value="hygrometer"/> </xsd:restriction> </xsd:simpleType> </xsd:schema> This schema provides a different implementation for the dangling type, sensor_type. 61 Note that this schema has an additional sensor_type that Boston does not have - hygrometer. Just as with the Boston weather station instance document, the London weather station instance document will conform to a collection of schemas: weather-station.xsd and london-sensors.xsd: london-weather-station.xml <?xml version="1.0"?> <weather-station xmlns="http://www.weather-station.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.weather-station.org weather-station.xsd http://www.sensor.org london-sensors.xsd"> <sensor>thermometer</sensor> <sensor>barometer</sensor> <sensor>hygrometer</sensor> <sensor>anenometer</sensor> </weather-station> The London weather station is able to customize the content of <sensor> by using london-sensors.xsd, which defines sensor_type appropriately for the London weather station. Wow! Summary: This method represents an extraordinarily powerful design pattern. The key to this design pattern is: 1. When you declare the variable content container element give it a type that is in another namespace, e.g., s:sensor_type 2. When you <import> that namespace dont provide a value for schemaLocation, e.g., <xsd:import namespace=http://www.sensors.org/> 3. Create any number of implementations of the dangling type, e.g., boston-sensors.xsd london-sensors.xsd 4. In instance documents identify the schema that you want used to implement the dangling type, e.g., xsi:schemaLocation= http://www.weather-station.org weather-station.xsd http://www.sensor.org london-sensors.xsd 62 Both simpleType and complexType: In our examples we have implemented the dangling type as a simpleType. The implementation of a dangling type does not have to be a simpleType. A schema could define it as a complexType. Advantages: Dynamic: A schema which contains a dangling type is very dynamic. It does not statically hard- code the identity of a schema to implement the type. Rather, it empowers the instance document author to identify a schema that implements the dangling type. Thus, at instance-document- creation the type implementation is provided (rather than at schema-document-creation) Applicable to both Simple and Complex Types: A dangling type can be implemented as either a simpleType or a complexType. The other methods are only applicable to creating variable content containers with a complex type. Disadvantages: Different Namespace: The implementation of the dangling type must be in another namespace. It cannot be in the same namespace as the variable content container element. If you have a hard requirement that the contents of your variable content container have the same namespace as the container element then this method cannot be employed. Best Practice Which method you should use to create your variable content containers ultimately depends on your requirements. Here are some things to consider. Use Method 1 (abstract element plus element substitution) when: Its okay for all the elements to descend from a common type. You need to provide the ability to extend the collection of elements in the variable content container without modifying its schema. You can live with the container elements all being namespace-exposed in instance documents. Use Method 2 (<choice> element) when: You need to contain a collection of dissimilar, independent elements It is adequate to have an external authority (i.e., a human) verify the collection of legal elements. Verification is accomplished by the external authority selecting which elements shall be allowed in the <choice> element Growth of the collection of elements is tightly determined by the external authority that controls the schema. 63 Use Method 3 (abstract type with type substitution) when: All the elements in the variable content container are of the same type, or derived from the same type Its okay to give all the elements in a variable content container a uniform name. The collection of elements may grow, independent of the container schema. You need to support namespace-hiding. You need to support scalable processing. Use Method 4 (dangling type) when: You need a simpleType variable content container You need to extend a simpleType You need very dynamic, customizable content Best Practice: Method 4 is by far the most flexible approach. Unfortunately, as of today (August 16, 2001) none of the schema validators have implemented dangling types. The workaround is to use the anyType. For example: <xsd:element name=sensor type=anyType/>. We lose a bit of type checking with this, but it is the best that we can do today. Encourage the schema validator developers to support this capability! 1 XML Schema Versioning Issue What is the Best Practice for versioning XML schemas? Introduction It is clear that XML schemas will evolve over time and it is important to capture the schemas version. This write-up summarizes two cases for schema changes and some options for schema versioning. It then provides some best practice guidelines for XML schema versioning. Schema Changes Two Cases Consider two cases for changes to XML schemas: Case 1. The new schema changes the interpretation of some element. For example, a construct that was valid and meaningful for the previous schema does not validate against the new schema. Case 2. The new schema extends the namespace (e.g., by adding new elements), but does not invalidate previously valid documents. Versioning Approaches Some options for identifying a new a schema version are to: 1. Change the (internal) schema version attribute. 2. Create a schemaVersion attribute on the root element. 3. Change the schema's targetNamespace. 4. Change the name/location of the schema. Option 1: Change the (internal) schema version attribute. In this approach one would simply change the number in the optional version attribute at the start of the XML schema. For example, in the code below one could change version=1.0 to version=1.1 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" version="1.0"> Advantages: - Easy. Part of the schema specification. - Instance documents would not have to change if they remain valid with the new version of the schema (case 2 above). - The schema contains information that informs applications that it has changed. An application could interrogate the version attribute, recognize that this is a new version of the schema, and take appropriate action. 2 Disadvantages: - The validator ignores the version attribute. Therefore, it is not an enforceable constraint. Option 2: Create a schemaVersion attribute on the root element. With this approach an attribute is included on the element that introduces the namespace. In the examples below, this attribute is named schemaVersion. This option could be used in two ways. Usage A: First, like option 1, this attribute could be used to capture the schema version. In this case, one could make the attribute required and the value fixed. Then each instance that used this schema would have to set the value of the attribute to the value used in the schema. This makes schemaVersion a constraint that is enforceable by the validator. With the example schema below, the instance would have to include a schemaVersion attribute with a value of 1.0 for the instance to validate. <xs:schema xmlns="http://www.exampleSchema" targetNamespace="http://www.exampleSchema" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="Example"> <xs:complexType> . <xs:attribute name="schemaVersion" type="xs:decimal" use="required" fixed="1.0"/> </xs:complexType> </xs:element> Advantages: - The schemaVersion attribute is an enforceable constraint. Instances would not validate without the same version number. Disadvantages: - The schemaVersion number in the instance must match exactly. This does not allow an instance to indicate that it is valid using multiple versions of a schema. Usage B: The second approach uses the schemaVersion attribute in an entirely different way. It no longer captures the version of the schema within the schema (i.e., it is not a fixed value). Rather, it is used in the instance to declare the version (or versions) of the schema with which the instance is compatible. This approach would have to be done in conjunction with option 1 (or an alternative indicator in the schema file to identify its version). The schemaVersion attributes value could be a list or a convention could be used to define how this attribute is used. For example, if the convention was that the schemaVersion attribute declares the latest schema version with which the instance is compatible, then the example instance below states that the instance should be valid with schema version 1.2 or earlier. With this approach, an application could compare the schema version (captured in the schema file) with the version to which the instance reports that it is compatible. 3 Sample Schema (declares its version as 1.3) <xs:schema xmlns="http://www.exampleSchema" targetNamespace="http://www.exampleSchema" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" version="1.3"> <xs:element name="Example"> <xs:complexType> . <xs:attribute name="schemaVersion" type="xs:decimal" use="required"/> </xs:complexType> </xs:element> Sample Instance (declares it is compatible with version 1.2 (or 1.2 and other versions depending upon the convention used)) <Example schemaVersion="1.2" xmlns="http://www.example" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example MyLocation\Example.xsd"> Advantages: - Instance documents may not have to change if they remain valid with the new schema version (case 2). - Like option 1, an application would receive an indication that the schema has changed. - Could provide an alternative to schemaLocation as a means to point to the correct schema version. This could be desirable where the business practice requires the use of a schema in a controlled repository, rather than an arbitrary location. Disadvantages: - Requires extra processing by an application. For example, an application would have to pre-parse the instance to determine what schema version with which it is compatible, and compare this value to the version number stored in the schema file. Option 3: Change the schema's targetNamespace. In this approach, the schemas targetNamespace could be changed to designate that a new version of the schema exists. One way to do this is to include a schema version number in the designation of the target namespace as shown in the example below. <xs:schema xmlns="http://www.exampleSchemaV1.0" targetNamespace="http://www.exampleSchemaV1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> Advantages: - Applications are notified of a change to the schema (i.e., an application would not recognize the new namespace). - Requires action to assure that there are no compatibility problems with the new schema. At a minimum, the instance documents that use the schema, and schemas that include the relevant schema, must change to reference the new targetNamespace. This both an advantage and a disadvantage. 4 Disadvantages: - With this approach, instance documents will not validate until they are changed to designate the new targetNamepsace. However, one does not want to force all instance documents to change, even if the change to the schema is really minor and would not impact an instance. - Any schemas that include this schema would have to change because the target namespace of the included components must be the same as the target namespace of the including schema. Option 4: Change the name/location of the schema. This approach changes the file name or location of the schema. This mimics the convention that many people use for naming their files so that they know which version is the most current (e.g., append version number or date to end of file name). Advantages: Disadvantages: - As with option 3, this approach forces all instance documents to change, even if the change to the schema would not impact that instance. - Any schemas that import the modified schema would have to change since the import statement provides the name and location of the imported schema. - Unlike the previous options, with this approach an application receives no hint that the meaning of various element/attribute names has changed. - The schemaLocation attribute in the instance document is optional and is not authoritative even if it is present. It is a hint to help the processor to locate the schema. Therefore, relying on this attribute is not a good practice (with the current reading of the specification). XML Schema Versioning Best Practices [1] Capture the schema version somewhere in the XML schema. [2] Identify in the instance document, what version/versions of the schema with which the instance is compatible. [3] Make previous versions of an XML schema available. This allows applications to use previous versions. It also allows users to migrate to new versions of the schema as compatibility is assured. One way to do this is to have applications pre-parse the instance and choose the appropriate schema based on the version number. For example, one could have the schemaLocation URI point to a document that includes a list of the locations of the available versions of the schema. A tool could then be used to obtain the correct version of the schema. The disadvantage of this approach is that this pre-parsing requires two passes at the XML instance (one to get the correct version of the schema and one to validate). 5 [4] When an XML schema is only extended, (e.g., new elements, attributes, extensions to an enumerated list, etc.) one should strive to not invalidate existing instance documents. For example, if one is adding new elements or attributes, one could consider making them optional where this makes sense. Also, one could come up with a convention for schema versioning to indicate whether the schema changed significantly (case 1) or was only extended (case 2). For example, for case 1 a version could increment by one (e.g., v1.0 to v2.0) whereas for case 2 a version could increment by less than one (e.g., v1.2 to v1.3). In this case, a possible approach would be to do the following with respect to the schema: a. Change the schema version number within the schema (e.g., option 1). b. Record the changes in the schema in a change history. c. Make the new and previous versions of the schema available (therefore, one would want to change the file name/location as well). [5] Where the new schema changes the interpretation of some element (e.g., a construct that was valid and meaningful for the previous schema does not validate against the new schema), one should change the target namespace. In this case, the changes with respect to the schema are the same as with [4], with one addition: d. Change the target namespace. In this case there are also required changes with respect to the instances that use this schema. e. Update the instances to reflect the new target namespace. f. Confirm that there are no compatibility problems with the new schema. g. Change the attribute that identifies the version/versions of the schema with which the instance is valid. h. Update the schema name/location if appropriate. 28 Zero, One, or Many Namespaces? Table of Contents Issue Introduction Example Heterogeneous Namespace Design Homogeneous Namespace Design Chameleon Namespace Design Impact of Design Approach on Instance Documents <redefine> - only Applicable to Homogeneous and Chameleon Namespace Designs Default Namespace and the Chameleon Namespace Design Avoiding Name Collisions with Chameleon Components Creating Tools for Chameleon Components Best Practice Issue: In a project where multiple schemas are created, should we give each schema a different targetNamespace, or should we give all the schemas the same targetNamespace, or should some of the schemas have no targetNamespace? Managing Multiple Schemas - Same or Different targetNamespaces? Schema- 1. xsd Schema- 2. xsd Schema- n. xsd . . . or no targetNamespace? 29 Introduction In a typical project many schemas will be created. The schema designer is then confronted with this issue: shall I define one targetNamespace for all the schemas, or shall I create a different targetNamespace for each schema, or shall I have some schemas with no targetNamespace? What are the tradeoffs? What guidance would you give someone starting on a project that will create multiple schemas? Here are the three design approaches for dealing with this issue: [1] Heterogeneous Namespace Design: give each schema a different targetNamespace [2] Homogeneous Namespace Design: give all schemas the same targetNamespace [3] Chameleon Namespace Design: give the main schema a targetNamespace and give no targetNamespace to the supporting schemas (the no-namespace supporting schemas will take-on the targetNamespace of the main schema, just like a Chameleon) To describe and judge the merits of the three design approaches it will be useful to take an example and see each approach in action. Example: XML Data Model of a Company Imagine a project which involves creating a model of a company using XML Schemas. One very simple model is to divide the schema functionality along these lines: Company schema Person schema Product schema A company is comprised of people and products. Here are the company, person, and product schemas using the three design approaches.
30 [1] Heterogeneous Namespace Design This design approach says to give each schema a different targetNamespace, e.g., <xsd: schema t ar get Namespace=" C"> <xsd: i mpor t namespace=" A" schemaLocat i on=" A. xsd" / > <xsd: i mpor t namespace=" B" schemaLocat i on=" B. xsd" / >
</ xsd: schema> <xsd: schema t ar get Namespace=" A" > A. xsd <xsd: schema t ar get Namespace=" B" > B. xsd C. xsd Below are the three schemas designed using this design approach. Observe that each schema has a different targetNamespace. Product.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.product.org" xmlns="http://www.product.org" elementFormDefault="qualified"> <xsd:complexType name="ProductType"> <xsd:sequence> <xsd:element name="Type" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:schema> 31 Person.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.person.org" xmlns="http://www.person.org" elementFormDefault="qualified"> <xsd:complexType name="PersonType"> <xsd:sequence> <xsd:element name="Name" type="xsd:string"/> <xsd:element name="SSN" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> Company.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.company.org" xmlns="http://www.company.org" elementFormDefault="qualified" xmlns:per="http://www.person.org" xmlns:pro="http://www.product.org"> <xsd:import namespace="http://www.person.org" schemaLocation="Person.xsd"/> <xsd:import namespace="http://www.product.org" schemaLocation="Product.xsd"/> <xsd:element name="Company"> <xsd:complexType> <xsd:sequence> <xsd:element name="Person" type="per:PersonType" maxOccurs="unbounded"/> <xsd:element name="Product" type="pro:ProductType" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> Note the three namespaces that were created by the schemas: http://www.product.org http://www.person.org http://www.company.org 32 [2] Homogeneous Namespace Design This design approach says to create a single, umbrella targetNamespace for all the schemas, e.g., <xsd: schema t ar get Namespace=" Li br ar y"> <xsd: i ncl ude schemaLocat i on=" Li br ar yBookCat al ogue. xsd" / > <xsd: i ncl ude schemaLocat i on=" Li br ar yEmpl oyees. xsd" / >
</ xsd: schema> Li br ar yBookCat al ogue. xsd Li br ar yEmpl oyees. xsd <xsd: schema t ar get Namespace=" Li br ar y" > <xsd: schema t ar get Namespace=" Li br ar y" > Li br ar y. xsd Below are the three schemas designed using this approach. Observe that all schemas have the same targetNamespace. Product.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.company.org" xmlns="http://www.product.org" elementFormDefault="qualified"> <xsd:complexType name="ProductType"> <xsd:sequence> <xsd:element name="Type" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:schema> 33 Person.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.company.org" xmlns="http://www.person.org" elementFormDefault="qualified"> <xsd:complexType name="PersonType"> <xsd:sequence> <xsd:element name="Name" type="xsd:string"/> <xsd:element name="SSN" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> Company.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.company.org" xmlns="http://www.company.org" elementFormDefault="qualified"> <xsd:include schemaLocation="Person.xsd"/> <xsd:include schemaLocation="Product.xsd"/> <xsd:element name="Company"> <xsd:complexType> <xsd:sequence> <xsd:element name="Person" type="PersonType" maxOccurs="unbounded"/> <xsd:element name="Product" type="ProductType" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> Note that all three schemas have the same targetNamespace: http://www.company.org Also note the mechanism used for accessing components in other schemas which have the same targetNamespace: <include>. When accessing components in a schema with a different namespace the <import> element is used, as we saw above in the Heterogeneous Design. 34 [3] Chameleon Namespace Design This design approach says to give the main schema a targetNamespace, and the supporting schemas have no targetNamespace, e.g., <xsd: schema t ar get Namespace=" Z"> <xsd: i ncl ude schemaLocat i on=" Q. xsd" / > <xsd: i ncl ude schemaLocat i on=" R. xsd" / >
</ xsd: schema> Q. xsd R. xsd <xsd: schema > <xsd: schema > Z. xsd In our example, the company schema is the main schema. The person and product schemas are supporting schemas. Below are the three schemas using this design approach: Product.xsd (no targetNamespace) <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xsd:complexType name="ProductType"> <xsd:sequence> <xsd:element name="Type" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:schema> 35 Person.xsd (no targetNamespace) <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xsd:complexType name="PersonType"> <xsd:sequence> <xsd:element name="Name" type="xsd:string" minOccurs="1" maxOccurs="1"/> <xsd:element name="SSN" type="xsd:string" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:schema> Company.xsd (main schema, uses the no-namespace-schemas) <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.company.org" xmlns="http://www.company.org" elementFormDefault="qualified"> <xsd:include schemaLocation="Person.xsd"/> <xsd:include schemaLocation="Product.xsd"/> <xsd:element name="Company"> <xsd:complexType> <xsd:sequence> <xsd:element name="Person" type="PersonType" maxOccurs="unbounded"/> <xsd:element name="Product" type="ProductType" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> There are two things to note about this design approach: First, as shown above, a schema is able to access components in schemas that have no targetNamespace, using <include>. In our example, the company schema uses the components in Product.xsd and Person.xsd (and they have no targetNamespace). Second, note the chameleon-like characteristics of schemas with no targetNamespace: The components in the schemas with no targetNamespace get namespace-coerced. That is, the components take-on the targetNamespace of the schema that is doing the <include>. For example, ProductType in Products.xsd gets implicitly coerced into the company targetNamespace. 36 Chameleon effect ... This is a term coined by Henry Thompson to describe the ability of components in a schema with no targetNamespace to take-on the namespace of other schemas. This is powerful! Impact of Design Approach on Instance Documents Above we have shown how the schemas would be designed using the three design approaches. Lets turn now to the instance document. Does an instance document differ depending on the design approach? All of the above schemas have been designed to expose the namespaces in instance documents (as directed by: elementFormDefault=qualified). If they had instead all used elementFormDefault=unqualified then instance documents would all have this form: <?xml version="1.0"?> <c:Company xmlns:c="http://www.company.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.company.org Company.xsd"> <Person> <Name>John Doe</Name> <SSN>123-45-6789</SSN> </Person> <Product> <Type>Widget</Type> </Product> </c:Company> It is when the schemas expose their namespaces in instance documents that differences appear. In the above schemas, they all specified elementFormDefault=qualified, thus exposing their namespaces in instance documents. Lets see what the instance documents look like for each design approach: [1] Company.xml (conforming to the multiple targetNamespaces version) <?xml version="1.0"?> <Company xmlns="http://www.company.org" xmlns:per="http://www.person.org" xmlns:prod="http://www.product.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.company.org Company.xsd"> <Person> <per:Name>John Doe</per:Name> <per:SSN>123-45-6789</per:SSN> </Person> <Product> <prod:Type>Widget</prod:Type> </Product> </Company> 37 Note that: there needs to be a namespace declaration for each namespace the elements must all be uniquely qualified (explicitly or with a default namespace) [2] Company.xml (conforming to the single, umbrella targetNamespace version) <?xml version="1.0"?> <Company xmlns="http://www.company.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.company.org Company.xsd"> <Person> <Name>John Doe</Name> <SSN>123-45-6789</SSN> </Person> <Product> <Type>Widget</Type> </Product> </Company> Since all the schemas are in the same namespace the instance document is able to take advantage of that by using a default namespace. [3] Company.xml (conforming to the main targetNamespace with supporting no- targetNamespace version) <?xml version="1.0"?> <Company xmlns="http://www.company.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.company.org Company.xsd"> <Person> <Name>John Doe</Name> <SSN>123-45-6789</SSN> </Person> <Product> <Type>Widget</Type> </Product> </Company> Both of the schemas that have no targetNamespace take on the the company targetNamespace (ala the Chameleon effect). Thus, all components are in the same targetNamespace and the instance document takes advantage of this by declaring a default namespace. 38 <redefine> - only Applicable to Homogeneous and Chameleon Namespace Designs The <redefine> element is used to enable access to components in another schema, while simultaneously giving the capability to modify zero or more of the components. Thus, the <redefine> element has a dual functionality: it does an implicit <include>. Thus it enables access to all the components in the referenced schema it enables you to redefine zero or more of the components in the referenced schema, i.e., extend or restrict components Example. Consider again the Company.xsd schema above. Suppose that it wishes to use ProductType in Product.xsd. However, it would like to extend ProductType to include a product ID. Heres how to do it using redefine: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.company.org" xmlns="http://www.company.org" elementFormDefault="qualified"> <xsd:include schemaLocation="Person.xsd"/> <xsd:redefine schemaLocation="Product.xsd"> <xsd:complexType name="ProductType"> <xsd:complexContent> <xsd:extension base="ProductType"> <xsd:sequence> <xsd:element name="ID" type="xsd:ID"/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> </xsd:redefine> <xsd:element name="Company"> <xsd:complexType> <xsd:sequence> <xsd:element name="Person" type="PersonType" maxOccurs="unbounded"/> <xsd:element name="Product" type="ProductType" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> Now the <Product> element in instance documents will contain both <Type> and <ID>, e.g., 39 <?xml version="1.0"?> <Company xmlns="http://www.company.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.company.org Company.xsd"> <Person> <Name>John Doe</Name> <SSN>123-45-6789</SSN> </Person> <Product> <Type>Widget</Type> <ID>1001-1-00</ID> </Product> </Company> The <redefine> element is very powerful. However, it can only be used with schemas with the same targetNamespace or with no targetNamespace. Thus, it only applies to the Homogenous Namespace Design and the Chameleon Namespace Design. Name collisions When a schema uses Chameleon components those components become part of the including schemas targetNamespace, just as though the schema author had typed the element declarations and type definitions inline. If the schema <include>s multiple no-namespace schemas then there will be a chance of name collisions. In fact, the schema may end up not being able to use some of the no-namespace schemas because their use results in name collisions with other Chameleon components. To demonstrate the name collision problem, consider this example: Suppose that there are two schemas with no targetNamespace: 1.xsd A B 2.xsd A C Schema 1 creates no-namespace elements A and B. Schema 2 creates no-namespace elements A, and C. Now if schema 3 <include>s these two no-namespace schemas there will be a name collision: 3.xsd targetNamespace=http://www.example.org <include schemaLocation=1.xsd/> <include schemaLocation=2.xsd/> 40 This schema has a name collision - A is defined twice. [Note: its not an error to have two elements in the same symbol space, provided they have the same type. However, if they have a different type then it is an error, i.e., name collision.] Namespaces are the standard way of avoiding such collisions. Above, if instead the components in 1.xsd and 2.xsd resided in different namespaces then 3.xsd could have <import>ed them and there would be no name collision. [Recall that two elements/types can have the same name if the elements/types are in different namespaces.] How do we address the name collision problem that the Chameleon design presents? Thats next. Resolving Namespace Collisions using Proxy Schemas There is a very simple solution to the namespace collision problem: for each no-namespace schema create a companion namespaced-schema (a proxy schema) that <include>s the no- namespace schema. Then, the main schema <import>s the proxy schemas. <xsd: schema t ar get Namespace=" Z2" > <xsd: i ncl ude schemaLocat i on="R. xsd"/ > <xsd: schema t ar get Namespace=" Z1" > <xsd: i ncl ude schemaLocat i on="Q. xsd"/ > <xsd: schema t ar get Namespace=" Z" > <xsd: i mpor t namespace=" Z1" schemaLocat i on=" Q- Pr oxy. xsd"/ > <xsd: i mpor t namespace=" Z2" schemaLocat i on=" R- Pr oxy. xsd"/ >
</ xsd: schema> Q. xsd R. xsd <xsd: schema > <xsd: schema > Z. xsd Q- Proxy. xsd R- Proxy. xsd With this approach we avoid name collisions. This design approach has the added advantage that it also enables the proxy schemas to customize the Chameleon components using <redefine>. Thus, this approach is a two-step process: Create the Chameleon schemas Create a proxy schema for each Chameleon schema The main schema <import>s the proxy schemas. The advantage of this two-step approach is that it enables applications to decide on a domain (namespace) for the components that it is reusing. Furthermore, applications are able to refine/ 41 customize the Chameleon components. This approach requires an extra step (i.e., creating proxy schemas) but in return it provides a lot of flexibility. Contrast the above two-step process with the below one-step process where the components are assigned to a namespace from the very beginning: 1-fixed.xsd targetNamespace=http://www.1-fixed.org A B 2-fixed.xsd targetNamespace=http://www.2-fixed.org A C main.xsd targetNamespace=http://www.main.org <xsd:import namespace=http://www.1-fixed.org schemaLocation=1-fixed.xsd/> <xsd:import namespace=http://www.2-fixed.org schemaLocation=2-fixed.xsd/> This achieves the same result as the above two-step version. In this example, the components are not Chameleon. Instead, A, B, and C were hardcoded with a namespace from the very beginning of their life. The downside of this approach is that if main.xsd wants to <redefine> any of the elements it cannot. Also, applications are forced to use a domain (namespace) defined by someone else. These components are in a rigid, static, fixed namespace. Creating Tools for Chameleon Components Tools for Chameleon Components We have seen repeatedly how Chameleon components are able to blend in with the schemas that use them. That is, they adopt the namespace of the schema that <include>s them. <xsd: schema > <xsd: schema t ar get Namespace="Z1"> <xsd: i ncl ude schemaLocat i on=" Q. xsd" / > <xsd: schema t ar get Namespace="Z2"> <xsd: i ncl ude schemaLocat i on=Q. xsd"/ > Chameleon components take-on the namespace of the <include>ing schema 42 How do you write tools for components that can assume so many different faces (namespaces)? Tool ? <xsd: schema > <xsd: schema t ar get Namespace="Z1"> <xsd: i ncl ude schemaLocat i on=" Q. xsd" / > <xsd: schema t ar get Namespace="Z2"> <xsd: i ncl ude schemaLocat i on=Q. xsd"/ > How does a tool identify components that can assume many faces? Certainly not by namespaces. Consider this no-namespace schema: 1.xsd A B Suppose that we wish to create a tool, T, which must process the two Chameleon components A and B, regardless of what namespace they reside in. The tool must be able to handle the following situation: imagine a schema, main.xsd, which <include>s 1.xsd. In addition, suppose that main.xsd has its own element called A (in a different symbol space, so theres no name collision). For example: main.xsd targetNamespace=http://www.example.org <include schemaLocation=1.xsd/> <element name=stuff> <complexType> <sequence> <element name=A type=xxx/> ... </sequence> </complexType> </element> How would the tool T be able to distinguish between the Chameleon component A and the local A in an instance document? 43 Chameleon Component Identification One simple solution is that when you create Chameleon components assign them a global unique id (a GUID). The XML Schema spec allows you to add an attribute, id, to all element, attribute, complexType, and simpleType components. <xsd: el ement name=" Lat _Lon" id="http://www.geospacial.org"
</ xsd: el ement > Each component (element, complexType, simpleType, attribute) in a schema can have an associated id attribute. This can be used to uniquely identify each Chameleon component, regardless of its namespace. Note that the id attribute is purely local to the schema. There is no representation in the instance documents. This id attribute could be used by a tool to locate a Chameleon component, regardless of what face (namespace) it currently wears. That is, the tool can open up an instance document using DOM, and the DOM API will provide the tool access to the id value for all components in the instance document. Tool <xsd: schema > <xsd: schema t ar get Namespace="Z1"> <xsd: i ncl ude schemaLocat i on=" Q. xsd" / > <xsd: schema t ar get Namespace="Z2"> <xsd: i ncl ude schemaLocat i on=Q. xsd"/ > i d=" www. geospaci al . or g" i d=" www. geospaci al . or g" i d=" www. geospaci al . or g" A tool can locate the Chameleon component by using the id attribute. Best Practice Above we explored the design space for this issue. We looked at the three design approaches in action, both schemas and instance documents. So which design is better? Under what circumstances? 44 When you are reusing schemas that someone else created you should <import> those schemas, i.e., use the Heterogeneous Namespace design. It is a bad idea to copy those components into your namespace, for two reasons: (1) soon your local copies would get out of sync with the other schemas, and (2) you lose interoperability with any existing applications that process the other schemas components. The interesting case (the case we have been considering throughout this discussion) is how to deal with namespaces in a collection of schemas that you created. Heres our guidelines for this case: Use the Chameleon Design: with schemas which contain components that have no inherent semantics by themselves, with schemas which contain components that have semantics only in the context of an <include>ing schema, when you dont want to hardcode a namespace to a schema, rather you want <include>ing schemas to be able to provide their own application-specific namespace to the schema Example. A repository of components - such as a schema which defines an array type, or vector, linked list, etc - should be declared with no targetNamespace (i.e., Chameleon). As a rule of thumb, if your schema just contains type definitions (no element declarations) then that schema is probably a good candidate for being a Chameleon schema. Use the Homogeneous Namespace Design when all of your schemas are conceptually related when there is no need to visually identify in instance documents the origin/lineage of each element/attribute. In this design all components come from the same namespace, so you loose the ability to identify in instance documents that element A comes from schema X. Oftentimes thats okay - you dont want to categorize elements/attributes differently. This design approach is well suited for those situations. Use the Heterogeneous Namespace Design when there are multiple elements with the same name. (Avoid name collision) when there is a need to visually identify in instance documents the origin/lineage of each element/attribute. In this design the components come from different namespaces, so you have the ability to identify in instance documents that element A comes from schema X. Lastly, as we have seen, in a schema each component can be uniquely identified with an id attribute (this is NOT the same as providing an id attribute on an element in instance documents. We are talking here about a schema-internal way of identifying each schema component.) Consider identifying each schema component using the id attribute. This will enable a finer degree of traceability than is possible using namespaces. The combination of namespaces plus the schema id attribute is a powerful tandem for visually and programmatically identifying components.