Вы находитесь на странице: 1из 5

XML

Generating XML Data


Creating the File

Using a standard text editor, create a file called slideSample.xml.

Writing the Declaration


<?xml version='1.0' encoding='utf-8'?>

Adding a Comment
<!-- A SAMPLE set of slides -->

Defining the Root Element


After the declaration, every XML file defines exactly one element, known as the root element. Any other
elements in the file are contained within that element. Enter the text highlighted below to define the root
element for this file, slideshow:

<?xml version='1.0' encoding='utf-8'?>

<!-- A SAMPLE set of slides -->

<slideshow>

</slideshow>

Adding Attributes to an Element


<slideshow
title="Sample Slide Show"
date="Date of publication"
author="Yours Truly"
>
</slideshow>
When you create a name for a tag or an attribute, you can use
hyphens ("-"), underscores ("_"), colons (":"), and periods (".") in
addition to characters and numbers. Unlike HTML, values for XML
attributes are always in quotation marks, and multiple attributes are
never separated by commas.

Adding Nested Elements


<slideshow
...
>

<!-- TITLE SLIDE -->


<slide type="all">
<title>Wake up to WonderWidgets!</title>
</slide>
</slideshow>

Adding an Empty Element


<item>Why <em>WonderWidgets</em> are great</item>
<item/>

Writing Processing Instructions


As you saw in Processing Instructions, the format for a processing instruction is
<?target data?>, where "target" is the target application that is expected to do the
processing, and "data" is the instruction or information for it to process.
<!-- PROCESSING INSTRUCTION -->
<?my.presentation.Program QUERY="exec, tech, all"?>

Notes:

• The "data" portion of the processing instruction can contain spaces, or may even be null. But there
cannot be any space between the initial <? and the target identifier.
• The data begins after the first space.
• Fully qualifying the target with the complete Web-unique package prefix makes sense, so as to
preclude any conflict with other programs that might process the same data.

Introducing an Error
The parser can generate one of three kinds of errors: fatal error, error, and warning.

Substituting and Inserting Text

• Handling Special Characters ("<", "&", and so on)


• Handling Text with XML-style syntax

Substituting and Inserting Text


In this section, you'll learn about:

• Handling Special Characters ("<", "&", and so on)


• Handling Text with XML-style syntax

Handling Special Characters

In XML, an entity is an XML structure (or plain text) that has a name. Referencing the entity by name causes
it to be inserted into the document in place of the entity reference. To create an entity reference, the entity
name is surrounded by an ampersand and a semicolon, like this:

&entityName;

Predefined Entities
An entity reference like &amp; contains a name (in this case, "amp") between the start and end delimiters.
The text it refers to (&) is substituted for the name, like a macro in a programming language.

Table 2-1 Predefined Entities


Character Reference
& &amp;
< &lt;
> &gt;
" &quot;
' &apos;

Character References

A character reference like &#147; contains a hash mark (#) followed by a number. The number is the
Unicode value for a single character, such as 65 for the letter "A", 147 for the left-curly quote, or 148 for the
right-curly quote. In this case, the "name" of the entity is the hash mark followed by the digits that identify the
character.

Using an Entity Reference in an XML Document

Suppose you wanted to insert a line like this in your XML document:

Market Size < predicted

Add the text highlighted below to your slideSample.xml file, and save a copy of it for future use as
slideSample03.xml:

<!-- OVERVIEW -->


<slide type="all">
<title>Overview</title>
...
</slide>
<slide type="exec">
<title>Financial Forecast</title>
<item>Market Size &lt; predicted</item>
<item>Anticipated Penetration</item>
<item>Expected Revenues</item>
<item>Profit Margin </item>
</slide>
</slideshow>

Handling Text with XML-Style Syntax

When you are handling large blocks of XML or HTML that include many of the special characters, it
would be inconvenient to replace each of them with the appropriate entity reference. For those situations,
you can use a CDATA section.

A CDATA section works like <pre>...</pre> in HTML, only more so--all whitespace in a
CDATA section is significant, and characters in it are not interpreted as XML. A CDATA section starts with
<![CDATA[ and ends with ]]>.

<slide type="tech">
<title>How it Works</title>
<item>First we fozzle the frobmorten</item>
<item>Then we framboze the staten</item>
<item>Finally, we frenzle the fuznaten</item>
<item><![CDATA[Diagram:
frobmorten <--------------- fuznaten
| <3> ^
| <1> | <1> = fozzle
V | <2> = framboze
Staten--------------------+ <3> = frenzle
<2>
]]></item>
</slide>

Creating a Document Type Definition

After the XML declaration, the document prolog can include a DTD, which lets
you specify the kinds of tags that can be included in your XML document. In addition to
telling a validating parser which tags are valid, and in what arrangements, a DTD tells
both validating and nonvalidating parsers where text is expected, which lets the parser
determine whether the whitespace it sees is significant or ignorable.

Basic DTD Definitions

To begin learning about DTD definitions, let's start by telling the parser where text is expected and where
any text (other than whitespace) would be an error. (Whitespace in such locations is ignorable.)

<!-- DTD for a simple "slide show". -->


<!ELEMENT slideshow (slide+)>

As you can see, the DTD tag starts with <! followed by the tag name (ELEMENT). After the tag name
comes the name of the element that is being defined (slideshow) and, in parentheses, one or more items
that indicate the valid contents for that element. In this case, the notation says that a slideshow consists
of one or more slide elements.

Table 2-2 DTD Element Qualifiers


Qualifier Name Meaning
? Question Mark Optional (zero or one)
* Asterisk Zero or more
+ Plus Sign One or more

You can include multiple elements inside the parentheses in a comma separated list, and use a qualifier on
each element to indicate how many instances of that element may occur. The comma-separated list tells
which elements are valid and the order they can occur in.

You can also nest parentheses to group multiple items. For an example, after defining an image element
(coming up shortly), you could declare that every image element must be paired with a title element in
a slide by specifying ((image, title)+). Here, the plus sign applies to the image/title pair to
indicate that one or more pairs of the specified items can occur.
Defining Text and Nested Elements

Now that you have told the parser something about where not to expect text, let's see how to tell it where
text can occur. Add the text highlighted below to define the slide, title, item, and list elements:

<!ELEMENT slideshow (slide+)>


<!ELEMENT slide (title, item*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT item (#PCDATA | item)* >

Limitations of DTDs

It would be nice if we could specify that an item contains either text, or text followed by one or more list
items. But that kind of specification turns out to be hard to achieve in a DTD. For example, you might be
tempted to define an item like this:

<!ELEMENT item (#PCDATA | (#PCDATA, item+)) >

That would certainly be accurate, but as soon as the parser sees #PCDATA and the vertical bar, it requires
the remaining definition to conform to the mixed-content model. This specification doesn't, so you get can
error that says: Illegal mixed content model for 'item'. Found &#x28; ..., where
the hex character 28 is the angle bracket the ends the definition.

Trying to double-define the item element doesn't work, either. A specification like this:

<!ELEMENT item (#PCDATA) >


<!ELEMENT item (#PCDATA, item+) >

produces a "duplicate definition" warning when the validating parser runs. The second definition is, in fact,
ignored. So it seems that defining a mixed content model (which allows item elements to be interspersed
in text) is about as good as we can do.

Special Element Values in the DTD

Rather than specifying a parenthesized list of elements, the element definition could use one of two special
values: ANY or EMPTY. The ANY specification says that the element may contain any other defined
element, or PCDATA. Such a specification is usually used for the root element of a general-purpose XML
document such as you might create with a word processor. Textual elements could occur in any order in
such a document, so specifying ANY makes sense.

The EMPTY specification says that the element contains no contents. So the DTD for e-mail messages that
let you "flag" the message with <flag/> might have a line like this in the DTD:

<!ELEMENT flag EMPTY>

Вам также может понравиться