Вы находитесь на странице: 1из 4

Learning XML

While SGML has a long history of use in specific user communities (Humanities, Government
documentation), its use as XML "On The Web" is still relatively rare. Support for integrated authoring
and browsing ala HTML is simply not there yet. IE 4.0 does contain hooks for XML viewing using
plug-ins. Both Netscape and Microsoft plan much greater support in later browser versions. For now
though, if you want to write and view XML documents, the available solutions might best be
described as "clunky"1. In this lab, you'll get to try one such solution and experience that 'clunkiness'
first hand.

Assumptions
I am assuming that you all know HTML and are literate users of Windows machines and associated
applications. In particular, I expect that you know how to use Notepad/Wordpad, Netscape/IE, and
perform standard desktop manipulations like copying files and folders.

Goals
The goals of this exercise are for you to
• Practice writing XML,
• Use a non-validating XML parser to verify that the XML code you write is well-formed,
• Gain experience generating HTML using an XML source file and a provided style sheet,
• Experience the separation of structure from formatting, and
• Have fun!

Preliminaries
We must do some set-up first. The course folder is called XML on the G drive of your machine
(G:\XML). Your home folder will be a folder called "XML" on the C Drive inside the TEMP folder
(C:\TEMP\XML).
 Go to the TEMP folder on your C Drive. If the XML folder exists delete it (put it in the Recycle
Bin).
 Create a new folder called XML. This is your home folder -- C:\TEMP\XML
 Using Windows Explorer, go to the course folder and copy the "examples" folder to your home
folder.
 Open a DOS command window.
 In the DOS window go to the folder you just copied

C:

cd \TEMP\XML\examples
 List the contents of this folder, so you can see what files are there (of course you can do this from
the Windows desktop as well)

DIR

1
To some degree, "clunky" makes sense. XML alone is about structure not formatting/display.

Learning XML Copyright Charles L. Viles, 1998, All rights reserved.


1
Using an XML parser
The XML parser we will use today is called Lark. Recall that an XML parser will read XML source
code and decide whether or not it is well-formed. Validating parsers also compare the structure of a
given XML file to see if it is valid with respect to a particular DTD. By itself, Lark is a non-validating
parser -- it will only tell you whether the XML is well-formed or not.

Checking a well-formed, existing file: course.xml


You have copied over several existing XML files in the examples folder. Take one of these,
"course.xml", and use Lark to check the XML for "well-formed-ness".
You have to do this in the MS-DOS window as follows:

jview G:\xml\lark\driver course.xml

If you really want to know what's going on with the above command, ask the instructor. Anyway,
since this example is already "well-formed", you should get output something like the following:
Hello Tim
Lark V1.0 final beta Copyright (c) 1997-98 Tim Bray.
All rights reserved; the right to use these class files for any purpose
is hereby granted to everyone.
Parsing...
Done.

Translation: Lark says the file parses cleanly - no error messages are given - its well-formed!

Checking a "messed-up" file, cd.xml


Now check the file cd.xml for well-formedness - you should get some error messages and output
something like:
Hello Tim
Lark V1.0 final beta Copyright (c) 1997-98 Tim Bray.
All rights reserved; the right to use these class files for any purpose
is hereby granted to everyone.
Parsing...
Lark:/export/home/viles/xml/cd.xml:4:12:E:Fatal: Encountered </para> expected </em>
...assumed </em>
Lark:/export/home/viles/xml/cd.xml:19:11:E:Fatal: Encountered </document> expected
</para>
...assumed </para>
...assumed </em>
Done.

Lark has found at least two errors, though there may be more.
Recall the "rules" of well-formedness. Minimally, your XML markup should
• respect case-sensitivity
• have ending tags for all starting tags
• have no overlapping elements

 Using Notepad or some other editor of your choosing, fix cd.xml so that it is well-formed. You
will have to repeatedly use Lark as shown above in order to check the XML.

Learning XML Copyright Charles L. Viles, 1998, All rights reserved.


2
Generating Displayable Content from XML
Well, it ain't easy, because the tool support is not there yet. Conceptually, we want to take the
structural markup in the XML code, combine it with formatting instructions in a "style sheet" to
produce a displayable product. Ideally, the web browser would handle this transparently, but right now
there is little support for viewing XML in browsers.
Of course rendering the XML in a prettified manner is one of many things that you might want to do
with that data. The process we will use to get prettified XML is to take three items:
1. The XML document,
2. A supplied stylesheet (written in XSL - eXtensible Style Language), and
3. A conversion program, msxsl,
and use these to generate an HTML file that is palatable to browsers. The conversion program,
msxsl, takes the XML document and the stylesheet and produces the HTML file. For the course.xml
file, you would do this as follows (as always, in the MS-DOS window)

G:\xml\msxsl\msxsl -i course.xml -s course.xsl -o course.html


where the options to the program specify the XML file, the XSL file, and the HTML file respectively.
Ain't this clunky?
The choices for a style sheet syntax have still not been worked out completely by the market place or
standards organizations. There is current support for "Cascading Style Sheets" in both web browsers,
though Microsoft is pushing strongly for the adoption of Extensible Style Language (XSL) as a
standard. Though agreement on the form and substance of XSL is far from reached, we will use XSL
formatting rules in this lab because they fit well with our working tool set.
We have provided separate style sheets for each XML document here, though in practice it is likely
that a single style sheet will be applied to many documents, not just a single one.
 Go ahead and run the msxsl command above for course.xml and a similar one for cd.xml.
If all goes well, you should be able to load the resulting HTML file into your web browser for display.
 Load the HTML files you have generated with msxsl into your web browser. Take some time
and compare the HTML files with the provided XML files - noting the big difference in the kinds
of markup in each.

Writing your own XML


Now you should be ready to write some XML from scratch - almost. If we were making pizza, then
you know now how to order out. Now we'll get the Chef-Boy-ar-Dee package from Harris Teeter. The
hard part, designing a DTD(making pizza dough from scratch) requires considerably more time than
we have here.
Enough with that pizza metaphor. Now you can use the informal DTD we worked up in the in-class
session to write your own XML document. The particular document of interest is a recent story about
Microsoft from the Washington Post and its located at

C:\TEMP\XML\examples\microsoft.xml

Learning XML Copyright Charles L. Viles, 1998, All rights reserved.


3
Your task is to add well-formed XML markup to this file. Although the file has an XML extension,
there is no markup in it. Use Notepad, Wordpad, Homesite, or some other text editor to add the
markup. Remember the "well-formedness" rules, we talked about in class.
When you think you are done, use the Lark parser to check it.

jview G:\xml\lark\driver microsoft.xml


Once you have well-formed XML, generate the HTML using the supplied stylesheet found in
microsoft.xsl. The command to do this looks something like

G:\xml\msxsl\msxsl -i microsoft.xml -s microsoft.xsl -o microsoft.html

If successful, the file microsoft.html will have been created. Go ahead and load this file in your
web browser to see what the combination of your markup and the supplied style sheet has yielded.

If you are feeling lucky ...


Try altering any of the supplied style sheets in order to alter the appearance of the document. For
example, consider making paragraphs in a large font and the title in a small font (just to be crazy eh?).
Start with the cd.xsl stylesheet, as that one is the most straightforward. Note that all of the supplied
stylesheets are very elementary. XSL is far more powerful and flexible than what you have seen here.

Further Reading
Books
The following books have been helpful in the preparation of this material.
1. "XML : A Primer." by Simon St. Laurent, MIS Press, 1998. Available from Amazon Books.
2. "The XML Handbook" by Charles F. Goldfarb, Paul Prescod, Prentice-Hall, 1998. Available from
Amazon Books.
3. "XML : Extensible Markup Language", by Elliotte Rusty Harold, IDG Books WorldWide, Inc.,
1998. Available from Amazon Books

Online
This course's XML page: http://ils.unc.edu/viles/xml/
Microsoft's XML page. http://www.microsoft.com/xml/default.asp
World Wide Web Consortium's Web Page: http://www.w3.org/XML/
Junglee's XML Reference List: http://www.junglee.com/tech/xml_sparchive.html
Oasis.org's XML Resource, maintained by Robin Clover: http://www.oasis-open.org/cover/xml.html

Learning XML Copyright Charles L. Viles, 1998, All rights reserved.


4

Вам также может понравиться