Академический Документы
Профессиональный Документы
Культура Документы
Libraries
Roy Tennant
eScholarship
California Digital Library
escholarship.cdlib.org
Introduction
• Goal: introduce you to XML, explain
what it can do in general terms, and
highlight particular uses
• Caveat: you will not learn enough to do
it without further study
04:24 AM 04:24 AM
Outline
• Introduction to XML
• Serving XML to the Web
• Case Studies
• Tips & Advice
• Resources
04:24 AM 04:24 AM
Introduction to XML
• Extensible Markup Language
• A method of creating and using tags to
identify the structure and contents of a
document — not how it should be
displayed
• The tags used can be arbitrary or can
come from a specification
04:24 AM 04:24 AM
What it Looks Like
<?xml version="1.0"?>
<book>
<author>
<lastname>Tennant</lastname>
<firstname>Roy</lastname>
</author>
<title>The Great American Novel</title>
<chapter number=“1”>
<chaptitle>It Was Dark and Stormy</chaptitle>
<p> “I’m scared,” I said.</p>
</chapter>
</book>
04:24 AM 04:24 AM
Two Types of XML
• Well-Formed
• Valid
04:24 AM 04:24 AM
Well-Formed XML
• Follows general tagging rules:
– All tags begin and end
• But can be minimized if empty: <br/> instead of <br></br>
– All tags are lowercase
– All tags are properly nested:
• <author> <firstname>Mark</firstname>
<lastname>Twain</lastname> </author>
– All attribute values are quoted:
• <subject scheme=“LCSH”>Music</subject>
• Has identification & declaration tags
• Software can make sure a
document follows these rules
04:24 AM 04:24 AM
Valid XML
• Uses only specific tags and rules as codified
by one of:
– A document type definition (DTD)
– A schema definition
• Only the tags listed by the schema or DTD
can be used
• Software can take a DTD or schema and
verify that a document adheres to the rules
• Editing software can prevent an author
from using anything except allowed tags
04:24 AM 04:24 AM
Ways to Use XML
• Behind the scenes as a standard and
easily transformed format for
information
• As a transfer syntax, to exchange
information in a machine-parseable
form
• As a method of delivery direct to the
user (not recommended)
04:24 AM 04:24 AM
Why is XML Important?
• It is a standard, easily extensible way to
encode loosely-structured as well as highly-
structured information
• Due to its easy parseability, software can
transform it in countless ways, thereby
allowing:
– Easy migration paths
– Alternative displays
– On-the-fly response to user needs
04:24 AM 04:24 AM
XML vs. Databases
(a simplistic formula)
04:24 AM 04:24 AM
Transforming XML: XSLT
• XML Stylesheet Language — Transformations
(XSLT)
• A markup language and programming syntax
for processing XML
• Is most often used to:
– Transform XML to HTML for delivery to standard
web clients
– Transform XML from one set of XML tags to
another
– Transform XML into another syntax/system
04:24 AM 04:24 AM
Required Components for
Serving XML to the Web
• An XML-encoded “document”
• An XSLT stylesheet to…
• …transform it to HTML or XHTML:
– Static
– Dynamic
• A CSS stylesheet (optional)
04:24 AM 04:24 AM
XML Web Publishing Software
• Required to:
– Apply dynamic transformations to XML
content
– Render HTML dynamically for standard
web browsers
• Just beginning to be available:
– Cocoon: http://xml.apache.org/cocoon/
– AxKit: http://axkit.org/
04:25 AM 04:25 AM
Case Study: Publishing Books @ the
California Digital Library
• Goals:
– To create highly usable online versions of
books
– To create versions that will migrate easily
as technology changes
– To create an infrastructure that will
support dynamic presentations of the
same content
04:25 AM 04:25 AM
Case Study: Publishing Books @ the
California Digital Library
• Strategy:
Markup the texts in XML
Serve them dynamically using XML web
publishing software (currently Cocoon)
Create different displays for different purposes,
and a mechanism for allowing the user to select
their preferred view
Find and apply an XML-aware search engine
– Create a method by which users can create their
own Adobe Acrobat versions
04:25 AM 04:25 AM
AxKit
mod_perl
Web Server
Cocoon
Tomcat
Web Server
Cocoon
Tomcat
Web Server
I want this XML doc…
XSLT
Stylesheet
XML
Doc
Cocoon
Tomcat
Web Server
XSLT
Stylesheet
XML
Doc XHTML
Cocoon Document
(no display
markup)*
Tomcat
HTML
Web Server Stylesheet
(CSS)
* Dynamic document
Transformation
Information XSLT
Stylesheet Presentation
XML
Doc XHTML
Cocoon Document
(no display
markup)*
Tomcat
HTML
Web Server Stylesheet
(CSS)
* Dynamic document
Case Study: ILL ASAP
ILL
ASAP Local
Catalog
OCLC
Downloaded Internet
Requests XML File Explorer
XSL Printable
Stylesheet XHTML
File
04:25 AM 04:25 AM
04:25 AM 04:25 AM
04:25 AM 04:25 AM
Service Tasmania Architecture
04:25 AM 04:25 AM
Case Study: Univ. of Michigan
04:25 AM 04:25 AM
04:25 AM 04:25 AM
Tips and Advice
• Begin transitioning to XML now:
– XHTML and CSS for web files, XML for static
documents with long-term worth
• Do not rely on browser support of XML
• DTDs? We don’t need no stinkin’ DTDs!
• Get on the XML4Lib discussion list:
http://sunsite.berkeley.edu/XML4Lib/
• Buy my book!
04:25 AM 04:25 AM
Resources
• Web sites
• Electronic discussions
• Books
• Magazines and journals
• Individuals
04:25 AM 04:25 AM