Вы находитесь на странице: 1из 34

XML for

Libraries
Roy Tennant
eScholarship
California Digital Library
escholarship.cdlib.org
Introduction
• Goal: introduce you to XML, explain
what it can do in general terms, and
highlight particular uses
• Caveat: you will not learn enough to do
it without further study

04:24 AM 04:24 AM
Outline
• Introduction to XML
• Serving XML to the Web
• Case Studies
• Tips & Advice
• Resources

04:24 AM 04:24 AM
Introduction to XML
• Extensible Markup Language
• A method of creating and using tags to
identify the structure and contents of a
document — not how it should be
displayed
• The tags used can be arbitrary or can
come from a specification

04:24 AM 04:24 AM
What it Looks Like
<?xml version="1.0"?>
<book>
<author>
     <lastname>Tennant</lastname>
     <firstname>Roy</lastname>
</author>
<title>The Great American Novel</title>
<chapter number=“1”>
     <chaptitle>It Was Dark and Stormy</chaptitle>
     <p> “I’m scared,” I said.</p>
</chapter>
</book>
04:24 AM 04:24 AM
Two Types of XML
• Well-Formed
• Valid

04:24 AM 04:24 AM
Well-Formed XML
• Follows general tagging rules:
– All tags begin and end
• But can be minimized if empty: <br/> instead of <br></br>
– All tags are lowercase
– All tags are properly nested:
• <author> <firstname>Mark</firstname>
<lastname>Twain</lastname> </author>
– All attribute values are quoted:
• <subject scheme=“LCSH”>Music</subject>
• Has identification & declaration tags
• Software can make sure a
document follows these rules
04:24 AM 04:24 AM
Valid XML
• Uses only specific tags and rules as codified
by one of:
– A document type definition (DTD)
– A schema definition
• Only the tags listed by the schema or DTD
can be used
• Software can take a DTD or schema and
verify that a document adheres to the rules
• Editing software can prevent an author
from using anything except allowed tags

04:24 AM 04:24 AM
Ways to Use XML
• Behind the scenes as a standard and
easily transformed format for
information
• As a transfer syntax, to exchange
information in a machine-parseable
form
• As a method of delivery direct to the
user (not recommended)

04:24 AM 04:24 AM
Why is XML Important?
• It is a standard, easily extensible way to
encode loosely-structured as well as highly-
structured information
• Due to its easy parseability, software can
transform it in countless ways, thereby
allowing:
– Easy migration paths
– Alternative displays
– On-the-fly response to user needs

04:24 AM 04:24 AM
XML vs. Databases
(a simplistic formula)

• If your information is…


– Tightly structured
– Fixed field length
– Massive numbers of individual items
• You need a database
• If your information is…
– Loosely structured
– Variable field length
– Massive record size
• You need XML
04:24 AM 04:24 AM
Serving XML to the Web
• Directly in native form
• Transformed to static HTML
• Transformed to HTML dynamically

04:24 AM 04:24 AM
Transforming XML: XSLT
• XML Stylesheet Language — Transformations
(XSLT)
• A markup language and programming syntax
for processing XML
• Is most often used to:
– Transform XML to HTML for delivery to standard
web clients
– Transform XML from one set of XML tags to
another
– Transform XML into another syntax/system

04:24 AM 04:24 AM
Required Components for
Serving XML to the Web
• An XML-encoded “document”
• An XSLT stylesheet to…
• …transform it to HTML or XHTML:
– Static
– Dynamic
• A CSS stylesheet (optional)

04:24 AM 04:24 AM
XML Web Publishing Software
• Required to:
– Apply dynamic transformations to XML
content
– Render HTML dynamically for standard
web browsers
• Just beginning to be available:
– Cocoon: http://xml.apache.org/cocoon/
– AxKit: http://axkit.org/

04:25 AM 04:25 AM
Case Study: Publishing Books @ the
California Digital Library
• Goals:
– To create highly usable online versions of
books
– To create versions that will migrate easily
as technology changes
– To create an infrastructure that will
support dynamic presentations of the
same content

04:25 AM 04:25 AM
Case Study: Publishing Books @ the
California Digital Library
• Strategy:
 Markup the texts in XML
 Serve them dynamically using XML web
publishing software (currently Cocoon)
 Create different displays for different purposes,
and a mechanism for allowing the user to select
their preferred view
 Find and apply an XML-aware search engine
– Create a method by which users can create their
own Adobe Acrobat versions

04:25 AM 04:25 AM
AxKit

mod_perl

Web Server
Cocoon

Tomcat

Web Server
Cocoon

Tomcat

Web Server

I want this XML doc…
XSLT
Stylesheet

XML
Doc
Cocoon

Tomcat

Web Server
XSLT
Stylesheet

XML
Doc XHTML
Cocoon Document
(no display
markup)*
Tomcat

HTML
Web Server Stylesheet
(CSS)

* Dynamic document
Transformation

Information XSLT
Stylesheet Presentation

XML
Doc XHTML
Cocoon Document
(no display
markup)*
Tomcat

HTML
Web Server Stylesheet
(CSS)

* Dynamic document
Case Study: ILL ASAP
ILL
ASAP Local
Catalog
OCLC

Downloaded Internet
Requests XML File Explorer

XSL Printable
Stylesheet XHTML
File

04:25 AM 04:25 AM
04:25 AM 04:25 AM
04:25 AM 04:25 AM
Service Tasmania Architecture

04:25 AM 04:25 AM
Case Study: Univ. of Michigan

04:25 AM 04:25 AM
04:25 AM 04:25 AM
Tips and Advice
• Begin transitioning to XML now:
– XHTML and CSS for web files, XML for static
documents with long-term worth
• Do not rely on browser support of XML
• DTDs? We don’t need no stinkin’ DTDs!
• Get on the XML4Lib discussion list:
http://sunsite.berkeley.edu/XML4Lib/
• Buy my book!

04:25 AM 04:25 AM
Resources
• Web sites
• Electronic discussions
• Books
• Magazines and journals
• Individuals

04:25 AM 04:25 AM

Вам также может понравиться