Вы находитесь на странице: 1из 628

Contents

Articles
XQuery Advanced Search All Leaf Paths All Paths Alphabet Poster Auto-generation of Index Config Files Background Basic Authentication Basic Feedback Form Basic Search Basic Session Management BBC Weather Forecast Benefits Caching and indexes Chaining Web Forms Changing Permissions on Collections and Resources Compare two XML files Compare with XQuery Creating a Timeline Creating XQuery Functions Dates and Time DBpedia with SPARQL - Stadium locations Delivery Status Report Digest Authentication Digital Signatures DocBook to HTML DOJO data Dynamic Module Loading Examples Wanted eXist demo server Extracting data from XHTML files Filling Portlets Flickr GoogleEarth FLWOR Expression 1 14 19 21 23 27 28 30 31 33 35 37 39 41 44 48 50 51 57 60 63 64 67 69 72 74 74 76 79 79 80 81 82 83

Formatting Numbers Generating PDF from XSL-FO files Generating Skeleton Typeswitch Transformation Modules Generating xqDoc-based XQuery Documentation Get zipped XML file Google Chart Bullet Bar Google Chart Sparkline Google Charts Graphing Triples Grouping Items Guest Registry Higher Order Functions Histogram of File Sizes Image Library Incremental Searching Index of Application Areas Index of eXist modules and features Index of XQuery features Inserting and Updating Attributes Installing and Testing Installing the XSL-FO module Introduction to XML Search Keyword Search Latent Semantic Indexing Limiting Child Trees Link gathering List OWL Classes Login and Logout Lorum Ipsum text Lucene Search Multiple page scraping and Voting behaviour MusicXML to Arduino Naming Conventions Navigating Collections OAuth Open Search Overview of eXist search functions and operators Overview of Page Scraping Techniques

84 86 92 100 103 110 112 114 117 119 122 123 125 128 129 132 135 135 136 137 138 141 142 154 157 158 165 167 168 171 178 180 183 184 187 189 190 191

Pachube feed Publishing Overview Publishing to Subversion Quantified Expressions Registered Functions Registered Modules Regular Expressions REST interface definition Returning the Longest String Saving and Updating Data Searching multiple collections Sending E-mail Sequences Sequences Module Setting HTTP Headers Simile Exhibit Sitemap for Content Management System Slideshow SMS tracker Southampton Pubs SPARQLing Country Calling Codes Special Characters Splitting Files Subversion Sudoku Synchronizing Remote Collections TEI Concordance TEI Document Timeline The Emp-Dept case study Time Based Queries Time Comparison with XQuery Timelines of Resource Timing Fibonacci algorithms Transformation idioms Typeswitch Transformations UK shipping forecast Unzipping an Office Open XML docx file Updates and Namespaces

192 204 207 208 209 211 212 215 226 227 229 232 234 245 246 250 252 257 262 267 268 272 273 276 282 287 290 295 302 303 305 309 311 317 327 331 347 349

Uploading Files Uptime monitor URL Driven Authorization URL Rewriting Basics Using Intermediate Documents Using Triggers to assign identifiers Using Triggers to Log Events Using XQuery Functions UWE StudentsOnline Validating a document Validation using a Catalog Web XML Viewer Wikibook list of code links Wikipedia Events RSS Wikipedia Page scraping World Temperature records XHTML + Voice XML Differences XML Schema to Instance XML Schema to SVG XML Schema to XForms XMP data XQuery SQL Module Adder Ah-has Checking for Required Parameters Dataflow diagrams DBpedia with SPARQL - Football teams DBpedia with SPARQL and Simile Timeline - Album Chronology Displaying data in HTML Tables Displaying Lists Employee Search Example Sequencer Excel and XML eXist Crib sheet Filtering Nodes Filtering Words Fizzbuzz

351 352 365 366 370 375 377 379 381 381 383 384 388 389 389 391 406 409 411 420 421 421 423 424 428 429 432 434 443 448 451 452 454 455 457 460 467 470

Getting POST Data Getting URL Parameters Google Geocoding Gotchas Graph Visualization HelloWorld HTML Table View Incremental Search of the Chemical Elements Limiting Result Sets Manipulating URIs Nationalgrid and Google Maps Net Working Days Page scraping and Yahoo Weather Parsing Query Strings Project Euler Searching,Paging and Sorting Sequence Diagrams Simple RSS reader Simple XForms Examples SPARQL interface SPARQL Tutorial String Analysis Tag Cloud Topological Sort Tree View Validating a hierarchy Wiki weapons page Wikibook index page Wikipedia Lookup XML to RDF XML to SQL XPath examples XQuery and Python XQuery and XML Schema XQuery and XSLT XQuery from SQL XQuery IDE XSL-FO Images

471 474 476 480 483 485 487 489 493 496 498 502 505 509 512 515 520 527 529 539 542 552 558 561 563 565 566 567 568 569 575 576 580 581 589 594 609 610

XSL-FO SVG XSL-FO Tables

611 612

References
Article Sources and Contributors Image Sources, Licenses and Contributors 617 621

Article Licenses
License 622

XQuery

XQuery
XQuery Examples Collection
Welcome to the XQuery Examples Collection Wikibook! XQuery is a World Wide Web Consortium recommendation for selecting data from documents and databases.

Current Status
A new release of eXist (1.4) is currently installed and under test. Please note any problems with these examples in the discussion. Recent Changes

About this Project


This is a collaborative project and we encourage everyone who is using XQuery to contribute their XQuery examples. All example programs must conform to the creative-commons-2.5 share-alike with attribution license agreement [1]. Execution of examples use an eXist demo server. 1. Instructors: please sign our Guest Registry if you are using this book for learning or teaching XQuery 2. Contributors: please see our Naming Conventions to ensure your examples are consistent with the textbook 3. Learners: If you are looking for an example of a specific XQuery language construct, technique or problem but can't find an example, please add a suggestion to the Examples Wanted section.

Introduction
1. 2. 3. 4. Background - A brief history and motivation for the XQuery standard. Benefits - Why use XQuery? Installing and Testing - How to install an XQuery server on your . Naming Conventions - Naming standards used throughout this book.

Example Scripts
Beginning Examples
Examples that do not assume knowledge of functions and modules. 1. 2. 3. 4. 5. 6. 7. 8. HelloWorld - A simple test to see if XQuery is installed correctly. FLWOR Expression - A basic example of how XQuery FLWOR statements work. Sequences - Working with sequences is central to XQuery. XPath examples - Sample XPath samples for people new to XML and XPath Regular Expressions - Regular expressions make it easy to parse text. Searching multiple collections - How to search multiple collections in a database. Getting URL Parameters - How to get parameters from the URL. Getting POST Data - How to get XML data posted to an XQuery.

9. Checking for Required Parameters - How to check for a required parameter using if/then/else. 10. Displaying Lists - How to take a list of values in an XML structure and return a comma separated list. 11. Extracting data from XHTML files - How to use the doc() function to get data from XHTML pages.

XQuery 12. Displaying data in HTML Tables - How to display XML data in an HTML table. 13. Limiting Result Sets - How to limit the number of records returned in an XQuery. 14. Filtering Words - How to test to see if a word is on a list. 15. Saving and Updating Data - How to have a single XQuery that saves new records or updates existing records. 16. Quantified Expressions - Testing all the items in a sequence. 17. Dates and Time - Sample expressions that work with date and time values 18. Chaining Web Forms - Passing data from one web page to another using URL parameters, sessions or cookies {{stage short|25%|Dec 17th, 2010}

Intermediate Examples
Assumes knowledge of functions and modules. 1. 2. 3. 4. 5. 6. Using XQuery Functions - How to read XQuery function documents and user XQuery functions Creating XQuery Functions - How to create your own local XQuery functions Returning the Longest String - A function to find the longest string from a list of strings Net Working Days - How to calculate the number of working days between two dates Tag Cloud - Counting and viewing the number of keywords String Analysis/ - Regular expression string analysis

7. Manipulating URIs - How to get and manage URIs 8. Parsing Query Strings - Parsing query strings using alternate delimiters. 9. Splitting Files - Splitting a large XML files into many smaller files. 10. Filling Portlets - How to fill regions of a web page with XQuery 11. Filtering Nodes - How to use the identity transform to filter out nodes 12. Limiting Child Trees - You have a tree of information and you want to "prune" only at a specific level 13. Higher Order Functions - Passing functions as arguments to functions 14. Timing Fibonacci algorithms - A couple of Fibonacci algorithms and timing display 15. Using Intermediate Documents - Analysis of a MusicXML file 16. Formatting Numbers - using picture formats to format numbers 17. Uploading Files - how to upload files using HTML forms 18. TEI Concordance - How to build a TEI-based concordance

Search
1. Introduction to XML Search - An overview of XML search terminology 2. Basic Search - A simple search page 3. Searching,Paging and Sorting - Searching and Viewing search results 4. Keyword Search - full text search with Google-style results 5. Employee Search - an Ajax example 6. Incremental Search of the Chemical Elements - with Ajax 7. Lucene Search - using eXist's Lucene-based fulltext search 8. Incremental Searching - working with a JavaScript client to perform incremental search 9. Advanced Search - creating complex searches using multiple search fields 10. Open Search - creating an OpenSearch file to describe your search page 11. Auto-generation of Index Config Files - scripts to automatically generate the index configuration file

XQuery

Interaction
1. 2. 3. 4. Adder - Creating a web service that adds two numbers. Simple XForms Examples Navigating Collections - an example of an AJAX browser Sending E-mail - How to send an e-mail message from within an XQuery

Creating Custom Views


These examples use reflection on the structure of an XML document using name() to implement generic functions for XML transformations. 1. HTML Table View - A generic HTML table representation 2. Tree View - A generic HTML tree representation 3. Grouping Items - how to group items in a report

Transforming Complex XML Documents


XQuery has many features that allow you to transform XML and create full document-style transformation libraries. Unlike traditional "database" documents, complex XML documents have "complex content" that includes in-line elements in unpredictable order. This section provides a foundation for these transformations based on using the XQuery typeswitch functions. Typeswitch function transformations replace XSLT transforms but can also access indexes for very fast transforms of large collections. 1. Typeswitch Transformations Using the typeswitch function for document-style transforms. 2. Transformation idioms Handling transformation tasks 3. Generating Skeleton Typeswitch Transformation Modules Using XQuery to generate a skeleton module for typeswitch-based document transformation 4. Web XML Viewer Using the typeswitch function to transform an XML document to HTML

Paginated Reports
Unlike HTML pages, paginated reports use the concept of text flows between pages. These examples show you how to convert raw XML into high-quality PDF files suitable for printing. The examples use a markup standard called XSL-FO for "Formatted Objects" 1. Installing_the_XSL-FO_module - update your 1.4 configuration to get the current software from the Apache web site 2. Generating PDF from XSL-FO files - Converting XML-FO to PDF files 3. XSL-FO Tables - Generating XSL-FO tables from XML files 4. XSL-FO Images - Embedding images in generated (PDF) files

XQuery

Content Publishing
1. Publishing Overview - How to transfer a document from an internal intranet server to a public web site 2. Publishing to Subversion - How to transfer a document from an internal intranet to a public SVN server using SSL and digest authentication

XML Document Comparison and Merging


1. 2. 3. 4. 5. 6. Compare two XML files - using the eXist compare() function to test to see if two XML files are exactly the same XML Differences - displaying the difference between two XML files Compare with XQuery - Using XQuery to Compare Lists Time Comparison with XQuery - Using XQuery to Compare Dated Items Synchronizing Remote Collections - Using lastModified time stamps to see what items have changed Finding Duplicate Documents - Using a hash function to find duplicate documents

Time Based Queries


1. Time Based Queries - using dates and times to limit search results 2. Timing a Query - profiling how long a query takes

XML document kinds


TEI documents
Text Encoding Initiative. 1. TEI Concordance - How to build a TEI-based concordance 2. TEI Document Timeline - Using Simile Timeline to visualize a TEI document

DocBook Documents
1. DocBook to HTML 2. DocBook to PDF 3. DocBook to ePub

OpenOffice
1. OpenOffice to HTML

Office Open XML


1. Office Open XML

XQuery

XML Schemas
1. XML Schema to Instance 2. XML Schema to XForms 3. XML Schema to SVG

Processing Special Characters


1. Special Characters - dealing with newlines and other special characters.

XQuery and other languages


MusicXML
1. Using Intermediate Documents Analysing MusicXML documents 2. MusicXML to Arduino

Language Comparisons
Python
1. XQuery and Python

SQL
1. XQuery SQL Module - Calling SQL from within your XQuery 2. XQuery from SQL - Using XQuery to access a classic Relational database - Employee/Department/Salary

RDF/OWL
1. List OWL Classes - A simple XQuery script that will display all the OWL classes in an OWL file

Language combination
Excel
1. Excel and XML

JavaScript
1. 2. 3. 4. Navigating Collections - basic AJAX Employee Search - basic AJAX Incremental Search of the Chemical Elements - AJAX DOJO data - basic JSON

SQL
1. XML to SQL

XQuery

XHTML + Voice
1. Simple RSS reader 2. XHTML + Voice Twitter Radio for Opera

XSLT
1. XQuery and XSLT Executing an XSLT transform from within XQuery

Data Mashups
Authentication
1. Basic Authentication - Logging in to a remote web server using HTTP Basic Authentication 2. Digest Authentication - Logging in to a remote web server using HTTP Digest Authentication 3. OAuth - A standard for protecting a set of user-owed data within a web service

Wikipedia interaction
1. 2. 3. 4. Wikipedia Page scraping Wikipedia Lookup Wikipedia Events RSS Wiki weapons page

Wikibook applications
1. Wikibook index page 2. Wikibook list of code links

Visualization
1. 2. 3. 4. Graph Visualization Dataflow diagrams Sequence Diagrams Example Sequencer - Step-by-step tutorial

Google Charts
Although the Google Charts functions only work when you are connected to the Internet, these examples show that XQuery is an ideal tool for converting XML data into charts. 1. 2. 3. 4. Google Charts Using XML and XQuery to generate Google Charts using REST Google Chart Sparkline - A demonstration of how to create a chart using the Google Charts API Google Chart Bullet Bar - A demonstration of how to a dashboard bullet bar using the Google Charts API Histogram of File Sizes - An XQuery report that generates a histogram of file sizes

There are also sample XForms that can be used to create front-ends in the XForms Tutorial and Cookbook [2]

XQuery

Digital Dashboards
Digital dashboards are single screens that compress a great deal of information into a single web page. This section will leverage many of the Google Charts examples from the prior section. 1. Dashboard Architecture - How to design dashboards that have fast response times

Page Scraping
Page scraping is the process of extracting well-formed XML data from any HTML web page. When creating mashup applications this is also known as the harvesting process. 1. 2. 3. 4. 5. 6. 7. 8. Overview of Page Scraping Techniques Page scraping and Yahoo Weather UK shipping forecast BBC Weather Forecast Page scraping and Mashup Simple RSS reader Multiple page scraping and Voting behaviour Link gathering

9. REST interface definition 10. Caching and indexes

Mapping
1. 2. 3. 4. 5. Google Geocoding String Analysis#Location_Mapping Mapping Car Registrations Flickr GoogleEarth Nationalgrid and Google Maps SMS tracker

Timelines
1. Creating a Timeline - Creating a simple timeline view of events 2. Timelines of Resource - Using creation and modification dates to create timelines 3. TEI Document Timeline - Creating a timeline of all dates within a single TEI document

The Semantic Web


1. DBpedia with SPARQL - Football teams 2. DBpedia with SPARQL and Simile Timeline - Album Chronology Creating a timeline of album releases using data from Wikipedia 3. DBpedia with SPARQL - Stadium locations 4. The Emp-Dept case study 1. XML to RDF 2. SPARQL Tutorial 3. SPARQL interface 5. Graphing Triples 6. SPARQLing Country Calling Codes 7. Southampton Pubs 8. Alphabet Poster

XQuery 9. Simile Exhibit Browser visualizations using the Simile JavaScript libraries 10. Latent Semantic Indexing Finding the semantic distance between documents

Development Tools
1. Sitemap for Content Management System XQuery functions can easily perform many common web site content management functions 2. Uptime monitor/ use XQuery to monitor a remote web service 3. XQuery IDE - XQuery Integrated development environment 4. Image Library - using an XQuery to preview your images 5. XML Schema to Instance - XQuery function to generate a sample XML instance from an XML Schema file (.xsd) 6. Lorum Ipsum text - generating sample text for inserting into test page layouts 7. XQuery and XML Schema - Generating an XML instance document 8. Generating XQDocs - Automating the generation of XQuery documentation for Modules and Functions 9. XqUSEme [3] - Firefox extension to allow XQueries including against the loaded document (even against originally non-XML (poorly formed) HTML).

Validation
1. Validating a document - Validate a document with an XML Schema 2. Validation using a Catalog - Using a Catalog file to validate documents 3. Validating a hierarchy -

Path Analysis
1. All Paths - A report of all paths in a document or collection 2. All Leaf Paths - A report of all leaf paths in a document or collection

Security
1. 2. 3. 4. Login and Logout - How to log users in and log them out URL Driven Authorization How to use URL rewriting to check for valid users Digital Signatures - How to use a custom module to use the XML Digital Signature standards Changing Permissions on Collections and Resources - how to change permissions on collections and resources

Case Studies
1. 2. 3. 4. 5. 6. 7. 8. Fizzbuzz Project Euler Topological Sort Slideshow Sudoku Pachube feed World Temperature records - conversion of text data formats to XML, indexing and data presentation UWE StudentsOnline

XQuery

eXist db specific Functions and Configuration


Configuration
Installing modules
1. Installing the XSL-FO module

Setting HTTP Headers


1. Setting HTTP Headers

Modules
compression
Function Reference [4] 1. Get zipped XML file 2. Unzipping an Office Open XML docx file - Uncompressing and storing a docx file

ftp client
This module allows you to interact with a remote FTP server on a remote system. It includes functions for listing, getting and putting files. 1. FTP Client

httpclient
Function Reference [5] 1. Digest Authentication 2. UK shipping forecast

lucene
Function Reference [6] Help [7] 1. Lucene_Search

mail
Function Reference [8] 1. Sending E-mail 2. Basic Feedback Form

XQuery

10

math
1. Using the Math Module

request
Function Reference [9] Function examples [10] 1. 2. 3. 4. 5. 6. Getting URL Parameters/ Getting POST Data/ Checking for Required Parameters Manipulating URIs Parsing Query Strings Adder simple client-server interaction

scheduler
Function Reference [11] Help [12] 1. XQuery Batch Jobs

sequences
Function Reference [13] 1. Sequences Module - three additional functions (map, fold and filter)

session
Function Reference [14] 1. Basic Session Management - the basics of session management including getting a setting session variables

subversion
Function Reference [15] 1. Subversion - how to update a subversion repository from within XQuery using the subversion client

transform
Function Reference [16] 1. String_Analysis

util
Function Reference [17] 1. 2. 3. 4. 5. 6. 7. Registered Modules : util:registered-modules() Registered Functions : util:registered-functions() Dynamic Module Loading : util:import-module(), util:eval() Higher Order Functions : util:function(), util:call() Timing Fibonacci algorithms : util:function(), util:call(), util:system-time() XMP data : util:binary-doc(), util:binary-to-string(), util:parse() Basic Authentication : util:string-to-binary(), httpclient:get()

XQuery

11

validation
Function Reference [18] Help [19] 1. Validating a document

xmldb
Function Reference [20] 1. Saving_and_Updating_Data 2. Splitting_Files

xqdoc
Function Reference [21] 1. Generating xqDoc-based XQuery Documentation

xslfo
XSL-FO (Formatted Objects) is a way of converting XML into PDF. Function Reference [22] 1. 2. 3. 4. 5. Installing the XSL-FO module - setting up your XSL-Module within eXist Generating PDF from XSL-FO files - generating PDF from a FO file XSL-FO Tables - adding tables to your PDF XSL-FO Images - adding images to your PDF XSL-FO SVG - adding SVG images to your PDF

Triggers
1. Using Triggers to Log Events - how to set up a trigger to log store, update and remove events on a collection 2. Using Triggers to assign identifiers - how to use triggers to assign identifiers to new documents or new nodes 3. Sending E-mail Email is one way to notify when a trigger has fired

XQuery Updates
1. Inserting and Updating Attributes 2. Updates and Namespaces - How updates can change serialization

XQuery

12

URL Rewriting
1. URL Rewriting Basics How to make your URLs look nice

Apache Ant Tasks


1. Reindex a Collection

General guidance
eXist Crib sheet

Appendixes
Systems that Support XQuery
Using native and hybrid XML databases that implement XQuery 1. BaseX - Native open source XML Database with visual frontend 2. DataDirect XQuery - Java XQuery engine supporting relational, EDI, flat files and XML input/output 3. eXist - Open source native XML database 4. DB2 pureXML - DB2 9.1 includes the pureXML feature 5. MarkLogic Server - MarkLogic Server commercial XML Content Server 6. Microsoft SQL Server 2005 7. NetKernel - NetKernel 8. Oracle Berkeley DB XML - Open source embedded storage management 9. Oracle XML DB - Oracle Server 11g includes the XML DB (XDB) feature 10. Sedna - Open source native XML Database 11. Stylus Studio - XQuery mapping/editing/debugging, ships with Saxon (and SA) and DataDirect XQuery 12. EMC xDB - EMC Documentum xDB commercial native XML database 13. XQilla - Open source XQuery library and command line utility 14. Zorba - Open source XQuery engine C++ implementation with C, Java, Php, Python, Ruby library bindings and command line utility 15. Qizx - Open source and pro XQuery engine Java implementation

XQuery

13

Debugging XQuery
1. Gotchas - some pitfalls 2. Ah-has/ - some ah-ha moments

Other sources
Function Libraries
1. FunctX XQuery Function Library [23] by Priscilla Walmsley

Discussion Groups
1. XQuery General [24]

Indexes
Page index [25] - generated Index of Application Areas - edited Key to symbols: indicates an XQuery/Best practice

References
[1] http:/ / creativecommons. org/ licenses/ by-sa/ 2. 5/ [2] http:/ / en. wikibooks. org/ wiki/ XForms/ Google_Charts [3] https:/ / addons. mozilla. org/ en-US/ firefox/ addon/ 5515 [4] http:/ / demo. exist-db. org/ exist/ functions/ compression [5] http:/ / demo. exist-db. org/ exist/ functions/ httpclient [6] http:/ / demo. exist-db. org/ exist/ functions/ lucene [7] http:/ / exist-db. org/ lucene. html [8] http:/ / demo. exist-db. org/ exist/ functions/ mail [9] http:/ / demo. exist-db. org/ exist/ functions/ request [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ request/ requestProperties. xq?a=6& b=7#xxxxx [11] http:/ / demo. exist-db. org/ exist/ functions/ scheduler [12] http:/ / www. exist-db. org/ scheduler. html [13] http:/ / demo. exist-db. org/ exist/ functions/ sequences [14] http:/ / demo. exist-db. org/ exist/ functions/ session [15] http:/ / demo. exist-db. org/ exist/ functions/ svn [16] http:/ / demo. exist-db. org/ exist/ functions/ transform [17] http:/ / demo. exist-db. org/ exist/ functions/ util [18] http:/ / demo. exist-db. org/ exist/ functions/ validation [19] http:/ / www. exist-db. org/ validation. html [20] http:/ / demo. exist-db. org/ exist/ functions/ xmldb [21] http:/ / demo. exist-db. org/ exist/ functions/ xqdoc [22] http:/ / demo. exist-db. org/ exist/ functions/ xslfo [23] http:/ / www. xqueryfunctions. com/ xq/ [24] http:/ / news. gmane. org/ gmane. text. xml. xquery. general [25] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ wikiindex. xq?book=XQuery

Advanced Search

14

Advanced Search
Motivation
You have multiple fields that you would like to search on. You want to allow users to optionally search on specific fields and perform a boolean "AND" when multiple fields are used. For example you may have a database of people. Each person has a first name, last name, e-mail and phone. You want to allow users to search on any single field or multiple fields together. If two fields are entered only records that match both fields will be returned.

Method
We will use a standard HTML form with multiple input and selection fields. We will check each incoming search request for each parameter and if the parameter is not null we will concatenate a single query with many predicates and then evaluate it using the util:eval() function.

Example XML Data Set


In the following example we will use an XML file that contains list of people. The format will be the following: <people> <person> <id>123</id> <firstname>John</firstname> <lastname>Smith</lastname> <phone>(123) 456-7890</phone> <email>john.smith@example.com</email> <type>faculty</type> </person> <person> <id>456</id> <firstname>Sue</firstname> <lastname>Jones</lastname> <phone>(123) 654-0123</phone> <email>sue.jones@example.com</email> <type>staff</type> </person> </people>

Background on Predicates
If you have a single "where clause" (called a predicate) you can always this predicate to the end of an XPath expression. For example the following FLWOR expression will return all person records in the system: for $person in collection('/db/apps/directory')//person return $person You can now restrict this to only include faculty by adding a predicate:

Advanced Search for $person in collection('/db/apps/directory')//person[type='faculty'] return $person You can now search for all faculty with a first name of "mark" buy just adding an additional predicate:
for $person in collection('/db/apps/directory')//person[type='faculty'][firstName='mark'] return $person

15

Sample Search Form

Sample HTML code for advanced search form


The following is an HTML form section for this form. <form method="get" action="advanced-search.xq"> <label>First Name: </label> <input type="text" name="firstname"/> <br/> <label>Last Name: </label> <input type="text" name="lastname"/> <br/> <label>E-Mail: </label> <input type="text" name="email" size="40"/> <br/> <label>Phone: </label> <input type="text" name="phone"/> <br/> <label>Primary Type: </label> <select name="type"> <option value="">- Select -</option> <option value="staff">Staff</option> <option value="faculty">Faculty</option>

Advanced Search <option value="student">Students</option> </select> <br/> <input type="submit" name="Submit"/> </form> When the user adds a name of "John" to the first name field and selects a type of "staff" and then the submit query button is pressed the following is an example of the URL created by this form:
advanced-search.xq?firstname=John&lastname=&email=&phone=&type=staff&Submit=Submit+Query

16

Note that most of the fields are null. Only firstname and type have a value to the right of the equal sign.

Sample Search Service


The search service will have the following code sections.

Getting the URL parameters


The following code fragment will get the URL parameters from the incoming URL request and assign them to XQuery variables. let let let let let $firstname := lower-case(request:get-parameter('firstname', '')) $lastname := lower-case(request:get-parameter('lastname', '')) $email := lower-case(request:get-parameter('email', '')) $phone := lower-case(request:get-parameter('phone', '')) $type := lower-case(request:get-parameter('type', ''))

Note that each of the incoming parameters is first converted to lowercase before any comparisons are done.

Building the Predicate Strings


We are now ready to start building our predicates. Since many of the fields will be empty we will only construct a predicate if the variable exists. let $firstname-predicate := if ($firstname) then concat('[lower-case(firstname/text())', " = '", $firstname, "']") else () let $lastname-predicate := if ($lastname) then concat('[lower-case(lastname/text())', " = '", $lastname, "']") else () let $email-predicate := if ($email) then concat('[lower-case(email/text())', " = '", $email, "']") else () let $phone-predicate := if ($phone) then concat("[contains(phone/text(), '", $phone, "')]") else () let $type-predicate := if ($type) then concat('[type/text()', " = '", $type, "']") else ()

For the firstname, lastname, and email we are comparing the incoming parameter with the lowercase string in the XML file. With the phone number we are using the contains() function to return all records that have a string somewhere in the phone number. The type is using an exact match since both the case of the data and keyword are

Advanced Search known precisely. The most challenging aspect of this program is learning how to get the order of the quotes correct. In general I use single quotes for enclosing static strings unless that string itself must contain a single quote. Then we use double quotes. The most difficult part is to assemble a string such as [type/text() = 'staff'] and to remember to put the single quotes around the word staff. If you can figure this out the rest will be easy. If you are having trouble you can also break the concat into multiple lines: concat( '[type/text()', " = '", $type, "']" ) Where each line clearly must start and end with the same type of quote.

17

Concatenating the Query


To create an eval string we just need to create a single long string starting with the collection and add each of the predicates. If there was no argument the predicate strings will be null. let $eval-string := concat ("collection('/db/apps/directory/data')//person", $firstname-predicate, $lastname-predicate, $email-predicate, $phone-predicate, $type-predicate )

The query with just a lastname and type would then look like this: collection('/db/apps/directory/data')//person[lower-case(firstname/text()) = 'John'][lower-case(type/text()) = 'faculty'] Note that some advanced system will modify the order of the predicates based on the most likely to narrow the search. Since there are fewer records with the first name John than there are faculty it is always more efficient to put the first name before the type. This means that fewer nodes need to be moved from hard disk into RAM and the query will execute much faster.

Executing the Query


The execution of the query is done by passing the eval string to the util:eval function. let $persons := util:eval($eval-string)

Displaying the Results


We are now ready to display all of the results. We do this by creating a FLWOR statement for each person and returning a element for each hit. The elements each have single link with the lastname, firstname and type as the link content. When the user clicks on each link an item viewer is used and the ID of the person is passed to the item viewer.

Advanced Search for $person in $persons let $id := $person/id/text() let $lastname := $person/lastname/text() let $firstname := $person/firstname/text() order by $lastname, $firstname return <div class="hit"> <a href="../views/view-item.xq?id={$id}"> {$lastname}, {$person//firstname/text()} {' '} {$person/type/text()} </a> </div>

18

NGram Searching
in your conf.xml module make sure the following line is uncommented:
<module uri="http://exist-db.org/xquery/ngram" class="org.exist.xquery.modules.ngram.NGramModule" />

Here is the page on NGram elements in your collection.xconf file: NGram Configuration File [1] After you edit then reindex. You can now use any of the following functions: NGram Functions [2]

Acknowledgments
This example has been provided by Eric Palmer and his staff at the University of Richmond, USA.

References
[1] http:/ / exist-db. org/ ngram. html [2] http:/ / demo. exist-db. org/ exist/ functions/ ngram

All Leaf Paths

19

All Leaf Paths


Motivation
You want to generate a list of all leaf paths in a document or document collection. This process is very useful to get to know a new data set. Specifically you will find that the leaf elements in an XML file carry much of the data in a data-style markup. These leaf elements frequently are used to carry the most semantics or meaning within the document. They for the basis for a semantic inventory of the document. That is each leaf element should be able to be associated with a data definition. Leaf elements are also good targets for indexing within your index configuration file.

Method
We will use the functx leaf-elements() function functx:leaf-elements($nodes*) xs:string* This function takes as input, one or more nodes and returns an array of strings.

Example Output
For the demo play Hamlet that is included in the eXist demo set the file /db/shakespeare/plays/hamlet.xml will generate the following output: PLAY TITLE FM P PERSONAE PERSONA PGROUP GRPDESCR SCNDESCR PLAYSUBT ACT SCENE STAGEDIR SPEECH SPEAKER LINE

Source Code to leaf-elements


declare namespace functx = "http://www.functx.com"; declare function functx:leaf-elements ($root as node()?) as element()* { $root/descendant-or-self::*[not(*)] };

All Leaf Paths This query uses the descendant-or-self::* function with the predicate [not(*)] to qualify only elements that do not have child nodes.

20

Example XQuery
xquery version "1.0"; declare namespace functx = "http://www.functx.com"; declare function functx:distinct-element-names($nodes as node()*) as xs:string* { distinct-values($nodes/descendant-or-self::*/local-name(.)) }; let $doc := doc('/db/shakespeare/plays/hamlet.xml') let $distinct-element-names := functx:distinct-element-names($doc) let $distinct-element-names-count := count($distinct-element-names) return <ol>{ for $distinct-element-name in $distinct-element-names order by $distinct-element-name return <li>{$distinct-element-name}</li> }</ol>

Adding Attributes
You can also run a query that will get all the distinct attributes. Attributes are all considered leaf data types since they can never have child elements. declare function functx:distinct-attribute-names($nodes as node()*) xs:string* { distinct-values($nodes//@*/name(.)) }; as

This query says in effect to "get all the all the distinct attribute names in the input nodes". For the MODS demo file: doc('/db/mods/01c73f2b05650de2e6124d9d113f40be.xml') You will get the following attributes: 1. type 2. encoding 3. authority </source>

All Leaf Paths

21

References
Documentation [1] on xqueryfunctions.com web site.

References
[1] http:/ / www. xqueryfunctions. com/ xq/ functx_leaf-elements. html

All Paths
Motivation
You want to generate a list of all unique path expressions to a document. This process is very useful to quickly get familiar with a new data set. It is also important to make sure that your document-style transforms are accessing all the elements. This process can also be used as a basis for generating index files for a new data set.

Example Output
Paths the list of unique paths for a sample file from the Shakespeare Demos on the eXist demo system at /db/shakespeare/plays/hamlet.xml would generate the following results. PLAY PLAY/TITLE PLAY/FM PLAY/FM/P PLAY/PERSONAE PLAY/PERSONAE/TITLE PLAY/PERSONAE/PERSONA PLAY/PERSONAE/PGROUP PLAY/PERSONAE/PGROUP/PERSONA PLAY/PERSONAE/PGROUP/GRPDESCR PLAY/SCNDESCR PLAY/PLAYSUBT PLAY/ACT PLAY/ACT/TITLE PLAY/ACT/SCENE PLAY/ACT/SCENE/TITLE PLAY/ACT/SCENE/STAGEDIR PLAY/ACT/SCENE/SPEECH PLAY/ACT/SCENE/SPEECH/SPEAKER PLAY/ACT/SCENE/SPEECH/LINE PLAY/ACT/SCENE/SPEECH/STAGEDIR PLAY/ACT/SCENE/SPEECH/LINE/STAGEDIR

Note that these path expressions are sorted in document order, that is the order that the path first appeared in a document. So you can see that the cast list in the PERSONAE appear before the ACT/SCENE elements. The output can also be sorted in alphabetical order.

All Paths

22

Method
We will use the functx libraries. In particular the function: functx:distinct-element-paths($nodes) takes as its input a node and returns a sequence of strings of the path expressions. See Documentation on xqueryfunctions.com [1]

distinct-element-paths function
xquery version "1.0"; declare namespace functx = "http://www.functx.com"; declare function functx:path-to-node($nodes as node()*) as xs:string* { $nodes/string-join(ancestor-or-self::*/name(.), '/') }; declare function functx:distinct-element-paths($nodes as node()*) as xs:string* { distinct-values(functx:path-to-node($nodes/descendant-or-self::*)) }; declare function functx:sort($seq as item()*) as item()* { for $item in $seq order by $item return $item }; let $in-xml := collection("NAMEOFCOLLECTION") return functx:sort(functx:distinct-element-paths($in-xml))

The heart of this query is the single expression: ancestor-or-self::*/name(.) Which says in effect "get me the element names of all the nodes in the document". The next step is to turn this list into a list of distinct element paths. This is done by the function functx:distinct-element-paths()

All Paths

23

Working with a single test document


use the document()

Working with a document collection


use collection() function

Acknowledgments
David Elwell posted this suggestion on the open-exist list on July 22 of 2010

References
[1] http:/ / www. xqueryfunctions. com/ xq/ functx_distinct-element-paths. html

Alphabet Poster
This toy programme creates alphabet posters using images from Wikipedia, located via dbpedia. It is described in a blog entry [1]

Script
(: This script creates a picture alphabet based on a list of words.

The pictures are from wikipedia, found via dbpedia.

This was created for alphabet.

Charlie Taylor (age 5 ) for his animal

@parameter @parameter

title - The title string for the poster alphabet - list of comma-separated word , unordered

@parameter cols - the number of columns in the table layout @parameter action : poster - generate the poster, editor generate

the editor for the data @author Chris Wallace @date 2008-10-22

:) declare namespace r = "http://www.w3.org/2005/sparql-results#";

declare variable $alphabet := request:get-parameter("alphabet","Ant,Bat"); declare variable $words := tokenize(normalize-space($alphabet)," *, *"); declare variable $title := request:get-parameter("title","Charlie's Animal Alphabet");

declare variable $cols := xs:integer(request:get-parameter("cols",4)); declare variable $action := request:get-parameter("action","edit");

Alphabet Poster

24

declare variable

$query := "

PREFIX : <http://dbpedia.org/resource/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT * WHERE { :Hedgehog } "; foaf:depiction ?img.

declare function local:execute-sparql($query as xs:string) let $sparql :=

concat("http://dbpedia.org/sparql?format=xml&amp;default-graph-uri=http://dbpedia.org&amp;query=", encode-for-uri($query) ) return }; doc($sparql)

declare function local:picture($animal as xs:string )

as xs:string {

let $queryx := replace($query,"Hedgehog",replace($animal," ","_")) let $result:= local:execute-sparql($queryx) return string($result//r:result[1]//r:uri) };

declare function local:cell ($animal as xs:string , $picture as xs:string) as element(td) { let $letter := substring($animal,1,1) return <td class="cell" valign="top"> <span class="letter">{$letter} </span> is for <div> <a href="http://en.wikipedia.org/wiki/{$animal}"> <img src="{$picture}" alt="{$animal}" title="{$animal}" border="0" /> </a> </div> <span class="word"> {$animal} </span> </td>

};

Alphabet Poster
declare function local:poster() as element(div) { <div> <h1>{$title} </h1>

25

{let $letters := for $animal in $words let $picture := local:picture($animal) order by $animal return local:cell($animal,$picture) let $nrows := xs:integer(ceiling(count($letters) div $cols)) return <table> {for $row in (1 to $nrows) return <tr> {for $col in (1 to $cols) let $letter := $letters[position()= ($row - 1 ) * $cols + $col] return if ($letter) then $letter else <td> </td> } </tr> } </table> } </div> };

declare function local:editor() as element(form) { <form action="alphabet.xq" method="get"> <input type="hidden" name="action" value="poster"/> <div> <label for="title">Title of Alphabet</label><input type="text" name="title" value="{$title}"size="50"/></div> <div> <label for="cols">Number of Columns</label><input type="text" name="cols" value="{$cols}"size="2"/></div> <div> <label for="alphabet">Alphabet words, unordered, separated by <br/> <textarea name="alphabet" cols="80" rows="5"> {$alphabet} </textarea> </div> <input type="submit" value="Create Alphabet Poster"/> </form> }; , </label>

declare option exist:serialize "method=xhtml media-type=text/html";

Alphabet Poster
<html> <head> <title>Alphabet Poster - {$action}</title> <style> <![CDATA[ body {font-family:Comic Sans MS;} div.cell {margin: 0 5px 10px 0; } span.letter {font-size:200%;}

26

span.word {display:none;} ]]> </style> <style media="print"> <![CDATA[ .nav {display:none} span.word {display:block; font-size:120%;font-family:Comic Sans MS; } ]]> </style> </head> <body> { if ($action = "poster") then (<span class="nav"><a href="alphabet.xq?alphabet={string-join($words,", ")}&amp;title={$title}&amp;cols={$cols}&amp;action=edit"> [edit]</a></span> , local:poster() ) else if ($action="edit")

then local:editor() else () } </body> </html>

References
[1] http:/ / thewallaceline. blogspot. com/ 2008/ 10/ grandson-charlie-age-nearly-6-rang. html

Auto-generation of Index Config Files

27

Auto-generation of Index Config Files


Motivation
You want to automatically generate a index configuration file based on instance or XML schema data. Creation of an index configuration file is difficult for new users. To help new users get started it is frequently benefitial to generate a sample collection.xconf file for these users based on simple analysis of sample instance data or XML Schemas that are provided by the users.

Index Types
There are several types of indexes you may want to create. Range indexes are very useful when you have identifiers or you want to sort results based on element content. Fulltext indexes are most frequently used of language text that contains full sentences with punctuation.

FullText Indexes
The following is some example code on how one might do this. Lucene fulltext indexes are most useful when they index fulltext sentences. One approach is to scan an instance document for full sentences looking for longer strings with punctuation. Although a full implementation would involve the inclusion of a "Natural Language Processor" library such as Apache UIMA, we can begin with some very simple rules. Here are some sample steps in the process for non-mixed-text content. Mixed text can also be done but the steps are more complex: 1. 2. 3. 4. get a list of all elements in a sample index file classify the elements according to if they have simple or complex content if they have simple content, look for sentences (spaces and puncuation) for each element that has fulltext create a lucene index

Sample Code for Namespace Generation


This creates an index on <foo> with every namespace that is used in the collection. let $defaultNamespaces := for $value in distinct-values( for $doc in collection($dataLocation) let $ns := namespace-uri($doc/*) return if ($ns) then $ns else () ) return element ns { $value } let $index1 := "<collection xmlns='http://exist-db.org/collection-config/1.0'><index" let $index2 :=

Auto-generation of Index Config Files for $ns in $defaultNamespaces return concat(' xmlns:ns',index-of($defaultNamespaces,$ns),$eq,$qt,$ns,$qt) let $index3 := "><fulltext default='none' attribute='no'/><lucene><analyzer class='org.apache.lucene.analysis.standard.StandardAnalyzer'/><analyzer id='ws' class='org.apache.lucene.analysis.WhitespaceAnalyzer'/><text qname='foo'/>" let $index4 := for $ns in $defaultNamespaces let $prefix := concat('ns',index-of($defaultNamespaces,$ns)) return concat('<text qname=',$qt,$prefix,':foo',$qt,'/>') let $index5 := "</lucene></index></collection>" let $index := util:parse(string-join(($index1,$index2,$index3,$index4,$index5),"")) let $status := xmldb:store($indexLocation,"collection.xconf",$index) let $result :=xmldb:reindex($dataLocation)

28

Background
XQuery and Functional Programming
XQuery is an example of a functional programming language. Like other functional languages, XQuery variables are immutable, meaning that you can set them once but never change them after that. XQuery functions do not have "side effects" meaning that they do not change data that is not specifically passed to them. Functional programming has recently gained popularity with the rise of the MapReduce algorithms recently popularized by Google. Google's ability to leverage tens of thousands of CPUs in its data center has shown that functional languages are in many ways superior to procedural languages. But many of the benefits of functional programming go back to mathematical formalisms of the 1930s, including the lambda calculus and the -recursive functions. Although the XQuery 1.0 W3C specification does not allow a function to be passed as an argument to a function, most implementations such as eXist support this, so technically the eXist implementation of XQuery is a true functional language but the W3C standard is not. However, XQuery 1.1 is to allow function items as data[1] while A history of functional programming is available at Functional Programming [2]. This article has an excellent historical background on functional programming and why functional programs are ideal for a server environment where reliability is critical.

Background

29

XQuery as a w3c Standard


In 1998 Jonathan Robie and Joe Lapp (then the principal architect of WebMethods) created a new query language designed specifically to query XML files called XQL In 1998, two query languages, XQL and XML-QL got a lot of interest within the W3C and a working group for XML-based querying languages was formed. In 1998 the World Wide Web consortium hosted a conference on query languages[3]. This conference gathered XML and query language experts from around the world and from many fields. 66 "Position Papers" were presented. The result was a very large knowledge base of use cases and proposals that began as a basis for a future standardized query language. The working group selected around 90 use cases and compared the ability of seven advanced query languages to execute them. None of the seven were perfect. Each had some defects. The working group took the best part of each of the seven languages and created the XQuery standard. The XSLT language reached recommendation status in 1999. But there were many people who felt that XSLT was too difficult to learn, and because of the XML syntax, it was very unfamiliar for many software developers. People with an SQL background had a difficult time understanding how to learn XSLT.

XQuery and SQL


Studies have shown that people familiar with SQL can quickly learn XQuery. Once developers understand the structure of the FLWOR statement, many SQL concepts such as sorting and selecting distinct values are easily learned.

XQuery and XSLT


Many developers once considered XSLT template-style transforms ideal for transforming documents and XQuery ideal for querying more structured XML data such as a collection of book metadata. Recent work with typeswitch-style transforms have shown that XQuery modules and functions can be used to create document-style transforms that rival most XSLT functions. And because many XQuery systems leverage document indexes they can be considerably faster than XSLT transforms that were never designed to use indexed XML structures.

References
W3C Papers from 1998 on XML Query Languages [4]
[1] [2] [3] [4] http:/ / www. w3. org/ TR/ xquery-11/ #id-inline-func http:/ / en. wikibooks. org/ wiki/ Computer_programming/ Functional_programming http:/ / www. w3. org/ TandS/ QL/ QL98/ Overview. html http:/ / www. w3. org/ TandS/ QL/ QL98/ pp. html

Basic Authentication

30

Basic Authentication
Motivation
You want to use a very basic login process over a secure network such as a secure Intranet or over an SSL connection.

Method
We will used the base64 encoding and decoding tools to generate the right strings. xquery version "1.0"; let $user := 'Aladdin' let $password := 'open sesame' let $credentials := concat($user, ':', $password) let $encode := util:string-to-binary($credentials) return <results> <user>{$user}</user> <password>{$password}</password> <encode>{$encode}</encode> </results> Returns the following: <results> <user>Aladdin</user> <password>open sesame</password> <encode>QWxhZGRpbjpvcGVuIHNlc2FtZQ==</encode> </results>

Sample HTTP GET using Basic Authentication


xquery version "1.0"; declare function local:basic-get-http($uri,$username,$password) { let $credentials := concat($username,":",$password) let $credentials := util:string-to-binary($credentials) let $headers := <headers> <header name="Authorization" value="Basic {$credentials}"/> </headers> return httpclient:get(xs:anyURI($uri),false(), $headers) }; let $host := "http://localhost:8080" let $path := "/exist/rest/db/apps/terms/data/1.xml" let $uri := concat($host, $path)

Basic Authentication let $user := 'my-login' let $password := 'my-password' return local:basic-get-http($uri,$username,$password)

31

References
http://en.wikipedia.org/wiki/Basic_access_authentication Wikipedia Entry on Basic Authentication

Basic Feedback Form


Motivation
You want to gather feedback from visitors.

Implementation
A simple HTML form gathers suggested improvements and an email address. The suggestion is emailed to one of the authors and an acknowledgment sent to the submitter. Here the default send-mail client on the eXist implementation at UWE, Bristol is used.

XQuery script
xquery version "1.0";

(: A simple Feedback form using the eXist mail module :)

import module namespace mail="http://exist-db.org/xquery/mail"; declare option exist:serialize "method=xhtml media-type=text/html";

let $comment:= normalize-space(request:get-parameter("comment","")) let $email := normalize-space(request:get-parameter("email","")) return <html> <head> <title>Feedback on the XQuery Wikibook</title> </head> <body> <h1>Feedback on the XQuery Wikibook</h1> <form method="post"> Please let us know how this Wikibook could be improved.<br/> <textarea name="comment" rows="5" cols="80"/><br/> Your email address <input type="text" name="email" size="60"/> <input type="submit" value="Send"/> </form> {if ($email ne "" and $comment ne "") then let $commentMessage := <mail>

Basic Feedback Form


<from>{$email}</from> <to>kit.wallace@gmail.com</to> <subject>Wikibook Feedback</subject> <message> <text>{$comment}</text> </message> </mail> let $ackMessage := <mail> <to>{$email}</to> <from>kit.wallace@gmail.com</from> <subject>Wikibook Feedback</subject> <message> <text>Many thanks for your feedback - we appreciate your interest in this collaborative work.</text> </message> </mail> let $sendcomment := mail:send-email($commentMessage,(),()) let $sendack := mail:send-email($ackMessage,(),()) return if ($sendcomment and $sendack) then <div> <h2>Feedback</h2> <p>You suggested that the XQuery Wikibook could be improved by:<br/> <em>{$comment}</em>. <br/>Thanks for the feedback.</p> </div> else <p>Something went wrong - please try again</p> else if ($comment ne "") then <div>Please provide an email address so that we can let you know of progress on your suggestion.</div> else () } </body> </html>

32

Feedback Form [1]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ mail/ feedback. xq

Basic Search

33

Basic Search
Motivation
You want to create a basic HTML search page and search service.

Method
We will create two files. One is an HTML form and the other is a RESTful search service that takes a single parameter from the URL which is the search query. The search service will search a collection of XML files. Here is the base path to our test search collection: /db/test/search The data to be searched will be in the following collection: /db/test/search/data In "Browse Collections" in the Admin interface, create the collection "test"; create the collection "search" under it; lastly, create the collection "data" under "search". Upload the two XML documents listed under "Sample Data" to "data"; upload "search-form.xq" and "search.xq" to "search" (instead of uploading, you can Save to URL, using oXygen, or use the Webstart client).

Search Form
/db/test/search/search-form.xq
We will create a basic HTML form that has just one input field for the query. declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; let $title := 'Basic Search Form' return <html> <head> <title>{$title}</title> </head> <body> <h1>{$title}</h1> <form method="GET" action="search.xq"> <p> <strong>Keyword Search:</strong> <input name="q" type="text"/> </p> <p> <input type="submit" value="Search"/> </p> </form> </body>

Basic Search </html> Note that the action will pass the value from the form to a RESTful service. The only parameter will be "q", the query string.

34

Search Service
The following file should be placed in /db/test/search/search.xq

/db/test/search/search.xq
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; let $title := 'Simple Search RESTful Service' let $data-collection := '/db/test/search/data' (: get the search query string from the URL parameter :) let $q := request:get-parameter('q', '') return <html> <head> <title>{$title}</title> </head> <body> <h1>Search Results</h1> <p><b>Searching for: </b>{$q} in collection: {$data-collection}</p> <ol>{ for $fruit in collection($data-collection)/item[fruit/text() = $q] return <li>{data($fruit)}</li> }</ol> </body> </html>

Running your Search


To test your search service from a URL, copy the following into the browser navigation toolbar: http:/ / localhost:8080/ exist/ rest/ db/ test/ search/ search. xq?q=apple You should see the following result:

To drive this service from a form, click the following link or copy it into your browser navigation toolbar:

Basic Search http:/ / localhost:8080/ exist/ rest/ db/ test/ search/ search-form. xq

35

Sample Data for /db/test/search/data


/db/test/search/data/1.xml
<item> <fruit>apple</fruit> </item>

/db/test/search/data/2.xml
<item> <fruit>banana</fruit> </item>

Basic Session Management


Motivation
You want to associate some behavior of your web application with a users's login session.

Method
There are several functions provided by eXist and other web servers to manage information associated with a login session. xquery version "1.0"; let $session-attributes := session:get-attribute-names() return <results> {for $session-attribute in $session-attributes return <session-attribute>{$session-attribute}</session-attribute> } </results> Before you add any session attributes this might return only a single variable such as:

Basic Session Management <results> <session-attribute>_eXist_xmldb_user</session-attribute> </results> xquery version "1.0"; (: set the group and role :) let $set-dba-group := session:set-attribute('group', 'dba') let $set-role-editor := session:set-attribute('role', 'editor') let $session-attributes := session:get-attribute-names() return <results> {for $session-attribute in $session-attributes return <session-attribute>{$session-attribute}</session-attribute> } </results> This will return the following attributes: <results> <session-attribute>group</session-attribute> <session-attribute>role</session-attribute> <session-attribute>_eXist_xmldb_user</session-attribute> </results> These attributes will remain associated with the user until the user logs out or their session times out, typically after 15 minutes of inactivity. One sample use of session attributes is to keep track of user interface preferences. For example if a user wants to have their data sorted by a person's zip code you can add that to their session variable. let $set-sort := session:set-attribute('sort', 'zip-code')

36

BBC Weather Forecast

37

BBC Weather Forecast


BBC Weather forecasts
Some weather data is available from the BBC as RSS feeds. Currently this includes the current conditions [1] and the 3-day forecast [2]. Lacking a standard set of tags fro weather properties, the conditions are expressed in a string and string parsing is needed to access the elemental data. For other forecasts such as the 24-hr and 5 day which are not available as RSS we must scrape the HTML page. One approach to this task is this Yahoo Pipe [3] which converts the page to an RSS feed. However the data would be more useful converted to XML elements.

Dates and times


In all these pages and feeds there is a problem to assign a date to a forecast or observation. Dates are often omitted or expressed as a day-of-the week. This leads to complications in processing both RSS and HTMl pages.

24-hour forecast
This script uses the eXist module httpclient to get the HTML, parses the HTML and generates an XML file. This XML could then be transformed via XSLT to a viewable page.

Interface
This script has two parameter: region - required - a numeric code unique to the BBC (? code list) area - optional - a sub region , typically the beginning of the postcode
declare namespace h ="http://www.w3.org/1999/xhtml";

declare function local:day-of-week($date) { ('Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat') [ xs:integer(($date - xs:date('1901-01-06')) div xs:dayTimeDuration('P1D')) mod 7 +1] };

let $area := request:get-parameter("area",()) let $region := request:get-parameter("region","2") let $url := concat ("http://news.bbc.co.uk/weather/forecast/",$region, "?state=fo:B", if (exists($area)) then concat("&area=",$area) let $doc := httpclient:get(xs:anyURI($url),false(),()) let $currentDate := current-date() let $currentTime := current-time() let $dow := local:day-of-week($currentDate) return element forecasts { element region {$region}, if (exists($area)) then element area {$area} else () , element source {"BBC"}, else ())

BBC Weather Forecast


for $row in $doc/httpclient:body//h:table/h:tbody/h:tr

38

let $raw-time :=normalize-space($row/h:td[1]) let $time := if (contains($raw-time," ")) then substring-before($raw-time," ") else $raw-time let $time := xs:time(concat($time,":00")) let $pdow := if (contains($raw-time,"(")) then substring-before(substring-after($raw-time,"("),")") else $dow let $date := if ($pdow ne $dow) return element forecast { element date {$date}, element time {$time}, element dow {$pdow}, element summary {string($row/h:td[2]//h:p[@class="sum"])}, element imageurl {string($row/h:td[2]//h:div[@class="summary"]//h:img/@src)}, element maxTemp{ attribute units {"degc"} , $row/h:td[3]//h:span[@class="cent"]/text()}, element maxTemp {attribute units {"degf"} , $row/h:td[3]//h:span[contains(@class,"fahr")]/text()}, element windDirection {string($row/h:td[4]//h:span[contains(@class,"wind")]/@title)}, element windSpeed {attribute units {"mph"} , substring-before($row/h:td[4]//h:span[contains(@class,"mph")], "mph")}, element windSpeed {attribute units {"kph"} ,substring-before($row/h:td[4]//h:span[contains(@class,"kph")], "km/h")}, element humidity {attribute units {"%"}, normalize-space(substring-before($row/h:td[5]//h:span[contains(@class,"hum")], "%"))}, element pressure { attribute units {"mb"} , normalize-space(substring-before($row/h:td[5]//h:span[@class="pres"], "mB"))}, then $currentDate + xs:dayTimeDuration("P1D") else $currentDate

element visibility {normalize-space($row/h:td[5]//h:span[contains(@class,"vis")])} } }

24 hour forecast for Bristol [4]

References
[1] [2] [3] [4] http:/ / newsrss. bbc. co. uk/ weather/ forecast/ 3/ ObservationsRSS. xml http:/ / newsrss. bbc. co. uk/ weather/ forecast/ 3/ Next3DaysRSS. xml http:/ / pipes. yahoo. com/ pipes/ pipe. edit?_id=1HlcTL8F3hGF7NSlPxJ3AQ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ weather/ bbc24hforecast. xq?region=3

Benefits

39

Benefits
Benefits of XQuery
The principal benefits of XQuery are: Expressiveness - XQuery can query many different data structures and its recursive nature makes it ideal for querying tree and graph structures Brevity - XQuery statements are shorter than similar SQL or XSLT programs Flexibility - XQuery can query both hierarchical and tabular data Consistency - XQuery has a consistent syntax and can be used with other XML standards such as XML Schema datatypes XQuery is frequently compared with two other languages, SQL and XSLT, but has a number of advantages over these.

Advantages over SQL


Unlike SQL, XQuery returns not just tables but arbitrary tree structures. This allows XQuery to directly create XHTML structures that can be used in web pages. XQuery is for XML-based object databases, and object databases are much more flexible and powerful than databases which store in purely tabular format. Unlike XSLT, XQuery can be learned by anyone familiar with SQL. Many of the constructs are very similar such as: Ordering Results: Both XQuery and SQL add an order by clause to the query. Selecting Distinct Values: Both XQuery and SQL have easy ways to select distinct values from a result set Restricting Rows: Both XQuery and SQL have a WHERE X=Y clause that can be added to an XQuery Another big advantage is that XQuery is essentially the native query language of the World Wide Web. One can query actual web pages with XQuery, but not SQL. Even if one uses SQL-based databases to store HTML/XHTML pages or fragments of such pages, one will miss many of the advantages of XQuery's simple tag/attribute search (which is akin to searching for column names within column names).

Advantages over XSLT


Unlike XSLT, XQuery can be quickly learned by anyone familiar with SQL. XSLT has many patterns that are unfamiliar to many procedural software developers. Also, whereas XSLT is good for using as a static means to convert one type of document to another, for example RSS to HTML, XQuery is a much more dynamic querying tool, useful for pulling out sections of data from large documents and/or large number of documents.

The Debate about XQuery vs. XSLT for Document Transformation


There has been a debate of sorts about the merits of the two languages for transforming XML: XSLT and XQuery. A common misconception is that "XQuery is best for querying or selecting XML, and XSLT is best for transforming it." In reality, both methods are capable of transforming XML. Despite XSLT's longer history and larger install base, the "XQuery typeswitch" method of transforming XML provides numerous advantages. Most people who need to transform XML hear that they need to learn a language called XSLT. XSLT, whose first version was published by the W3C in 1999, was a huge innovation for its time and, indeed, remains dominant. It was one of the very first languages dedicated to transforming XML documents, and it was the first domain-specific language (DSL) to use advanced theories from the world of functional programming to create very reliable, side-effect free transformations. Many XML developers still feel strong indebted to this groundbreaking language, since it helped them see a new model of software development: one focused around the transformation of models

Benefits and empowering them to fuse both the requirements and documentation of a transformation routing into a single, modular program. On the other hand, learning XSLT requires overcoming a very substantial learning curve. XSLT's difficulty is due, in part, to one of the key design decisions by its architects: to express the transformation rules using XML itself, rather than creating a brand new syntax and grammar for storing the transformation rules. XSLT's unique approach to transformation rules also contributes to the steepness of the learning curve. The learning curve can be overcome, but it is fair to say that this learning curve has created a opening for an alternative approach. XQuery has filled this demand for an alternative among a growing community of users: they find XQuery has a lower learning curve, it meets their needs for transforming XML, and, together with XQuery's other advantages, it has become a compelling "all-in-one" language. Like XSLT, XQuery was created by the W3C to handle XML. But instead of expressing the language in XML syntax, the architects of XQuery chose a new syntax that would be more familiar to users of server-side scripting languages such as PHP, Perl, or Python. XQuery was designed to be similar to users of relational database query languages such as SQL, while still remaining true to functional programming practices. Despite its relative youth (XQuery 1.0 was only released in 2007 when XSLT had already reached its version 2.0), XQuery was born remarkably mature. XML servers like eXist-db and MarkLogic were already using XQuery as their language for querying XML and performing web server operations (obviating the need for learning PHP, Perl, or Python). So, in the face of the XSLT community's contention that "XSLT is best for transforming documents and XQuery is best for querying databases", this community of users was surprised to find that XQuery has entirely replaced their need for XSLT. They have come to argue unabashedly that they prefer XQuery for this purpose. How does XQuery accomplish the task of transforming XML? The primary technique in XQuery for transforming XML is a little-known expression added by the authors of XQuery, called "typeswitch." Although it is quite simple, typeswitch enables XQuery to perform nearly the full set of transformations that XSLT does. A typeswitch expression quickly looks at a node's type, and depending on the node's type, performs the operation you specify for that type of node. What this means is that each distinct element of a document can have its own rule, and these rules can be stored in modular XQuery functions. This humble addition to the XQuery language allows developers to transform documents with complex content and unpredictable order - something commonly believed to be best reserved for the domain of XSLT. Despite the differences in syntax and approach to transformation, a growing community has actually come to see the XQuery typeswitch expression as a valid, even superior, way to store their document transformation logic. By structuring a set of XQuery functions around the typeswitch expression, you can achieve the same result as XSLT-style transforms while retaining the benefits of XQuery: ease of learning and integration with native XML databases. Even more important for those users of native XML databases, the availability of typeswitch means that they only need to learn a single language for their database queries, web server operations, and document transformations. These XQuery typeswitch routines have proved easy to build, test, and maintain - some believe easier than XSLT. XQuery typeswitch has given these users a high degree of agility, allowing them to master XQuery fully rather than splitting their time and attention between XQuery and XSLT. That said, there is still a large body of legacy XSLT transforms that work well, and there are XSLT developers who see little benefit from transitioning to a typeswitch-style XQuery. Both are valid approaches to document transformation. A natural tension has arisen between the proponents of XQuery typeswitch and XSLT, each promoting what they are most comfortable with and believe to be superior. In practice you might be best served by trying both techniques and determining what style is right for you and your organization. Without presuming a background or interest in XSLT, this article and its companion article help you to understand the key patterns for using XQuery typeswitch for your XML transformation needs.

40

Caching and indexes

41

Caching and indexes


Motivation
The views of the data about individual teams or groups needs to be supplemented with indexes to the resources for which those views are appropriate. Generating the indexes on demand is one approach but loads the SPARQL server. Given the batch nature of the DBpedia extract, it makes more sense to cache the index data and use the cache to generate an index page. (triggering the cache refresh is another problem!)

Non-caching approach
The following script generates an index page with links to the HTML view and the timeline views of a artist album.
declare option exist:serialize "method=xhtml media-type=text/html"; declare variable $query := " PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?group } "; skos:subject <http://dbpedia.org/resource/Category:Rock_and_Roll_Hall_of_Fame_inductees>.

declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"\(.*\)","") let $text := replace($text,"_"," ") return $text };

let $category := request:get-parameter("category","") let $categoryx := replace($category,"_"," ") let $queryx := replace($query,"Rock_and_Roll_Hall_of_Fame_inductees",$category) let $sparql := concat("http://dbpedia.org/sparql?default-graph-uri=",escape-uri("http://dbpedia.org",true()), "&query=",escape-uri($queryx,true()) ) let $result return <html> <body> <h1>{$categoryx}</h1> <table border="1"> { for $row in $result/table//tr[position()>1] let $resource := substring-after($row/td[1],"resource/") let $name := local:clean($resource) order by $name return <tr> <td> := doc($sparql)

Caching and indexes


{$name} </td> <td> <a href="group2html.xq?group={$resource}">HTML</a> </td> <td> <a href="groupTimeline.xq?group={$resource}">Timeline</a> </td> </tr> } </table> </body> </html>

42

Index examples
Rock and Roll Groups [1]

Caching Approach
Two scripts are needed - one to generate the data to cache, the other to generate the index page. The approach is illustrated with an index to Rock and Roll groups based on the Wikipedia category Rock and Roll Hall of Fame inductees.

Generate the index data


This script generates an XML file. A further development would store the XML directly to the database but it could also be saved manually to the appropriate location. It is parameterised by a category.
declare variable $query := " PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?group } "; skos:subject <http://dbpedia.org/resource/Category:Rock_and_Roll_Hall_of_Fame_inductees>.

declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"\(.*\)","") let $text := replace($text,"_"," ") return $text };

declare function local:table-to-seq($table ) { let $head := $table/tr[1] for $row in $table/tr[position()>1] return <tuple>

Caching and indexes


{ for $cell at $i in $row/td return element {$head/th[position()=$i]} {string($cell)} } </tuple> };

43

let $category := request:get-parameter("category","Rock_and_Roll_Hall_of_Fame_inductees") let $queryx := replace($query,"Rock_and_Roll_Hall_of_Fame_inductees",$category) let $sparql := concat("http://dbpedia.org/sparql?default-graph-uri=",escape-uri("http://dbpedia.org",true()), "&query=",escape-uri($query,true()) ) let $result := doc($sparql)/table

let $groups := local:table-to-seq($result) return <ResourceList category="{$category}"> {for $group in $groups let $resource := substring-after($group/group,"resource/") let $name := local:clean($resource) order by $name return <resource id="{$resource}" name="{$name}"/> } </ResourceList>

Note: I guess a better approach would be to use triples here, saved to a local triple store.

HTML index page


This script, groupList, uses the cached index data declare option exist:serialize "method=xhtml media-type=text/html"; let $list := //ResourceList[@category="Rock_and_Roll_Hall_of_Fame_inductees"] return <html> <body> <h1>Rock Groups</h1> <table border="1"> {for $declare option exist:serialize "method=xhtml media-type=text/html"; lresource in $list/resource order by $resource/@name return <tr> <td> {string($resource/@name)} </td> <td> <a href="group2html.xq?group={$resource/@id}">HTML</a>

Caching and indexes </td> <td> <a href="groupTimeline.xq?group={$resource/@id}">Timeline</a> </td> </tr> } </table> </body> </html>

44

Execute
Roll and Roll groups [2]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ groupIndex. xq?category=Rock_and_Roll_Hall_of_Fame_inductees [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ groupList. xq

Chaining Web Forms


Motivation
You want to create a series of web pages that pass information from one page to the next. This is very typical in web application development for example in the creation of "wizards" that ask the user for a series of questions on separate web pages.

Methods
We will use three methods to demonstrate this: on the client using URL parameters and hidden form fields on the client in cookies on the server using sessions

Using URL Parameters and Hidden Form Fields


In this method we will use a series of HTML forms in successive pages. Each page will gather some information and pass this information on the the next form by adding additional parameters to the URL. We will use the request:get-parameter functions to get the key-value pairs from the URL. Our first form will ask the user for their name. The second will ask them their favorite color. Here is and example of the first form: question-1.html <html> <head> <title>Question 1: Your Name</title> </head> <body> <h1>Question 1</h1>

Chaining Web Forms <form action="02-web-form.xq"> <span class="label">Please enter your first name:</span> <input type="text" name="name"/><br/> <input type="submit" value="Next Question"/> </form> </body> </html> The URL is passed to the second form and we will use the request:get-parameter() function to get the value from the URL. Here is the XQuery function for question 2. question-2.xq
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=yes indent=yes"; let $name := request:get-parameter('name', '') let $title := 'Question 2: Enter Your Favorite Color' return <html> <head> <title>{$title}</title> </head> <body> <h1>{$title}</h1> <form action="03-result.xq"> <span class="label">Hello {$name}. Please enter your favorite color:</span> <input type="hidden" name="name" value="{$name}"/> <input type="text" name="color"/><br/> <input type="submit" value="Results"/> </form> </body> </html>

45

Note that we are storing the incoming name in a hidden input field in the form. The value of the hidden field must take the value of the incoming {$name} parameter. The last page just gets the two input parameters and displays them in an HTML page. If you look at the URL it will be of the format: result.xq?name=dan&color=blue result.xq xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=yes indent=yes"; let $name := request:get-parameter('name', '') let $color := request:get-parameter('color', '')

Chaining Web Forms let $title := 'Result' return <html> <head> <title>{$title}</title> </head> <body> <h1>{$title}</h1> <p>Hello {$name}. Your favorite color is {$color}</p> </body> </html> Discussion This method is the preferred method since it does not require the client browser to support cookies. It also does not require the users to have a login a manage sessions. Sessions have the disadvantage that if the user gets interrupted half way through the process their session information will be lost and all the data they entered will need to be re-entered. Note that although the first "name" parameter is not visible in the second form, the value is visible in the URL. So the term "hidden" does not apply to the URL, only the form.

46

Using Cookies
In this example we will use the following functions for setting and getting cookies: response:set-cookie($name as xs:string, $value as xs:string) empty() request:get-cookie-value($cookie-name as xs:string) xs:string? The first form is identical to the example above. But the
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=yes indent=yes"; (: get the input and set the name cookie :) let $name := request:get-parameter('name', '') let $set-cookie := response:set-cookie('name', $name) let $title := 'Question 2: Enter Your Favorite Color' return <html> <head> <title>{$title}</title> </head> <body> <h1>{$title}</h1> <form action="03-result.xq"> <span class="label">Hello {$name}. Please enter your favorite color:</span>

Chaining Web Forms


<input type="text" name="color"/><br/> <input type="submit" value="Results"/> </form> </body> </html>

47

Our first form will set the first cookie value and our second form will read the name cookie's value. xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=yes indent=yes"; let $name := request:get-cookie-value('name') let $color := request:get-parameter('color', '') let $title := 'Result From Cookies' return <html> <head> <title>{$title}</title> </head> <body> <h1>{$title}</h1> <p>Hello {$name}. Your favorite color is {$color}</p> </body> </html> Discussion Using cookies can be complex and you must be very careful that your cookies are not changed by another application from the same domain. Your design must also consider the fact that browsers and users disable cookies.

Using Sessions
The last method is to use server session values to store the key-value data. This will be very similar to the last example but we will use the eXist Session [1] module functions to set and get the values. Here are the two calls we will need: session:set-attribute($name as xs:string, $value as item()*) empty() session:get-attribute($name as xs:string) xs:string* You only need to make a change to a single line of the 2nd form. Just change the lines to the following: (: get the name and set the session :) let $name := request:get-parameter('name', ) let $set-session := session:set-attribute('name', $name) and in the final result script just get the data from the session: let $name := session:get-attribute('name')

Chaining Web Forms Discussion Using sessions can also be complex if you are new to session management. There are many rules that govern session timeouts and both the web server and database server may need to be configured to take your users needs into account. Session management may also not appropriate for public web sites that have policies against collection information on the web server.

48

Trade off Analysis


There are many points to consider. Storing information in URLs has many advantages since user can start a multi-step form and come back later to finish. As long as they do not shut down their browser the URL parameters will remain. Cookies will remain on the client until the user takes some action to remove them. These are very useful when you do not want to have a person re-enter data for each session. Cookies tend to be ideal for storing users preferences when you do not have the ability to store them on the server. Sessions are most useful when you have users authenticate with a login but data is lost when the users log out or their session times out.

References
[1] http:/ / demo. exist-db. org/ functions/ session

Changing Permissions on Collections and Resources


Motivation
You want to change permissions on a set of collections and resources.

Method
There are two functions we will use: For collections: xmldb:chmod-collection($collection, $perm) and for resources: xmldb:chmod-resource($collection, $resource, $perm) The $perm is a decimal number. As of 1.5 you can use the function xmldb:string-to-permissions("rwurwu---") to get this decimal number.

Sample to get decimal values for guest permissions


xquery version "1.0"; <results> <guest-none>{xmldb:string-to-permissions("rwurwu---")}</guest-none> <guest-read>{xmldb:string-to-permissions("rwurwur--")}</guest-read> <guest-read-write>{xmldb:string-to-permissions("rwurwurw-")}</guest-read-write>

Changing Permissions on Collections and Resources


<guest-all>{xmldb:string-to-permissions("rwurwurwu")}</guest-all> </results>

49

Returns the following <results> <guest-none>504</guest-none> <guest-read>508</guest-read> <guest-read-write>510</guest-read-write> <guest-all>511</guest-all> </results>

Recursive Script to Remove All Guest Permissions


xquery version "1.0"; declare function local:chmod-collection($collection) { xmldb:chmod-collection($collection, xmldb:string-to-permissions("rwurwu---")), for $child in xmldb:get-child-collections($collection) return local:chmod-collection(concat($collection, "/", $child)) }; system:as-user('my-login', 'password', ( local:chmod-collection("/db/collection"), for $doc in collection("/db/collection") return xmldb:chmod-resource(util:collection-name($doc), util:document-name($doc), xmldb:string-to-permissions("rwurwu---")) ) )

Warning, this breaks several features. You must run many functions as non-guest.

Compare two XML files

50

Compare two XML files


Motivation
You want to compare two XML files. If the files are the same you want to return a true and if not you want to return a false. Note if you want to see the differences see the ../XML Differences example.

Method
We will use the xdiff:compare() function that comes built in to eXist. To use this you pass two nodes to the compare function: xdiff:compare($node1 as node(), $node2 as node())

Sample Source Code


Assume you have two different XML files xquery version "1.0"; import module namespace xdiff="http://exist-db.org/xquery/xmldiff" "java:org.exist.xquery.modules.xmldiff.XmlDiffModule"; let $doc1 := '/db/apps/xml-diffs/data/diff1.xml' let $doc2 := '/db/apps/xml-diffs/data/diff2.xml' return <results> <result>diff of 1,1: {xdiff:compare(doc($doc1), doc($doc1))}</result> <result>diff of 1,2: {xdiff:compare(doc($doc1), doc($doc2))}</result> <result>diff of 2,2: {xdiff:compare(doc($doc2), doc($doc2))}</result> </results> The result will be: <results> <result>diff of 1,1: true</result> <result>diff of 1,2: false</result> <result>diff of 2,2: true</result> </results>

at

Compare with XQuery

51

Compare with XQuery


Methods
We will use a variety of functions that iterate through the lists. For each example we will perform some comparison with the second list.

Simple iteration and test for missing elements


In this first example, we will use a simple for loop to go through each item on a linear list. We then will check to see if that item is anywhere on the second list (regardless of order). If it is on the second list we will display the item from the first list. If not we will output "missing". This is useful if you want to find out if a local collection is missing files from some remote collection. xquery version "1.0"; (: Compare of two linear lists :) let $list1 := <list1> <item>a</item> <item>b</item> <item>c</item> <item>d</item> </list1>

let $list2 := <list1> <item>a</item> <item>c</item> <item>e</item> </list1> return <missing>{ for $item1 in $list1/item let $item-text := $item1/text() return <test item="{$item-text}"> {if ($list2/item/text()=$item-text) then ($item1) else <missing>{$item-text}</missing> } </test> }</missing> Note that the conditional expression:

Compare with XQuery if ($list2/item/text() = $item-text) Tests to see if the $item-text is anywhere in list2. If it occurs anywhere this expression will return true().

52

Sample Results
<missing> <test item="a"> <item>a</item> </test> <test item="b"> <missing>b</missing> </test> <test item="c"> <item>c</item> </test> <test item="d"> <missing>d</missing> </test> </missing> Note that this will not report any items on the second list that are missing from the first list.

Using Quantified Expressions


This can be rewritten using XQuery quantified expressions. There are two reasons for this. First the XQuery optimizer can frequently run quantified expressions much faster and some people feel they are easier to read. See XQuery/Quantified Expressions for more details. In this second example the list assignments are the same but we will only display the items from list 1 that are missing from list 2. <missing>{ for $item1 in $list1/item return if (some $item2 in $list2/item satisfies $item2/text() = $item1/text()) then () else $item1 }</missing> This returns: <missing> <item>b</item> <item>d</item> </missing> We are now ready to modularize this missing function so that we can pass any two lists to find missing elements.

Compare with XQuery

53

Creating a Missing XQuery Function


Out next step is to create an XQuery function that compare any two lists and returns the items in the second list that are not in the first list. declare function local:missing($list1 as node()*, $list2 as node()*) as node()* { for $item1 in $list1/item let $item-text := $item1/text() return if (some $item2 in $list2/item satisfies $item2/text() = $item1/text()) then () else $item1 };

We can rewrite the output function to use this function: <results> <missing-from-2>{local:missing($list1, $list2)}</missing-from-2> <missing-from-1>{local:missing($list2, $list1)}</missing-from-1> </results> Note that the order of the lists has been reversed in the second call to the missing() function. The second pass looks for items on list2 that are not on list1. Running this query generates the following output: <results> <missing-from-2> <item>b</item> <item>d</item> </missing-from-2> <missing-from-1> <item>e</item> </missing-from-1> </results>

Compare with XQuery

54

Creating HTML Difference Lists


We can use CSS to style the output of these reports.

Screen Image

HTML Diff Report using CSS

Sample Data
This example uses full words of items to show text highlighting: let $list1 := <list> <item>apples</item> <item>bananas</item> <item>carrots</item> <item>kiwi</item> </list>

let $list2 := <list> <item>apples</item> <item>carrots</item> <item>grapes</item> </list> The following function uses HTML div and span elements and adds class="missing" to each div that is missing. The CSS file will highlight this background. declare function local:missing($list1 as node()*, $list2 as node()*) as node()* { for $item1 in $list1/item return if (some $item2 in $list2/item satisfies $item2/text() = $item1/text()) then <div>{$item1/text()}</div> else <div> {attribute {'class'} {'missing'}} {$item1/text()}

Compare with XQuery </div> }; We then use the following CSS file to highlight the differences. Each missing element must have class="missing" attribute for the missing element to be highlighted in this report. body {font-family: Ariel,Helvetica,sans-serif; font-size: large;} h2 {padding: 3px; margin: 0px; text-align: center; font-size: large; background-color: silver;} .left, .right {border: solid black 1px; padding: 5px;} .missing {background-color: pink;} .left {float: left; width: 190px} .right {margin-left: 210px; width: 190px} <body> <h1>Missing Items Report</h1> <div class="left"> <h2>List 1</h2> {for $item in $list1/item return <div>{$item/text()}</div>} </div> <div class="right"> <h2>List 2</h2> {for $item in $list2/item return <div>{$item/text()}</div>} </div> <br/> <div class="left"> <h2>List 1 Missing from 2</h2> {local:missing($list1, $list2)} </div> <div class="right"> <h2>List 2 Missing from 1</h2> {local:missing($list2, $list1)} </div> </body>

55

Collation
If the lists are in sorted order, or can be sorted into order, an alternative approach is to recursively collate the two lists. The core algorithm looks like: declare function local:merge($a, $b as item()* ) as item()* { if (empty($a) and empty($b)) then () else if (empty ($b) or $a[1] lt $b[1]) then ($a[1], local:merge(subsequence($a, 2), $b)) else if (empty($a) or $a[1] gt $b[1]) then ($b[1],local:merge($a, subsequence($b,2))) else (: a and b matched :) ($a[1], $b[1], local:merge(subsequence($a,2), subsequence($b,2)))

Compare with XQuery };

56

With the example above, we can merge two lists.

let $list1 := <list> <item>apples</item> <item>bananas</item> <item>carrots</item> <item>kiwi</item> </list>

let $list2 := <list> <item>apples</item> <item>carrots</item> <item>grapes</item> </list> return <result> {local:merge($list1/item,$list2/item) } </result> Execute [1] The actions on merge will depend on the application and the algorithm can be modified to output only mismatched items on one or other list, and handle matching items appropriately. For example, to display the merged list as HTML, we might modify the algorithm to:
declare function local:merge($a, $b if (empty($a) and empty ($b)) then () else if (empty ($b) or $a[1] lt $b[1]) then (<div class="left">{$a[1]/text()}</div>, local:merge(subsequence($a, 2), $b)) else if (empty ($a) or $a[1] gt $b[1]) then else }; (<div class="right">{$b[1]/text()}</div>,local:merge($a, subsequence($b,2))) (<div class="match">{$a[1]/text()}</div>, local:merge(subsequence($a,2), as item()* ) as item()* {

subsequence($b,2)))

Execute [2]

Compare with XQuery

57

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ collate1. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ collate3. xq

Creating a Timeline
Motivation
You want to create a timeline of event data. Timelines show events in a horizontal scrolling view.

Method
We will use the JavaScript client Timeline widgets provided by the Simile-Widgets project will be using the timeline 2.2.0 API calls.
[1]

. In this example we

To do this we need to transform a list of event dates into the proper formats and then create an HTML page that includes calls to the Simile JavaScript libraries. Steps 1. View sample Event XML File format 2. View HTML template that loads XML file 3. Create XQuery Function that generates the HTML template and loads the appropriate XML data file Our first example will use a list of non-Duration Events (Instant Events). We will explore duration events and other events in a future chapter. We will then create a simple XQuery module with a single function that loads a simple timeline.

Sample XML File Using Standard XML Date Formats


Most XML dates use ISO 8601 coding. To use this format you must put in a date format attribute in the data file. <data date-time-format="iso8601"> <event start="2009-01-01" isDuration="false"> First Day January, 2009 </event> <event start="2009-01-01" isDuration="false"> First Day of the Feb, 2009 </event> </data> Note that the data file must specify the ISO8601 date formats that are used as the XML date format.

HTML Driver Template


The sample HTML file shows how this XML file is loaded using the Timeline.loadXML() function.
<html> <head> <script src="http://static.simile.mit.edu/timeline/api-2.2.0/timeline-api.js" type="text/javascript"></script> <script type="text/javascript">

<![CDATA[

Creating a Timeline
var tl;

58

function onLoad() { var eventSource = new Timeline.DefaultEventSource(); var bandInfos = [ Timeline.createBandInfo({ eventSource: date: width: intervalUnit: eventSource, "Jan 01 2009 00:00:00 GMT", "70%", Timeline.DateTime.MONTH,

intervalPixels: 100 }), Timeline.createBandInfo({ eventSource: date: width: intervalUnit: eventSource, "Jan 01 2009 00:00:00 GMT", "30%", Timeline.DateTime.YEAR,

intervalPixels: 200 }) ]; bandInfos[1].syncWith = 0; bandInfos[1].highlight = true;

tl = Timeline.create(document.getElementById("my-timeline"), bandInfos); Timeline.loadXML("example-01.xml", function(xml, url) { eventSource.loadXML(xml, url); }); }

var resizeTimerID = null; function onResize() { if (resizeTimerID == null) { resizeTimerID = window.setTimeout(function() { resizeTimerID = null; tl.layout(); }, 500); } } ]]> </script> </head> <body onload="onLoad();" onresize="onResize();"> <h1>Timeline Template</h1> <div id="my-timeline" style="height: 150px; border: 2px solid blue">

</div>

Creating a Timeline
<noscript> This page uses Javascript to show you a Timeline. Please enable Javascript in your browser to see the full page. Thank you. </noscript> </body> </html>

59

Sample Image
This will produce the following example:

Sample Timeline Output For Two Events

Sample XML Event File Using Non-Standard XML Date Formats


<data> <event start="Jan 01 2009 00:00:00 GMT" isDuration="false" title="First Day of the New Year"> First Day of the New Year</event> <event start="Feb 01 2009 00:00:00 GMT" isDuration="false" title="First Day of the Feb"> First Day of the Feb</event> </data>

References
[1] http:/ / code. google. com/ p/ simile-widgets/ wiki/ Timeline

Creating XQuery Functions

60

Creating XQuery Functions


Motivation
You want to avoid duplication of XQuery code or create more modular XQuery programs.

Method
Use XQuery functions to encapsulate any chunk of XQuery code with a function wrapper. Any time you see a grouping of XQuery or XML code in your XQuery program that you would like to standardize, it is good design to start creating your own XQuery functions.

Static Content
Static content is content that is fixed and is not changed by the use of parameters. XQuery functions are ideal for storage of static content libraries. For example, if all your HTML pages have the same block of code that has your logo and header text, you can create a simple XQuery function that encodes this functionality. Here is the HTML code you want to standardize on: <div class="web-page-header"> <img src="images/mylogo.jpg" alt="Our Logo"/> <h1>Acme Widgets Inc.</h1> </div> declare function local:header() as node() { <div class="web-page-header"> <img src="images/mylogo.jpg" alt="Our Logo"/> <h1>Acme Widgets Inc.</h1> </div> }; When you want to reference this you just call the function by placing it in your HTML page and enclosing it in curly braces: <html> <head> <title>Sample Web Page</title> </head> <body> {local:header()} </body> </html> Note that these functions names are preceded by "local:". This is the default namespace of a function invoked only in the same XQuery main module. If you want to store your functions in a separate file, you can do so. Such a file is called a "library module". To make use of the functions in this module, you need to "import" the module in the prolog of your query. The benefit of storing your code in functions and modules is that if you ever need to make a change to a function, you only have to make the change in one location, rather than in the many locations where you've copied and pasted the same code.

Creating XQuery Functions The following file, which we will save as webpage.xqm, is an example of this (note also the addition of a footer function): module namespace webpage='http://www.example.com/webpage'; declare function webpage:header() as node() { <div class="web-page-header"> <img src="images/mylogo.jpg" alt="Our Logo"/> <h1>Acme Widgets Inc.</h1> </div> }; declare function webpage:footer() as node() { <div class="web-page-footer"> <img src="images/mylogo.jpg" alt="Our Logo"/> <p>Acme Widgets Inc.</p> </div> }; The module begins with a declaration of the module's namespace. Here we use an arbitrary namespace, "webpage".

61

Static Page Assembly


To use this function module you must import the module at the top of your XQuery file (the following "import module" expression assumes your XQuery file is in the same directory as the module file, webpage.xqm): xquery version "1.0"; import module namespace webpage='http://www.example.com/webpage' at 'webpage.xqm'; let $title := 'Sample Web Page' return <html> <head> <title>{$title}</title> </head> <body> {webpage:header()} <h1>{$title}</h1> <div class="content">Content goes here.</div> {webpage:footer()} </body> </html> Example [1]

Creating XQuery Functions

62

Dynamic Content
Unlike static content, dynamic content can be modified by including parameters into the function. One very common approach is to use a "page-assembler" function that includes parameters such as the document title and content. Here is an example of this function.

Dynamic Page Assembler Function


xquery version "1.0"; declare function webpage:assemble-page($title as xs:string, $content as node() as node()) { <html> <head> <title>{$title}</title> </head> <body> {webpage:header()} <h1>{$title}</h1> <div class="content">{$content}</div> {webpage:footer()} </body> </html> }; Your web pages now can all reference a central page assembler like the following. import module namespace webpage='http://www.example.com/webpage' at 'webpage.xqm'; let $title := 'Sample Web Page' let $content := <p>Content goes here.</p> return webpage:assemble-page($title, $content) Example [2]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ functions/ staticpage. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ functions/ dynamicpage. xq

Dates and Time

63

Dates and Time


Motivation
You want a quick reference page of sample functions that work with dates and times.

Method
We will provide a sample list of XQuery expressions and their results.

Current Date
This function returns the current date on the system that is executing the XQuery in W3C XML Schema date format: current-date() Result: 2010-05-28-05:00 Note that the "-05:00" is the offset from GMT of the server.

Current Time
current-time() Result: 07:02:11.616-05:00

Current Date and Time


current-dateTime() Result 2010-05-28T06:59:05.526-05:00 Note that default the letter 'T" seperates the date and the time. The results after the decimal point are the number of milliseconds.

One Week Ago


xs:date(current-dateTime()) - xs:dayTimeDuration('P7D') Result: 2010-05-21-05:00

DBpedia with SPARQL - Stadium locations

64

DBpedia with SPARQL - Stadium locations


At the risk of being repetitious, here is another script which mashes up data from DBpedia with GoogleMaps, this time to show the location of all venues in a supplied Wikipedia Category of venues.

Examples
Football Venues in England
kml [1] GoogleMap [2]

Football Venues in Scotland


kml [3] GoogleMap [4]

Script
(: This accepts a all stadiums :) category of stadiums and generates a kml map of

declare namespace r = "http://www.w3.org/2005/sparql-results#"; declare variable $query := "

PREFIX p: <http://dbpedia.org/property/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE {?ground skos:subject <http://dbpedia.org/resource/Category:Football_venues_in_England>. ?ground geo:long ?ground geo:lat ?long. ?lat.

?ground rdfs:label ?groundname. OPTIONAL {?ground foaf:depiction ?image .}. OPTIONAL {?club p:ground ?ground. FILTER (lang(?clubname) = 'en')}. OPTIONAL {?ground foaf:page ?wiki.}. FILTER (lang(?groundname) ='en'). } "; ?club rdfs:label ?clubname .

declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&amp;default-graph-uri=http://dbpedia.org&amp;query=", encode-for-uri($query)

DBpedia with SPARQL - Stadium locations


) return }; doc($sparql)

65

declare function local:sparql-to-tuples($rdfxml) { for $result in $rdfxml//r:result return <tuple> { for $binding return if ($binding/r:uri) then element {$binding/@name} attribute type { in $result/r:binding

{"uri"} ,

string($binding/r:uri) } else element {$binding/@name} attribute type {$binding/@datatype}, string($binding/r:literal) } } </tuple> }; {

declare option exist:serialize

"method=xhtml

media-type=application/vnd.google-earth.kml+xml highlight-matches=none";

let $category := request:get-parameter("category","Football_venues_in_England") let $queryx := replace($query,"Football_venues_in_England",$category) let $result:= local:execute-sparql($queryx) let $tuples := local:sparql-to-tuples($result)

let $x := response:set-header('Content-disposition','Content-disposition: inline;filename=stadiums.kml;')

return

<Document> <name>{replace($category,"_"," ")}</name> <Style id="stadium"> <IconStyle> <Icon><href>http://maps.google.com/mapfiles/kml/shapes/ranger_station.png</href> </Icon> </IconStyle>

DBpedia with SPARQL - Stadium locations


</Style> { for $groundid in distinct-values($tuples/ground) let $groundTuples := $tuples[ground=$groundid] let $ground := $groundTuples [1] let $name := string($ground/groundname) let $lat := xs:decimal($ground/lat) let $long := xs:decimal($ground/long) let $clubs := string-join($groundTuples/clubname,", ") let $wiki := string($ground/wiki) let $description := <div> Ground of {$clubs} {if ($ground/image) then (<br/>,<img src="{$ground/image}"/>) else () } <br/> <a href='{$groundid}'>DBpedia</a> <a href='{$wiki}'>Wikipedia</a> <a href="http://images.google.co.uk/images?q=stadium+{$name}">Google Images</a> </div> return <Placemark> <name>{$name}</name> <description> {util:serialize($description,"method=xhtml")} </description> <Point> <coordinates>{concat($long, ",",$lat,",0")}</coordinates> </Point> <styleUrl>#stadium</styleUrl> </Placemark> } </Document>

66

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ stadium2kml. xq?category=Football_venues_in_England [2] http:/ / maps. google. co. uk/ maps?q=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ stadium2kml. xq?category=Football_venues_in_England [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ stadium2kml. xq?category=Football_venues_in_Scotland [4] http:/ / maps. google. co. uk/ maps?q=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ stadium2kml. xq?category=Football_venues_in_Scotland

Delivery Status Report

67

Delivery Status Report


A common task is the need to integrate local data with related data on another site. Although an increasing number of sites provide an RSS feed, it is often necessary to scrape web pages to get the relevant data.

Delivery Status Reporting


One such case is where a company needs to monitor the status of their deliveries which use a courier service. In this mock example, a company maintains their own records of all deliveries commissioned, the delivery service used and the service's consignment number.. As an XML file, this might look like;
<DeliveryList> <Delivery><CustomerName>Fred Flintstone</CustomerName><Service>CityLink</Service> <ConsignmentNo>RZL14823</ConsignmentNo></Delivery> <Delivery><CustomerName>Bill Bailey</CustomerName><Service>CityLink</Service> <ConsignmentNo>RZL14869</ConsignmentNo></Delivery> <Delivery><CustomerName>Jack and Jill</CustomerName><Service>CityExpress</Service> <ConsignmentNo>RXL9999</ConsignmentNo></Delivery> </DeliveryList>

Integrated Report
The following script shows how the local delivery data can be combined with the data for this delivery obtained from the delivery company. In this case, the delivery company City-Line provides a page for each consignment reporting its status. The script loops over the relevant deliveries and constructs the appropriate URL to read the page for each delivery. The page is input to an HTML-to-XML conversion (used in the Yahoo Weather feed), and then specific elements are retrieved from the HTML to build an extract in XML of the page. This XML data is then combined with the local data to create a combined report.
import module namespace fwiki = "http://www.cems.uwe.ac.uk/xmlwiki" at "../reports/util.xqm";

declare option exist:serialize "method=xhtml media-type=text/html";

declare variable $citylinkURL := "http://www.city-link.co.uk/pod/podfrm.php?JobNo=ZZZZ";

declare function

local:get-consignment($consNo) {

let $citylinkURL := replace($citylinkURL,"ZZZZ",$consNo) let $page := fwiki:html-to-xml($citylinkURL) return <Consignment> <CustomerReference> {string($page//table[@id="this_table_holds_the_summary_info"]/tr[1]/td[2])} </CustomerReference> <ScheduledDeliveryDate> {string($page//table[@id="this_table_holds_the_summary_info"]/tr[1]/td[4])} </ScheduledDeliveryDate> <DeliveryStatus> {string($page//table[@id="this_table_holds_the_detailed_status_desc"]/tr[1]/td[2])} </DeliveryStatus> </Consignment> };

Delivery Status Report

68

let $report := <Report> {for $delivery in //Delivery[Service="CityLink"] let $citylink := local:get-consignment($delivery/ConsignmentNo) return <Delivery> {$delivery/*} {$citylink/*} </Delivery> } </Report> return fwiki:element-seq-to-table($report)

Show Report [1]

Notes
1. In production, a simple script to extract and store the delivery data in the database could be scheduled to run every hour to reduce the demands on the sites used in this application. 2. The script uses a generic function to convert any simple tabular XML to an HTML table. 3. The mapping between HTML elements and XML depends on the stability of this page. The paths are simplified by the presence of ids for the relevant tables. 4. A production system must be able to detect HTTP errors and act accordingly. This would require more control over the HTTP requests and responses. This facility is provided by the HTTP module in later releases of eXist. The simplistic approach taken here to obtain the XML would need to be replaced.

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Scrape/ deliveryReport. xq

Digest Authentication

69

Digest Authentication
Motivation
The API you are using uses digest authentication, for example the Talis platform this in the eXist httpclient module, but one can be written in XQuery.
[1]

There is no direct support for

The following implementation is based on the description and examples in w:Digest_authentication.

Modules and concepts


eXist httpclient : for basic POST operation eXist util : for uuid generation and md5encoding

XQuery Module
module namespace http ="http://www.cems.uwe.ac.uk/xmlwiki/http"; declare namespace httpclient= "http://exist-db.org/xquery/httpclient";

Two functions transform between a comma-delimited list of name="value" pairs and an XML representation: The first function takes strings in the following format: string="value",string1="value2",string3="value3" Note that the replace function removes all double quotes from the right side of each expression.

Supporting Functions
The following two functions convert key-value encoded strings of this form: key1="value1",key2="value2",key3="value3" into XML structures of the form: <field name="key1" value="value1"/> <field name="key2" value="value2"/> <field name="key3" value="value3"/> Here are the supporting functions: declare function http:string-to-nvs($string) { let $nameValues := tokenize($string,", ") return for $f in $nameValues let $nv := tokenize($f,"=") return <field name = "{$nv[1]}" value="{replace($nv[2],'"','')}"/> }; declare function http:nvs-to-string($nvs) { string-join( for $field in $nvs return concat ($field/@name, '="',$field/@value,'" ') , ", ")

Digest Authentication };

70

Post With Digest Function


The main function handles a POST operation in two steps. The first POST will get a 401 response (should check this). The Digest is constructed and sent back with the second POST.
declare function http:post-with-digest($host, $path, $username, $password, $doc, $header ) { let $uri := xs:anyURI(concat($host,$path))

(: send an HTTP Request to the server - called the challenge :) let $request := ) (: The server responds with the 401 response code. ressponse the server provide the authentication realm and a randomly-generated, single-use value called a nonce. We will get the realm and the nouce by finding the WWW-Authenticate value out of the response :) let $first-response := substring-after($request//httpclient:header[@name="WWW-Authenticate"]/@value,"Digest ") (: now we get the nounce, realm and the optional quality of protection out of the first response :) let $fields := http:string-to-nvs($first-response) let $nounce := $fields[@name="nonce"]/@value let $realm := $fields[@name="realm"]/@value let $qop := $fields[@name="qop"]/@value (: Create a client nounce using a Universally Unique Identifier :) let $cnonce := util:uuid() (: this is the nounce count :) let $nc := "00000001" let $HA1:= util:md5(concat($username,":",$realm,":",$password)) (: TODO if the quality of protection (qos) is "auth-int" , then HA2 is MD5(method : digestURU : MD5(entityBody)) But if qos "auth" or "auth-int" then it is the following :) let $HA2 := util:md5(concat("POST:",$path)) util:md5(concat($HA1, ":", $nounce,":",$nc,":", In this httpclient:post( $uri, "dummy", false(), $header

let $response :=

$cnonce, ":", $qop, ":",$HA2)) (: note that if qop directive is unspecified, then the response should be md5(HA!:nounce:HA2) :)

Digest Authentication

71

(: here are the new headers :) let $newfields := ( <field name="username" value="{$username}"/>, <field name="uri" value="{$path}"/>, <field name="cnonce" value="{$cnonce}"/>, <field name="nc" value="{$nc}"/>, <field name="response" value="{$response}"/> ) let $authorization := concat("Digest ", http:nvs-to-string(($field,$newfields))) let $header2 := <headers> {$header/header} <header name="Authorization" value='{$authorization}'/> </headers> return httpclient:post( $uri, $doc, false(), $header2 ) };

Note that on under eXist 1.4 the util:md5($string) function has been deprecated. You should now use util:hash($string, 'md5) function with the second parameter now the type of hash.

Example
In this example, an RDF file is POSTed to the Talis server.
declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; import module namespace http = "http://www.cems.uwe.ac.uk/xmlwiki/http" at "http.xqm"; let $rdf := let $path := doc("/db/RDF/dataset.rdf")/rdf:RDF "/store/mystore/meta"

let $username := "myusername" let $password := "mypassword" let $host := "http://api.talis.com" let $header := <headers> <header name="Content-Type" value="application/rdf+xml"/> </headers> return http:put-with-digest($host, $path, $username, $password, $rdf , $header)

Digest Authentication

72

References
http://en.wikipedia.org/wiki/Digest_access_authentication Wikipedia Page on Digest Authentication http://technet.microsoft.com/en-us/library/cc780170%28WS.10%29.aspx Microsoft Technet Article

References
[1] http:/ / www. talis. com/ platform/

Digital Signatures
Motivation
You want to verify that a document sent to you has not been modified.

Method
We will use the W3C Digital Signature standard. We will use the standard Java function to sign and verify the signature of a document. Warning: This program is not working yet

Creating a Local Keystore


To use the function you will need to create a local key store to store your information in. In production systems the key store is stored on an internal server but in this example we will store it in the eXist database as a binary file. The following shell command shows how the keytool program that comes with the Java JRE can be used to generate a keystore file:
/usr/java/bin/keytool -genkeypair -dname "cn=Test Certificate, ou=MyDivision, o=MyCompany, c=US" -alias eXist -keypass kpi135 -keystore /tmp/keystore.pem -storepass ab987c validity 180

After you run this file put the /tmp/keystore.pem file into your file system /db/test/dig-sig/keystore.pem

Adding a XQuery Function Wrapper Module


We will add a custom jar file to our $EXIST_HOME/lib/extensions area called x-krypt.jar. After this file has been loaded we need to add the following line to the $EXIST_HOME/conf.xml in the xquery/builtin-modules area (around line 780): <module class="ro.kuberam.xcrypt.XcryptModule" uri="http://kuberam.ro/x-crypt" />

Adding a Digital Signature to a File


After rebooting the server the following can be executed:
xquery version "1.0"; let $keystore-file-path := '/db/test/dig-sig/keystore.txt'

Digital Signatures
return if ( not(util:binary-doc-available($keystore-file-path)) ) then <error><message>Keystore File {$keystore-file-path} Not Available</message></error> else let $doc := <data><a>1</a><b>7</b><c/><c/></data> let $certificate-details := <digital-certificate> <keystore-type>JKS</keystore-type> <keystore-name>{$keystore-file-path}</keystore-name> <keystore-password>ab987c</keystore-password> <key-alias>eXist</key-alias> <private-key-password>kpi135</private-key-password> </digital-certificate> let $signed-doc := x-crypt:generate-signature($doc, "inclusive", "", "DSA_SHA1", "ds", "enveloped", $certificate-details ) return <results> <doc>{$doc}</doc> <keystore-file-path>{$keystore-file-path}</keystore-file-path> </results>

73

Validating a Digital Signature


The same process that was used to sign an XML document can be used to verify its signature.

DocBook to HTML

74

DocBook to HTML
Motivation
You would like to convert DocBook documents to HTML format.

Method
We will use an XQuery transform that converts sample instance documents into an XQuery typeswitch module. To begin this process you can use any tool that generates an instance document from the XML Schema. You can then edit this document to include only the elements that you want to transform. You can then run this file through the tool to generate the typeswich XQuery module.

References
Chis Wallace has provided a tool that converts the XML Docbook into a typeswitch here: ../Generating Skeleton Typeswitch Transformation Modules/ DocBook to HTML Typeswitch Transform [1]

References
[1] http:/ / exist. svn. sourceforge. net/ svnroot/ exist/ branches/ dmccreary/ docs/ webapp/ docs/ docbook5/ docbook2xhtml-v2. xqm

DOJO data
Motivation
You want to use XQuery with your DOJO JavaScript library which uses a variation of JSON syntax.

Method
DOJO is a framework for developing rich client side applets in javascript: from the nice to have to the core webapp. Some day you may want to deliver your data in a way, that you or other people can easily use from DOJO. DOJO specifies its own idiosyncratic way of wrapping data in JSON formatted objects, so it can be consumed by lots of its widgets: trees, grids, comboboxes, input fields etc. Below example (note the use of single quotes, which makes this invalid JSON) is taken from its web supplied documentation: { identifier: 'abbr', label: 'name', items: [ { abbr:'ec', name:'Ecuador', { abbr:'eg', name:'Egypt', ]}

capital:'Quito' }, capital:'Cairo' }

Now, if eg. you want to feed an incremental user input widget from a server side search, xquery (in eXist at least) makes this a piece of cake. Please read below script as an introduction to the concept, very likely it can be optimized. The search itself uses a lucene fulltext index, which returns very quickly.

DOJO data xquery version "1.0"; import module namespace json="http://www.json.org"; declare namespace request="http://exist-db.org/xquery/request"; declare option exist:serialize "method=html media-type=text/javascript"; (: where the data lives:) let $coll := "/db/apps/myapp/data" (: what we are looking for, sanitize remote input :) let $tmp := xs:string(request:get-parameter("q", "")) let $querystring := replace($tmp, "[^0-9a-zA-Z\-,. ]", "") let $query := <query> <near slop="10" ordered="no">{$querystring}</near> </query> return (: fetch results, dont forget to create an index in collection.xconf :) let $hits := collection($coll)//article[ft:query(., $query)] let $count := count($hits) let $result := <result> <identifier>id</identifier> <label>title</label> <count>{$count}</count> { for $item in $hits return <items> <id>{string($item/@id)}</id> {$item/title} </items> } </result> return json:xml-to-json($result) The xquery extension json:xml-to-json($node as node()) does all the magic. In the result variable the data structure is created in the way DOJO wants it (per default), as shown above. Another thing to note: DOJO expects the identifier to be unique. It is up to you to design your data to satisfy this. Another note: as of today (eXist trunk of early september 2010) numbers in the output are quoted, it is up to you to convert them on the client for optimal processing.

75

Dynamic Module Loading

76

Dynamic Module Loading


Motivation
You want to conditionally import a module. For example the module might provide a list of all the functions to style a web page such as header/footer and breadcrumbs.

Method
Module import
We will use the XQuery function util:import-module(). This function has three arguments: $namespace: The full URI of the module that you are loading such as http://example.com/my-module $prefix: the prefix you want to use to reference each function in the module such as style $location: the database path that you will be loading the module from such as an absolute path/db/modules/my-module.xqm or a relative path my-module.xqm For example the following will import a module called my-module from the /db/modules collection.:
util:import-module(xs:anyURI('http:/ / example. com/ my-module'), 'style', xs:anyURI('/db/modules/my-module.xqm'))

The function xs:anyURI is used to cast each string into the URL type.

Function invocation
Because the namespace is declare dynamically, the imported functions have to be invoked using util:eval. The input to this function is a string containing an XQuery expression. e.g. util:eval('style:header()')

Example
The following will randomly load one of two style modules. xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=yes indent=yes"; let $module := if (math:random() < 0.5) then util:import-module( xs:anyURI('http://example.com/style-a'), 'style', xs:anyURI('style-a.xqm') ) else util:import-module( xs:anyURI('http://example.com/style-b'), 'style', xs:anyURI('style-b.xqm') )

Dynamic Module Loading return <html> <head> <title>Test of Dynamic Module Import</title> {util:eval('style:import-css()')} </head> <body> {util:eval('style:header()')} {util:eval('style:breadcrumb()')} <h1>Test of Dynamic Module Import</h1> {util:eval('style:footer()')} </body> </html> Run [1]

77

Style A Module
Here is an example of a style module. It has four functions. One to import the CSS files, one for the header, one for the navigation breadcrumb and one for the footer. xquery version "1.0"; module namespace style='http://example.com/style-a'; declare function style:import-css() { <link type="text/css" rel="stylesheet" href="style-a.css"/> }; declare function style:header() { <div class="header"> <h1>Header for Style A</h1> </div> }; declare function style:breadcrumb() { <div class="breadcrumb"> <h1>Breadcrumb for Style A</h1> </div> }; declare function style:footer() { <div class="footer"> <h1>Footer for Style A</h1> </div> };

Dynamic Module Loading

78

Style A CSS
body { color: blue; }

Style B Module
xquery version "1.0"; module namespace style='http://example.com/style-b'; declare function style:import-css() { <link type="text/css" rel="stylesheet" href="style-b.css"/> }; declare function style:header() { <div class="header"> <h1>Header for Style B</h1> </div> }; declare function style:breadcrumb() { <div class="breadcrumb"> <h1>Breadcrumb for Style B</h1> </div> }; declare function style:footer() { <div class="footer"> <h1>Footer for Style B</h1> </div> };

Style B CSS
body { color: red; }

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ dynamicModule. xq

Examples Wanted

79

Examples Wanted
Examples wanted
If you would like examples of XQuery code to be added to the Wikibook, please list your suggestions here.

Suggestions
XUpdate xquery examples for xml data mining

eXist demo server


The sample scripts are executed on a server running a release of the eXist XML database. This server is based in the Faculty of Computing, Engineering and Mathematical Sciences [1] at the University of the West of England, Bristol [2] . The demo server is currently running release 1.4.0 We appreciate the use of this free server and hope you use its resources respectfully.

References
[1] http:/ / www. uwe. ac. uk/ cems [2] http:/ / www. uwe. ac. uk

Extracting data from XHTML files

80

Extracting data from XHTML files


Motivation
You want to perform an XQuery on XHTML files that use the XHTML namespace.

Method
We will start our XQuery by adding the default namespace for XHTML. declare default element namespace "http://www.w3.org/1999/xhtml";

Sample Source Code


Assume that you have an well-formed XHTML file in the file /db/text/index.xhtml xquery version "1.0"; declare default element namespace "http://www.w3.org/1999/xhtml"; declare option exist:serialize "method=html media-type=text/html indent=yes"; let $doc := doc('/db/test/index.xhtml') let $body := $doc/html/body/* return <html> <head> <title>Replace Head</title> </head> <body> {$body} </body> </html>

Filling Portlets

81

Filling Portlets
Motivation
You want to be able to create reports that work with industry standard portals. These systems have standard div tags with class attributes that are standardized. For example the searchbox for a page will have the following XHTML: <div class="portal-searchbox"> </div>

Method
We will create a report that is structured as a set of divs with the appropriate class tags. We can then take the URL for this report and add it to the portal management system. Our report will automatically be styled according to the central portal style sheet. Portal software allows These divs need to be filled by XQueries: portal-wrapper portal-top portal-header portal-breadcrumbs portal-searchbox portal-advanced-search portal-footer portal-colophon portal-personaltools

Flickr GoogleEarth

82

Flickr GoogleEarth
Flickr photos which are geo-coded can be used to generate a GoogleEarth overlay. [** API not functional on this server yet **] Select Photos [1] The code for the Flickr Api to kml transformation. $flickrKey is my Flickr API key (not shown).
declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml";

declare function local:callFlickr($method,$params){ doc(concat("http://api.flickr.com/services/rest/?method=",$method,"&api_key={$flickrKey}", string-join($params,"&"))) };

let $username := request:request-parameter("username","") let $tags := string-join(request:request-parameter("tags",""),",")

let $user := string(local:callFlickr("flickr.people.findByUsername",concat("username=",$username))//user/@id) return <Folder> <name>Places for {$username} tagged {$tags}</name> { for $photo in local:callFlickr("flickr.photos.search",(concat("user_id=",$user),concat("tags=",$tags)))//photo let $photo_id := string($photo/@id) let $details := local:callFlickr("flickr.photos.getInfo",concat("photo_id=",$photo_id))//photo

where exists($details/location) return <Placemark> <name>{string($details/title)}</name> <description> {let $url := string(local:callFlickr("flickr.photos.getSizes",concat("photo_id=",$photo_id))//size[@label="Small"]/@source) return util:serialize(<div> <a href="http://www.flickr.com/photos/{string($details/owner/@nsid)}/{$photo_id}"><img src="{$url}"/></a> </div>,()) } <div>{string($details/description)}</div> </description> <Point> <coordinates>{string($details/location/@longitude)},{string($details/location/@latitude)},0</coordinates> </Point> </Placemark> } </Folder>

Flickr GoogleEarth

83

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ selectFlickr. xq

FLWOR Expression
Motivation
You have a sequence of items and you want to create a report that contains these items.

Method
We will use a basic XQuery FLWOR expression to iterate through each of the items in a sequence. The five parts of a FLWOR expression are: for - specifies what items in the sequence you want to select (optional) let - used to create temporary names used in the return (optional) where - limit items returned (optional) order - change the order of the results (optional)

return - specify the structure of the data returned (required) Here is a simple example of a FLWOR expression: for $book in doc("catalog.xml")/books/book let $title := $book/title/text() let $price := $book/price/text() where xs:decimal($price) gt 50.00 order by $title return <book> <title>{$title}</title> <price>{$price}</price> </book> This XQuery FLWOR expression will return all books that have a price over $50.00. Note that we have not just one but two let statements after the for loop. We also add a where clauses to restrict the results to books over $50.00. The results are sorted by the title and the result is a new sequence of book items with both the price and the title in them.

Using the "to" function to generate a range of values


You can also express a range of values from one number to another number by placing the keyword "to" between two numbers in a sequence. The following generates a list of values from 1 to 10. xquery version "1.0"; <list> {for $i in (1 to 10) return <value>{$i}</value> }

FLWOR Expression </list>

84

Formatting Numbers
Motivation
You want an easy way to format numbers by specifying the picture format of the number. So for example if you want to format numbers with a leading dollar sign, commas and two decimal places you would use the following "picture format": format-number($my-decimal, "$,000.00") If the input number was 1234 the output would be $1,234.00

Method 1 - XSLT Wrapper


The format-number is a standard function in XSLT 1.0 and XPath 2.0. It is also included in the draft XQuery 1.1 Requirements [1] To use this with eXist we will just write a wrapper to the Saxon XSLT format-number() function. To do this you will need to do the following: 1. download a copy of the Saxon9B XSLT program from http://prdownloads.sourceforge.net/saxon/ saxonb9-1-0-2j.zip 2. unzip the package and copy three jar files (saxon9.jar, saxon9-dom.jar and saxon9-xpath.jar) into your your eXist lib/endorsed folder 3. comment out the following line in your eXist conf.xml file <transformer class="org.apache.xalan.processor.TransformerFactoryImpl"/> 4. un-comment the three lines that enable Saxon to be used as the default XSLT 5. restart your eXist server 6. add a function that will wrap a simple XSLT. See the example code below.

Source Code
We will create an XQuery function that takes two arguments. One decimal number and the second a string that specifies the picture format. We will pass these both to a small XSLT stylesheet.
(: the numeric picture format function from XPath 2.0. eXist we must enable Saxon as the default XSLT engine. conf.xml file in the eXist folder for details. :) declare function local:format-number($n as xs:decimal ,$s as xs:string) as xs:string { string(transform:transform( <any/>, <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match='/'> <xsl:value-of select="format-number({$n},'{$s}')"/> </xsl:template> </xsl:stylesheet>, () To work with See the

Formatting Numbers
)) };

85

Usage
The XSLT 1.0 format-number() [2] function takes two arguments. The first is a decimal number and the second is a string that represents a picture of the output you desire. The format string is defined in the Java class DecimalFormat
[3]

If you want comma-separated values: local:format-number($my-decimal, ',000') If you want leading dollar signs: local:format-number($my-decimal, '$,000') The format of negative numbers is specified in a second picture format followed by a comma. If you want negative numbers to have a minus sign: local:format-number($my-decimal, '0,000.00;-0,000.00')

Run tests
Run [4]

Method 2 - XQuery function


[external links broken] Minollo posted an XQuery function in his blog. His code passed a suite of tests.

Test
Run tests [5]

Discussion
It is our sincere hope that a future version of XQuery includes the functions to allow the developer to easily format both numeric and date formats.

Reference
Blog posting on XML Connections blog on format-number() written in XQuery [6]

References
[1] [2] [3] [4] [5] [6] http:/ / www. w3. org/ TR/ xquery-11-requirements/ #numeric-formatting http:/ / www. w3. org/ TR/ xslt#format-number http:/ / java. sun. com/ j2se/ 1. 4. 2/ docs/ api/ java/ text/ DecimalFormat. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Test/ formatnumber-xslt. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Test/ formatnumber-xquery. xq http:/ / www. xml-connection. com/ 2007/ 08/ formatting-numbers-in-xquery-10. html

Generating PDF from XSL-FO files

86

Generating PDF from XSL-FO files


Motivation
You want to generate documents with precise page layout from XML documents, for example to PDF.

Approach
Typically, the steps required to generate a PDF document are: retrieve or compute the base XML document transform to XSL-FO, perhaps using XSL transform the XSL-FO to PDF using Apache FOP

Method
We will use a built-in function to convert XSL-FO into PDF. (See ../Installing the XSL-FO module/ if this module is not installed and configured.)

Using the xslfo:render() function


The function is the xslfo:render(). It has the following structure:
let $pdf-binary := xslfo:render($input-xml-fo-document, 'application/pdf', $parameters)

This file can be saved directly to the XML file system. It will be stored as a non-searchable binary. You can then view this directly by providing a link to the file or you can send it directly to the browser by using the response:stream-binary() function as follows:
return response:stream-binary($pdf-binary, 'application/pdf', 'myGeneratedPDF.pdf')

Example XQuery to Generate PDF


The following program will generate a PDF document with the text "Hello World". xquery version "1.0"; declare namespace fo="http://www.w3.org/1999/XSL/Format"; declare namespace xslfo="http://exist-db.org/xquery/xslfo"; let $fo := <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="my-page"> <fo:region-body margin="1in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="my-page"> <fo:flow flow-name="xsl-region-body"> <fo:block>Hello World!</fo:block> </fo:flow> </fo:page-sequence>

Generating PDF from XSL-FO files </fo:root> let $pdf := xslfo:render($fo, "application/pdf", ()) return response:stream-binary($pdf, "application/pdf", "output.pdf") Execute [1]

87

Notes on Installing XSL-FO


Enabling the XSL-FO Module
Make sure that the module extension is loaded. You can do this by going to the $EXIST_HOME/conf.xml file and un-commenting the following line (around line 769):
<module class="org.exist.xquery.modules.xslfo.XSLFOModule" uri="http://exist-db.org/xquery/xslfo"> <parameter name="processorAdapter" value="org.exist.xquery.modules.xslfo.ApacheFopProcessorAdapter"/> </module

Where the two possible values for the processorAdapter parameter are:
org.exist.xquery.modules.xslfo.ApacheFopProcessorAdapter for Apache's FOP org.exist.xquery.modules.xslfo.RenderHouseXepProcessorAdapter for RenderHouse's XEP

If the module is correctly loaded then you should see it in the function documentation. Make sure that you have correctly edited the $EXIST_HOME/extensions/build.properties to set XSLFO to to be true: Change: # XSL FO transformations (Uses Apache FOP) include.module.xslfo = false To be: include.module.xslfo = true Make sure that the build file can get access to the correct fop.jar file from the Apache web site.

Downloading XSL-FO Jar Files


Exist comes with a sample ant task that can automatically download the FOP distribution zip file, extract the tree jar files we need and remove the rest. Here is the ant target from the eXist 1.4 $EXIST_HOME/modules/build.xml
<target name="prepare-libs-xslfo" unless="libs.available.xslfo" if="include.module.xslfo.config"> <echo message="Load: ${include.module.xslfo}"/> <echo message="------------------------------------------------------"/> <echo message="Downloading libraries required by the xsl-fo module"/> <echo message="------------------------------------------------------"/> <!-- Apache FOP .95 --> <get src="${include.module.xslfo.url}" dest="fop-0.95-bin.zip" verbose="true" usetimestamp="true" /> <unzip src="fop-0.95-bin.zip" dest="${top.dir}/${lib.user}"> <patternset>

Generating PDF from XSL-FO files


<include name="fop-0.95/build/fop.jar"/> <include name="fop-0.95/lib/batik-all-1.7.jar"/> <include name="fop-0.95/lib/xmlgraphics-commons-1.3.1.jar"/> </patternset> <mapper type="flatten"/> </unzip> <delete file="fop-0.95-bin.zip"/> </target>

88

Note that fop 1.0 is now available so you can change this task to be the following:
<target name="prepare-libs-xslfo" unless="libs.available.xslfo" if="include.module.xslfo.config"> <echo message="Load: ${include.module.xslfo}"/> <echo message="------------------------------------------------------"/> <echo message="Downloading libraries required by the xsl-fo module"/> <echo message="------------------------------------------------------"/>

<!-- Download the Apache FOP Processor from the Apache Web Site--> <get src="${include.module.xslfo.url}" dest="fop-1.0-bin.zip" verbose="true" usetimestamp="true" /> <unzip src="fop-1.0-bin.zip" dest="${top.dir}/${lib.user}"> <patternset> <include name="fop-1.0/build/fop.jar"/> <include name="fop-1.0/lib/batik-all-1.7.jar"/> <include name="fop-1.0/lib/xmlgraphics-commons-1.3.1.jar"/> </patternset> <mapper type="flatten"/> </unzip> <delete file="fop-1.0-bin.zip"/> </target>

Sample Transcript
The following is a sample transcript: prepare-xslfo:
[echo] Load: true [echo] -----------------------------------------------------[echo] Downloading libraries required by the xsl-fo module [echo] -----------------------------------------------------[fetch] Getting: http:/ / apache. cs. uu. nl/ dist/ xmlgraphics/ fop/ binaries/ fop-1. 0-bin. zip [fetch] To: C:\DOCUME~1\DANMCC~1\LOCALS~1\Temp\FetchTask8407348433221748527tmp [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] ....................................................

Generating PDF from XSL-FO files


[fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................... [fetch] Expanding: C:\DOCUME~1\DANMCC~1\LOCALS~1\Temp\FetchTask8407348433221748527tmp into C:\ws\exist-trunk\lib\us

89

At the end of this process you should see the following three jar files in your $EXIST_HOME/lib/extensions folder:
cd $EXIST_HOME/lib/extensions $ ls -l -rwxrwxrwx+ 1 Dan McCreary None 3318083 2010-12-10 09:23 batik-all-1.7.jar -rwxrwxrwx+ 1 Dan McCreary None 3079811 2010-12-10 09:23 fop.jar -rwxrwxrwx+ 1 Dan McCreary None 569113 2010-12-10 09:23 xmlgraphics-commons-1.4.jar

If you do not see these files you can manually copy them from the a download of the XSL-FO binaries. Now go to the $EXIST_HOME directory and type "build". You should not see any error messages. If you do got to the build file and fix or remove the errors. After you reboot you should be able to see the XSL-FO convert the file into a PDF file.

Using Config File for External References


When you reference an image you must either use an absolute reference and make sure that the server has read access or you must use a relative path reference. The root of relative path references can be set in the xslfo config file. xquery version "1.0"; declare namespace fo="http://www.w3.org/1999/XSL/Format"; declare namespace xslfo="http://exist-db.org/xquery/xslfo"; let $fop-config := <fop version="1.0"> <!-- Base URL for resolving relative URLs --> <base>http://localhost:8080/exist/rest/db/nosql/pdf/images</base> </fop> let $fo := doc('/db/test/xslfo/fo-templates/samle-fo-file-with-external-references.fo') let $pdf := xslfo:render($fo, "application/pdf", (), $fop-config) return response:stream-binary($pdf, "application/pdf", "output.pdf")
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="my-page"> <fo:region-body margin="0.5in"/> </fo:simple-page-master> </fo:layout-master-set>

Generating PDF from XSL-FO files


<fo:page-sequence master-reference="my-page"> <fo:flow flow-name="xsl-region-body"> <fo:block>Test of external SVG reference </fo:block> <fo:block> SVG Chart Test <fo:external-graphic content-width="7.5in" scaling="uniform" src="url(my-test-image.png)"/> content-width="7.5in" scaling="uniform" src="url(chart.svg)" </fo:block> </fo:flow> </fo:page-sequence> </fo:root>

90

Including SVG Images in your PDF files


When you create PDF documents you have the ability to include "line art" directly in the PDF files that have use the SVG format. There are some translation issues from SVG to PDF but much of the line-art converts very well. To get SVG rendering to work within eXist you must also load the Sun AWT libs if you reference SVG images. http://xmlgraphics.apache.org/fop/0.95/graphics.html#batik Which says you must tell Java to force-load the awt libraries when the JVM starts up: -Djava.awt.headless=true In your $EXIST_HOME/startup.bat or $EXIST_HOME/startup.sh you will need to add the following:
set JAVA_OPTS="-Xms128m -Xmx512m -Dfile.encoding=UTF-8 -Djava.endorsed.dirs=%JAVA_ENDORSED_DIRS% -Djava.awt.headless=true"

If you are using the "wrapper" tool to start your sever you will need to add the following lines to the $EXIST_HOME/tools/wrapper/conf/wrapper.conf # make AWT load the fonts for SVG rendering inside of XSLFO wrapper.java.additional.6=-Djava.awt.headless=true

Using Inline SVG


One of easy ways to test your configuration is to use an inline reference to an SVG file. You can do this by using the fo:instream-foreign-object element. The following is an example of this.
<fo:block> Test of inline SVG reference. <fo:block> <fo:instream-foreign-object content-width="7.5in" scaling="uniform"> <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" height="200" width="200"> <circle cx="100" cy="100" r="40" stroke="black" stroke-width="2" fill="blue"/> </svg> </fo:instream-foreign-object> </fo:block> content-width="7.5in"

Generating PDF from XSL-FO files


scaling="uniform" </fo:block>

91

Sample External SVG Reference


Note this assumes you have configured your <base> URL in the FOP configuration file.
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="my-page"> <fo:region-body margin="0.5in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="my-page"> <fo:flow flow-name="xsl-region-body"> <fo:block>Test of external SVG reference</fo:block> <fo:block> SVG Chart Test <fo:external-graphic content-width="7.5in" scaling="uniform" src="url(chart.svg)"/> content-width="7.5in" scaling="uniform" src="url(chart.svg)" </fo:block> </fo:flow> </fo:page-sequence> </fo:root>

Notes
See ../XSL-FO Tables/ and ../XSL-FO Images/ on how to add print quality tables and charts to your document. When you follow trunk, sometimes conf.xml gets reset to the defaults, and you have to reenable xslfo processing in conf.xml. The error printed if you miss this reads like that: "cannot compile xquery: err:xpst0017 call to undeclared function: xslfo:render".

Acknowledgments
The user Dmitriy has been helpful in the creation of the procedure for installation on systems that do not have source code.

Discussion
The steps to enable the FOP module should be listed somewhere in the eXist administrative site and removed from this Wikibook.

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xsl-fo/ helloworld. xq

Generating Skeleton Typeswitch Transformation Modules

92

Generating Skeleton Typeswitch Transformation Modules


Motivation
For document types which contain multiple tags, such as TEI or DocBook, it is tedious and error-prone to write this conversion code by hand. We can use XQuery to generate the basic text for an XQuery module. This article uses the same example as used in ../Transformation idioms/.

Example
Starting with a simple list of the tags in this document, we can generate a module which performs an identity transform on a document containing these tags. import module namespace gen = at "gen.xqm"; "http://www.cems.uwe.ac.uk/xmlwiki/gen"

let $tags := ("websites","sites","site","uri","name","description") let $config := <config> <modulename>coupland</modulename> <namespace>http://www.cems.uwe.ac.uk/xmlwiki/coupland</namespace> </config> return gen:create-module($tags, $config) Here is the XML output [1] and the text XQuery file [2] created by adding the line declare option exist:serialize "method=text media-type=text/text";

to the script. If we save this script as, say coupid.xqm, we can use it to generate the transformed document: import module namespace coupland = "http://www.cems.uwe.ac.uk/xmlwiki/coupland" at "coupid.xqm"; let $doc := doc("/db/Wiki/eXist/transformation/Coupland1.xml")/* return coupland:convert($doc)

Generate [3] We can also check if the identity transformation has retained the full structure of the document: import module namespace coupland = "http://www.cems.uwe.ac.uk/xmlwiki/coupland" at "coupid.xqm"; let $doc := doc("/db/Wiki/eXist/transformation/Coupland1.xml")/*

Generating Skeleton Typeswitch Transformation Modules return <compare>{deep-equal($doc,coupland:convert($doc))}</compare> Compare [4]

93

Module design
The generated module looks like this: module namespace coupland = "http://www.cems.uwe.ac.uk/xmlwiki/coupland"; (: conversion module generated from a set of tags :) declare function coupland:convert($nodes as node()*) as item()* { for $node in $nodes return typeswitch ($node) case element(websites) return coupland:websites($node) case element(sites) return coupland:sites($node) case element(site) return coupland:site($node) case element(uri) return coupland:uri($node) case element(name) return coupland:name($node) case element(description) return coupland:description($node) default return coupland:convert-default($node) }; declare function coupland:convert-default($node as node()) as item()* { $node }; declare function coupland:websites($node as element(websites)) as item()* { element websites{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:sites($node as element(sites)) as item()* { element sites{ $node/@*, coupland:convert($node/node()) } };

Generating Skeleton Typeswitch Transformation Modules declare function coupland:site($node as element(site)) as item()* { element site{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:uri($node as element(uri)) as item()* { element uri{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:name($node as element(name)) as item()* { element name{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:description($node as element(description)) as item()* { element description{ $node/@*, coupland:convert($node/node()) } };

94

The function convert($nodes) contains the typeswitch statement to dispatch the node to one of the tag functions. Each tag function creates an element of that name, copies the attributes and then recursively calls the convert function passing the child nodes. The default action defined in the function convert-default merely copies the node.

Generation Function
This function generates the code for an XQuery module which performs an identity transformation. There are two parameters tags - a sequence of tags config - an XML node containing definitions of the module name, module prefix and module namespace. declare variable $gen:cr := "&#13;"; declare function gen:create-module($tags as xs:string*, $config as element(config) ) as element(module) { let $modulename := $config/modulename/text() let $prefix := $config/prefix/text() let $pre:= concat($modulename,":",$prefix)

Generating Skeleton Typeswitch Transformation Modules let $namespace := ($config/namespace,"http://mysite/module")[1]/text() return <module> module namespace {$modulename} = "{$namespace}"; (: conversion module generated from a set of tags :) <function> declare function {$pre}convert($nodes as node()*) as item()* {{ {$gen:cr} for $node in $nodes return typeswitch ($node) {for $tag in $tags return <s>case element({$tag}) return {$pre}{replace($tag,":","-")}($node) </s> } default return {$pre}convert-default($node) }}; </function> <function> declare function {$pre}convert-default($node as node()) as item()* {{ {$gen:cr} $node }}; </function> {for $tag in $tags return <function> declare function {$pre}{replace($tag,":","-")}($node as element({$tag})) as item()* {{ {$gen:cr} element {$tag} {{ $node/@*, {$pre}convert($node/node()) }}{$gen:cr} }}; </function> } </module> };

95

Generating Skeleton Typeswitch Transformation Modules

96

Generating the tags


All tags in the document or corpus need to be handled by the identity transformation, so it would be better to generate the list of tags from the document or corpus itself. The following function returns a sequence of tags in alphabetically order. declare function gen:tags($docs as node()*) as xs:string * { for $tag in distinct-values ($docs//*/name(.)) order by $tag return $tag };

and we can modify the calling script: let $doc := doc("/db/Wiki/eXist/transformation/Coupland1.xml") let $tags := gen:tags($doc) let $config := <config> <modulename>coupland</modulename> <namespace>http://www.cems.uwe.ac.uk/xmlwiki/coupland</namespace> </config> return gen:create-module($tags, $config) Generate [5]

User-defined function template


The module generator function generates a fixed code pattern for each tag. We can allow the user to customize this pattern by using a callback function to generate the code pattern as an alternative to modifying the generator code itself. The modified function code has the following modifications: Function signature ;
declare function gen:create-module($tags as xs:string*, $callback as function, $config as element(config) ) as element(module) {

generating each tag function: <function> declare function {$pre}{replace($tag,":","-")}($node as element({$tag})) as item()* {{ {$gen:cr} {util:call($callback,$tag,$pre)}{$gen:cr} }}; </function> To generate a basic transformation to HTML, with HTML elements being copied while non-HTML elements are converted to div elements with an additional class attribute, we define the function to create the code body, create the function reference and call the convert function:

Generating Skeleton Typeswitch Transformation Modules import module namespace gen = "http://www.cems.uwe.ac.uk/xmlwiki/gen" at "gen.xqm"; declare namespace fx = "http://www.cems.uwe.ac.uk/xmlwiki/fx"; declare variable $fx:html-tags := ("p","a","em","q"); declare function fx:tag-code ($tag as xs:string, $pre as xs:string) { if ($tag = $x:html-tags) then <code> element {$tag} {{ $node/@*, {$pre}convert($node/node()) }} </code> else <code> element div {{ attribute class {{"{$tag}" }}, $node/(@* except class), {$pre}convert($node/node()) }} </code> }; declare option exist:serialize "method=text media-type=text/text"; let $doc := doc("/db/Wiki/eXist/transformation/Coupland1.xml") let $tags := gen:tags($doc) let $callback := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/x","fx:tag-code"),2) let $config := <config> <modulename>coupland</modulename> <namespace>http://www.cems.uwe.ac.uk/xmlwiki/coupland</namespace> </config> return gen:create-module($tags, $callback, $config) Generate [6]

97

Generating Skeleton Typeswitch Transformation Modules

98

Customising the generator


Another customization of the generator which may be required is to add an additional $options parameter to all signatures and calls. This provides a mechanism for passing configuration parameter around the functions to control the transformation.

Transforming with XSLT


When the conversion is complex, requiring restructuring, context-dependent transformations and reordering, it is not clear that the XQuery typeswitch approach is better or worse than the XSLT equivalent. For comparison here is the equivalent XSLT.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="/websites"> <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Web Sites by Coupland</title> <link rel="stylesheet" href="../../css/blueprint/screen.css" type="text/css" media="screen, projection"/> <link rel="stylesheet" href="../../css/blueprint/print.css" type="text/css" media="print"/> <!--[if IE ]><link rel="stylesheet" href="../../css/blueprint/ie.css" type="text/css" media="screen, projection" /><![endif]--> <link rel="stylesheet" href="screen.css" type="text/css" media="screen"/> </head> <body> <div class="container"> <h1>Design web sites by Ken Coupland</h1>

<xsl:apply-templates select="category"> <xsl:sort select="name"/> </xsl:apply-templates> </div> </body> </html> </xsl:template>

<xsl:template match="websites/category"> <div> <div class="span-10"> <h3> <xsl:value-of select="name"/> </h3> <h4> <xsl:value-of select="subtitle"/> </h4> <xsl:copy-of select="description/node()"/> </div> <div class="span-14 last"> <xsl:apply-templates select="../sites/site"> <xsl:sort select="(sortkey,name)[1]" order="ascending"/>

Generating Skeleton Typeswitch Transformation Modules


</xsl:apply-templates> </div> <hr /> </div> </xsl:template> <xsl:template match="site/category">

99

</xsl:template> <xsl:template match="site"> <h3> <xsl:value-of select="name"/> </h3> <span><a href="{uri}">Link</a></span>

<div class="site"> <xsl:apply-templates select="* except (uri,name,sortkey)"/> </div> </xsl:template>

<xsl:template match="description"> <p> <xsl:copy-of select="node()"/> </p> </xsl:template>

<xsl:template match="image"> <img src="{uri}"/>

</xsl:template>

</xsl:stylesheet>

and XQuery to apply this server-side: declare option exist:serialize "method=xhtml media-type=text/html"; let $doc := doc("/db/Wiki/eXist/transformation/Coupland1.xml") let $ss := doc("/db/Wiki/eXist/transformation/tohtml.xsl") return transform:transform($doc, $ss,())

Transform to HTML via XSLT [7]

Generating Skeleton Typeswitch Transformation Modules

100

References
[1] [2] [3] [4] [5] [6] [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidxml. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtext. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtrans. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidcompare. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtext2. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtext3. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtransxsl. xq

Generating xqDoc-based XQuery Documentation


Motivation
You want to create high-quality documentation of your XQuery functions and modules.

Method
xqDoc [1] is a standard for formatting comments in XQuery modules. The eXist system comes with a XQuery module which parses XQuery modules containing comments in this format and generates XML in the xqDoc XML format [2]. This XML can then be transformed into other formats such as HTML, PDF, DocBook or ePub.

Generating the XML


You can automatically generate an XML file from an XQuery module using the following syntax:
let $my-doc := xqdm:scan(xs:anyURI('xmldb:exist:///db/my-modules/my-module.xqm'))

Note that the string must be converted to a data type of anyURI.

Sample Output
Sample XQuery script
xquery version "1.0"; (:~ : This is a simple module which contains a single function : @author Dan McCreary : @version 1.0 : @see http://xqdoc.org : :) module namespace simple = "http://simple.example.com"; (:~ : this function accepts two integers and returns the sum : : @param $first - the first number : @param $second - the second number : @return the sum of $first and $second

Generating xqDoc-based XQuery Documentation : @author Dan McCreary : @since 1.1 : :) declare function simple:add($first as xs:integer, $second as xs:integer) as xs:integer { $first + $second };

101

Sample xqDoc Output


The scanner will generate the following XML:
<xqdoc:xqdoc xmlns:xqdoc="http://www.xqdoc.org/1.0"> <xqdoc:control> <xqdoc:date>Mon Mar 15 22:34:08 GMT 2010</xqdoc:date> <xqdoc:version>1.0</xqdoc:version> </xqdoc:control> <xqdoc:module type="library"> <xqdoc:uri>http://simple.example.com</xqdoc:uri> <xqdoc:name>/db/Wiki/eXist/xqdoc/test.xqm</xqdoc:name>

<xqdoc:comment> <xqdoc:description> This is a simple module which contains a single function</xqdoc:description> <xqdoc:author> Dan McCreary</xqdoc:author> <xqdoc:version> 1.0</xqdoc:version> <xqdoc:see> http://xqdoc.org</xqdoc:see>

</xqdoc:comment> <xqdoc:body xml:space="preserve">xquery version "1.0";

(:~ : This is a simple module which contains a single function : @author Dan McCreary : @version 1.0 : @see http://xqdoc.org : :) module namespace simple = "http://simple.example.com";

(:~ : this function accepts : : @param $first - the first number : @param $second - the second number : @return the sum of $first and $second : @author Dan McCreary : @since 1.1 two integers and returns the sum

Generating xqDoc-based XQuery Documentation


: :) declare function simple:add($first as xs:integer, $second as xs:integer) as xs:integer { $first + $second }; </xqdoc:body> </xqdoc:module> <xqdoc:functions> <xqdoc:function> <xqdoc:comment> <xqdoc:description> this function accepts sum</xqdoc:description> two integers and returns the

102

<xqdoc:author> Dan McCreary</xqdoc:author> <xqdoc:param> $first - the first number </xqdoc:param> <xqdoc:param> $second - the second number</xqdoc:param> <xqdoc:return> the sum of $first and $second</xqdoc:return> <xqdoc:since> 1.1 </xqdoc:since>

</xqdoc:comment> <xqdoc:name>add</xqdoc:name> <xqdoc:signature>add($first as xs:integer, $second as xs:integer) as xs:integer</xqdoc:signature> <xqdoc:body xml:space="preserve">declare function simple:add($first as xs:integer, $second as xs:integer) as xs:integer{ $first + $second };</xqdoc:body> </xqdoc:function> </xqdoc:functions> </xqdoc:xqdoc>

Execute [3] Geodesy module [4]

Known Problems
The parser for the XQuery doc is slightly different than the standard XQuery parser for eXist. In some cases an XQuery that works with eXist will fail under the XQDocs parser. old-style variable declarations are still supported in eXist but not by the xqDoc parser For example the following variable declaration: declare variable $foo:bar { 'Hello World' }; is valid in eXist XQuery but this syntax is not valid in xqDoc which only supports the XQuery standard declaration e.g. declare variable $foo:bar := 'Hello World'; comments must be valid XML text. This is more restrictive than in XQuery. For example < and & must be expressed as &lt; and &amp;

Generating xqDoc-based XQuery Documentation

103

References
[1] [2] [3] [4] http:/ / xqdoc. org/ http:/ / xqdoc. org/ xqdoc-1. 0. xsd http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xqdoc/ test. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xqdoc/ geodoc. xq

Get zipped XML file


Motivation
You want to process XML documents from the web which contained in a zip file.

Implementation
This script uses the the unzip function in the eXist compression module. This function uses higher order functions to filter the required components of the zipped file and to process each component.

The Unzip Function


The unzip function has five input parameters, two of which are XQuery functions that are passed to the unzip function. Each of these functions in turn have parameters. Here is the general layout of the compression function: compression:unzip( $zip-data as xs:base64Binary, $entry-filter as function, $entry-filter-param as xs:anyType*, $entry-data as function, $entry-data-param as xs:anyType*) item()* UnZip all the resources/folders from the provided data by calling user defined functions to determine what and how to store the resources/folders $zip-data The zip file data $entry-filter A user defined function for filtering resources from the zip file. The function takes 3 parameters e.g. user:unzip-entry-filter($path as xs:string, $data-type as xs:string, $param as item()*) as xs:boolean. $type may be 'resource' or 'folder'. $param is a sequence with any additional parameters, for example a list of extracted files.If the return type is true() it indicates the entry should be processed and passed to the entry-data function, else the resource is skipped. $entry-filter-param A sequence with an additional parameters for filtering function. $entry-data A user defined function for storing an extracted resource from the zip file. The function takes 4 parameters e.g. user:unzip-entry-data($path as xs:string, $data-type as xs:string, $data as item()?, $param as item()*). $type may be 'resource' or 'folder'. $param is a sequence with any additional parameters $entry-data-param A sequence with an additional parameters for storing function. In the first example, we know that there is only one XML file and we intend to process the XML in the script. Later examples store the file or files for later processing.

Get zipped XML file

104

Extracting a single zipped file


declare namespace fw = "http://www.cems.uwe.ac.uk/xmlwiki/fw"; declare function fw:filter($path as xs:string, $type as xs:string, $param as item()*) as xs:boolean { (: pass all :) true() };

declare function fw:process($path as xs:string,$type as xs:string, $data as item()? , $param as item()*) { (: return the XML :) $data };

let $uri := request:get-parameter("uri","http://www.iso.org/iso/iso_3166-1_list_en.zip") let $zip := httpclient:get(xs:anyURI($uri), true(), ())/httpclient:body/text() let $filter := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:filter"),3) let $process := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:process"),4) let $xml := compression:unzip($zip,$filter,(),$process,()) return $xml

Execute [1]

Sample XML Output


<ISO_3166-1_List_en xml:lang="en"> <ISO_3166-1_Entry> <ISO_3166-1_Country_name>AFGHANISTAN</ISO_3166-1_Country_name> <ISO_3166-1_Alpha-2_Code_element>AF</ISO_3166-1_Alpha-2_Code_element> </ISO_3166-1_Entry> <ISO_3166-1_Entry> <ISO_3166-1_Country_name>LAND ISLANDS</ISO_3166-1_Country_name> <ISO_3166-1_Alpha-2_Code_element>AX</ISO_3166-1_Alpha-2_Code_element> </ISO_3166-1_Entry> ... </ISO_3166-1_List_en>

How the Process Function Works


The compression:unzip() function calls the process function for each component in the zip archive it finds. This is known as a callback function. You can place any valid XQuery code in the process function to do what you would like with each input file such as list or store it. For example the following process function will list all the items in a zip file, their path, their type and the root node if the item is an XML file. declare function t:process($path as xs:string, $type as xs:string, $data as item()? , $param as item()*) { (: return a list of the items in the zip file. :) <item path="{$path}" type="{$type}">{name($data/*)}</item>

Get zipped XML file }; Running this on a Office Open XML file returns the following: <item <item <item <item <item <item <item <item <item <item <item path="[Content_Types].xml" type="resource">Types</item> path="_rels/.rels" type="resource">Relationships</item> path="word/_rels/document.xml.rels" type="resource">Relationships</item> path="word/document.xml" type="resource">w:document</item> path="word/theme/theme1.xml" type="resource">a:theme</item> path="word/settings.xml" type="resource">w:settings</item> path="word/fontTable.xml" type="resource">w:fonts</item> path="word/webSettings.xml" type="resource">w:webSettings</item> path="docProps/app.xml" type="resource">Properties</item> path="docProps/core.xml" type="resource">cp:coreProperties</item> path="word/styles.xml" type="resource">w:styles</item>

105

Storing the unzipped File


You probably want to store the unzipped documents in the database. We can modify the process function to do this. We can use the third parameter to pass in the directory in which to store each file. In addition we need to create a collection to hold the unzipped files.
declare namespace fw = "http://www.cems.uwe.ac.uk/xmlwiki/fw";

declare function fw:filter($path as xs:string, $type as xs:string, $param as item()*) as xs:boolean { (: pass all :) true() };

declare function fw:process($path as xs:string,$type as xs:string, $data as item()? , $param as item()*) { (: store the XML in the nominated directory :)

xmldb:store($param/@directory, $path, $data) };

let $baseCollection := "/db/apps/zip/data/" let $uri := request:get-parameter("uri","http://www.iso.org/iso/iso_3166-1_list_en.zip") let $unzipCollection := request:get-parameter("dir","temp") let $zip := httpclient:get(xs:anyURI($uri), true(), ())/httpclient:body/text() let $filter := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:filter"),3) let $process := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:process"),4)

let $login :=

xmldb:login("/db","admin","password")

let $fullPath := concat($baseCollection, $unzipCollection)

Get zipped XML file


let $mkdir := if (xmldb:collection-available($fullPath)) then () else xmldb:create-collection($baseCollection, $unzipCollection)

106

let $store := compression:unzip($zip,$filter,(),$process,<param directory="{$fullPath}"/>) return $store

Unzipping a zip archive


Zip files commonly contain multiple files. In particular Microsoft Word .docX and Excel .xslx files are zipped collections of xmlfiles which together define the document or spreadsheet. When documents are stored in the eXist database, the mime type (media type) is inferred from the file suffix using the mime-types.xml file. Alternatively the mime type can be set explicitly when the document is stored. We assume here that filenames in the zip file are simple. If there is a directory structure, this needs additional coding.
declare namespace fw = "http://www.cems.uwe.ac.uk/xmlwiki/fw";

declare function fw:filter($path as xs:string, $type as xs:string, $param as item()*) as xs:boolean { (: pass all :) true() };

declare function fw:process($path as xs:string,$type as xs:string, $data as item()? , $param as item()*) { (: store the XML in the nominated directory :)

(: we need to encode the filename to account for filenames with illegal characters like [Content_Types].xml :) let $path := xmldb:encode($path) (: ensure mime type is set properly for .rels files which are xml alternatively you could add this mime type to the mime-types.xml configuration file :) return if (ends-with($path, '.rels')) then xmldb:store($param/@directory, $path, $data, 'application/xml') else xmldb:store($param/@directory, $path, $data) };

let $baseCollection := "/db/apps/zip/data/" let $uri := request:get-parameter("uri","http://www.iso.org/iso/iso_3166-1_list_en.zip") let $unzipCollection := request:get-parameter("dir","temp") let $zip := httpclient:get(xs:anyURI($uri), true(),

Get zipped XML file


())/httpclient:body/text() let $filter := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:filter"),3) let $process := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:process"),4)

107

let $login :=

xmldb:login("/db","admin","password")

let $fullPath := concat($baseCollection, $unzipCollection) let $mkdir := if (xmldb:collection-available($fullPath)) then () else xmldb:create-collection($baseCollection, $unzipCollection)

let $store := compression:unzip($zip,$filter,(),$process,<param directory="{$fullPath}"/>) return <result> {for $file in $store return <file>{$file}</file> } </result>

Zips with a directory structure


Most zip files contain a directory tree of files. This directory structure needs to be recreated in the database as the files are unzipped. We can modify the process function to create database collections as necessary, assuming that higher directories are referenced before sub directories.
declare namespace fw = "http://www.cems.uwe.ac.uk/xmlwiki/fw";

declare function fw:filter($path as xs:string, $type as xs:string, $param as item()*) as xs:boolean { (: filter any files which are not required :) if (ends-with($path,".bin")) then false() else true() };

declare function fw:process($path as xs:string,$type as xs:string, $data as item()? , $param as item()*) { (: parse the path and create a collection if necessary :) let $steps := tokenize($path,"/") let $nsteps := count($steps) let $filename := $steps[$nsteps] let $collection := string-join(subsequence($steps,1,$nsteps - 1 ),"/") let $baseCollection := string($param/@collection) let $fullCollection := concat($baseCollection,"/",$collection) let $mkdir :=

Get zipped XML file


if (xmldb:collection-available($fullCollection)) then () else xmldb:create-collection($baseCollection, $collection)

108

let $filename := xmldb:encode($filename) return xmldb:store($fullCollection, $filename, $data) };

let $baseCollection := "/db/apps/zip/data/" let $path := request:get-parameter("path","http://www.iso.org/iso/iso_3166-1_list_en.zip") let $unzipCollection := request:get-parameter("dir","temp")

let $zip :=

httpclient:get(xs:anyURI($path), true(),

())/httpclient:body/text()

let $filter := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:filter"),3) let $process := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:process"),4)

let $login :=

xmldb:login("/db","admin","password")

let $collection := concat($baseCollection, $unzipCollection) let $mkdir := if (xmldb:collection-available($collection)) then () else xmldb:create-collection($baseCollection, $unzipCollection)

let $store := compression:unzip($zip,$filter,(),$process,<param collection="{$collection}"/>) return <result> {for $file in $store return <file>{$file}</file> } </result>

Get zipped XML file

109

Processing stored zip files


It may be desirable to store the zip files in the database as binary resources before they are unzipped. By default files with a .zip suffix are stored as binary data. To store .docx and .xslx files in eXist, you will need to add these suffices to the entry in the $EXIST_HOME/mime-type.xml configuration file. Change

<mime-type name="application/zip" type="binary"> <description>ZIP archive</description> <extensions>.zip</extensions> </mime-type> to

<mime-type name="application/zip" type="binary"> <description>ZIP archive and Office Open XML</description> <extensions>.zip,.docx,.xlsx,.pptx</extensions> </mime-type> You will need to reboot the server for this change to take effect. The basic script remains the same with minor modifications let $path := request:get-parameter("path","http://www.iso.org/iso/iso_3166-1_list_en.zip") let $unzipCollection := request:get-parameter("dir","temp") let $zip := if (starts-with($path,"http")) then httpclient:get(xs:anyURI($path), true(), ())/httpclient:body/text() else util:binary-doc($path)

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Codes/ getCountries. xq

Google Chart Bullet Bar

110

Google Chart Bullet Bar


Motivation
You want to create a "digital dashboard" that displays several key performance indicators using a graphical presentation.

Bullet Bars
From this article Bullet Bars with Google Charts Charts API.
[1]

we can see that it is easy to create bullet bars using the Google

Here is the template that we would like to customize:

Sample URL
http:/ / chart. apis. google. com/ chart?cht=bhs& chs=150x30& chd=t:70& chm=r,ff0000,0,0. 0,0. 5|r,ffff00,0,0. 5,0. 75|r,00A000,0,0. 75,1. 0|r,000000,0,0. 8,0. 81& chco=000000& chbh=10

Sample Screen Image

terms used in dashboard gauge

http://chart.apis.google.com/chart? cht=bhs& chs=150x30& chd=t:70& chm=r,ff0000,0,0.0,0.5| r,ffff00,0,0.5,0.75| r,00A000,0,0.75,1.0| r,000000,0,0.8,0.81& chco=000000&chbh=10

Input Parameters for the Bullet Bar


Here are the parameters: width of the bar in pixels height of the bar in pixels danger-value width value in pixels of the red part of the gauge warn-value width value in pixels of the yellow area in pixels ok-value width value in pixels of the yellow area in pixels target-value distance to the vertical black target bar

Google Chart Bullet Bar actual-value value of the central black line

111

Sample XQuery Function


Note that the width of the danger, warn and ok bars (red, yellow and green) are expressed as a percentage digit from 0 to 100. declare function local:bullet-bar( $height as xs:decimal, $width as xs:decimal, $danger as xs:decimal, $warn as xs:decimal, $ok as xs:decimal, $target as xs:decimal, $actual as xs:decimal ) as xs:string { let $danger-width-percent := $danger div 100 let $warn-width-percent := $warn div 100 let $ok-width-percent := $ok div 100 let $target-width-percent := $target div 100 let $target-plus-one := $target-width-percent + 0.01 return concat( 'http://chart.apis.google.com/chart?cht=bhs&amp;chs=', $width, 'x', $height, '&amp;chd=t:', $actual, '&amp;chm=r,ff0000,0,0.0,', $danger-width-percent, '|r,ffff00,0,', $danger-width-percent, ',', $warn-width-percent, '|r,00A000,0,', $warn-width-percent, ',', $ok-width-percent, '|r,000000,0,', $target-width-percent, ',', $target-plus-one, '&amp;chco=000000&amp;chbh=', round($height * 0.6) ) };

References
[1] http:/ / broadcast. oreilly. com/ 2008/ 11/ creating-bullet-bars-with-goog. html

Google Chart Sparkline

112

Google Chart Sparkline


Motivation
You want to create an easy-to-use specialized chart using a generic charting service such as Google Charts

Method
The GoogleChart API [1] creates PNG-format charts from data passed in the URL line. One use of the service would be to generate a Tufte sparkline [2]. This script uses random data to generate a small sparkline-like graphic. With a bit more work, the additional features such as minimum, maximum and normal bands should be able to be added. A line chart (cht=lc) includes axes but these can be removed by using an undocumented feature in which the chart type is specified as lfi [3] The script uses function overloading in XQuery which allows two functions to have the same name but different numbers of parameters. The more general function has parameters for the sequence of values and the min and max to be used in scaling the values. The second function (with the same name) accepts only the values and calculates the min and max from the data before calling the more general function to complete the task.
(: This script illustrates the use of the GoogleChart API to generate

a sparkline-like graphic

:) declare option exist:serialize "method=html media-type=text/html";

declare function local:simple-encode( $vals as xs:decimal* , $min as xs:decimal, $max as xs:decimal) as xs:string { (: encode the sequence of numbers as a string according to the simple encoding scheme. the data values are encoded in the characters A-Z,a-z,0-9 giving a range from 0 to 61 :) let $scale := 62.0 div ($max - $min) let $simpleEncode := "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" return string-join( for $x in $vals let $n := floor( ($x - $min) * $scale) return substring($simpleEncode,$n+1,1) ,"") };

declare function local:simple-encode($vals as xs:decimal*) as xs:string { (: compute the minimulm and maximum values, then call the more general

Google Chart Sparkline


function to encode :) let $min := min($vals) let $max := max($vals) return local:simple-encode($vals,$min,$max) };

113

declare function local:sparkline( $data as xs:decimal* , $fontHeight as xs:integer, $pointSize as xs:integer, $label as xs:string ) as element(span) { (: create a span element containing the line chart of the data, the name of the data set and the last data value fontHeight and pointSize are :) let $codeString := local:simple-encode($data) let $width := count($data) * $pointSize let $last := $data[last()] let $title :=concat( "Graph of ",$label, " data: ",count($data)," defined in pixels

values, min ",min($data), " max ", max($data)) return <span> <img src="http://chart.apis.google.com/chart?chs={$width}x{$fontHeight}&amp;chd=s:{$codeString}&amp;cht=lfi" alt="{$title}" title="{$title}" /> <font style="font-size:{$fontHeight}px"> {$label}&#160;{$last}</font> </span>

};

(: generate some random data :) let $data := for $i in (1 to 100) return floor(math:random() * 10) return local:sparkline($data,15,1,"Random")

Random Sparkline [4]

References
[1] [2] [3] [4] http:/ / code. google. com/ apis/ chart/ http:/ / www. edwardtufte. com/ bboard/ q-and-a-fetch-msg?msg_id=0001OR http:/ / 24ways. org/ 2007/ tracking-christmas-cheer-with-google-charts http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Graph/ randomSparkline. xq

Google Charts

114

Google Charts
Motivation
You want a XQuery function to create charts using the Google Chart API service [1].

Method
We will create a simple XQuery function that takes the required parameters of a Google Chart (i.e. chart data, size, colors, and labels). It will then construct a URL with the correct values. You can then embed this URL in your XQuery to display the chart.

URL Encoded Parameters


Google Charts uses URL parameters to encode the values for charts like this one the URL parameters for the chart: http://chart.apis.google.com/chart? chxl=0:|Jan|Feb|Mar|Apl|May|Jun|1:|10|50|100 &chxr=0,-5,100 &chxt=x,r &chbh=a &chs=300x150 &cht=bvs &chco=0000FF &chd=t:10,20,30,40,50,60 &chp=0.05 &chtt=Downloads+Per+Month
[1]

. Here is a clearer rendering of

Source Code
Here is an example function: declare function utility:graph($type, $colors, $size, $markers, $data, $alt, $title, $barwidthandspacing, $linestyles) { let $parameters := <Parameters> <Parameter label="chco" value="{$colors}"/> <Parameter label="chl" value="{$markers}"/> <Parameter label="chtt" value="{$title}"/> <Parameter label="chbh" value="{$barwidthandspacing}"/> <Parameter label="chls" value="{$linestyles}"/> </Parameters> let $src := concat('http://chart.apis.google.com/chart?', 'cht=', $type, '&amp;chs=', $size, for $parameter in $parameters//Parameter[@value ne ''] return

Google Charts concat('&amp;', $parameter/@label,'=', $parameter/@value) '&amp;chd=t:', $data) return <img alt="{$alt}" src="{$src}"/> };

115

Checking for required parameters


The following adds a fuller set of Google Chart parameters, and gives each parameter a @required attribute:
declare function utility:graph($type, $size, $data, $title, $barwidthandspacing, $linestyles, $colors, $labels, $markers, $axes, $axislabels, $axislabelpositions, $axisrange, $axisstyles, $zeroline, $ticklength, $margin, $fill, $grid, $legend, $legendplacement, $alt) { let $parameters := <Parameters> <Parameter label="cht" value="{$type}" required="true"/> <Parameter label="chs" value="{$size}" required="true"/> <Parameter label="chd" value="{$data}" required="true"/> <Parameter label="chtt" value="{$title}" required="false"/> <Parameter label="chbh" value="{$barwidthandspacing}" required="false"/> <Parameter label="chls" value="{$linestyles}" required="false"/> <Parameter label="chco" value="{$colors}" required="false"/> <Parameter label="chl" <Parameter label="chm" value="{$labels}" required="false"/> value="{$markers}" required="false"/>

<Parameter label="chxt" value="{$axes}" required="false"/> <Parameter label="chxl" value="{$axislabels}" required="false"/> <Parameter label="chxp" value="{$axislabelpositions}" required="false"/> <Parameter label="chxr" value="{$axisrange}" required="false"/> <Parameter label="chxs" value="{$axisstyles}" required="false"/> <Parameter label="chp" value="{$zeroline}" required="false"/> <Parameter label="chxtc" value="{$ticklength}" required="false"/> <Parameter label="chma" value="{$margin}" required="false"/> <Parameter label="chf" <Parameter label="chg" value="{$fill}" required="false"/> value="{$grid}" required="false"/>

<Parameter label="chdl" value="{$legend}" required="false"/> <Parameter label="chdlp" value="{$legendplacement}" required="false"/> </Parameters> let $optional-parameters := string-join( ( for $parameter in $parameters//Parameter[@required = 'false'][@value ne ''] return

Google Charts
concat('&amp;', $parameter/@label, '=', $parameter/@value) ) ,'') let $src := concat('http://chart.apis.google.com/chart?', 'cht=', $type, '&amp;chs=', $size, $optional-parameters, '&amp;chd=t:', $data) return <img alt="{$alt}" src="{$src}"/> };

116

Acknowledgments
Fraser Hore and Dmitriy Shabanov posted these examples to the eXist mailing list.

Resources
Sample XML Schema for checking Google Chart Parameters [2]

References
[1] http:/ / chart. apis. google. com/ chart?chxl=0:|Jan|Feb|Mar|Apl|May|Jun|1:|10|50|100& chxr=0,-5,100& chxt=x,r& chbh=a& chs=300x150& cht=bvs& chco=0000FF& chd=t:10,20,30,40,50,60& chp=0. 05& chtt=Downloads+ Per+ Month [2] http:/ / code. google. com/ p/ xrx/ source/ browse/ trunk/ 20-google-charts/ schemas/ google-charts. xsd

Graphing Triples

117

Graphing Triples
The RDF Validation service [1] can be used to graph RDF, but since this expands prefixed names to full URIs the graphs can look rather un-readable as examples. This service is for drawing simple triple graphs where each triple is defined in a local XML format in which each triple has attributes subject, property and object. [XQuery function to convert RDF to N3 needed] Subjects and objects are drawn as nodes, triples as arcs with the property as the label. If the subject or object contains ':' or starts with 'http:/ / ', then the node is shown as an ellipse. If it starts with '_' it is a blank node and an unnamed circle is shown; otherwise the node is assumed to be a literal and is drawn in a box.

Endpoint
[2]

Parameters
url : url of triples in the xml format illustrated above dir : LR - left to right (default), TB - top to bottom [ rankdir in Graphviz] title title for graph default none

Example
From page 11 of the RDF primer [3]
<?xml version="1.0" encoding="UTF-8"?> <graph> <triple subject="exstaff:85740" property="exterms:address" object="exaddressid:87540"/> <triple subject="exaddressid:87540" property="exterms:street" object="1501 Grant Avenue"/> <triple subject="exaddressid:87540" property="exterms:city" object="Bedford"/> <triple subject="exaddressid:87540" property="exterms:state" object="Massachusetts"/> <triple subject="exaddressid:87540" property="exterms:postalcode" object="01730"/> </graph>

dot Output [4] digraph { rankdir='LR' "exstaff:85740" [label="exstaff:85740" shape=ellipse]; "exaddressid:87540" [label="exaddressid:87540" shape=ellipse]; "1501 Grant Avenue" [label="1501 Grant Avenue" shape=box]; "Bedford" [label="Bedford" shape=box]; "Massachusetts" [label="Massachusetts" shape=box]; "01730" [label="01730" shape=box]; "exstaff:85740" -> "exaddressid:87540" [label="exterms:address"]; "exaddressid:87540" -> "1501 Grant Avenue" [label="exterms:street"]; "exaddressid:87540" -> "Bedford" [label="exterms:city"]; "exaddressid:87540" -> "Massachusetts" [label="exterms:state"]; "exaddressid:87540" -> "01730" [label="exterms:postalcode"]; } GIF Image

Graphing Triples http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2image. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/RDF/triple2dot.xq?url%3Dhttp://www.cems.uwe.ac.uk/xmlwiki/RDF/egtriples4.xml

118

Usage
Either save the generated gif, or use 5clicks [5] or similar to capture the image on the screen. One way to print large GIF images is to save the image, then insert it into an Excel spreadsheet. Excel will print the image over multiple pages. Reduced in size and with page borders removed, even large graphs can be printed and then taped together.

Source
declare declare declare declare declare option exist:serialize "method=text media-type=text/text"; variable $nl := "&#10;"; variable $url := request:get-parameter("url",()); variable $dir := request:get-parameter("dir","LR"); variable $title := request:get-parameter("title","");

let $graph := doc($url) return ( "digraph ",$title, " { rankdir='" , $dir,"' ", $nl, for $node in distinct-values(($graph//triple/@subject,$graph//triple/@object)) let $nodetype := if (contains($node,":") or starts-with ($node,"http://")) then concat ('label="',$node,'" shape=ellipse') else if (starts-with($node,"_")) then 'shape=circle' else concat ('label="',$node,'" shape=box') return concat ('"',$node,'" [',$nodetype,'];',$nl) , for $triple in $graph//triple return ( concat ('"', $triple/@subject, '" -> "' , $triple/@object ,'" [label="',$triple/@property, '"];'), $nl) , "} ",$nl ) This script would be improved by the use of an intermediate XML structure and a XSLT script to convert to dot.

Graphing Triples

119

References
[1] [2] [3] [4] [5] http:/ / www. w3. org/ RDF/ Validator/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ triple2dot. xq http:/ / www. w3. org/ TR/ REC-rdf-syntax/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ triple2dot. xq?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ egtriples4. xml http:/ / www. screen-capture. net/

Grouping Items
Motivation
You have many items in a set of data that have a category associated with them. You want to create a report that sorts the items by a category.

Method
We will perform the query in three steps. 1. use a FLOWR statement create a sequence of the distinct categories using the distinct-values() function 2. for each item in the category sequence, select all items that belong to that category. This will be done by adding a predicate (where clause) to the end of our XPath selector. This takes the form of data/item[x=y] where if x=y returns true the item will be added to the sequence 3. for each result set in the FLOWR statement return the category name and then all the items in that category

Sample Data
<items> <item> <name>item #1</name> <category>red</category> </item> <item> <name>item #2</name> <category>green</category> </item> <item> <name>item #3</name> <category>red</category> </item> <item> <name>item #4</name> <category>blue</category> </item> <item> <name>item #5</name> <category>red</category> </item> <item> <name>item #6</name>

Grouping Items <category>blue</category> </item> <item> <name>item #7</name> <category>green</category> </item> <item> <name>item #8</name> <category>red</category> </item> </items>

120

Sample Query
The following XQuery will demonstrate this technique. Note that the distinct values for all the categories are stored in the $distinct-categories variable. xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html"; (: load the items :) let $data := doc('/db/mdr/apps/training/labs/04-group-by/data.xml')/items let $distinct-categories := distinct-values($data/item/category/text()) return <html> <body> <table border="1"> <thead> <tr> <th>Category</th> <th>Items</th> </tr> </thead> <tbody> { for $category in $distinct-categories return <tr> <td>{$category}</td> <td>{string-join($data/item[category=$category]/name/text(), ', ')}</td> </tr> } </tbody> </table>

Grouping Items </body> </html> In the query above the statement: $data/item[category=$category] reads as "get all the items from the data set that have the category element equal to the current category. The string-join() function just puts a comma and a space string between the items in the output stream for readability.

121

Sample Output
Category red green blue Items item #1, item #3, item #5, item #8 item #2, item #7 item #4, item #6

Discussion
Note that you are not restricted to having an item be in a single category. Adding multiple categories to an item will not require any changes to the script. You can also add new categories to this list at any time without changing the program above. As long as there are range indexes for the category element the list of all categories will be created very quickly, even for millions of records.

Guest Registry

122

Guest Registry
Guest Registry
Dan McCreary
I am using this book for teaching XQuery to my students. Most of my work is in the government and financial sectors. I am using XQuery with REST and XForms.

Jim Fuller
I am using this book for teaching some aspects of XQuery to people I mentor professionally and also plan to use for a University course in Prague this September.

Rajamani Marimuthu
I am using this book for creating samples and some real time applications for the users .. and also for teaching to my colleagues and learners.

Dominique Rabeuf
I am using this book for creating XForms/XQuery samples and applications

Chris Cargile
I am using this book for creating XQuery samples for research purposes

Joe Wicentowski
I am using this book to capture community innovations in XQuery and develop best practices. I am using XQuery with REST and XForms.

Esteban Arango Medina


I am using this book to learn about XQuery, I need it for a project where I'm working in my university (EAFIT University).

Higher Order Functions

123

Higher Order Functions


Motivation
You have a sequence of items and you would like to perform several sequential operations on each of the items in the sequence.

Method
You would like to use a single function where you pass a series of functions as parameters to that function.

Background on Functional Languages


Functional languages are languages that treat functions as first-order data types. They frequently have a function that you pass a list of items and tell it what function to perform on each of those items. Just like XML transformations, functional languages are ideal when you have many small tasks to perform on a large number of items. Functional languages are excellent for these tasks since the actual order that the functions get executed on items does not have to be guaranteed. The developer does not have to be concerned about waiting for a transformation on item 1 to finish before the system starts on item 2. The Google MapReduce algorithm is an example of functional systems. MapReduce allows a data set such as "all web sites" to be treated as a sequence of items. MapReduce then has different processors each receive small items of work that can be processes independently. For more on functional languages see Functional programming on Wikipedia and Functional Programming on Wikibooks. Because XQuery is also a functional language, you can also have the confidence that a large list of items passed off to an XQuery function can run independently on many processors without the concern of incorrect results if the items are processed out of order.

Simple example
In the following example we will declare two functions. We will then process a list of words by applying these functions to each of the items in the sequence. We will do this by passing the function name as an argument to another function. NOTE: This only appears to work in eXist 1.3. eXist versions 1.2.X have the wrong data type associated with the QName() function. The eXist system needs to turn each function into a function identifier. To do this it needs to call util:function(). util:function takes two arguments, the qualified name of the function (the prefix and the function name) as well as the arity of the function. In this case the arity of a function is the number of arguments that the function takes. The data type of the first argument must be of type QName. The data type of the arity, the second parameter, is an integer. util:function($function as xs:QName, $arity as xs:integer) as function declare namespace fw = "http://www.cems.uwe.ac.uk/xmlwiki/fw"; declare function fw:apply($words as xs:string*, $my-function as function) { for $word in $words return util:call($my-function,$word)

Higher Order Functions }; declare function fw:f1($string) { string-length($string) }; declare function fw:f2($string) { substring($string,1,1) }; let $f1 := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:f1"),1) let $mywords := ("red","green","purple") return <hofs> <data>{$mywords}</data> <hof> <task>length of each string</task> <result>{fw:apply($mywords,$f1)}</result> </hof> <hof> <task>Initial letter of each string</task> <result>{ fw:apply($mywords, util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:f2"),1) ) }</result> </hof> </hofs>

124

Execute [1]

References
Jim Fuller Article on IBM DeveloperWorks [2] - this has an excellent example of how to use Higher Order Functions using Saxon.

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Test/ hof. xq [2] http:/ / www. ibm. com/ developerworks/ edu/ x-dw-x-advxquery. html

Histogram of File Sizes

125

Histogram of File Sizes


Motivation
You have a large number of files in one or more collections and you want to generate a histogram chart that shows the relative distribution of the number of files in each size range.

Method
We will use the xmldb:size() function to generate a list of all the file sizes in a given collection. We can then transform this list into a series of strings that can be passed to the Google Charts line chart function. The format of the size function is the following: xmldb:size($collection, $document-name) This function returns the number of bytes in a file. Our first step is to create a sequence of numbers that represents all of the sizes of resources in a collection: let $sizes := for $file in xmldb:get-child-resources($collection) let $size := xmldb:size($collection, $file) return $size This can also be done with a combination of the collection() function and the util:document-name() function: let $sizes := for $file in collection($collection)/* let $name := util:document-name($file) let $size := xmldb:size($collection, $name) return $size

Sample Program
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html"; (: Put the collection you want to analyze here :) let $collection := "/db/test" (: How many bytes in each section :) let $increment := 10000 (: How many divisions :) let $divisions := 20 (: Color for the lines or the bars in RRGGBB :) let $color := '0000FF'

Histogram of File Sizes

126

(: For vertical bar chart use 'bvs', for line chart use 'lc', for spark line (no axis) use 'ls' :) let $chart-type := 'bvs' (: this is the max size of a google chart - 30K pixels. number is the width, the second is the height. :) let $chart-size := '600x500' let $uriapi := 'http://chart.apis.google.com/chart?' let $sizes := for $file in xmldb:get-child-resources($collection) let $size := xmldb:size($collection, $file) return $size (: the raw data counts for each range. The 't' is just a marker that it is true that we are in this range. :) let $raw-data := for $range in (0 to $divisions) let $min := $range * $increment let $max := ($range + 1) * $increment return count( for $number in $sizes return if ($number gt $min and $number lt $max) then ('t') else () ) let $max-value := max($raw-data) (: scale to the max height :) let $scaled-data := for $num in $raw-data return string(floor($num div ($max-value div 500))) (: join the strings with commas to get a comma separated list :) let $data-csv := string-join($scaled-data, ',') (: construct the URL :) let $chart-uri := concat($uriapi, 'cht=', $chart-type, '&amp;chs=', $chart-size, '&amp;chco=', $color, '&amp;chd=t:', $data-csv) (: return the results in an HTML page :) <html> The first

Histogram of File Sizes <head><title>Google Chart Histogram View of {$collection}</title></head> <body> <h1>Google Chart Histogram View of {$collection}</h1> <p><img src="{request:encode-url(xs:anyURI($chart-uri))}"/></p> </body> </html>

127

Sample Result
http:/ / chart. apis. google. com/ chart?cht=ls& chd=t:83,6,13,37,85,414,500,87,41,31,11,16,9,12,5,7,4,4,3,1,1 chs=500x500& chco=0000FF&

Discussion
To run the query you will need to customized the name of the collection that you are analyzing. After you run the query you can check to make sure the results are what you expect and then copy the results into a browser URL. Note that if there are files over the max size indicated in the top range, an additional count of these file sizes should be added. let $top-range := $increment * ($divisions + 1) let $top-count := count( for $num in $sizes return if ($num > $overflow) then ('t') else () ) This query could also be parametrized using the get-parameter() function so that many of the parameters that are passed to the Google chart can also be set as a parameter in the XQuery of the URL.

Image Library

128

Image Library
Motivation
You want a script that will display a small thumbnail of all the images in an image collection. The images may have many file suffixes (jpg, png, gif etc).

Method
We will write an XQuery that finds all the child resources in the collection that have the correct file types.

Source Code
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html";

(: look for the collection parameter in the incoming URL. assume a default collection like /db/images. :)

If not

let $collection := request:get-parameter('collection', '/db/images')

(: you can also change the number of images per row :) let $images-per-row := request:get-parameter('images-per-row', 10)

(: first get all the files in the collection :) let $all-files := xmldb:get-child-resources($collection)

(: now just get the files with known image file type extensions :) let $image-files := for $file in $all-files[ ends-with(.,'.png') or ends-with(.,'.jpg') or ends-with(.,'.tiff') or ends-with(.,'.gif')] return $file

let $image-count := count($image-files) let $rows-count := xs:integer(ceiling($image-count div $images-per-row)) return <html> <head> <title>Images for collection {$collection}</title> </head> <body> Images in collection: {$collection} <table>{ for $row return in (1 to $rows-count)

Image Library
<tr>{ for $col in (1 to $images-per-row) let $n := ($row - 1 ) * $images-per-row + $col

129

return if ($n <= $image-count) then let $image := $images[position = $n ] let $path := concat('/exist/rest', $collection, '/', $image) return <td> <a href="{$path}"><img src="{$path}" height="100px" width="100px"/></a> </td> else <td/> }</tr> }</table> </body> </html> (: blank cells at the end of the last row :)

Incremental Searching
Motivation
You have a large data set and you want to use JavaScript to asynchronously communicate with a server to narrow the scope of the search as a user types.

US Zip Code Example


There are around 43,000 5 digit ZipCodes in the US. There are a number of applications to convert a zip code to a location, for example Ben Fry's [1] Java applet written using Processing [2] This example uses a client-side XHTML page using Ajax to request a subset of codes from a server-side search of an XML database of Zipcodes, updating the page dyamically.

The HTML page


Generated by an XQuery although the content is static.
declare option exist:serialize "method=xhtml media-type=text/html indent=yes";

<html xmlns="http://www.w3.org/1999/xhtml" > <head> <title>ZIP Code to City and State using XmlHttpRequest</title> <script language="javascript" src="ajaxzip.js"/> </head> <body> <h1>US Zipcode decoder</h1>

Incremental Searching
<form onSubmit="getList(); return false"> <p>ZIP code: <input type="text" size="5" name="zip" id="zip" onkeyup="getList();" onfocus="getList();" /> e.g. 95472 </p> </form> <div id="list"/> </body> </html>

130

Javascript
Uses XMLHttpRequest to request the subset and innerHTML to update the page.
function updateList() { if (http.readyState == 4) { var divlist = document.getElementById('list'); divlist.innerHTML = http.responseText; isWorking = false; } }

function getList() { if (!isWorking && http) { var zipcode = document.getElementById("zip").value; http.open("GET", "getzip.xq?zipcode=" + escape(zipcode), true); http.onreadystatechange = updateList; // this sets the call-back function to be invoked when a response from the HTTP request is returned isWorking = true; http.send(null); } }

function getHTTPObject() { var xmlhttp; /*@cc_on @if (@_jscript_version >= 5) try { xmlhttp = new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { try { xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } catch (E) { xmlhttp = false; } } @else xmlhttp = false; @end @*/

Incremental Searching
if (!xmlhttp && typeof XMLHttpRequest != 'undefined') { try { xmlhttp = new XMLHttpRequest(); xmlhttp.overrideMimeType("text/xml"); } catch (e) { xmlhttp = false; } } return xmlhttp; }

131

var http = getHTTPObject(); // var isWorking = false;

create the HTTP Object

XQuery search
The Server-side XQuery to perform the search in the XML database and generate the XHTML. This uses the eXist full text index and the exist-specific &= operator. let $zipcode := request:get-parameter("zipcode",()) return <div> {if (string-length($zipcode) > 1) (: too slow :) then let $search := concat('^',$zipcode) for $zip in //Zipcode[matches(Code,$search)] return <div>{string-join(($zip/Code,$zip/Area,$zip/State),' ')}</div> else () } </div>

XML data file


The data was originally a CSV file, converted to XML using Excel and the XML add-in. This is a sample of the data. <Zipcodes> <Zipcode> <Code>210</Code> <Area>Portsmouth</Area> <State>NH</State> </Zipcode> <Zipcode> <Code>211</Code> <Area>Portsmouth</Area> <State>NH</State> </Zipcode> <Zipcode> <Code>212</Code> <Area>Portsmouth</Area>

Incremental Searching <State>NH</State> </Zipcode> <Zipcode> <Code>213</Code> <Area>Portsmouth</Area> <State>NH</State> </Zipcode> ... Execute [3]

132

References
[1] http:/ / acg. media. mit. edu/ people/ fry/ zipdecode/ [2] http:/ / processing. org/ [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ajax/ zipcode. xq

Index of Application Areas


AJAX
1. ../Employee Search

Diagrams
1. ../Graph Visualization 2. ../Sequence Diagrams

Geocoding
1. ../Google Geocoding 2. ../Nationalgrid and Google Maps 3. ../String Analysis

Dates
1. ../Net Working Days

Development
1. ../XQuery IDE

E-learning
1. ../Example Sequencer

eXist Native XML database


1. ../eXist Crib sheet 2. ../Index of eXist modules and features

Index of Application Areas

133

Graphs
1. ../Graph Visualization 2. ../Topological Sort

Mathematics
1. ../Project Euler

Page Scraping
1. 2. 3. 4. 5. ../Delivery Status Report ../Page scraping and Yahoo Weather ../Wikipedia Page scraping ../Wikipedia Lookup ../Wiki weapons page

Pipelines
1. ../Page_scraping_and_Yahoo_Weather

Puzzles
1. ../Fizzbuzz

Photos
1. ../Flickr GoogleEarth 2. ../XMP data

Python
1. ../XQuery and Python

Regular Expressions
1. ../String Analysis

RSS
1. ../Simple RSS reader

Strings
1. ../String Analysis 2. ../Tag Cloud

SQL
1. ../XML to SQL 2. ../XQuery from SQL

Index of Application Areas

134

Tables
1. ../Searching,Paging and Sorting 2. ../Table View

Trees
1. ../Tree View 2. ../Validating a hierarchy

URL Parameters
1. ../Getting URL Parameters 2. ../Checking for Required Parameters 3. ../Parsing Query Strings

VoiceXML
1. Simple RSS reader

Weather
1. Page scraping and Yahoo Weather

XForms
1. ../Simple XForms Examples

XSLT
1. ../XQuery and XSLT

Index of eXist modules and features

135

Index of eXist modules and features


Modules
1. 2. 3. 4. request response system util

Index of XQuery features


Crib sheets
1. ../eXist Crib sheet

Gotchas
1. ../Gotchas

Regular Expressions
1. ../String Analysis/

URL Parameters
1. ../Getting URL Parameters/ 2. ../Checking for Required Parameters/ 3. ../Parsing Query Strings/

XPath Navigation
1. ../XPath examples/

Inserting and Updating Attributes

136

Inserting and Updating Attributes


Motivation
You want to insert or update attributes in a document. (Note: The XQuery Update syntax below is specific to eXist and is not necessarily identical to that in the W3C XQuery Update spec. Full documentation of eXist's XQuery Update syntax can be found at http:/ / exist-db. org/ update_ext.html)

Example Input Document


<root> <message>Hello World</message> </root>

Example of Attribute Insert


xquery version "1.0"; let $doc := doc('/db/test.xml')/root let $update := update insert attribute foo {'bar'} into $doc return $doc

Result Document
<root foo="bar"> <message>Hello World</message> </root>

Example of Attribute Update


let $doc := doc('/db/test/update-attribute/root.xml')/root return update value $doc/@foo with 'new-value'

Result Document
<root foo="new-value"> <message>Hello World</message> </root>

Installing and Testing

137

Installing and Testing


Installing the eXist native XML database
Steps for Installing eXist on Microsoft Windows Systems The full up-to-date quick start installation instructions [1] are available on the eXist web site. This set of instructions is slightly simplified for the needs of the cookbook. 1. Download the JAR file (Java Archive File) from http://www.exist-db.org 2. To install it, you will need a JDK (Java Developer Kit) installed on your system, preferably version 1.5 or higher 1. To test this type "javac -version" at the command prompt. 2. If you do not have a JDK installed, install the (not bundled) "Java SE Development Kit (JDK)" JDK 6 Update XX from http://java.sun.com/javase/downloads/index.jsp 3. Double Click the jar file. This will automatically install the application. Warning Do not use the default installation path that has "Program Files" and the default install directory. This will not work because of the spaces in the path name. You MUST pick a path that does not have spaces, C:\eXist for example. Complete the installation by selecting the defaults. It is strongly advised to set the admin password 4. On Windows start up the eXist database by selecting it from the Start menu, or double clicking C:\eXist\bin\startup.bat. This will start up a DOS console. Give it a few seconds to start up the server. 5. Bring up your web browser to the address: http://localhost:8080/exist [2] It shows a local copy of the eXist site. 6. Click Admin in the Administration menus (on the left). This section allows you to add users and passwords 7. You may choose to "Import Example Data" if you like 8. You can now enter simple queries by going to the page http://localhost:8080/exist/sandbox/ [3]

Adding a WebDAV Client


1. If you are using Windows, you can also add a WebDAV folder that will allow you to drag-and-drop many XML and XQuery files from your windows desktop to and from an eXist collection. WebDAV Windows [4] Note that the Windows interface is somewhat buggy. You can not copy files from one eXist system directly to another eXist system. You must first copy to a windows file system and then copy it to the second eXist system.

References
[1] [2] [3] [4] http:/ / exist-db. org/ quickstart. html http:/ / localhost:8080/ exist http:/ / localhost:8080/ exist/ sandbox/ http:/ / exist-db. org/ webdav. html

Installing the XSL-FO module

138

Installing the XSL-FO module


You may have to work with your eXist system administrator to first enable the XSL-FO XQuery module in order for these examples to run. This can be done by using the following steps.

Step 1: Enable the XSLFO XQuery Module


Edit the conf.xml file in the $EXIST_HOME directory. You must un-comment the xslfo module.
<module class="org.exist.xquery.modules.xslfo.XSLFOModule" uri="http://exist-db.org/xquery/xslfo" />

Step 2: Edit the extensions build.properties file


If your eXist was not built from source the fo module and supporting fop.jar file must also be included in your exist-modules.jar file in all production systems. This can be done by downloading the eXist source and changing the build.properties file setting to be true. For details please see the eXist build instructions [1] $EXIST_HOME lib/extensions/build.properties or $EXIST_HOME extensions/build.properties # XSL FO transformations (Uses Apache FOP) include.module.xslfo = true If you are running release 1.4 of eXist, the Apache FOP version referenced might no longer be available. If you have a line like the following in your build-properties like this:

Step 3: Update the path to the new XSL-FO zip file in the configuration file
include.module.xslfo.url = http://apache.cs.uu.nl/dist/xmlgraphics/fop/binaries/fop-0.95-bin.zip

You will find that the Apache project removed the old binaries (not always best practice). This is the new line this should be replaced with.
include.module.xslfo.url = http://apache.cs.uu.nl/dist/xmlgraphics/fop/binaries/fop-1.0-bin.zip

And additionally change references to the deprecated version in $EXIST_HOME extensions/modules/build.xml to point to the newer version. E.g:
<!-- Apache FOP --> <get src="${include.module.xslfo.url}" dest="fop-1.0-bin.zip" verbose="true" usetimestamp="true" /> <unzip src="fop-1.0-bin.zip" dest="${top.dir}/${lib.user}"> <patternset> <include name="fop-1.0/build/fop.jar"/> <include name="fop-1.0/lib/batik-all-*.jar"/> <include name="fop-1.0/lib/xmlgraphics-commons-*.jar"/> <include name="fop-1.0/lib/avalon-*.jar"/> </patternset> <mapper type="flatten"/> </unzip> <delete file="fop-1.0-bin.zip"/>

Installing the XSL-FO module

139

Step 4 Download the new XSLFO Zip file


When you run a build the next time this will download the Apache FOP jar file from the Apache web site. cd $EXIST_HOME/extensions/modules ant -version ant prepare-libs-xslfo Sample from build log:
prepare-libs-xslfo: [echo] Load: true [echo] -----------------------------------------------------[echo] Downloading libraries required by the xsl-fo module [echo] -----------------------------------------------------[get] Getting: http://apache.cs.uu.nl/dist/xmlgraphics/fop/binaries/fop-1.0-bin.zip [get] To: C:\ws\eXist-1.4dev\extensions\modules\fop-1.0-bin.zip [get] ....................................................

[get] last modified = Thu Jul 31 09:47:44 CDT 2008 [unzip] Expanding: C:\workspace\exist\extensions\modules\fop-1.0-bin.zip in to C:\workspace\exist\lib\user

Step 5 Verify that the new XSLFO Library is available


Note that in addition to the fop binary the following jar file is required by the FOP processor. $EXIST-HOME/lib/user/xmlgraphics-commons-1.3.1.jar On UNIX:
$ sudo cp -v xmlgraphics-commons-1.4.jar $EXIST_HOME/lib/user `xmlgraphics-commons-1.4.jar' -> `/usr/local/exist/lib/user/xmlgraphics-commons-1.4.jar'

This can be downloaded from the Apache XML Graphics Commons Distribution Mirror [2]

Step 6 Copy the Java files


Copy the jar files from $EXIST-HOME/lib/user/fop.jar and the new $EXIST-HOME/lib/extensions/exist-modules.jar on to your production systems. Note you should make sure the source and destination systems are the same version. If you want to have fop hyphenate your text, put the fop-hyphen.jar into $EXIST-HOME/lib/user/ and restart eXist. fo:blocks with the attribute hyphenate set to true in you fo or xsl file will be hyphenated then if you add a language attribute to the fo:page-sequence. The patterns for many languages are available here: http:/ / sourceforge. net/ projects/offo/ Note that your exist-modules.jar file can then be placed on the systems that do not have the source code. There is currently no method to use FOP without downloading the source. Also note that you can verify that your exist-modules.jar file contains the XSLFO classes by running: You can now check that the jar files are correctly installed in the user libraries directory: $ ls -l $EXIST_HOME/lib/user/*.jar Which returns

Installing the XSL-FO module total 7480 -rwxrwxrwx -rw-rw-r--rw-rw-r--rwxrwxrwx -rwxrwxrwx -rw-r--r--

140

1 1 1 1 1 1

root ec2-user ec2-user root root root

root 56290 Nov 3 2009 ec2-user 3318083 Jul 12 2010 ec2-user 3079811 Jul 12 2010 root 434812 Nov 3 2009 root 117470 Nov 3 2009 root 569113 Nov 12 17:28

activation-1.1.1.jar batik-all-1.7.jar fop.jar mail-1.4.2.jar nekohtml-1.9.11.jar xmlgraphics-commons-1.4.jar

Build The Extensions


According to the eXist web site [3] the "build extensions" target should now recompile the extensions: Type the following into the shell: $ ./build.{sh|bat} extension-modules

Verify that the jar files are in the right position


$ jar tf exist-extensions.jar Or on a windows system you can change the jar file file name to end with .zip, uncompress and check that the xslfo classes are in the exist-modules file.

References
[1] http:/ / exist. sourceforge. net/ building. html [2] http:/ / www. apache. org/ dyn/ closer. cgi/ xmlgraphics/ commons [3] http:/ / demo. exist-db. org/ exist/ building. xml#build-system

Introduction to XML Search

141

Introduction to XML Search


Motivation
You want to have a high-level overview of XML Search terms and technologies.

Search Terms
Structured Search - retaining an using the structure of a document to aid in search result ranking Search Document - a document or document fragment that is returned as the result of a search. Note that in the search examples the word "document" may imply an entire XML document or a fragment or item in such a document. Search Query - a word or phrase to find Boolean Search - a search for an XML document that is either true or false with no ordering of search results Search Hit - a match of a query to a document or document fragment Hit Scoring - a method of assigning a weight to a search result for sorting. For example if a term occurs more frequently in a document it might receive a higher score Search Ranking - an order list of search results Global Search - searching one or more item types in a database Item Viewer - a program used to view a specific item type in a collection

References
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schtze, Introduction to Information Retrieval, Cambridge University Press. 2008: online edition [1]

References
[1] http:/ / www-csli. stanford. edu/ ~hinrich/ information-retrieval-book. html

Keyword Search

142

Keyword Search
Motivation
You want to create a Google-style keyword search interface to an XML database with relevance-ranked, full-text search of selected nodes and search results in which the keyword in context is highlighted, as shown below.

Method
Our search engine will receive keywords from a simple HTML form, assigning them to the variable $q. Then it (1) parses the keywords, (2) constructs the scope of the query, (3) executes the query, (4) scores and sorts the hits according to the score, (5) shows the linked results with a summary containing the keyword highlighted in context, and (6) paginates the results. Note: This tutorial was written against eXist 1.3, which was a development version of eXist; since then eXist 1.4 has been released, which altered several aspects of eXist slightly. This article has not yet been fully updated to account for the changes. The most notable changes are that (1) the kwic.xql file referenced here is now a built-in module and (2) the previous default fulltext search index (whose search operator is below as &=) is disabled by default in favor of the new, Lucene-based fulltext index, which speeds both search and scoring considerably. The changes required to make the code work with 1.4 will be extensive, but nonetheless the article is instructive in its current form. Lastly, this example will not run under versions prior to 1.3.

Keyword Search

143

Example Collections and Data


Let's assume that you have three collections: /db/test /db/test/articles /db/test/people The articles and people collections contain XML files with different schemas: "articles" contains structured content, and "people" contains biographical information about people mentioned in the articles. We want to search both collections using a full-text keyword search, and we want to search specific nodes of each collection: the body of the articles and the names of the people. Fundamentally, our search string is: for $hit in (collection('/db/test/articles')/article/body, collection('/db/test/people')/person/biography)[. &= $q] Note: "&=" is an eXist fulltext search operator, and it will return nodes that match the tokenized contents of $q. See [1] for more information. Assume you have two collections: Collection A File='/db/test/articles/1.xml'
<article id="1" xmlns="http://en.wikibooks.org/wiki/XQuery/test"> <head> <author id="2"/> <posted when="2009-01-01"/> </head> <body> <title>A Day at the Races</title> <div> <head>So much for taking me out to the ballgame</head> <p>My dad, <person target="1">John</person>, was a great guy, but he sure was a bad driver...</p> <p>...</p> </div> </body> </article>

Collection B File='/db/test/people/2.xml'
<person id="2" xmlns="http://en.wikibooks.org/wiki/XQuery/test"> <name>Joe Doe</name> <role type="author"/> <contact type="e-mail">joeschmoe@mail.net</contact> <biography>Joe Doe was born in Brooklyn, New York, and he now lives in Boston, Massachusetts.</biography> </person>

Keyword Search

144

Search Form
File='/db/test/search.xq' xquery version "1.0"; declare namespace test="http://en.wikibooks.org/wiki/XQuery/test"; declare option exist:serialize "method=xhtml media-type=text/html"; <html> <head><title>Keyword Search</title></head> <body> <h1>Keyword Search</h1> <form method="GET"> <p> <strong>Keyword Search:</strong> <input name="q" type="text"/> </p> <p> <input type="submit" value="Search"/> </p> </form> </body> </html> Note that the form element can also contain an action attribute such as action="search.xq" to specify the XQuery function to use.

Receive Search Submission


It's nice to show the received results in the search field, so we can capture the search submission in variable $q using the request:get-parameter() function. We change the input element so it contains the value of $q as soon as there is a value. let $q := xs:string(request:get-parameter("q", "")) ... <input name="q" type="text" value="{$q}"/>

Filter Search Parameters


In order to prevent [2] XQuery [3] injection [4] attacks [5], it is good practice to force the $q variable into a type of xs:string and to filter out unwanted characters from the search parameters. let $q := xs:string(request:get-parameter("q", "")) let $filtered-q := replace($q, "[&amp;&quot;-*;-`~!@#$%^*()_+-=\[\]\{\}\|';:/.,?(:]", "") An alternative method of filtering is to only allow characters that are in a whitelist:

Keyword Search let $q := xs:string(request:get-parameter("q", "")) let $filtered-q := replace($q, "[^0-9a-zA-Z\-,. ]", "")

145

Construct Search Scope


In the context of a native XML database, the scope of a search can be very fine-grained, using the full expressive power of XPath. We can choose to target specific collections, documents, and nodes within documents. We can also target specific element namespaces, and we can use predicates to limit results to elements with a specific attribute. In the case of our example, we will target two collections and a specific XPath for each case. We create this search scope as using a sequence of XPath expressions: let $scope := ( collection('/db/test/articles')/article/body, collection('/db/test/people')/people/person/biography )

Construct Search String and Execute Search


Although we could execute our search directly using the example above (under "Example Collections and Data"), we'll have much more flexibility if we first construct our search as a string and then execute it using the util:eval() function. let $search-string := concat('$scope', '[. &amp;= "', $filtered-q, '"]') let $hits := util:eval($search-string)

Score and Sort Search Results


Without sorting our results, the results would come back in "document order" -- the order in which the database executed the search. Results can be sorted according to any criteria: alphabetical order, date order, the number of keyword matches, etc. We will use a simple relevance algorithm to score our results: the number of keyword matches divided by the string length of the matching node. Using this algorithm, a hit with 1 match that is 10 characters long will score higher than a hit with 2 matches and that is 100 characters in length. let $sorted-hits := for $hit in $hits let $keyword-matches := text:match-count($hit) let $hit-node-length := string-length($hit) let $score := $keyword-matches div $hit-node-length order by $score descending return $hit

Keyword Search

146

Show Results with Highlighted Keyword in Context


We want to show each result as an HTML div element containing 3 components: The title of the hit, a summary with an excerpt of the hit showing the keywords highlighted in context, and a link to display the full hit. Depending on the collection, these components will be constructed differently; we use the collection as the 'hook' to drive the display of each type of result. (Note: Other 'hooks' could be used, including namespace, node name, etc.) We will create our highlighted keyword search summary by importing a module called kwic.xql and using a function inside called kwic:summarize(). The kwic:summarize() function highlights the first matching keyword term in a hit, and returns the surrounding text. kwic.xql was written by Wolfgang Meier and is distributed in eXist version 1.3b. We will place kwic.xql in the eXist database inside the /db/test/ collection.
xquery version "1.0"; import module namespace kwic="http://exist-db.org/xquery/kwic" at "xmldb:exist:///db/test/kwic.xql"; ... let $results := for $hit in $sorted-hits[position() = ($start to $end)] let $collection := util:collection-name($hit) let $document := util:document-name($hit) let $base-uri := replace(request:get-url(), 'search.xq$', '') let $config := <config xmlns="" width="60"/> return if ($collection = '/db/test/articles') then let $title := doc(concat($collection, '/', $document))//test:title/text() let $summary := kwic:summarize($hit, $config) let $url := concat('view-article.xq?article=', $document) return <div class="result"> <p> <span class="title"><a href="{$url}">{$title}</a></span><br/> {$summary/*}<br/> <span class="url">{concat($base-uri, $url)}</span> </p> </div> else if ($collection = '/db/test/people') then let $title := doc(concat($collection, '/', $document))//test:name/text() let $summary := kwic:summarize($hit, $config) let $url := concat('view-person.xq?person=', $document) return <div class="result"> <p> <span class="title"><a href="{$url}">{$title}</a></span><br/> {$summary/*}<br/>

Keyword Search
<span class="url">{concat($base-uri, $url)}</span> </p> </div> else let $title := concat('Unknown result. Collection: ', $collection, '. Document: ', $document, '.') let $summary := kwic:summarize($hit, $config) let $url := concat($collection, '/', $document) return <div class="result"> <p> <span class="title"><a href="{$url}">{$title}</a></span><br/> {$summary/*}<br/> <span class="url">{concat($base-uri, $url)}</span> </p> </div>

147

Paginate and Summarize Results


In order to reduce the result list to a manageable number, we can use URL parameters and XPath predicates to return only 10 results at a time. To do so, we need to define two new variables: $perpage and $start. As the user retrieves each page of results, the $start value will be passed to the server as a URL parameter, driving a new set of results using the XPath predicate. let let let let $perpage := xs:integer(request:get-parameter("perpage", "10")) $start := xs:integer(request:get-parameter("start", "0")) $end := $start + $perpage $results := for $hit in $sorted-hits[$start to $end] ...

We also need to provide links to each page of results. To do so, we will mimic Google's pagination links, which start by displaying 10 results per page, grow up to 20 results per page, and show previous and next results. Our pagination links will only show if there are more than 10 results, and will be a simple HTML list that can be styled with CSS.
let $perpage := xs:integer(request:get-parameter("perpage", "10")) let $start := xs:integer(request:get-parameter("start", "0")) let $total-result-count := count($hits) let $end := if ($total-result-count lt $perpage) then $total-result-count else $start + $perpage let $number-of-pages := xs:integer(ceiling($total-result-count div $perpage)) let $current-page := xs:integer(($start + $perpage) div $perpage) let $url-params-without-start := replace(request:get-query-string(), '&amp;start=\d+', '') let $pagination-links :=

Keyword Search
if ($total-result-count = 0) then () else <div id="search-pagination"> <ul> { (: Show 'Previous' for all but the 1st page of results :) if ($current-page = 1) then () else <li><a href="{concat('?', $url-params-without-start, '&amp;start=', $perpage * ($current-page - 2)) }">Previous</a></li> }

148

{ (: Show links to each page of results :) let $max-pages-to-show := 20 let $padding := xs:integer(round($max-pages-to-show div 2)) let $start-page := if ($current-page le ($padding + 1)) then 1 else $current-page - $padding let $end-page := if ($number-of-pages le ($current-page + $padding)) then $number-of-pages else $current-page + $padding - 1 for $page in ($start-page to $end-page) let $newstart := $perpage * ($page - 1) return ( if ($newstart eq $start) then (<li>{$page}</li>) else <li><a href="{concat('?', $url-params-without-start, '&amp;start=', $newstart)}">{$page}</a></li> ) }

{ (: Shows 'Next' for all but the last page of results :) if ($start + $perpage ge $total-result-count) then () else <li><a href="{concat('?', $url-params-without-start, '&amp;start=', $start + $perpage)}">Next</a></li> } </ul> </div>

Keyword Search We should also provide a plain English summary of the search results, in the form "Showing all 5 of 5 results", or "Showing 10 of 1200 results." let $how-many-on-this-page := (: provides textual explanation about how many results are on this page, : i.e. 'all n results', or '10 of n results' :) if ($total-result-count lt $perpage) then concat('all ', $total-result-count, ' results') else concat($start + 1, '-', $end, ' of ', $total-result-count, ' results')

149

Putting it All Together


Here is the complete search.xq, with some CSS to make the results look nice. This search XQuery is quite long, and lends itself well to refactoring by moving sections of code into separate functions. File='/db/test/search.xq'
xquery version "1.0";

import module namespace kwic="http://exist-db.org/xquery/kwic" at "xmldb:exist:///db/test/kwic.xql";

declare namespace test="http://en.wikibooks.org/wiki/XQuery/test";

declare option exist:serialize "method=xhtml media-type=text/html";

let $q := xs:string(request:get-parameter("q", "")) let $filtered-q := replace($q, "[&amp;&quot;-*;-`~!@#$%^*()_+-=\[\]\{\}\|';:/.,?(:]", "") let $scope := ( collection('/db/test/articles')/test:article/test:body, collection('/db/test/people')/test:person/test:biography ) let $search-string := concat('$scope', '[. &amp;= "', $filtered-q, '"]') let $hits := util:eval($search-string) let $sorted-hits := for $hit in $hits let $keyword-matches := text:match-count($hit) let $hit-node-length := string-length($hit) let $score := $keyword-matches div $hit-node-length order by $score descending return $hit let $perpage := xs:integer(request:get-parameter("perpage", "10")) let $start := xs:integer(request:get-parameter("start", "0"))

Keyword Search
let $total-result-count := count($hits) let $end := if ($total-result-count lt $perpage) then $total-result-count else $start + $perpage let $results := for $hit in $sorted-hits[position() = ($start + 1 to $end)] let $collection := util:collection-name($hit) let $document := util:document-name($hit) let $config := <config xmlns="" width="60"/> let $base-uri := replace(request:get-url(), 'search.xq$', '') return if ($collection = '/db/test/articles') then let $title := doc(concat($collection, '/', $document))//test:title/text() let $summary := kwic:summarize($hit, $config) let $url := concat('view-article.xq?article=', $document) return <div class="result"> <p> <span class="title"><a href="{$url}">{$title}</a></span><br/> {$summary/*}<br/> <span class="url">{concat($base-uri, $url)}</span> </p> </div> else if ($collection = '/db/test/people') then let $title := doc(concat($collection, '/', $document))//test:name/text() let $summary := kwic:summarize($hit, $config) let $url := concat('view-person.xq?person=', $document) return <div class="result"> <p> <span class="title"><a href="{$url}">{$title}</a></span><br/> {$summary/*}<br/> <span class="url">{concat($base-uri, $url)}</span> </p> </div> else let $title := concat('Unknown result. Collection: ', $collection, '. Document: ', $document, '.') let $summary := kwic:summarize($hit, $config) let $url := concat($collection, '/', $document) return <div class="result"> <p>

150

Keyword Search
<span class="title"><a href="{$url}">{$title}</a></span><br/> {$summary/*}<br/> <span class="url">{concat($base-uri, $url)}</span> </p> </div> let $number-of-pages := xs:integer(ceiling($total-result-count div $perpage)) let $current-page := xs:integer(($start + $perpage) div $perpage) let $url-params-without-start := replace(request:get-query-string(), '&amp;start=\d+', '') let $pagination-links := if ($number-of-pages le 1) then () else <ul> { (: Show 'Previous' for all but the 1st page of results :) if ($current-page = 1) then () else <li><a href="{concat('?', $url-params-without-start, '&amp;start=', $perpage * ($current-page - 2)) }">Previous</a></li> }

151

{ (: Show links to each page of results :) let $max-pages-to-show := 20 let $padding := xs:integer(round($max-pages-to-show div 2)) let $start-page := if ($current-page le ($padding + 1)) then 1 else $current-page - $padding let $end-page := if ($number-of-pages le ($current-page + $padding)) then $number-of-pages else $current-page + $padding - 1 for $page in ($start-page to $end-page) let $newstart := $perpage * ($page - 1) return ( if ($newstart eq $start) then (<li>{$page}</li>) else <li><a href="{concat('?', $url-params-without-start, '&amp;start=', $newstart)}">{$page}</a></li> ) }

Keyword Search
(: Shows 'Next' for all but the last page of results :) if ($start + $perpage ge $total-result-count) then () else <li><a href="{concat('?', $url-params-without-start, '&amp;start=', $start + $perpage)}">Next</a></li> } </ul> let $how-many-on-this-page := (: provides textual explanation about how many results are on this page, : i.e. 'all n results', or '10 of n results' :) if ($total-result-count lt $perpage) then concat('all ', $total-result-count, ' results') else concat($start + 1, '-', $end, ' of ', $total-result-count, ' results') return

152

<html> <head> <title>Keyword Search</title> <style> body {{ font-family: arial, helvetica, sans-serif; font-size: small }} div.result {{ margin-top: 1em; margin-bottom: 1em; border-top: 1px solid #dddde8; border-bottom: 1px solid #dddde8; background-color: #f6f6f8; }} #search-pagination {{ display: block; float: left; text-align: center; width: 100%; margin: 0 5px 20px 0; padding: 0; overflow: hidden; }} #search-pagination li {{ display: inline-block; float: left; list-style: none; padding: 4px; text-align: center;

Keyword Search
background-color: #f6f6fa; border: 1px solid #dddde8; color: #181a31; }} span.hi {{ font-weight: bold; }} span.title {{ font-size: medium; }} span.url {{ color: green; }} </style> </head> <body> <h1>Keyword Search</h1> <div id="searchform"> <form method="GET"> <p> <strong>Keyword Search:</strong> <input name="q" type="text" value="{$q}"/> </p> <p> <input type="submit" value="Search"/> </p> </form> </div>

153

{ if (empty($hits)) then () else ( <h2>Results for keyword search &quot;{$q}&quot;. {$how-many-on-this-page}.</h2>, <div id="searchresults">{$results}</div>, <div id="search-pagination">{$pagination-links}</div> ) } </body> </html> Displaying

Keyword Search

154

References
[1] [2] [3] [4] [5] http:/ / demo. exist-db. org/ exist/ xquery. xml#ftidx http:/ / developer. marklogic. com/ svn/ lib-search/ trunk/ docs/ XQuery%20Injection%20Audit. txt http:/ / cwe. mitre. org/ data/ definitions/ 652. html http:/ / searchsecuritychannel. techtarget. com/ generic/ 0,295582,sid97_gci1304701,00. html http:/ / markmail. org/ message/ syghikh5yac2pajj

Latent Semantic Indexing


Motivation
You have a collection of documents and for any document you want to find out what documents are the most similar to any given document.

Method
We will use a text-mining technique called "Latent Semantic Indexing". We will first create a matrix of all concept words (terms) by all the documents. Each cell will have the frequency count of terms in each document. We then send this term-document matrix to a service that performs a standard Singular Value Decomposition or SVD. SVD is a very compute-intensive algorithm that can take many hours or days of calculation if you have a large number of words and documents. The SVD service then return a set of "Concept Vectors" that can be used to group related documents.

Sample Data
To keep the example simple, we will just use the document titles, not the full documents. Here are some document titles: XQuery Tutorial and Cookbook XForms Tutorial and Cookbook Auto-generation of XForms with XQuery Building RESTful Web Applications with XRX XRX Tutorial and Cookbook XRX Architectural Overview The Return on Investment of XRX Our first step will be to build a Word-Document Matrix. This matrix has all the words in the document in a column and one column for each document. We will do this in several steps. 1. Get all the words from all the documents an put them into a single sequence 2. Create a list of the distinct words that are not "stop words" 3. For each word: 1. For each document count the frequency that this word appears in the document

Latent Semantic Indexing

155

Sample Word-Document Matrix


Word Applications Architectural Auto-generation 0.03125 Building Cookbook Investiment Overview RESTful Return Tutorial Web XForms XQuery XRX 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 1 2 3 4 0.03125 0.03125 5 6 7

Sample Program Source


xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; (: this is where we get our data :) let $app-collection := '/db/apps/latent-semantic-analysis' let $data-collection := concat($app-collection , '/data') (: get all the titles where $titles is a sequence of titles :) let $titles := collection($data-collection)/html/head/title/text() let $doc-count := count($titles) (: A list of words :) let $stopwords := <words> <word>a</word> <word>and</word> <word>in</word> <word>the</word> <word>of</word> <word>or</word> <word>on</word> <word>over</word>

Latent Semantic Indexing


<word>with</word> </words> (: a sequence of words in all the document titles :) (: the \s is the generic whitespace regular expression :) let $all-words := for $title in $titles return tokenize($title, '\s') (: just get a distinct list of the sorted words that are not stop words :) let $concept-words := for $word in distinct-values($all-words) order by $word return if ($stopwords/word = lower-case($word)) then () else $word let $total-word-count := count($all-words) return <html> <head> <title>All Document Words</title> </head> <body> <p>Doc count =<b>{$doc-count}</b> Word count = <b>{$total-word-count}</b></p> <h2>Documents</h2> <ol> {for $title in $titles return <li>{$title}</li> } </ol> <h2>Word-Document Matrix</h2> <table border="1"> <thead> <tr> <th>Word</th> {for $doc at $count in $titles return <th>{$count}</th> } </tr>

156

Latent Semantic Indexing


</thead> {for $word in $concept-words return <tr> <td>{$word}</td> {for $title in $titles return <td>{if (contains($title, $word)) then (1 div $total-word-count) else (' ')}</td> } </tr> } </table> </body> </html>

157

Creating Sigma Values


The Sigma matrix is a matrix that is multiplied by both the word vectors and the documents vectors: [Word Document Matrix] = [Word Vectors] X [Sigma Values] X [Document Vectors]

Limiting Child Trees


Motivation
You have a tree of data and you want to limit the results to a given level of a tree.

Sample Data
Assume we have an org chart that has the following structure: <position title="President" name="Peg Prez"> <position title="Vice President" name="Vic Vicepres"> <position title="Director" name="Dan Director"> <position title="Manager" name="Marge Manager"> <position title="Supervisor" name="Sue Supervisor"> <position title="Project Manager" name="Pete Project"/> </position> </position> </position> </position> <position title="CFO" name="Barb Beancounter"/> </position> <position title="CIO" name="Tracy Technie"/> </position> </position>

Limiting Child Trees To display an org chart you only want to display the individual and their direct reports.

158

Approach
We will use computed element and doc('/db/my-org/apps/hr/data/positions.xml')/position {for $subelement in $positions/position return element {name($subelement)} {for $attribute in $subelement/@* return attribute {name($attribute)} {$attribute} , $subelement/text()} } attribute constructors. let $positions :=

Link gathering
Motivation
You want to gather the links on a blog page.

Method
We use the doc() function to perform an HTTP GET on a remote web page. If the page is a well formed XML file you can then extract all the unorder list items by adding a ul predicate to the doc function. This script fetches the blog page and selects the urls in the link section, which reference other blog articles. Each referenced article is fetched and the urls marked as external are selected. The result is returned as XML.
declare namespace q = "http://www.w3.org/1999/xhtml";

<results> { let $nav := doc("http://www.advocatehope.org/tech-tidbits/theory-of-the-web-as-one-big-database")//q:ul[@class="portletNavigationTree navTreeLevel0"] for $href in $nav//@href let $page := data($href) let $content := doc($page)//q:div[@id="content"] for $links in $content//q:a[@title="external-link"] return <link>{ data($links/@href) }</link> } </results>

Execute [1]

Link gathering

159

Version 2
Dropping the intermediate variables allows the structure to be seen more clearly: declare namespace q = "http://www.w3.org/1999/xhtml"; let $uri := "http://www.advocatehope.org/tech-tidbits/theory-of-the-web-as-one-big-database" return <results> { for $page in doc($uri)//q:ul[@class="portletNavigationTree navTreeLevel0"]//@href for $link in doc($page)//q:div[@id="content"]//q:a[@title="external-link"]/@href return <link>{data($link)}</link> } </results> Execute [2]

Repository Schemas
Daniel is proposing a standard for supporting the extraction of data such as this from a site. Such a schema would define a view of a set of documents sufficient to allow the extraction above to be based on the schema. We can go some way towards this with a view schema represented as an ER model, with added implementation-dependent paths.
<model name="blog-links">

<type name="url" datatype="string"/> <entity name="page" > max="N" path="//q:ul[@class='portletNavigationTree navTreeLevel0']//@href" path="//q:div[@id='content']//q:a[@title='external-link']/@href" type="page"/> type="url"/>

<attribute name="inner"

<attribute name="external" max="N" </entity> </model>

This schema can then be used by a generic script link gathering script: let $start := request:get-parameter("page",()) let $view := request:get-parameter("view",()) let $schema := doc($view) let $inner := $schema//entity[@name='page']/attribute[@name='inner']/@path let $external := $schema//entity[@name='page']/attribute[@name='external']/@path return <results> { for $page in util:eval(concat('doc($start)',$inner)) for $link in util:eval(concat('doc($page)',$external))

Link gathering return <link>{string($link)}</link> } </results> This script now performs the task of link gathering on any site whose page structure can be defined in terms of the schema with appropriate paths. Execute [3]

160

Relative and absolute URIs


The previous version works only if the URIs are absolute. A little more work is needed if not: declare namespace q = "http://www.w3.org/1999/xhtml"; declare variable $start := request:get-parameter("page",()); declare variable $view := request:get-parameter("view",()); declare variable $schema := doc($view); declare variable $base := substring-before($start,local:local-uri($start)); declare function local:local-uri($uri) { if (contains($uri,"/")) then local:local-uri(substring-after($uri,"/")) else $uri }; declare function local:absolute-uri($url) { if (starts-with($url,"http://")) then $url else concat($base,$url) }; let $inner := $schema//entity[@name='page']/attribute[@name='inner']/@path let $external := $schema//entity[@name='page']/attribute[@name='external']/@path let $starturi := local:absolute-uri($start) return <results> { for $page in util:eval(concat('doc($starturi)',$inner)) let $pageuri := local:absolute-uri($page) for $link in util:eval(concat('doc($pageuri)',$external)) return <link>{string($link)}</link> }

Link gathering </results> So with a different schema - same model, different paths:
<model name="site-links"> <type name="url" datatype="string"/> <entity name="page" >

161

<attribute name="inner" max="N" path="//div[@class='nav']//a/@href" type="page"/> <attribute name="external" max="N" path="//div[@class='content']//a/@href" type="url"/> </entity> </model>

which is a view schema of this test site [4] Execute [5]

Virtual Paths
The navigation path is still hard-coded in the script. We would like to write path expressions where the steps are defined in the schema. This path would then be interpreted in the context of the schema.

View Schema
In this example, the test site has been expanded to include a separate index page and some additional components in the view:
<model name="site-links"> <entity name="externalPage"> <attribute name="title" path="/head/title"/> </entity> <entity name="index"> <attribute name="link" max="N" path="//div[@class='index']//a/@href" type="page"/> </entity> <entity name="page"> <attribute name="title" path="//head/title"/> <attribute name="inner" max="N" path="//div[@class='nav']//a/@href" type="page"/> <attribute name="external" max="N" path="//div[@class='content']//a/@href" type="externalPage"/> <attribute name="author" min="0" path="//div[@class='content']/span[@class='author']"/> </entity> </model>

Index [6]

Link gathering

162

Path language
This prototype uses a simple path language.The step -> dereferences a relative or absolute URL. Where a step is recognised as an attribute of the current entity, the associated path expression is used, otherwise the step is executed as XPath. The first step identifies the (entity) type of the initial document. For example: index/link/->/title List the titles of the pages in the index. import module namespace vp ="http://www.cems.uwe.ac.uk/xmlwiki/vp" at "../Gov/vp.xqm"; let $uri := "http://www.cems.uwe.ac.uk/xmlwiki/Gov/site/index.html" let $schema := "/db/Wiki/Gov/site3.xml" return <result> {vp:process-path($uri,"index/link/->/title",$schema) } </result> Run [7] index/link/->/author/string(.) List the authors of the pages referenced in the index. import module namespace vp ="http://www.cems.uwe.ac.uk/xmlwiki/vp" at "../Gov/vp.xqm"; let $uri := "http://www.cems.uwe.ac.uk/xmlwiki/Gov/site/index.html" let $schema := "/db/Wiki/Gov/site3.xml" return <result> {vp:process-path($uri,"index/link/->/author/string(.)",$schema) } </result> Run [8] page/inner/->/external List the url of all distinct external links of all pages referenced by the index page. import module namespace vp ="http://www.cems.uwe.ac.uk/xmlwiki/vp" at "../Gov/vp.xqm"; declare option exist:serialize "method=xhtml media-type=text/html"; let $uri := "http://www.cems.uwe.ac.uk/xmlwiki/Gov/site/index.html" let $schema := "/db/Wiki/Gov/site3.xml" return <ul> {for $uri in distinct-values(vp:process-path($uri,"index/link/->/external",$schema))

Link gathering order by $uri return <li> <a href="{$uri}">{string($uri)}</a> </li> } </ul> Run [9] page/inner/->/inner/->/title List the titles of pages linked to the initial page. import module namespace vp ="http://www.cems.uwe.ac.uk/xmlwiki/vp" at "../Gov/vp.xqm"; let $uri := "http://www.cems.uwe.ac.uk/xmlwiki/Gov/site/test1.html" let $schema := "/db/Wiki/Gov/site3.xml" return <result> {vp:process-path($uri,"page/inner/->/inner/->/title",$schema) } </result> Run [10]

163

Script
The core function processes a virtual path in the context of a schema. declare function vp:process-steps($nodes,$context,$steps,$base,$schema) { if (empty($steps)) then $nodes else let $step := $steps[1] let $entity := $schema//entity[@name=$context] return if ( $step = "->" ) then let $newnodes := for $node in $nodes return vp:get-doc($node,$base) return vp:process-steps($newnodes, $context, subsequence($steps,2),$base,$schema) else if ($entity/attribute[@name=$step]) then let $attribute :=$entity/attribute[@name=$step] let $next :=

Link gathering string($schema//entity[@name=$attribute/@type]/@name) let $path := string($attribute/@path) let $newnodes := for $node in $nodes let $newnode := util:eval(concat("$node",$path)) return $newnode return vp:process-steps($newnodes, $next, subsequence($steps,2),$base,$schema) else let $newnodes := for $node in $nodes let $newnode := util:eval(concat("$node/",$step)) return $newnode return vp:process-steps($newnodes, $context, subsequence($steps,2),$base,$schema) };

164

Acknowledgments
This example is based on an article by Daniel Bennett [11].

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ urlDiscover. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ urlDiscover3. xq [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ urlDiscover5. xq?page=http:/ / www. advocatehope. org/ tech-tidbits/ theory-of-the-web-as-one-big-database& view=/ db/ Wiki/ Gov/ blog. xml [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ site/ test1. html [5] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ urlDiscover6. xq?page=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ site/ test1. html& view=/ db/ Wiki/ Gov/ site. xml [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ site/ index. html [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ test1. xq [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ test2. xq [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ test3. xq? [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ test4. xq [11] http:/ / www. advocatehope. org/ tech-tidbits/ theory-of-the-web-as-one-big-database

List OWL Classes

165

List OWL Classes


Motivation
You want a simple XQuery program that will extract all the OWL classes from an OWL file that is coded using RDF.

Method
We will start by just selecting all the classes in the file that have an name. In this example the names are stored in the rdf:ID attribute of the class like the following: <owl:Class rdf:ID="Wine"> For our example we will use the wine ontology [1] Used in the W3C OWL Guide [2]: Our XQuery will specifically get all the RDF tags with the "owl:Class" element in the file. Here is a simple XQuery that returns all the Classes in the wine ontology. To you this script you can load it into a collection such as /db/apps/owl/views/classes.xq and the RDF data files can be loaded into /db/apps/owl/data /db/apps/owl/views/classes.xq xquery version "1.0"; declare declare declare declare namespace namespace namespace namespace xsd="http://www.w3.org/2001/XMLSchema"; rdfs="http://www.w3.org/2000/01/rdf-schema#"; rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; owl="http://www.w3.org/2002/07/owl#";

declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; let $title := 'List of OWL Classes' let $data-collection := '/db/apps/owl/data' let $file := request:get-parameter('file', 'wine.rdf') let $file-path := concat($data-collection, '/', $file) (: we only want classes that have an ID. Other classes are not named classes. :) let $classes := doc($file-path)//owl:Class[@rdf:ID] (: sort the list :) let $ordered-classes := for $class in $classes order by $class/@rdf:ID return $class return <html>

List OWL Classes <head> <title>{$title}</title> </head> <body> <file>File Path: {$file-path}</file> <p>Number of Classes = {count($classes)}</p> <ol> {for $class in $ordered-classes let $class-name := string($class/@rdf:ID) return <li>{$class-name}</li> } </ol> </body> </html>

166

Sample Results
The results will be an HTML file with a ordered list: File Path: /db/org/syntactica/apps/owl/data/wine.rdf Number of Classes = 74 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. ... AlsatianWine AmericanWine Anjou Beaujolais Bordeaux Burgundy CabernetFranc CabernetSauvignon CaliforniaWine Chardonnay

Other Tools
There are several other tools for working with OWL files that are very useful. One is to list all of the properties in an OWL file or list all the properties of a class. These reports can then be used to load the class or property into an XForms application for editing/versioning/workflow and approval.

References
[1] http:/ / www. w3. org/ TR/ owl-guide/ wine. rdf [2] http:/ / www. w3. org/ TR/ owl-guide/

Login and Logout

167

Login and Logout


Motivation
You want to log users into the system and log them out.

Method
We will use the following functions to create login and logout forms: xmldb:login($collection, $user, $password, true()) session:create() session:invalidate()

Logging In
To login we need to first create a new session and then use this session to store our login information: session:create() xmldb:login($collection, $user, $password, true()) This changes the effective user executing the current query and stores that user information into the HTTP session, so subsequent queries within the same session will also execute with the same user rights. Note that you must use "true()" as the fourth argument to the login function.

Logging Out
To log a user out use: session:invalidate() as well as session:clear will remove the user binding from the session, which means that the next call to the query will run as guest. However, the currently executing query will still use the old non-guest user until it completes. (: if we are already logged in, are we logging out - i.e. set permissions back to guest :) if(request:get-parameter("logout",()))then ( let $null := xdb:login("/db", "guest", "guest") let $inval := session:invalidate() return false() ) else ( (: we are already logged in and we are not the guest user :) true() )

In this example we have both call to xdb:login() as guest and session:invalidate(). We want to do both, clear the session for future queries as well as reset the current user for the rest of the query.

Login and Logout

168

Timeout setting
You can also change the default timeout setting by changing the Jetty configuration file here: $EXIST_HOME/tools/jetty/etc/webdefault.xml By default the configuration files sets the session timeout to 30 minutes: <session-config> <session-timeout>30</session-timeout> </session-config> Note: In the future there may be xmldb:logout function which combines both steps. Another approach could be to handle the login/ logout within a controller.xql and thus separate it from the main query.

Lorum Ipsum text


Motivation
You want to create realistically-sized example XML for testing or demonstration. Lorum impsum text is often used to fill out the contents and it would be useful to add this text wherever needed in an XML file. We explore two approaches, one based on modifying the text and the other modifying the XML.

Approach 1 : string replacement


The places in the incomplete XML file where lorum ipsum text is to be placed is marked with ellipsis "...". The XML file is read, serialised to a string, split into parts, and the parts re-assembled adding a randomly chosen section of the lorum ipsum text in place of the ellipsis. The string is then turned back into XML for output. The base lorum ipsum text is stored as an XML file: http://www.cems.uwe.ac.uk/xmlwiki/apps/lorumipsum/words.xml

Concepts used
XML <> string conversion : The script uses a pair of functions from the exist util module (util:serialize and util:parse) to convert back and forth between XML and a string. This allows the XML text to be operated on as a simple string before being converted back to XML recursion : interpolating the random text into the original string requires a recursive function regular expressions: reg exps are used to tokenise the lorum ipsum text and the incomplete XML file containing ellipsis

XQuery
declare function local:join-random($parts,$words) { if (count($parts) > 1) then let $randomtext :=string-join(subsequence ($words,util:random(100), util:random(100))," ") return string-join(($parts[1],$randomtext, local:join-random(subsequence($parts,2), $words)),"")

Lorum Ipsum text else $parts }; let $lorumipsum := doc("/db/Wiki/apps/lorumipsum/words.xml")/lorumipsum let $words := tokenize($lorumipsum,"\s+") let $file := request:get-parameter("file",()) let $doc := doc($file)/* let $docText := util:serialize($doc,"media-type=text/xml method=xml") let $parts := tokenize($docText, "\.\.\.") let $completedText := local:join-random($parts,$words) return util:parse($completedText)

169

Example
incomplete XML [1] XML with ellipsis replaced with ipsum lorum text [2]

Explanation
the lorum ipsum text is split into words by tokenising on whitespace the incomplete XML is fetched and the root element accessed. this element is converted to a string using the util:serialize function, then tokenized with the pattern "\.\.\.\" (not "..." since . means any single character in regular expressions) the recursive function join-random() joins the first of a sequence of strings with a random stretch of the lorum ipsum text with the remainder of the strings similarly joined the expanded text is converted back to an XML element using util:parse()

Improvements
the lorum ipsum text itself could be generated rather than stored. the script could be parameterized for the lorum impsum file, allowing different, perhaps more realistic text to be used. the lorum ipsum words are passed as a parameter to the recursive function. This could be defined in a global variable instead. It would be better to use the httpclient module to fetch the files and control the caching via headers - here the file is being cached

Approach 2 - XML replacement


The choice of ellipsis as marker is problematic if this is to appear in the text. The conversion into text and back to XML is an overhead. An alternative approach would be to use an XML element, for example <ipsum/> to mark the places where ipsum lorum text is to appear and replace every occurrance with random word. The replacement of a specific element anywhere in the XML tree can be accomplished by modifying the identify transformation discussed in XQuery/Filtering_Nodes.

Lorum Ipsum text

170

Concepts
recursion - to copy an arbitrary XML tree, replacing a given element with random text.

XQuery
declare variable $lorumipsum := doc("/db/Wiki/apps/lorumipsum/words.xml")/lorumipsum; declare variable $words := tokenize($lorumipsum,"\s+"); declare variable $marker:= "ipsum"; declare function local:copy-with-random($element as element()) as element() { element {node-name($element)} {$element/@*, for $child in $element/node() return if ($child instance of element()) then if (name($child) = $marker) then subsequence($words,util:random(100),util:random(100)) else local:copy-with-random($child) else $child } }; let $file := request:get-parameter("file",()) let $root := doc($file)/* return local:copy-with-random($root)

Explanation
the sequence of ipsum lorum words are held in a global variable to avoid passing it as a parameter to the recursive function. The copy-with-random() function recursively copies the elements and items in a tree to a new tree When the element with the name "ipsum" is encountered, a selection of ipsum lorem text is returned instead of the original element.

Lorum Ipsum text

171

Example
incomplete XML [3] XML with ellipsis replaced with ipsum lorum text [4]

Discussion
The second approach is simpler. Performance is about the same.

Acknowledgements
the sample XML is an extract from "Search: The Graphics Web Guide", Ken Coupland,Laurence King Publishing (2002)

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Coupland/ ex-flat. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ apps/ lorumipsum/ complete. xq?file=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Coupland/ ex-flat. xml [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Coupland/ ex-flat-x. xml [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ apps/ lorumipsum/ complete2. xq?file=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Coupland/ ex-flat-x. xml

Lucene Search
Motivation
You want to perform a full text keyword search on one or more XML documents. This is done using the Lucene index extensions to eXist.

Background
The Apache Lucene full text search framework was added to eXist 1.4 as a full text index, replacing the previous native full text index. The new Lucene full text search framework is faster, more configurable, and more feature-rich than eXist's legacy full text index. It will also be the basis for an implementation of the W3C's full text extensions for XQuery. eXist associates a distinct node-id with each node in an XML document. This node-id is used as the Lucene document ID in the Lucene index files, that is, each XML node becomes a Lucene document. This means that you can customize to a very high degree the search weight of keyword matches to every node in your document. So, for example, a match of a keyword within a title can have a higher score than a match in the body of a document. This means that a search hit retrieving a document title in a large number of documents will have a higher probability of being ranked first in your search results. This means your searches will have higher Precision and Recall than search systems that do not retain document structure.

Lucene Search

172

eXist and Lucene Documentation


The following is the eXist documentation on how to use Lucene: eXist Lucene Documentation [7] eXist supports the full Lucene Query Parser Syntax (with the exception of "fielded search"): Lucene Query Parser Syntax [1]

Sample XML File


<test> <p n="1">this paragraph tests the things made up by <name>ron</name>.</p> <p n="2">this paragraph tests the other issues made up by <name>edward</name>.</p> </test>

Setting up a Lucene Index


In order to perform Lucene-indexed, full text searching of this document, we need to create an index configuration file, collection.xconf, describing which elements and attributes should be indexed, and the various details of that indexing:
<collection xmlns="http://exist-db.org/collection-config/1.0"> <index> <!-- Enable the legacy full text index for comparison with Lucene --> <fulltext default="all" attributes="no"/> <!-- Lucene index is configured below --> <lucene> <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> <analyzer id="ws" class="org.apache.lucene.analysis.WhitespaceAnalyzer"/> <text match="//test"/> </lucene> </index> </collection>

Notes: If your test data are saved in db/test, you should save collection.xconf in db/system/config/db/test. Index configuration files are always saved in a directory structure inside system/config/db which is isomorphic to the directory structure of db. After you create or update this index configuration file, you will need to reindex the data. You can do this either by using the eXist Java-based admin client, selecting the test collection and choosing "Reindex collection", or by using the xmldb:reindex() [2] function, supplying xmldb:reindex('/db/test') in eXide or in the XQuery Sandbox. Although the legacy full text index is not needed for Lucene-based search, we have explicitly enabled it here for this example configuration in order to point out the expressive similarities between the Lucene and legacy search functions/operators (i.e. Lucene's ft:query() vs. the legacy full text index's &=, |=, near(), text:match-all(), text:match-any()).

Lucene Search

173

Indexing Strategies
You can either define a Lucene index on a single element or attribute name (qname="...") or on a node path (match="..."). If you define an index on a qname, such as <text qname="test"/>, an index is created on <test> alone. What is passed to Lucene is the string value of <test>, which includes the text of all its descendant text nodes. With such an index, one cannot search for the nodes below <test>, e.g. for <p> or <name>, since such nodes have all been collapsed. If you want to be able to query descendant nodes, you should set up additional indexes on these, such as <text qname="p"/> or <text qname="name"/>. If you define an index on a node path, as above with <text match="//test"/>, the node structure below <test> is maintained in the index and you can still query descendant nodes, such as <p> or <name>. This can be seen as a shortcut to establishing an index on all elements below <test>. Be aware that, according to the documentation, this feature is "subject to change" [3]. When deciding which approach to use, you should consider which parts of your document will be of interest as context for full text query. How narrow or broad to make it is best decided when considering concrete search scenarios.

Standard Lucene query syntax


eXist can process Lucene searches expressed in two kinds of query syntax, Lucene's standard query syntax and an XML syntax specific to eXist. In this section the standard query syntax is presented. This is the syntax one can expect a user to input in a search field. A search for "Ron" in the current context will be expressed as [ft:query(., 'ron')]. The first argument holds the nodes to be searched, here ".", the current context node. The second argument supplies the query string, here simply the word "ron". The ft:query() function allows the use of Lucene wildcards. "?" can be used for a single character and "*" for zero, one or more characters: "edward" is found with "ed?ard" and "e*d". Lucene standard query syntax does not allow "*" and "?" to occur in the beginning of a word. Fuzzy searches, with "~" at the end of a word, make it possible to retrieve "ron" through "don~". One can quantify the fuzziness, by appending a number between 0.0f and 1.0f, making it possible to retrieve "ron" by [ft:query(., 'don~0.6')], but not by [ft:query(., 'don~0.7')]. The amount of fuzziness is based on the Levenshtein Distance, or Edit Distance algorithm.[4]. The default is 0.5. The boolean operators "AND" and "OR" can be used, with the expected semantics. There is a variant notation for this: [ft:query(., 'edward AND ron')] can also be written [ft:query(., '+edward +ron')]. [ft:query(., '+edward ron')] would require "edward", but not "ron", to be present. "NOT" can also be used: [ft:query(., 'edward NOT ron')] finds "edward" without "ron". "NOT" can also be represented with "-": [ft:query(., '+edward -ron')]. Operators can be grouped with parentheses, as in [ft:query(., '(edward OR ron) NOT things')]. Phrases can be searched for by putting them in quotation marks: [ft:query(., '"other issues"')]. Fields, proximity searches, range searches, boosting, and escaped reserved characters are not supported in eXist with queries using Lucene's standard query syntax. Boosting can be effected during indexing: eXist Lucene Documentation [5]. See Lucene Query Parser Syntax [1]

Lucene Search

174

Indexing
Since we have indexed the <test> element as a path, the index includes descendant nodes, and queries for nested elements therefore also return hits: collection('/db/test')/test/p/name[ft:query(., 'edward')] collection('/db/test')/test/p[ft:query(name, 'edward')] If we had indexed the qname test with <text qname="test"/>, we would not be able to do so.

Stopwords
The standard Lucene analyser, activated in the above collection.conf file with <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>, applies the Lucene default list of English stop words and removes the following words from the index: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with. If you wish to make these words searchable, comment out the StandardAnalyzer, remove id="ws" from <analyzer id="ws" class="org.apache.lucene.analysis.WhitespaceAnalyzer"/> and reindex the collection. Todo: How can the list of stopwords be customised?

Ranking
Lucene assigns a relevance score or rank to each match. The more frequently a word occurs in a document, the higher the score. This score is preserved by eXist and can be accessed through the score function, which returns a decimal value. for $m in collection('/db/test')//p[ft:query(., 'tests ron')] let $score := ft:score($m) order by $score descending return <hit score="{$score}">{$m}</hit> The higher the score, the more relevant is the hit.

Boosting Values
The configuration file can be set up to apply higher search weights to specific elements within your document. So for example a match of a keyword in the title of a book will rank that search higher than matches in the body of the book.

Legacy Full Text Search Vs. Lucene XML Search


The following queries are equivalent (apart from the index used):

Matching any terms


To express the "match any" (|=) legacy style full text query using the new Lucene query function: collection('/db/test')//p[. |= 'tests edward'] you would use the following: collection('/db/test')//p[ft:query(., <query>

Lucene Search <bool> <term occur="should">tests</term> <term occur="should">edward</term> </bool> </query>)]

175

matching all terms


To express the "match all" (&=) legacy full text query using the new Lucene query function: collection('/db/test')//p[. &= 'tests edward'] you would use the following: collection('/db/test')//p[ft:query(., <query> <bool> <term occur="must">tests</term> <term occur="must">edward</term> </bool> </query>)]

matching no terms
To express the "match none" (not + |=) legacy full text query using the new Lucene query function: collection('/db/test')//p[not(. |= 'issues edward')] you would use the following: collection('/db/test')//p[not(ft:query(., <query> <bool> <term occur="should">issues</term> <term occur="should">edward</term> </bool> </query>))] Note that the last one could not be expressed as: collection('/db/test')//p[ft:query(., <query> <bool> <term occur="not">issues</term> <term occur="not">edward</term> </bool> </query>)] because Lucene's NOT operator can't be used on its own, without the presence of a 'positive' search term.

Lucene Search

176

XML Query Syntax vs. Default Lucene Syntax


Following queries are equivalent, and can be tested against the Shakespeare examples shipped with eXist, by supplying them as value for $query in this XQuery snippet: declare option exist:serialize "highlight-matches=both"; let $query := 'query' return //SPEECH[ft:query(., $query)]
search type 'atomic', match any term Lucene syntax fillet snake XML syntax <query> <term>fillet</term> <term>snake</term> </query> <query> <bool> <term occur="must">fillet</term> <wildcard occur="must">snake</wildcard> </bool> </query> <query> <bool> <term occur="not">fillet</term> <term occur="must">snake</term> </bool> </query> <query> <bool> <term occur="must">fillet</term> <wildcard occur="must">sn*e</wildcard> </bool> </query> <query> <bool> <term occur="must">fillet</term> <regex occur="must">sn.*e</regex> </bool> </query> "fillet snake" <query> <phrase>fillet snake</phrase> </query> <query> <near>fillet snake</near> </query> proximity search "fillet snake"~1 <query> <near slop="1"> <term>fillet</term> <term>snake</term> </near> </query>

'atomic', match all terms

+fillet +snake

'atomic', match only some terms

-fillet +snake

'atomic', with wildcards

+fillet +sn*e

'atomic', with regex

phrase search

Lucene Search

177
<query> <near slop="1" ordered="no"> <term>snake</term> <term>fillet</term> </near> </query> snake~ <query> <fuzzy>snake</fuzzy> </query> <query> <fuzzy min-similarity="0.3">snake</fuzzy> </query>

proximity search, unordered

fuzzy search, no similarity parameter

fuzzy search, with similarity parameter

snake~0.3

Mind the gaps in the table above! In standard Lucene syntax you can't express: regular expressions: this is a unique feature of eXist's XML query syntax, by means of the <regex> element ordering of proximity search terms: this is a unique feature of eXist's XML query syntax, by means of the @ordered attribute on <near> Finally, a more complex case, in which boolean operator are grouped to override default priority rules:
search type Lucene syntax XML syntax <query> <bool> <bool occur="must"> <term occur="should">fillet</term> <term occur="should">malice</term> </bool> <term occur="must">snake</term> </bool> </query>

groups of boolean search operators (fillet OR malice) AND snake

Note how: grouping in standard Lucene syntax can be expressed with nesting in XML syntax for nested <bool> operators, the @occur attribute can be specified as well

Notes on Using Wildcard


Note that if you include a wildcard in your string the <wildcard> element must be used to enclose the string: The following: //SPEECH[ft:query(., 'fennny sna*')] is equivlant to: xquery version "1.0"; let $query := <query> <term>fen</term> <wildcard>sna*</wildcard> </query> return

Lucene Search //SPEECH[ft:query(., $query)]

178

References
eXist Lucene XML Syntax [6] blog posting by Ron Van den Branden

References
[1] [2] [3] [4] [5] [6] http:/ / lucene. apache. org/ java/ 2_9_1/ queryparsersyntax. html http:/ / demo. exist-db. org/ exist/ functions/ xmldb/ reindex http:/ / exist-db. org/ lucene. html#N1018D http:/ / en. wikipedia. org/ wiki/ Levenshtein_distance http:/ / exist-db. org/ lucene. html#N102D5 http:/ / rvdb. wordpress. com/ 2010/ 08/ 04/ exist-lucene-to-xml-syntax

Multiple page scraping and Voting behaviour


Often the necessary data is spread over multiple web pages. Here is an example where data is taken from multiple pages to gather together the voting behaviour of a member in the US House of Representatives. An index of the issues in any session of the House are provided by pages such as [1]. For here, one can see that the pages reporting on any of sequentially numbered votes are generated by queries such as [2] The results are returned as an XML page rendered in a browser using XSLT. The XQuery doc() function retrieves the underlying XML. The following query aggregates the voting behavior for a specific member over 6 specific votes: <results> {for $i in 10 to 15 let $path := concat("http://clerk.house.gov/evs/2007/roll0",$i,".xml") let $report := doc($path) let $bill := $report//vote-metadata let $specificvote := $report//recorded-vote[legislator/@name-id = "E000215"] let $result := concat(data($specificvote//legislator)," voted ",data($specificvote/vote)," ",data($bill/vote-question)," of ",data($bill//legis-num)) return <result>{$result}</result> } </results> Execute [3] More generally, the following function will return an XML node containing the extracted data. In general the vote pages encode the roll number with leading zeros, with minimum length of 3 digits: declare function local:voting($repid as xs:string, $year as xs:integer, $rollnumbers as xs:integer*) { for $rollno in $rollnumbers let $zeropaddedrollnum := concat(string-pad("0",max((0,3 -

Multiple page scraping and Voting behaviour string-length(xs:string($rollno))))),xs:string($rollno)) let $path := concat("http://clerk.house.gov/evs/",$year,"/roll",$zeropaddedrollnum,".xml") let $report := doc($path) let $bill := $report//vote-metadata let $specificvote := $report//recorded-vote[legislator/@name-id = $repid] return <result> <year>{$year}</year> {$bill/rollcall-num} {$bill/vote-question} {$bill/legis-num} {$specificvote/legislator} {$specificvote/vote} </result> }; <report> {local:voting("E000215",2007,10 to 15)} </report> Execute [4] Note. It would be preferable to use the asp endpoint since this does not involve the complication arising here from leading zeros, but that produces mal-formed XML (??)

179

References
[1] [2] [3] [4] http:/ / clerk. house. gov/ evs/ 2007/ ROLL_000. asp http:/ / clerk. house. gov/ cgi-bin/ vote. asp?year=2007& rollnumber=10 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ voting. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ voteAnalysis. xq

MusicXML to Arduino

180

MusicXML to Arduino
Motivation
You want to play music available in MusicXML format on an Arduino.

Approach
Fetch the Music XML file (either plain XML or compressed) and transform one monophic part to code to be included in an Arduino sketch.

Script
(: ~ : convert a monotonic part in a MusicXML score to an Arduino code fragment suitable to include in a sketch : :@param uri - the uri of the MusicXML file :@param part - the id of the part to be converted to midi notes :@return text containing Arduino statements to : : set the tempo, : define the array of midi notes a : define a parallel array of note durations in beats :@author Chris Wallace :) (: offsets of the letters ABCDEFG from C :) declare namespace fw = "http://www.cems.uwe.ac.uk/xmlwiki/fw"; declare variable $fw:step2offset := (9,11,0,2,4,5,7); declare function fw:filter($path as xs:string, $type as xs:string, $param as item()*) as xs:boolean { (: pass all :) true() }; declare function fw:process($path as xs:string,$type as xs:string, $data as item()? , $param as item()*) { (: return the XML :) $data }; declare function fw:unzip($uri) { let $zip := httpclient:get(xs:anyURI($uri), true(), ())/httpclient:body/text() let $filter := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:filter"),3)

MusicXML to Arduino let $process := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:process"),4) let $xml := compression:unzip($zip,$filter,(),$process,()) return $xml };

181

declare function fw:MidiNote($thispitch as element() ) as xs:integer { let $step := $thispitch/step let $alter := if (empty($thispitch/alter)) then 0 else xs:integer($thispitch/alter) let $octave := xs:integer($thispitch/octave) let $pitchstep := $fw:step2offset [ string-to-codepoints($step) - 64] return 12 * ($octave + 1) + $pitchstep + $alter } ; declare function fw:mxl-to-midi ($part) { for $note in $part//note return element note { attribute midi { if ($note/rest) then 0 else fw:MidiNote($note/pitch)}, attribute duration { ($note/duration, 1) [1] } } }; declare function fw:notes-to-arduino ($notes as element(note)*) as element(code) { (: create the two int arrays for inclusion in an Arduino sketch :) <code> int note_midi[] = {{&#10; { string-join( for $midi at $i in $notes/@midi return concat(if ($i mod 10 eq 0) then "&#10;" else (),$midi) ,", ") } }}; int note_duration[] = {{&#10; { string-join( for $duration at $i in $notes/@duration return concat(if ($i mod 10 eq 0) then "&#10;" else (),$duration) ,", ") }

MusicXML to Arduino }}; </code> }; declare option exist:serialize "method=text media-type=text/text"; let let let let $uri := request:get-parameter("uri",()) $part := request:get-parameter("part","P1") $format := request:get-parameter("format","xml") $doc := if ($format = "xml") then doc ($uri) else if ($format = "zip") then fw:unzip($uri) else () (: get the requested part :) let $part := $doc//part[@id = $part] (: use the data in the first measure to set the temp :) let $measure := $part/measure[1] let $tempo := (xs:integer($measure/sound/@tempo), 100)[1] (: convert the notes into an internal XML format :) let $notes := fw:mxl-to-midi($part) return (: generate the sketch fragmemt:) <sketch> int tempo = {$tempo}; {fw:notes-to-arduino($notes) } </sketch>

182

Examples
1. Good King Wensceslas [1] 2. HTML Form interface [2]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Music/ mxl2arduino. xq?uri=http:/ / www. hymnsandcarolsofchristmas. com/ Hymns_and_Carols/ XML/ Good_King_Wenceslas2. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Music/ mxl2arduinoForm. xq?uri=http:/ / www. hymnsandcarolsofchristmas. com/ Hymns_and_Carols/ XML/ Good_King_Wenceslas2. xml

Naming Conventions

183

Naming Conventions
Guidelines for wikibook authors
Our goal is to allow many people to contribute examples but allow our readers a consistent user experience. In that light we would like all of our authors to use some of the following standards. Make sure you use the source tags to surround your code. If it is XML code use the lang="xml" attribute. <source lang="xml"> ...xml code here... </source> Try to keep the examples as simple as you can to demonstrate the core concepts of your examples.

Sample XQuery File


xquery version "1.0"; let $message := 'Hello World!' return <results> <message>{$message}</message> </results> Each XQuery file should begin with the word xquery and the version number. xquery version "1.0";

Complex XQueries should have comments using the XQuery comments xquery version "1.0"; (: This is a comment :)

File Name Conventions


Please use three space characters to indent your XQuery and XML examples. Do not use tabs. We use three characters because we use PDF output of these books and the printed pages have limited width. Please do not exceed 70 characters per line. This helps formatting for printed versions of this Wikibook. For XQuery scripts are stored inside the database please use the .xq suffix. If you are running on a system that MUST be compatible with the three-letter Microsoft DOS file name extension please use the .xql file suffix. For XQuery modules please use the suffix .xqm. For each module that has unit tests use the suffix -test.xq.

Navigating Collections

184

Navigating Collections
Motivation
You want to browse collections using an HTML web page and narrow your choices as you type.

Method
We will first create a server-side script that takes a single parameter. This is the collection path that the user is entering into an input field in a web page. With each character the user types the list of possible sub-collections is narrowed. There are three parts to this script: 1) the server side XQuery script 2) the HTML form 3) the JavaScript file that implements the JavaScript with AJAX functions.

Sample Server-Side Script


get-child-collections.xq
xquery version "1.0";

declare function local:substring-before-last-slash($arg as xs:string?) as xs:string { if (matches($arg, '/')) then replace($arg,'^(.*)/.*','$1') :) else '' }; (: by default matching is eager

let $title := 'Get child collections starts with prefix'

(: if we don't get any value then use the root collection :) let $collection := request:get-parameter("collection", '') let $before-last-slash := local:substring-before-last-slash($collection) let $after-last-slash := substring-after($collection, concat($before-last-slash, '/'))

let $sub-collections := xmldb:get-child-collections($before-last-slash)

(: collection={$collection}<br/> before last "/"={$before-last-slash}<br/> after last "/"={$after-last-slash}<br/> :) return <div class="results">{ if (count($sub-collections) = 0) then

Navigating Collections
<h1>There are no subcollections of {$collection}</h1> else <div class="selections">{ for $child in $sub-collections let $child-path := concat($before-last-slash, '/', $child) order by $child return if (starts-with($child, $after-last-slash)) then <div class="selection"><a href="browse.xq?collection={$child-path}/">{$child}</a></div> else () }</div> }</div>

185

browse.xq
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=no indent=yes

doctype-public=-//W3C//DTD&#160;XHTML&#160;1.0&#160;Transitional//EN

doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";

let $title := "Browse Collections (AJAX)"

let $collection := request:get-parameter("collection", '/db/')

return <html xmlns="http://www.w3.org/1999/xhtml" > <head> <title>{$title}</title> <script type="text/javascript" src="ajax-collection.js"/> <style type="text/css"> td {{background-color: #efe; font-size:14px;}} th {{background-color: #ded; text-align: right; padding:3px; font-size:12px;}} </style> </head> <body onload="getList();"> <h1>{$title}</h1> <p>{$collection}</p> <form onsubmit="getList(); return false" action="get"> <span> <label for="collection">Collection:</label> <input type="text" size="50" name="collection" id="collection" title="collection" onkeyup="getList();" onfocus="getList();" value="{$collection}"/>

Navigating Collections
</span> </form> <!-- this is where the results are placed --> <div id="results"/> </body> </html>

186

ajax-collection.js function updateList() { if (http.readyState == 4) { var divlist = document.getElementById('results'); divlist.innerHTML = http.responseText; isWorking = false; } } function getList() { if (!isWorking && http) { var collectionid = document.getElementById("collection").value; http.open("GET", "get-child-collections.xq?collection=" + collectionid); http.onreadystatechange = updateList; // this sets the call-back function to be invoked when a response from the HTTP request is returned isWorking = true; http.send(null); } } function getHTTPObject() { var xmlhttp; /*@cc_on @if (@_jscript_version >= 5) try { xmlhttp = new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { try { xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } catch (E) { xmlhttp = false; } } @else xmlhttp = false; @end @*/ if (!xmlhttp && typeof XMLHttpRequest != 'undefined') { try {

Navigating Collections xmlhttp = new XMLHttpRequest(); xmlhttp.overrideMimeType("text/xml"); } catch (e) { xmlhttp = false; } } return xmlhttp; } var http = getHTTPObject(); // var isWorking = false; create the HTTP Object

187

OAuth
Motivation
You want to login to a web service that support the OAuth protocol.

Background
OAuth is an open protocol to allow secure API authorization in a simple and standard method from desktop and web applications. Like OpenID, OAuth allows other web services to use your private data without giving out your passwords.

Terminology
Consumer Key - When you register as a developer with a OAuth service provider they will send you an API key to use with their service. This is typically about a 65 character string composed of digits and letter. Service Provider - an organization like LinkedIn, Google, or Twitter that has some of your data protected behind a web service. Token - a somewhat long string of computer-generated letters and numbers use in AOuth data exchanges. These strings hard to guess, and are paired with a secret key to protect the token from being used by unauthorized parties. OAuth defines two different types of tokens: a request token and access token.

Steps
We will perform this process in the following steps: 1. Request a Token 2. Sign 3. etc. Here is an example of the structure that contains OAuth information (from 28msec web site)
<oa:service-provider realm="example.com/oauth"> <oa:request-token> <oa:url></oa:url> <oa:http-method>GET</oa:http-method> </oa:request-token>

OAuth
<oa:user-authorization> <oa:url></oa:url> </oa:user-authorization> <oa:access-token> <oa:url></oa:url> <oa:http-method>GET</oa:http-method> </oa:access-token> <oa:supported-signature-methods> <oa:method>HMAC-SHA1</oa:method> </oa:supported-signature-methods> <oa:oauth-version>1.0</oa:oauth-version> <oa:authentication> <oa:consumer-key>your consumer key</oa:consumer-key> <oa:consumer-key-secret>your consumer secret</oa:consumer-key-secret> </oa:authentication> </oa:service-provider>

188

References
http://oauth.net/ http://hueniverse.com/2007/10/beginners-guide-to-oauth-part-ii-protocol-workflow/ http://sausalito.28msec.com/latest/index.php?id=working_with_oauth Examples of XML definitions for service Provider Structures [1] MarkLogic Facebook OAuth module [2] Norm Walsh on OAuth [3]

References
[1] http:/ / sausalito. 28msec. com/ latest/ index. php?id=service_provider_structures [2] http:/ / github. com/ marklogic/ comoms/ blob/ master/ src/ oauth. xqy [3] http:/ / norman. walsh. name/ 2010/ 09/ 25/ oauth

Open Search

189

Open Search
Motivation
You want to allow users to search your site using a tool such as the search boxes in the upper right corner of many web browsers. You want to publish your search interface using standardized documents. Note that this has not been made to work yet. It is currently in development.

Example of OpenSearch XML Configuration File


<?xml version="1.0" encoding="UTF-8"?> <OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/"> <ShortName>Search Shakespeare Plays</ShortName> <Description>Use local exist to search for keywords in demo Shakespeare plays that are included in the eXist demos.</Description> <Tags>example web</Tags> <Contact>yourname@yoursite.com</Contact> <Url type="application/application+xml" template="http://localhost:8080/exist/rest/db/search/search.xq?q={searchTerms}"/> </OpenSearchDescription>

This file will tell the search tool to take the search terms out of the search text field and perform an HTTP get on the local XQuery on your default eXist running on your local system. If you change your hostname, port or path you just need to update the URL in the XML configuration file.

Overview of eXist search functions and operators

190

Overview of eXist search functions and operators


In catching up with updating from eXist-1.2.6 to eXist-1.4.x, one thing struck me. Despite eXist's elaborate documentation, I was feeling a bit lost in the myriad of overlapping search functions and their modalities (both default and fallback behaviours). I think I would feel helped with an overview grouping the different search-related functions in one place. I don't know if this could be interesting to complete the eXist documentation; at this stage, however I'm not too sure about all functions, and maybe others may see more interesting approaches for such an overview. Therefore, I thought maybe the Wikibook could be an interesting place for such information. The table below provides an overview, with following colour codes: I'm not 100% sure -- all clarifications welcome! not in eXist function documentation

overview of eXist search funtions and operators


query type function / operator fn:matches() fn:contains(), fn:starts-with(), fn:ends-with() =, <, <=, >, >= text:match-any(), text:match-all() text:fuzzy-match-all(), text:fuzzy-match-any() text:matches-regex() near(), &=, |= ft:query() ngram:contains(), ngram:starts-with(), ngram:ends-with() x x x x x x x x x x index use

wildcards regex brute force fallback range legacy FT lucene FT ngram x x x x x x x x x

Overview of Page Scraping Techniques

191

Overview of Page Scraping Techniques


Motivation
You want a toolkit for pulling information out of web pages, even if those pages are not well formed XML files.

Method
XQuery is an ideal toolkit for manipulating well-formed HTML; you need only use the doc() function, e.g. doc('http://www.example.org/index.html') or doc('/db/path/to/index.html'). But, if a webpage is not well-formed XML, you will get errors about the source not being well-formed. Luckily, there are programs that transform HTML files into well-formed XML files. eXist provides several such tools. One is the httpclient module's get function, httpclient:get(). To use this function you need to enable the httpclient module, by modifying the conf.xml file so that the module is loaded the next time you start eXist. Uncomment the following line: <module class="org.exist.xquery.modules.httpclient.HTTPClientModule" uri="http://exist-db.org/xquery/httpclient" /> For example the following example performs an HTTP GET on the list of all the feeds from the IBM web site: let $feeds-url := 'http://www.ibm.com/ibm/syndication/us/en/?cm_re=footer-_-ibmfeeds-_-top_level' let $data := httpclient:get(xs:anyURI($feeds-url), true(), <Headers/>) return $data Sometimes the HTML is so malformed that even httpclient:get() will not be able to salvage the HTML. For example, if an element has two @id elements, you will get the error, "Error XQDY0025: element has more than one attribute 'id'". In this case, you may need to download the HTML source and clean up the HTML just enough so that eXist can parse the rest. Then, store the file in your database, and use the util:parse-html() function (which passes the text through the Neko HTML parser to make it well-formed). The following XQuery will clean up HTML (saved as text file, because it is still malformed): let $html-txt := util:binary-to-string(util:binary-doc('/db/html-file-saved-as-text.txt')) let $data := util:parse-html($html-txt) return $data

Testing your HTTP Client with an Simple Echo Script


Once you have the have the results in Source code for echo.xq xquery version "1.0"; declare namespace httpclient="http://exist-db.org/xquery/httpclient"; let $feeds-url := 'http://www.ibm.com/ibm/syndication/us/en/?cm_re=footer-_-ibmfeeds-_-top_level' let $http-get-data := httpclient:get(xs:anyURI($feeds-url), true(), <Headers/>) return

Overview of Page Scraping Techniques <echo-results> {$http-get-data} </echo-results>

192

Pachube feed
Motivation
You want to create a feed for the Pachube [1] application. A Pachube application allows you to store, share & discover realtime sensor, energy and environment data from objects, devices & buildings around the world. This provides a platform for sensor data integration. History gathered by Pachube can be presented in various formats and used by other applications to mashup feeds.

Modules and concepts


eXist httpclient to GET, POST and PUT eXist scheduler for job scheduling eXist update extension server-side XSLT

Tower Bridge
The idea of a feed of the open/closed status of Tower Bridge in London was borrowed from @ni [2]. A Twitter [3] stream provides the base data for a simple status feed. The RSS feed [4] from this stream is read by an XQuery script, the status deduced from the text and an XML file representing the current status updated. This XML file has an attached XSLT stylesheet so that when the file is pulled on schedule from the eXist database, it is first transformed on the server-side into the EEML [5] format required for Pachube feeds. As configured on the UWE server, this uses Saxon XSLT-2.

XQuery script
let $rss := httpclient:get(xs:anyURI( "http://twitter.com/statuses/user_timeline/14012942.rss" ),false(),())/httpclient:body let $lastChange:= $rss//item[1] let $bridgeStatus := return if (exists($lastChange) and exists($bridgeStatus)) then let $open := if(contains($lastChange/description,"opening")) return update replace $bridgeStatus with element status { attribute bridge {$open}, attribute lastChange {$lastChange/pubDate}, attribute lastUpdate {current-dateTime()} } else () then "1" else "0" doc("/db/Wiki/Pachube/bridge.xml")/data/status

Pachube feed 1. httpclient is used here because doc() throws an error about duplicate namespace declarations - under investigation

193

Bridge status
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="http://www.cems.uwe.ac.uk/xmlwiki/Pachube/bridge.xsl"?> <data> <status bridge="0" lastChange="Mon, 14 Dec 2009 12:09:02 +0000" lastUpdate="2009-12-14T16:57:00.679Z"/> </data>

XSLT
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:output media-type="application/xml" method="xml" indent="yes"/>

<xsl:template match="/data">

<eeml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.eeml.org/xsd/005" xsi:schemaLocation="http://www.eeml.org/xsd/005 http://www.eeml.org/xsd/005/005.xsd" version="5">

<environment updated="{current-dateTime()}">

<title>Tower Bridge </title>

<feed>http://www.cems.uwe.ac.uk/xmlwiki/Pachube/bridge.xml</feed>

<description>The status of the lifting Tower Bridge: 1 is open , 0

is closed. </description>

<email>kit.wallace@gmail.com</email>

<location exposure="outdoor" domain="physical" disposition="fixed">

<name>Tower Bridge</name>

<lat>51.5064186</lat>

<lon>-0.074865818</lon>

</location>

<data id="0">

<tag>bridge open</tag>

<value minValue="0" maxValue="1">

<xsl:value-of select="status/@bridge"/>

</value>

</data>

</environment>

</eeml>

</xsl:template>

</xsl:stylesheet>

Job scheduling
The XQuery update script is invoked by the eXist job scheduler every 1 minute:
let $login := xmldb:login( "/db", "user", "password" ) let $del := scheduler:delete-scheduled-job("BRIDGE") let $job := scheduler:schedule-xquery-cron-job("/db/Wiki/Pachube/pollbridgerss.xq" , "0 0/1 * * * ?","BRIDGE") return $job

Pachube feed

194

Feed view
There is a public view of the Feed as processed by Pachube : http://www.pachube.com/feeds/3922

Discussion
The Pachube interface refreshes the automatic feeds every 15 minutes (for the free service). Since typical bridge lifts last 10 minutes, there is a likelihood that a lift will be missed. The alternative is to push changes to Pachube when detected.

Weather Push Feed


Many amateur weather stations use Weather Display [6] software. This software writes current observations to a space-delimited text file to support interfaces to viewing software, such as the Flash Weather Display Live. The text files are generally web accessible, so that any client has access to this raw data, although it is polite to ask for access. One such station is run by Martyn Hicks at http:/ / www. martynhicks. co. uk/ weather/ data. php located in Horfield, Bristol. The raw data file is http://www.martynhicks.co.uk/weather/clientraw.txt In this Push implementation, a manual feed is defined in Pachube via the API by POSTing a full EEML document. An XML descriptor file defines the mapping between values in the data file and data streams in the feed. A scheduled XQuery script reads the data file and transforms it via the mapping file to EEML format prior to PUTing to the Pachube API.

XQuery feed creation


The feed is defined in an EEML document which is POSTed to the Pachube API. A return code of 201 indicates that the feed has been created. let $url := xs:anyURI("http://www.pachube.com/api/feeds/") let $headers := <headers> <header name="X-PachubeApiKey" value="...api key ...."/> </headers> let $feed := <eeml xmlns="http://www.eeml.org/xsd/005"> <environment updated="{current-dateTime()}"> <title>Horfield Weather</title> <description>The weather observed by a weather station run by Martyn Hicks. Public interface is http://www.martynhicks.co.uk/weather/data.php </description> <email>kit.wallace@gmail.com</email> <location exposure="outdoor" domain="physical" disposition="fixed"> <name>Horfield</name> <lat>51.4900</lat> <lon>-2.5805</lon> </location> </environment> </eeml> return

Pachube feed httpclient:post($url,$feed,false(),$headers)

195

Mapping File
An XML document defines the origin of the raw data, the Pachube appid and the mapping from data values (1-based) to data streams (numbered from 1 in document order). <weatherfeed xmlns = "http://www.cems.uwe.ac.uk/xmlwiki/wdl"> <data>http://www.martynhicks.co.uk/weather/clientraw.txt</data> <appid>4013</appid> <format> <field n="2" unit="kts">Average Wind Speed</field> <field n="4" unit="degrees">Wind Direction</field> <field n="5" unit="Celcius">Temperature</field> <field n="7" unit="hPa">Barometer</field> </format> </weatherfeed>

Update script
This script is scheduled to run every minute (as above). The mapping file namespace needs to be declared: declare namespace wdl = "http://www.cems.uwe.ac.uk/xmlwiki/wdl"; First a function to read the raw data file and tokenize to a sequence of values:
declare function local:client-data ($rawuri) { let $headers := element headers{ element header { attribute name {"Cache-Control"}, attribute value {"no-cache"} } } let $raw := httpclient:get(xs:anyURI($rawuri),false(),$headers )/httpclient:body return tokenize($raw,"\+") };

Then a function to transform from the sequence of values to the Pachube data chanels: declare function local:data-to-eeml ($data,$format) { for $field at $id in $format/wdl:field let $name := string($field) let $index := xs:integer($field/@n) return element data { attribute id {$id}, element tag { string($field)}, element value {$data[$index] }, element unit {string($field/@unit)} }

Pachube feed }; The main line fetches the feed definition file (here hard-coded but it could be passed in as a parameter). The data values are obtained, the EEML generated and PUT to the Pachube API.
let $feed := doc("/db/Wiki/Pachube/horfieldweather.xml")/wdl:weatherfeed let $data := local:client-data($feed/wdl:data) let $appid := $feed/wdl:appid let $APIKey := "eeda7c27ff8b7c49e8529e4eb4b3f57724c5b609db0d22904df11edd4742e92c" let $url := xs:anyURI(concat( "http://www.pachube.com/api/",$appid)) let $headers := <headers> <header name="X-PachubeApiKey" value="{$APIKey}"/> </headers> let $eeml:= <eeml xmlns="http://www.eeml.org/xsd/005"> <environment updated="{current-dateTime()}"> {local:data-to-eeml($data,$feed/wdl:format)} </environment> </eeml> return httpclient:put($url,$eeml,false(),$headers)

196

Feed view
There is a public view of the Pachube feed at http://www.pachube.com/feeds/4013

Weather Pull Feed


The alternative approach is simpler and relies on Pachube to pull data on their schedule. In this example, weather station data consolidated by WeatherUnderground transformed to EEML.
[7]

and republished as XML is

Weatherundgerground Feed
A typical XML feed for a station in weatherunderground is http:/ / api. wunderground. com/ weatherstation/ WXCurrentObXML.asp?ID=IBAYOFPL1

XSLT transform
This XML can be transformed to EEML using XSLT:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output media-type="application/xml" method="xml" indent="yes"/> <xsl:template match="/current_observation"> <eeml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.eeml.org/xsd/005" xsi:schemaLocation="http://www.eeml.org/xsd/005 http://www.eeml.org/xsd/005/005.xsd" version="5">

Pachube feed
<environment updated="{current-dateTime()}"> <title>Weather Report</title> <location exposure="outdoor" domain="physical" disposition="fixed"> <name> <xsl:value-of select="location/full"/> </name> <lat> <xsl:value-of select="location/latitude"/> </lat> <lon><xsl:value-of select="location/longitude"/> </lon> </location> <data id="1"> <tag>Average Wind speed</tag> <value> <xsl:value-of select="round-half-to-even(wind_mph * 1.15077945,1)"/> </value> <unit>kts</unit> </data> <data id="2"> <tag>Wind Direction</tag> <value> <xsl:value-of select="wind_degrees"/> </value> <unit>degrees</unit> </data> <data id="3"> <tag>Temperature</tag> <value> <xsl:value-of select="temp_c"/> </value> <unit>Celcius</unit> </data> <data id="4"> <tag>Barometric Pressure</tag> <value> <xsl:value-of select="pressure_mb"/> </value> <unit>hPA</unit> </data> </environment> </eeml> </xsl:template> </xsl:stylesheet>

197

Pachube feed

198

Connecting XML to XSLT


A simple XQuery script accepts one parameter, the station id, fetches the XML feed and transforms using XSLT to EEML:
let $id := request:get-parameter("id",()) let $ss := doc("/db/Wiki/Pachube/weatherunderground.xsl") let $data := doc(concat("http://api.wunderground.com/weatherstation/WXCurrentObXML.asp?ID=",$id)) return transform:transform($data,$ss,())

The script can be invoked: http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Pachube/ weatherunderground. xq?id=IBAYOFPL1. Since this script is parameterised, it could be used with any weatherUnderground station.

Pachube Feed
An automatic feed can be created - http://www.pachube.com/feeds/4037 which uses this feed.

NOAA Feed
We can adopt a similar approach with the feeds for US ICAO stations [8]. NOAA provide XML feeds such as http:/ / www. weather. gov/ xml/ current_obs/ KEWR. xml . The format is nearly the same as the weatherunderground feed and is documented: http:/ / www. weather. gov/ view/ current_observation. xsd. Update rate is hourly but there is no way currently to configure Pachube to update at that frequency.

XSLT
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output media-type="application/xml" method="xml" indent="yes"/> <xsl:template match="/current_observation"> <eeml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.eeml.org/xsd/005" xsi:schemaLocation="http://www.eeml.org/xsd/005 http://www.eeml.org/xsd/005/005.xsd" version="5"> <environment updated="{current-dateTime()}"> <title>NOAA Weather Report</title> <location exposure="outdoor" domain="physical" disposition="fixed"> <name> <xsl:value-of select="location"/> </name> <lat> <xsl:value-of select="latitude"/> </lat> <lon><xsl:value-of select="longitude"/> </lon> </location> <data id="1"> <tag>Average Wind speed</tag>

Pachube feed
<value> <xsl:value-of select="wind_kt"/> </value> <unit>kts</unit> </data> <data id="2"> <tag>Wind Direction</tag> <value> <xsl:value-of select="wind_degrees"/> </value> <unit>degrees</unit> </data> <data id="3"> <tag>Temperature</tag> <value> <xsl:value-of select="temp_c"/> </value> <unit>Celcius</unit> </data> <data id="4"> <tag>Barometric Pressure</tag> <value> <xsl:value-of select="pressure_mb"/> </value> <unit>hPA</unit> </data> </environment> </eeml> </xsl:template> </xsl:stylesheet>

199

XQuery Script
let $id := request:get-parameter("id",()) let $ss := doc("/db/Wiki/Pachube/NOAA.xsl") let $data := doc(concat("http://www.weather.gov/xml/current_obs/",$id,".xml")) return transform:transform($data,$ss,())

Pachube feed

200

Feed
The transformed XML http://www.cems.uwe.ac.uk/xmlwiki/Pachube/NOAA.xq?id=KEWR is the basis for the manual feed http://www.pachube.com/feeds/4047

XSLT only
If Pachube supported XSLT on the server side, the whole task could be handled by a single XSLT script. For the sake of generalisation, its helpful to provide an interface which allows parameters to be passed to the script but it is not necessary:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output media-type="application/xml" method="xml" indent="yes"/> <xsl:param name="station" select="'KEWR'"/> <xsl:template match="/"> <xsl:variable name="url" select='concat("http://www.weather.gov/xml/current_obs/",$station,".xml")'></xsl:variable> <xsl:apply-templates select="doc($url)/current_observation"/> </xsl:template> <xsl:template match="current_observation"> <eeml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.eeml.org/xsd/005" xsi:schemaLocation="http://www.eeml.org/xsd/005 http://www.eeml.org/xsd/005/005.xsd" version="5"> <environment updated="{current-dateTime()}"> <title>NOAA Current weather for {station_id}</title> <location exposure="outdoor" domain="physical" disposition="fixed"> <name> <xsl:value-of select="location"/> </name> <lat> <xsl:value-of select="latitude"/> </lat> <lon> <xsl:value-of select="longitude"/> </lon> </location> <data id="1"> <tag>Average Wind speed</tag> <value> <xsl:value-of select="wind_kt"/> </value> <unit>kts</unit> </data> <data id="2"> <tag>Wind Direction</tag> <value> <xsl:value-of select="wind_degrees"/>

Pachube feed
</value> <unit>degrees</unit> </data> <data id="3"> <tag>Temperature</tag> <value> <xsl:value-of select="temp_c"/> </value> <unit>Celcius</unit> </data> <data id="4"> <tag>Barometric Pressure</tag> <value> <xsl:value-of select="pressure_mb"/> </value> <unit>hPA</unit> </data> </environment> </eeml> </xsl:template> </xsl:stylesheet>

201

The server can just run this standalone to generate the EEML feed. This small XQuery script uses the SAXON processor on the eXist platform: transform:transform((),doc("/db/Wiki/Pachube/NOAA3.xsl"),()) XSLT [9] Execute [10] (currently fails - under investigation)

XSLT Feed service


Automatic Feed
The XSLT conversion could be provided as a service by an eXist db. It would need: an XML database of connection definitions e.g. <PachubeFeeds> <PachubeFeed id="2001"> <data>http://www.weather.gov/xml/current_obs/KORS.xml</data> <xslt>http://www.cems.uwe.ac.uk/xmlwiki/Pachube/NOAA7.xsl</xslt> <params/> </PachubeFeed> <PachubeFeed id="2002"> <data>http://www.weather.gov/xml/current_obs/KEWR.xml</data> <xslt>http://www.cems.uwe.ac.uk/xmlwiki/Pachube/NOAA7.xsl</xslt> <params/> </PachubeFeed> </PachubeFeeds> a script to locate and execute a feed:

Pachube feed let $id := request:get-parameter("id",()) let $feed := doc("/db/Wiki/Pachube/feeds.xml")//PachubeFeed[@id=$id] return transform:transform( doc($feed/data), doc($feed/xslt), $feed/params ) Pachube automatic feeds can now be created with a URL like http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Pachube/ getFeed. xq?id=2001 e.g. http://www.pachube.com/feeds/4661 User interface and database to allow users to register, create and edit feeds There are issues here with loading and with unsafe code in the stored XSLT.

202

Output
Similarly output processing of either the current EEML or a specific datastream's csv history could be provided with a bit of code and XSLT. Since this may require authentication, API keys would have to be stored on this database too. Jobs could be generated and scheduled to implement triggers but this will need a timed pull of the required data. Code is needed to convert the history feeds provided by Pachube to XML since these are only available in CSV. Once in XML, XSLT can transform to the format required. Of course it would be preferable if Pachube provided XML feeds in addition to the CSV feeds. Archive The full archive is provided as a csv file. We can convert that to XML with the following script:
import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../lib/csv.xqm";

let $feed := request:get-parameter("feed","") let $stream := request:get-parameter("stream","") let $archiveurl := concat("http://www.pachube.com/feeds/",$feed,"/datastreams/",$stream,"/archive.csv") let $data:= csv:get-data($archiveurl) let $rows := tokenize($data,$csv:newline)

let $now := current-dateTime() return <history feed="{$feed}" stream="{$stream}" dateTime="{$now}" {for $row in $rows let $point := tokenize($row,",") return <value dateTime="{$point[1]}">{$point[2]}</value> } </history> count="{count($rows)}">

http://www.cems.uwe.ac.uk/xmlwiki/Pachube/getArchive.xq?feed=4037&stream=2

Pachube feed 24 Hour History In the csv stream, these are untimed. The time has to be estimated and calculated using xs:dateTimeDuration:
import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../lib/csv.xqm";

203

let $feed := request:get-parameter("feed","") let $stream := request:get-parameter("stream","") let $historyurl := concat("http://www.pachube.com/feeds/",$feed,"/datastreams/",$stream,"/history.csv") let $data:= csv:get-data($historyurl) let $values :=tokenize($data,",") let $now := current-dateTime() let $then := $now - xs:dayTimeDuration("P1D") return <history feed="{$feed}" stream="{$stream}" dateTime="{$now}" {for $value at $i in $values let $dt := $then + xs:dayTimeDuration(concat("PT",15*$i,"M")) return <value dateTime="{$dt}">{$value}</value> } </history> count="{count($values)}">

http://www.cems.uwe.ac.uk/xmlwiki/Pachube/getHistory.xq?feed=4037&stream=2 </pre>

References
[1] http:/ / www. pachube. com/ [2] http:/ / twitter. com/ ni [3] http:/ / twitter. com/ towerbridge [4] http:/ / twitter. com/ statuses/ user_timeline/ 14012942. rss [5] http:/ / www. eeml. org/ [6] http:/ / www. weather-display. com/ index. php [7] http:/ / www. wunderground. com/ [8] http:/ / www. weather. gov/ xml/ current_obs/ [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Pachube/ NOAA3. xsl [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Pachube/ NOAA3. xq

Publishing Overview

204

Publishing Overview
Motivation
You have a workflow process that allows an internal team to review web content before it is transferred to a public web site. When the documents have been marked "approved for publication" they must be transferred to a public web server in a controlled way.

Methods
There are many ways to transfer XML documents from one server to another. This document will describe an set of basic methods that may vary based on your local configuration. In this document the following figure will be used.

Simple Publication Workflows


Many organizations have strict policy guidelines on who has permission to publish content to a public web site. Before content is transferred to a public web site. Documents that intent to be published typically go through a series of stages: draft - documents that are in very early stages of a quality control process under-review - documents that are being reviewed by a editorial team for quality such as content quality, spelling and typographical errors approved-for-publication - documents that have been approved for publication on a public web site. All documents that have been marked approved-for-publication can then be transferred from an internal content site to the public web server. In general only specified users with specified roles are allowed to mark documents as approved-for-publication.

Publishing Overview

205

Simple Publishing Scripts


There are several options to creating publishing scripts. We will begin with a very simple script and then add features.

Publishing with HTTP PUTs and DELETEs


This simplest way to publish documents is to use the eXist (or EXQuery) [httpclient http:/ / exist-db. org/ xquery/ httpclient] library. This library has PUT and DELETE operations that can be used to problematically add and delete web content on your publication server. Here is an example of the httpclient:put() function: httpclient:put($url as xs:anyURI, $content as node(), $persist as xs:boolean, $request-headers as element()?) item() Pros: very simple to use Cons: no central audit trail of who published what and when

Publishing with a POST service


An alternative is to create a central publishing service on your public web site that will coordinate all publishing events. This can be done by using the HTTP POST client function and then writing a single publication server to catch and log all publication events. httpclient:post($url as xs:anyURI, $content as node(), $persist as xs:boolean, $request-headers as element()?) item() You must remember to use the cast to xs:anyURI for the URL. For example:
let $post-status := httpclient:post(xs:anyURI('http:/ / example. com/ db/ get-doc. xq'), $doc-to-publish, true(), ())

Publishing with a Callback


It is frequently easier to instructe a web service on the public web server that you now have an new resource that is ready to published but not push the entire file to the public web site in the request. Only a URL to the resource is sent to the public web server that includes four parameters: the user publishing the document any authentication credentials the type (publish or delete) a comments on the reason for publication or deletion the identifier of the document to be pulled from the central content management system by the public web server or the id of the document to be deleted

The public web server then calls a function to load that resource from the content management system inside the system. This can be done with standard URL parameters. Note that in this case the passwords will be in the web log files. Example of getting URL parameters with: publish-with-callback.xq

Publishing Overview (: The user that will execute the login :) let $user := request:get-parameter('user', '') (: The pass that will execute the login :) let $pass := request:get-parameter('pass', '') (: The full URL of the document we are going to bring over :) let $url := request:get-parameter('url', '') (: the /db location we are going to put the new document into :) let $db-loc := request:get-parameter('db-loc', '') (: This is the document fetch from the internal CMS server :) let $get-doc := doc($url)

206

Note that this style is more secure since only documents that exist on the internal content management system are candidates for publishing.

Publishing Audit Logs


If you use a central web service for publishing you can now log all publishing events in a single centralized log file. This file can then be used to report and audit who changed what content on the public web site and when. The following example shows how all publishing events can be added to a log file that shows what users published what files and when they were published or deleted. In the example below the type code should be set to be publish or delete. let $audit-log := <publish-event> <type-code>publish|delete</type-code> <user>{$user}</user> <dateTime>{current-dateTime()}</dateTime> <db-loc>{$db-loc}</db-loc> </publish-event> (: check that the log file exists and if not then create it :) let $check-log-exists := if (doc-available($log-file)) then () else xmldb:store( functx:substring-before-last($log-file, '/'), functx:substring-after-last($log-file, '/'), <publish-events/> ) (: this inserts the audit record at the end of the log file :) let $update := update insert $audit-log into doc($log-file)/publish-events

Publishing Overview

207

Using Certificates
It is sometimes not possible to create a secure connection between an internal CMS systems and the publishing we site. An alternative method is to provide certificates to each system that is authorized to publish documents to the publishing server.

Publishing to Subversion
Motivation
You want to have a single button on a content management system that will copy a file to a remote subversion repository.

Method
We will configure our subversion repository on a standard Apache server that is configured with an SSL certificate. This will encrypt all communication between the intranet system and the remote subversion server. We will also set the authentication to be Basic Authentication.

Apache Configuration File


<Location "/testsvn/"> DAV svn AuthName "svntest" SVNParentPath /Library/Subversion/RepositoryTest SVNAutoversioning on <Limit GET HEAD OPTIONS CONNECT POST PROPFIND PUT DELETE PROPPATCH MKCOL COPY MOVE LOCK UNLOCK> Require user </Limit> AuthType Basic </Location> testuser1 testuser2 testuser3

HTTP Put Function for Basic Authentication


HTTP Basic authentication requires the user to concatinate the user and password with a colon seperating the strings and then do a base64-encoding on that string. This is then send in the HTTP header with the key of "Authorization". The HTTP header must look like the following: Authorization = Basic BASE64-ENCODED-USER-PASSWORD The following XQuery function performs this process. declare function http:put-basic-auth($url, $content, $username, $password, $in-header) as node(){ let $credentials := concat($username, ':', $password) let $encode := util:base64-encode($credentials) let $value := concat('Basic ', $encode) let $new-headers := <headers> {$in-header/header}

Publishing to Subversion <header name="Authorization" value="{$value}"/> </headers> let $response := httpclient:put($url, $content, false(), $new-headers) return $response }; To put the file you just need to put the URL to the correct content area, and the content to be inserted, the user name and password and

208

Monitoring HTTP Client Libraries


Debugging authentication protocols is very difficult if you do not have the correct tools. One of the most useful tools is to enable logging for the httpclient module. <category name="org.apache.commons.httpclient" additivity="false"> <priority value="debug"/> <appender-ref ref="console"/> </category>

References
Wikipedia entry on Basic access authentication

Quantified Expressions
Motivation
You have a list of items in a sequence and you want to test to see if any or all of the items match a condition. The result of the test on the sequence will be either true or false.

Method
Quantified expressions have a format very similar to a FLWOR expression with the the following two changes 1. instead of the word for you will use either the words some or every 2. instead of the where/order/return you will use the word satisfies The quantified expression always takes a sequence as its input and returns a boolean true/false as its output. Here is an example of a quantified expression which checks to see if there are any books that contain the word "cat". Assume you have a collection of books, each book has a single XML file with a title attribute such as this: <book> ... <title>The Cat With Nine Lives</title> ... </book> some $book in collection($collection)/book satisfies (contains(lower-case($book/title/text()), 'cat'))

Quantified Expressions This expression will return true as long as one book contains the word "cat" in the title. Note that the quantified expression can not be used to indicate which book title contains the word "cat", only that the word "cat" occurs in at least one title in your collection. Quantified expressions can often be rewritten by a single XPath expression with a predicate. In the above case the expression would be: let $has-a-cat-book := exists(collection($collection)/book/title[contains(lower-case(./text(), 'cat')])

209

The variable $has-a-cat-book will be set to true() if any book contains the word "cat". Some XQuery parsers can optimize quantified expressions better and some people feel that quantified expressions are more readable then a single XPath expression.

Registered Functions
Motivation
You want a list of all functions or all modules and their functions.

Method
There are two functions that we can use to get a list of functions in the current run-time system: util:registered-functions() util:registered-functions($module) The first function returns all registered functions, the second returns all registered functions for a given module.

List all registered functions


The following XQuery creates a list of all the XQuery functions in alphabetical order. The output will depend on the modules configured in the installation. xquery version "1.0"; import module namespace util = "http://exist-db.org/xquery/util"; <results>{ for $function in util:registered-functions() order by $function return <function>{ $function}</function> } <total-count>{count(util:registered-functions())}</total-count> </results> Run [1] <results> <function>compression:gzip</function>

Registered Functions <function>compression:tar</function> <function>compression:zip</function> <function>datetime:count-day-in-month</function> <function>datetime:date-for</function> ... Note that if there is no namespace prefix, the function is an XPath library function. (or the math module which also appears without a prefix ??)

210

Listing all functions by module


A more useful format would be to list functions by module: xquery version "1.0"; import module namespace util = "http://exist-db.org/xquery/util"; <results>{ for $module in util:registered-modules() order by $module return <module> <module-uri>{$module}</module-uri> {for $function in util:registered-functions($module) order by $function return <function>{$function}</function> } </module> } </results> Run [2] Sample Output: <results> <module> <module-uri>http://exist-db.org/xquery/compression</module-uri> <function>compression:gzip</function> <function>compression:tar</function> <function>compression:zip</function> </module> <module> <module-name>http://exist-db.org/xquery/datetime</module-name> <function>datetime:count-day-in-month</function> <function>datetime:date-for</function> ...

Registered Functions

211

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ util/ registered-functions. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ util/ registered-module-functions. xq

Registered Modules
Motivation
You want to check to see if an module is loaded in your runtime systems.

Method
Some modules that you may need are not loaded into the runtime engine when the server starts. If this is the case you may have to dynamically load a module.

Listing current modules in the runtime


xquery version "1.0"; let $modules := util:registered-modules() return <results>{ for $module in $modules order by $module return <module>{ $module}</module> } </results> Run [1]

Sample Results
<results> <module>http://exist-db.org/xquery/compression</module> <module>http://exist-db.org/xquery/datetime</module> <module>http://exist-db.org/xquery/examples</module> <module>http://exist-db.org/xquery/file</module> <module>http://exist-db.org/xquery/httpclient</module> <module>http://exist-db.org/xquery/image</module> <module>http://exist-db.org/xquery/mail</module> <module>http://exist-db.org/xquery/math</module> <module>http://exist-db.org/xquery/ngram</module> <module>http://exist-db.org/xquery/request</module> <module>http://exist-db.org/xquery/response</module> <module>http://exist-db.org/xquery/scheduler</module> <module>http://exist-db.org/xquery/session</module> <module>http://exist-db.org/xquery/sql</module>

Registered Modules <module>http://exist-db.org/xquery/system</module> <module>http://exist-db.org/xquery/text</module> <module>http://exist-db.org/xquery/transform</module> <module>http://exist-db.org/xquery/util</module> <module>http://exist-db.org/xquery/validation</module> <module>http://exist-db.org/xquery/xmldb</module> <module>http://exist-db.org/xquery/xmldiff</module> <module>http://www.w3.org/2005/xpath-functions</module> </results>

212

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ util/ registered-modules. xq

Regular Expressions
Motivation
You want to test to see if a text matches a specific pattern of characters You want to replace patterns of text with other patterns. You have text with repeating patterns and you would like to break the text up into discrete items.

Method
To deal with the above three problems, XQuery has the following functions: matches($input, $regex) - returns a true if the input contains a regular expression replace($input, $regex, $string) - replaces an input string that matches a regular expression with a new string tokenize($input, $regex) - returns a sequence of items matching a regular expression Through these functions we have access to the powerful syntax of regular expressions.

Summary of Regular Expressions


Regular expressions ("regex") are a field unto itself. If you wish to derive full benefit from this way of describing strings with patterns, you should consult a separate introduction. Priscilla Walmsley's XQuery (Chapter 18) has a clear summary of the functionality offered. fn:matches($input, $regex, $flags) takes a string and a regular expression as input. If the regular expression matches any part of the string, the function returns true. If it does not match, it returns false. Enclose the string with anchors (^ at the beginning and $ at the end), if you only want the function to return true when the pattern matches the entire string. fn:replace($input, $regex, $string, $flags) takes a string, a regular expression, and a replacement string as input. It returns a new string that is the string with all matches of the pattern in the input string replaced with the replacement string. You can use $1 to $99 to re-insert groups of characters captured with parentheses into the replacement string. fn:tokenize($input, $regex, $flags) returns an array of strings that consists of all the substrings in the input string between all the matches of the pattern. The array will not contain the matches themselves. In regular expressions, most characters represent themselves, so you are not obliged to use the special regex syntax in order to utilise these three functions. In regular expressions, a dot (.) represents all characters except newlines. Immediately following a character or an expression such as a dot, one can add a quantifier which tells how many

Regular Expressions times the character should be repeated: "*" for "0, 1 or many times" "?" for "0 or 1 times," and "+" for "1 or many times." The combination "*?" replaces the shortest substring that matches the pattern. NB: this only scratches the surface of the subject of regular expressions! The three functions all accept optional flag parameters to set matching modes. The following four flags are available: i makes the regex match case insensitive. s enables "single-line mode" or "dot-all" mode. In this mode, the dot matches every character, including newlines, so the string is treated as a single line. m enables "multi-line mode". In this mode, the anchors "^" and "$" match before and after newlines in the string as well in addition to applying to the string as a whole. x enables "free-spacing mode". In this mode, whitespace in regex pattern is ignored. This is mainly used when one has divided a complicated regex over several lines, but do not intend the newlines to be matched. If one do not use a flag, one can just leave the slot empty or write "".

213

Examples of matches()
let $input := 'Hello World' return (matches($input, 'Hello') = true(), matches($input, 'Hi') = false(), matches($input, 'H.*') = true(), matches($input, 'H.*o W.*d') = true(), matches($input, 'Hel+o? W.+d') = true(), matches($input, 'Hel?o+') = false(), matches($input, 'hello', "i") = true(), matches($input, 'he l lo', "ix") = true() , matches($input, '^Hello$') = false(), matches($input, '^Hello') = true() )

Execute [1]

Examples of tokenize()
(let $input := 'red,orange,yellow,green,blue' return deep-equal( tokenize($input, ',') , ('red','orange','yellow','green','blue')) , let $input := 'red, orange, yellow, green,blue' return deep-equal(tokenize($input, ',\s*') , ('red','orange','yellow','green','blue')) , let $input := 'red , orange , yellow , green , blue' return not(deep-equal(tokenize($input, ',\s*') , ('red','orange','yellow','green','blue'))) ,

Regular Expressions let $input := 'red , orange , yellow , green , blue' return deep-equal(tokenize($input, '\s*,\s*') , ('red','orange','yellow','green','blue')) )

214

In the second example, "\s" represents one whitespace character and thus matches the newline before "orange" and the tab character before "yellow". It is quantified with "*" so the pattern removes whitespace after the comma, but not before it. To remove all whitespace, use the pattern '\s*,\s*'. Execute [2]

Examples of replace()
( let $input := 'red,orange,yellow,green,blue' return ( replace($input, ',', '-') = 'red-orange-yellow-green-blue' ) , let $input := 'Hello World' return ( replace($input, 'o', 'O') = "HellO WOrld" , replace($input, '.', 'X') = "XXXXXXXXXXX" , replace($input, 'H.*?o', 'Bye') = "Bye World" ) , let $input := 'HellO WOrld' return ( replace($input, 'o', 'O', "i") = "HellO WOrld" ) , let $input := 'Chapter 1 Chapter 2 ' return ( replace($input, "Chapter (\d)", "Section $1.0") = "Section 1.0 Section 2.0 ") ) In the last example, "\d" represents any digit; the parenthesis around "\d" binds the variable "$1" to whatever digit it matches; in the replacement string, this variable is replaced by the matched digit. Execute [3]

Regular Expressions

215

Larger examples
XQuery/Incremental Search of the Chemical Elements Uses Ajax and a regular expression to search for a chemical element

References
The Regular Expression Library has more than 2,600 sample regular expressions: Regular Expression Library [4] This page has a very useful summary of the regular expression patterns: Regular Expression Cheat Sheet [5] This page describes how to use Regular Expressions within XQuery and XPath: XQuery and XPath Regular Expressions [6]

References
[1] [2] [3] [4] [5] [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ matches1. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ tokenize. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ replace. xq http:/ / regexlib. com/ http:/ / regexlib. com/ CheatSheet. aspx http:/ / www. regular-expressions. info/ xpath. html

REST interface definition


REST interfaces, in particular the URL language used to invoke services, has no equivalent to SOAP's WSDL. This example looks a creating a simple XML schema for defining such an interface, and using an XQuery script to create a generic interface to the site based on the interface definition.

Example REST definition


Here is a somewhat partial definition of the del.icio.us interface using a home-made schema. Parameters are defined using unique local names for each parameter, and then the services supported by the interface are defined using templates with curly braces delimiting the names of parameters, to be replaced by their actual values.
<?xml version="1.0" encoding="UTF-8"?>

<interface>

<name>del.icio.us</name>

<description><p>An almost complete description of the REST interface of the

del.icio.us social

bookmark site, excluding services requiring login</p>

<p> Chris Wallace May 2009</p></description>

<endpoint>http://del.icio.us/</endpoint>

<parameters>

<parameter>

<name>user-id</name>

<purpose>User Identifier</purpose>

<default>morelysq</default>

<tag>model</tag>

</parameter>

<parameter>

<name>tag</name>

REST interface definition


<purpose>a group of bookmarks</purpose>

216

<default>xml</default>

<tag>model</tag>

</parameter>

<parameter>

<name>url</name>

<purpose>bookmark</purpose>

<default>http://xml.com/</default>

<tag>model</tag>

</parameter>

<parameter>

<name>tagview</name>

<purpose>the tag list appearance</purpose>

<options>

<option>list</option>

<option>cloud</option>

</options>

<default>list</default>

<tag>ui</tag>

</parameter>

<parameter>

<name>tagsort</name>

<purpose>the order of tags in the tag list</purpose>

<options>

<option>alpha</option>

<option>freq</option>

</options>

<default>list</default>

<tag>ui</tag>

</parameter>

<parameter>

<name>minfreq</name>

<purpose>the minimum frequency of a tag to appear in the tag list</purpose>

<options>

<option>1</option>

<option>2</option>

<option>5</option>

</options>

<default>1</default>

<tag>ui</tag>

</parameter>

<parameter>

<name>bundleview</name>

<purpose>whether bundles are shown</purpose>

<options>

<option>show</option>

<option>hide</option>

REST interface definition


</options>

217

<default>show</default>

<tag>ui</tag>

</parameter>

<parameter>

<name>pageno</name>

<purpose>the page of the bookmark list</purpose>

<format>[0-9]+</format>

<default>1</default>

<tag>ui</tag>

</parameter>

<parameter>

<name>count</name>

<purpose>the number of bookmarks to shown per page</purpose>

<options>

<option>10</option>

<option>25</option>

<option>50</option>

<option>100</option>

</options>

<default>10</default>

<tag>ui</tag>

</parameter>

<parameter>

<name>search</name>

<purpose>search string</purpose>

<tag>ui</tag>

</parameter>

<parameter>

<name>scope</name>

<purpose>search scope</purpose>

<options>

<option>user</option>

<option>all</option>

<option>web</option>

</options>

<default>all</default>

<tag>ui</tag>

</parameter>

<parameter>

<name>helptopic</name>

<purpose>a page of the help manual</purpose>

<default>urlhistory</default>

<tag>help</tag>

</parameter>

</parameters>

<services>

REST interface definition


<service>

218

<template/>

<purpose>Home Page</purpose>

<tag>home</tag>

</service>

<service>

<template>{user-id}</template>

<purpose>View a user's public bookmarks</purpose>

<tag>user</tag>

</service>

<service>

<template>{user-id}?settagview={tagview}&amp;settagsort={tagsort}&amp;setminfreq={minfreq}&amp;setbundleview={bundleview}&amp;page={pageno}&amp;setcount={count}</template>

<purpose>View a user's public bookmarks, controlling its appearance</purpose>

<tag>user</tag>

</service>

<service>

<template>rss/{user-id}</template>

<purpose>Get an RSS feed of a user's bookmarks - limited to the

latest 20 items</purpose>

<tag>user</tag>

<tag>RSS</tag>

</service>

<service>

<template>{user-id}/{tag}</template>

<purpose>View a user's bookmarks by tag</purpose>

<tag>user</tag>

<tag>tag</tag>

</service>

<service>

<template>tag/{tag}</template>

<purpose>View tagged bookmarks</purpose>

<tag>tag</tag>

</service>

<service>

<template>network/{user-id}</template>

<purpose>View a user's network and their tags</purpose>

<tag>user</tag>

<tag>network</tag>

</service>

<service>

<template>subscriptions/{user-id}</template>

<purpose>View a user's subscriptions - i.e.watched bookmarks</purpose>

<tag>user</tag>

<tag>subscriptions</tag>

</service>

<service>

REST interface definition


<template>for/{user-id}</template>

219

<purpose>View links suggested to a user</purpose>

<tag>user</tag>

<tag>links</tag>

</service>

<service>

<template>rss/tag/{tag}</template>

<purpose>Get an RSS feed of tagged bookmarks</purpose>

<tag>tag</tag>

<tag>RSS</tag>

</service>

<service>

<template>popular/{tag}</template>

<purpose>View popular tagged bookmarks</purpose>

<tag>tag</tag>

</service>

<service>

<template>popular/</template>

<purpose>View today's popular bookmarks</purpose>

<tag>current</tag>

</service>

<service>

<template>popular/?new</template>

<purpose>View today's new popular bookmarks</purpose>

<tag>current</tag>

</service>

<service>

<template>url?url={url}</template>

<purpose>View Bookmarks for a URL</purpose>

<tag>url</tag>

</service>

<service>

<template>help/</template>

<purpose>Help index</purpose>

<tag>help</tag>

</service>

<service>

<template>help/{helptopic}</template>

<purpose>View a page of the help manual</purpose>

<tag>help</tag>

</service>

<service>

<template>search/?fr=del_icio_us&amp;p={search}&amp;searchtype={scope}</template>

<purpose>Search for string in different scopes</purpose>

<tag>search</tag>

</service>

REST interface definition


<service>

220

<template>rss/</template>

<purpose>RSS hotlist feed</purpose>

<tag>current</tag>

<tag>RSS</tag>

</service>

<service>

<template>rss/tag/{tag}</template>

<purpose>RSS feed for a tag</purpose>

<tag>tag</tag>

<tag>RSS</tag>

</service>

<service>

<template>html/{user-id}/</template>

<purpose>Get a contolled HTML extract of a user's tag</purpose>

<tag>user</tag>

<tag>html</tag>

</service>

</services>

</interface>

Generate the interface


An XQuery script takes one parameter called uri, the uri of the XML interface description. The script creates a generic interface based on this definition, regenerating the service urls when values are changed in the form and the form refreshed. del.icio.us interface [1]

The Script
declare namespace rest = "http://ww.cems.uwe.ac.uk/xmlwiki/rest";

(: declare global variables :)

declare variable $uri := request:get-parameter("_uri",()); declare variable $index := request:get-parameter("_index","tag"); declare variable $interface := doc(concat($uri,"?r=",math:random()))/interface;

declare function rest:template-parameters($template as xs:string) as xs:string* { (: parse the template to get the parameters :) distinct-values( for $p in subsequence(tokenize($template,"\{"),2)

return substring-before($p,"}") ) };

REST interface definition

221

declare function rest:parameter-value($name as xs:string) xs:string? {

as

let $parameter := $interface/parameters/parameter[name=$name] return }; (request:get-parameter($name,$parameter/default),"")[1]

declare function rest:replace-template-parameters($template as xs:string, $names as xs:string* ) as xs:string { (: recursively replace the tempate paramters by their current values :) if (empty($names)) then $template else let $name := $names[1] let $value := rest:parameter-value($name) let $templatex := if (exists($value)) then replace($template, concat("\{",$name,"\}"),$value) else $template return rest:replace-template-parameters(

$templatex,subsequence($names,2)) };

(:

interface generation

:)

declare function rest:parameter-input-field( $parameter as element(parameter) ) as element(span)? { (: create a parameter field in the parameter input form :)

let $name := $parameter/name let $value := rest:parameter-value($name) return <span class="input"> <label for="{$name}"> {if ($index = "parameter") (: if it the index

is by parameter, generate a link to that part of the index :) then <a href="#{$name}"> else } </label> {if ($parameter/options) then <select name="{$name}" title="{$parameter/purpose}" > {for $option in $parameter/options/option return <option value="{$option}" { if ($option= $value) title="{$option/@label}"> $name {string($name)}</a>

REST interface definition


then {"true"} else () } {string($option)} </option> } </select> else <input type="text" name="{$name}" title="{$parameter/purpose}" } </span> }; value="{$value}" size="{string-length($value)+1}"/> attribute selected

222

declare function rest:parameter-form() { <form method="post" action="interface.xq">

<div class="subhead"> interface <div class="group"> <label for="_uri" > uri </label> <input type="text" name="_uri" value="{$uri}" size="80"/> </div> </div> {for $tag in distinct-values($interface/parameters/parameter/tag) return <div> <div class="subhead">{$tag} </div> <div class="group"> { for $parameter in $interface/parameters/parameter[tag=$tag] return rest:parameter-input-field($parameter) } </div> </div> } <hr/> Index services by <select name="_index">

{for $index in ("parameter","tag") return if ($index = request:get-parameter("index","tag")) then <option value="{$index}" selected="true"> {$index} </option> else <option value="{$index}" > {$index} } </select> <hr/><input type="submit" value="refresh"/> </form> </option>

REST interface definition


};

223

declare function rest:service-link ($service as element(service) )as element(tr) { <div> <div class="label">{string($service/purpose)}</div> { let $names := rest:template-parameters($service/template) let $filledTemplate := rest:replace-template-parameters($service/template,$names) let $uri := if (starts-with($service/template,"http://")) then $filledTemplate

else concat($interface/endpoint,$filledTemplate) return <div class="link"><a href="{$uri}">../{$filledTemplate}</a> </div> } </div> };

declare

function rest:parameter-index() {

<div id="index"> <h2>Parameter index </h2> {for $parameter in $interface/parameters/parameter let $name := $parameter/name let $match := concat("{",$name,"}") order by lower-case($name) return <div> <div class="subhead"><a name="{$name}">{string($name)} </a> </div> <div class="group"> {for $service in $interface//service[contains(template,$match)] return } </div> </div> } </div> }; rest:service-link($service)

declare function rest:tag-index() <div id="index"> <h2>Tag index </h2>

REST interface definition


{for $tag in distinct-values($interface//service/tag) order by lower-case($tag) return <div> <div class="subhead" >{$tag}</div> <div class="group"> {for $service in $interface//service[tag=$tag] return } </div> </div> } </div> }; rest:service-link($service)

224

declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=no indent=yes

doctype-public=-//W3C//DTD&#160;XHTML&#160;1.0&#160;Transitional//EN

doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";

<html> <head> <title>Infterface for {string($interface/name)}</title> <link rel="stylesheet" type="text/css" href="screen.css" /> </head> <body> <h1>{string($interface/name)} interface</h1> {$interface/description/(text(),*)} <div id="parameters"> {rest:parameter-form()} </div> {if (exists($interface)) then <div id="services"> <h2> Interface properties </h2> <div class="group"> <div class="label">Interface definition <div class="link"><a href="{$uri}">{$uri} </a> </div> </div> <div class="label">Service endpoint <div class="link"><a href="{$interface/endpoint}"> {string($interface/endpoint)} </a> </div> </div> </div> {if ($index = "parameter")

REST interface definition


then rest:parameter-index()

225

else if ($index= "tag") then rest:tag-index() else () } </div> else () } </body> </html>

Discussion
Architecture
The script uses a common layered architecture in which low level functions operate on the base data model, and these functions are in turn used by functions which generate the user interface. Finally class and id hooks in the generated XHTML link with CSS to style the page. Determining how many layers to use and how the layers should interface is a central design decision in XQuery application, as it is in other technologies. Several alternatives are worth considering: the script generates an intermediate XML structure which is transformed server- or client-side with XSLT; the script generates an XForm in place of the HTML form; the whole task is handled client-side with JavaScript; client-side AJAX interfaces with a base XQuery script. Handling this design space is one of the challenges of web development.

Cache busting
For scripts running inside a proxy server,as these scripts are on the UWE server, repeated access to the same url in the doc() function will return the cached file. To break the cache, a random number is added to the URL.

Global Variables
The script uses variable declarations to define some global variables used in the script functions. Global variables feel like a reversion to Fortran COMMON and similar horrors, except that these are all constant once defined. Nonetheless, the dependence on these variables is not explicit. An alternative would be to explicitly pass this data down through the functions. An alternative script using this style, passing a single node which composed the data into a single 'object', executes several times slower, is more verbose and arguably no more understandable.

REST interface definition

226

Recursion
Replacement of the multiple parameters in a template is a recursive function, successively replacing each parameter throughout the template in turn.

Other interface
Flickr [2]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ URLTemplates/ interface. xq?_uri=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ URLTemplates/ delicious. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ URLTemplates/ interface. xq?_uri=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ URLTemplates/ flickr. xml

Returning the Longest String


Motivation
You want to create a very simple function that you can pass a sequence of strings and it will return the longest string(s) in that sequence.

Sample Program
xquery version "1.0"; declare function local:max-length($string-seq as xs:string*) as xs:string+ { let $max := max (for $s in $string-seq return string-length($s)) return $string-seq[string-length(.) = $max] }; let $tags := <tags> <tag>Z</tag> <tag>Ze</tag> <tag>Zee</tag> <tag>Zen</tag> <tag>Zenith</tag> <tag>nith</tag> <tag>ith</tag> <tag>Zenth</tag> </tags> return <results> <max-string>{local:max-length(($tags/tag))}</max-string> </results>

Returning the Longest String

227

Results
<results> <max-string>Zenith</max-string> </results> Execute [1]

Discussion
This XQuery creates a local function that takes zero or more strings: $string-seq as xs:string* and returns one or more strings: as xs:string+ It uses the max() XPath function that looks at a sequence of values and returns the highest. Note that if there are several strings in the input set that each have the same max length, it will return all strings of max length. If you only want the first returned, add "[1]" to the return expression: return $string-seq[string-length(.) = $max][1]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ longestString. xq

Saving and Updating Data


Motivation
You have some web forms (such as XForms) that need to save their data to your database. You want a single XQuery that will be used to save new data and update existing data. If a form is used to update an existing record, you can assume that it has an <id> tag in the XML document being saved. New documents will need to have a new document id created for them.

Method
We use HTTP POST data and scan for a specific element like id. If the record does not have an id element, we know that we must create a new record. Note that there is no sequence number generated in this example yet. If there is an id parameter, we will delete the old file and save the new data into the same file. Note that there is no backup or archive.

Sample Program to Store/Remove a single XML file


xquery version "1.0"; declare namespace exist = "http://exist.sourceforge.net/NS/exist"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace xmldb="http://exist-db.org/xquery/xmldb"; declare option exist:serialize "method=xhtml media-type=text/xml

Saving and Updating Data


indent=yes"; (: Call this like this: For new records:

228

http://localhost:8080/exist/rest/db/xquery-examples/save-test/new-update-save.xq?new=true For updates where each record has an id: http://localhost:8080/exist/rest/db/xquery-examples/save-test/new-update-save.xq?id=123 :) (: replace this with your document, for example use request:get-data() :) let $my-doc := <data> <id>123</id> <message>Hello World</message> </data>

let $id := $my-doc/id let $collection := 'xmldb:exist:///db/xquery-examples/save-test' (: this logs you in; you can also get these variables from your session variables :) let $login := xmldb:login($collection, 'mylogin', 'my-password')

(: replace this with a unique file name with a sequence number :) let $file-name := 'test-save.xml' return if (not($id)) then ( let $store-return-status := xmldb:store($collection, $file-name, $my-doc) return <message>New Document Created {$store-return-status} at {$collection}/{$file-name}</message> ) else ( let $remove-return-status := xmldb:remove($collection, $file-name) let $store-return-status := xmldb:store($collection, $file-name, $my-doc) return <message>Document {$id} has been successfully updated</message>)

Saving and Updating Data


}</results>

229

Sample Program to Insert an XML item at the end of an XML file


Sometimes you do not want to create a new XML file, but only save the results at the end of an existing XML file. This can be done using the XQuery update operations. W3C Candidate Recommendation for XQuery Update Operations [1] eXist 1.2/1.3/1.4 syntax for XQuery Update Operations [2] For example, in eXist 1.2/1.3/1.4 the operation syntax is the following: let $append-result := update insert <item/> into doc("myfile.xml")/items

References
[1] http:/ / www. w3. org/ TR/ xquery-update-10 [2] http:/ / www. exist-db. org/ update_ext. html

Searching multiple collections


Motivation
You want to find records in multiple collections.

Method
There are several ways to do this. The simplest way is to put both collections in a parent collection and start your search at the parent. Lets assume you have three collections: /db/test /db/test/a /db/test/b To get all the books in both collection a and b just specify the parent collection which is /db/test for $book in collection('/db/test')//book Note that the double forward slash // will find the books anywhere in the base collection or any child collections. If you have two collections that are at different locations in a file system you can simply specify each collection and join them together using the sequence join operation. This is the default operation of enclosing two sequences in parenthesis. For example if you have two sequences, a and b, the concatenation of the two sequences is just (a,b). Assume you have two collections that have books in the following collections:

Searching multiple collections

230

Collection A
File='/db/test/a/books.xml' <books> <book id="47"> <title>Moby Dick</title> <author>Herman Melville</author> <published-date>1851-01-01</published-date> <price>$19.95</price> <review>The adventures of the wandering sailor in pursuit of a ferocious wale.</review> </book> <book id="48"> <title>The Great Gatsby</title> <author>F. Scott Fitzgerald</author> <published-date>1925-05-10</published-date> <price>$29.95</price> <review>Chronicles of an era during the roaring 1920s when the US economy soared.</review> </book> </books>

Collection B
File='/db/test/b/books.xml' <books> <book id="49"> <title>Catch-22</title> <author>Joseph Heller</author> <published-date>1961-01-01</published-date> <price>$19.95</price> <review>A satirical, historical novel set during the later stages of World War II from 1943 onwards.</review> </book> <book id="48"> <title>Lolita</title> <author>Vladimir Nabokov</author> <published-date>1955-01-01</published-date> <price>$19.95</price> <review>A man becomes obsessed with a 12-year-old girl.</review> </book> </books> The following query would operate on both collections. xquery version "1.0"; let $col-a := '/db/test/a' let $col-b := '/db/test/b'

Searching multiple collections return <books>{ for $book in (collection($col-a)//book, collection($col-b)//book) return $book }</books> If you wanted to only return the titles you could use the following: xquery version "1.0"; let $col-a := '/db/test/a' let $col-b := '/db/test/b' return <books>{ for $book in (collection($col-a)//book, collection($col-b)//book) return $book/title }</books> This would return the following results: <books> <title>Moby Dick</title> <title>The Great Gatsby</title> <title>Catch-22</title> <title>Lolita</title> </books>

231

Sending E-mail

232

Sending E-mail
Motivation
You want to send an e-mail message from within an XQuery. This frequently done when a report has finished running or when a key event such as a task update has been done.

Method
eXist provides a simple interface to e-mail.

Format of the send-email function


mail:send-email($email as element()+, $server as xs:string?, $charset as xs:string?) xs:boolean+

where $email The email message in the following format: <mail> <from/> <reply-to/> <to/> <cc/> <bcc/> <subject/> <message> <text/> <xhtml/> </message> <attachment filename="" mimetype="">xs:base64Binary</attachment> </mail>
$server $charset The SMTP server. If empty, then it tries to use the local sendmail program. The charset value used in the "Content-Type" message header (Defaults to UTF-8)

Sample Code
xquery version "1.0"; (: Demonstrates sending an email through Sendmail from eXist :) declare namespace mail="http://exist-db.org/xquery/mail"; declare variable $message { <mail> <from>John Doe &lt;sender@domain.com&gt;</from> <to>recipient@otherdomain.com</to> <cc>cc@otherdomain.com</cc> <bcc>bcc@otherdomain.com</bcc> <subject>A new task is waiting your approval</subject> <message>

Sending E-mail <text>A plain ASCII text message can be placed inside the text elements.</text> <xhtml> <html> <head> <title>HTML in an e-mail in the body of the document.</title> </head> <body> <h1>Testing</h1> <p>Test Message 1, 2, 3</p> </body> </html> </xhtml> </message> </mail> }; if ( mail:send-email($message, 'mail server', ()) ) then <h1>Sent Message OK :-)</h1> else <h1>Could not Send Message :-(</h1>

233

References
eXist mail module [8] eXist send-mail function [1]

References
[1] http:/ / demo. exist-db. org/ exist/ functions/ mail/ send-email

Sequences

234

Sequences
Motivation
You want to manipulate a sequence of items. These items may be very similar to each other or they may be of very different types.

Method
We begin with some simple examples of sequences. We then look at the most common sequence operators. XQuery uses the word sequence as a generic name for an ordered container of items. Understanding how sequences work in XQuery is central to understanding how the language works. The use of generic sequences of items is central to functional programming and stands in sharp contrast to other programming languages such as Java or JavaScript that provide multiple methods and functions to handle key-value pairs, dictionaries, arrays and XML data. The wonderful thing about XQuery is that you only need to learn one set of concepts and a very small list of functions to learn how to quickly manipulate data.

Examples
Creating sequences of characters and strings
You use the parenthesis to contain a sequence, commas to delimit items and quotes to contain string values: let $sequence := ('a', 'b', 'c', 'd', 'e', 'f')

Note that you can use single or double quotes, but for most character strings a single quote is used. let $sequence := ("apple", 'banana', "carrot", 'dog', "egg", 'fig')

You can also intermix data types. For example the following sequence has three strings and three integers in the same sequence. let $sequence := ('a', 'b', 'c', 1, 2, 3)

You can then pass the sequence to any XQuery function that works with sequences of items. For example the count() function takes a sequence as an input and returns the number of items in the sequence. let $count := count($sequence)

To see the results of these items you can create a simple XQuery that displays the items using a FLOWR statement.

Viewing items in a sequence


xquery version "1.0"; let $sequence := ('a', 'b', 'c', 'd', 'e', 'f') let $count := count($sequence)

Sequences

235

return <results> <count>{$count}</count> <items> {for $item in $sequence return <item>{$item}</item> } </items> </results> Execute [1] <results> <count>6</count> <items> <item>a</item> <item>b</item> <item>c</item> <item>d</item> <item>e</item> <item>f</item> </items> </results>

Viewing specified items inside a sequence


One can specify and view individual items within a sequence using the bracketed predicate express '[]' and indicating the position of the item you are interested in viewing. xquery version "1.0"; let $sequence := ('a', 'b', 'c', 'd', 'e', 'f') let $position := $sequence[1, 3, 4] return <results> <count>{$position}</count> <items> {for $item in $sequence return <item>{$item}</item> } </items> </results> Results:

Sequences <results> <count>a c d</count> <items> <item>a</item> <item>b</item> <item>c</item> <item>d</item> <item>e</item> <item>f</item> </items> </results>

236

Adding XML elements to your sequence


You can also store XML elements in a sequence.
let $sequence := ('apple', <banana/>, <fruit type="carrot"/>, <animal type='dog'/>, <vehicle>car</vehicle>)

Although you can use parenthesis to create sequence of XML items, a best practice (?when) is to use XML tags to begin and end a sequence and to store all items as XML elements. One suggestion is to use items as the element name to hold generic sequences of items. Here is an example of this: let $items := <items> <banana/> <fruit type="carrot"/> <animal type='dog'/> <vehicle>car</vehicle> </items> The other convention is to put all individual items in their own item element tags and to place each item on a separate line if the list of items gets long. let $items := <items> <item>banana</item> <item> <fruit type="carrot"/> </item> <item> <animal type='dog'/> </item> <item> <vehicle>car</vehicle> </item> </items> The following FLOWR expression can then be used to display each of these items: xquery version "1.0"; let $sequence := <items>

Sequences <item>banana</item> <item> <fruit type="carrot"/> </item> <item> <animal type='dog'/> </item> <item> <vehicle>car</vehicle> </item> </items>

237

return <results> {for $item in $sequence/item return <item>{$item}</item> } </results> This will return the following XML <results> <item> <item>banana</item> </item> <item> <item> <fruit type="carrot"/> </item> </item> <item> <item> <animal type="dog"/> </item> </item> <item> <item> <vehicle>car</vehicle> </item> </item> </results> Note that when the resulting XML is returned, only double quotes are present in the output.

Sequences

238

Counting Items
You can count the number of items in a sequence by using the count function and adding /* to the end of the sequence path.

Common Sequence Functions


There are only a handful of functions you will need to use with sequences. We will review these functions and also show you how to create new functions using combinations of these functions. Here are the three most common non-mathematical functions used with sequences. These three are the real workhorses of XQuery sequences. You can spend days writing XQueries and never need functions beyond these three functions. count($seq as item()*) - used to count the number of items in a sequence. Returns a non-negative integer.
distinct-values($seq as item()*) - used to remove duplicate items in a sequence. Returns another sequence.
subsequence($seq as item()*, $start as int, $num as int) - used to return only a subset of items in a sequence. Returns another sequence.

All of these functions have a datatype of item()* which is read zero or more items. Note that both the distinct-values() function and the subsequence() function both take in a sequence and return a sequence. This comes in very handy when you are creating recursive functions. Along with count() are also a few sequence operators that calculate sums and average, min and max: sum($seq as item()*) - used to sum the values of numbers in a sequence
avg($seq as item()*) - used to calculate the average (arithmetic mean) of numbers in a sequence

min($seq as item()*) - used to find the minimum value of a sequence of numbers max($seq as item()*) - used to find the maximum value of a sequence of numbers These functions are designed to work on numeric values of items and all return numeric values. You many want to use the number() function when working with strings of items. You may find that you can perform many tasks just by learning these few XQuery functions. You can also create most other sequence operators from these functions.

Occasionally Used Sequence Functions


In addition there are some functions that are occasionally used:
insert-before($seq as item()*, $position as int, $inserts as item()*) - for inserting new items anywhere in a sequence

remove($seq as item()*, $position as int) - removes an item from a sequence reverse($seq as item()*) - reverses the order of items in a sequence
index-of($seq as anyAtomicType()*, $target as anyAtomicType()) - returns a sequence of integers that indicate where an item is within a sequence (index counting starts at 1)

Sequences These last two functions can be used in conjunction with the bracketed predicate expressions '[]' which operates on an item's position information within a sequence.

239

last() - when used in a predicate returns the last item in a sequence so (1,2,3)[last()] returns 3

position() - this function is used to output the position in a FLOWR statement

Example of Sum Function


Lets imagine that we have a basket of items and we want to count the total items in the basket. let $basket := <basket> <item> <department>produce</department> <type>apples</type> <count>2</count> </item> <item> <department>produce</department> <type>banana</type> <count>3</count> </item> <item> <department>produce</department> <type>pears</type> <count>5</count> </item> <item> <department>hardware</department> <type>nuts</type> <count>7</count> </item> <item> <department>packaged-goods</department> <type>nuts</type> <count>20</count> </item> </basket> To sum the counts of each item we will need to use an XPath expression to get the item counts: $basket/item/count We can then total this sequence and return the result: return <total> {sum($basket/item/count)} </total> Execute [2]

Sequences

240

Tests on Sequences
You can also test to see if a sequence contains one or all of the items in another set. There are several methods to do this.

Finding if an Item is in a Sequence


Users find that XQuery is easy to use since it tries to do the right thing based on the data types you give it. XQuery checks if you have a sequence, a XML element or a single string and performs the most logical operation. This behavior keeps your code compact and easy to read. If you are comparing a element with a string XQuery will look inside the element and get the string for you so you do not explicitly need to tell XQuery to use the content of an element. If you are comparing a sequence of items with a string with the "=" operator, XQuery will look for that string in the sequence and return true() if the string is in the sequence. It just works! For example if we have the sequence: let $sequence := ('a', 'b', 'c', 'd', 'e', 'f') Now if we execute: if ($sequence = 'd') then true() else false() Because it finds 'd' in the sequence of letters and will return true() and: if ($sequence = 'x') then true() else false() Will return false() because 'x' is not in the sequence. You can use the index-of() function to see where the item appears in the sequence. If the item is in the sequence then it will return a non-zero. You can then return true() or false() if the item is not in the sequence. let $sequence := ('a', 'b', 'c', 'd', 'e', 'f') let $item := 'x' return if (index-of($sequence, $item)) then true() else false() Recall that index-of() returns a 0 if the $item is not found in the $sequence. You can use a "Quantified" expression. some $str in $sequence satisfies ($str = 'e') Which will also return the the correct result. See the Wikibook article here: XQuery/Quantified_Expressions

Sequences

241

Sorting Sequence
There is no "sort" function in XQuery. To sort your sequence you just create a new sequence that contains a FLOWR loop of your items with the order statement in it. For example if you have a list of items with titles as one of the elements you can use the following to sort the items by title: let $sorted-items := for $item in $items order by $item/title/text() return $item You can return the items sorted by their element name : let $sorted-items := for $item in $items order by name($item) return $item You can also use descending with order by to reverse the order : for $item in $items order by name($item) descending return $item If you want to sort with your own order by creating a seperate sequence and using the index-of function to find where this item is in the sequence : for $i in /root/* let $order := ("b", "a", "c") let $name := name($i) order by index-of($order, $i) return $i

Set Operations: Concatenation, Unions, Intersections and Exclusions


XQuery also provides functions to join sets and to find items that are in both sets. Assume that we have two sets that contain overlapping items: let $sequence-1 := ('a', 'b', 'c', 'd') let $sequence-2 := ('c', 'd', 'e', 'f') Concatenation You can concatenate the two sequences by doing the following let $both := ($sequence-1, $sequence-2) or for $item in ( ($sequence-1, $sequence-2)) return $item Which will return: a b c d c d e f

Sequences Union You can also create a "union" set that removes duplicates for all items that are in both sets by using the distinct-values() function: distinct-values(($sequence-1, $sequence-2)) This will return the following: a b c d e f Note that the "c d" pair is not repeated. Intersection You can now use a variation of this to find the intersection of all items in sequence-1 that are not in sequence-2: distinct-values($sequence-1[.=$sequence-2]) This will return only items that are in BOTH sequence-1 AND sequence-2: c d The way you read this is "for each item in sequence-1, if this item (.) is also in sequence-2 then return it." Exclusion The last set operation you might want to do is the "exclusion" function, where we find all items in the first sequence that are NOT in the second sequence. distinct-values($sequence-1[not(.=$sequence-2)]) This will return a b Returning Duplicates The following example returns a list of all items that occur more than once in a sequences. This process is known as "duplicated detection" xquery version "1.0"; let $seq := ('a', 'b', 'c', 'd', 'e', 'f', 'b', 'c') let $distinct-value := distinct-values($seq) (: for each distinct item if the count is greater than 1 then return it :) let $duplicates := for $item in $distinct-value return if (count($seq[.=$item]) > 1) then $item else () return <results>

242

Sequences <sequence>{string-join($seq, ', ')}</sequence> <distinct-values>{$distinct-value}</distinct-values> <duplicates>{$duplicates}</duplicates> </results> This returns: <results> <sequence>a, b, c, d, e, f, b, c</sequence> <distinct-values>a b c d e f</distinct-values> <duplicates >b c</duplicates > </results> You can also remove all duplicates just by moving the $item to the else() portion of the if statement and putting () in the then() portion of the else statement: if (count($seq[.=$item]) > 1) then () else $item

243

Creating Sequences of Letters


You can use the codepoints functions to convert letters to numbers and numbers to letters. For example to generate a list of all the letters from a to z you can write the following XQuery: let $number-for-a := string-to-codepoints('a') let $number-for-z := string-to-codepoints('z') for $letter in ($number-for-a to $number-for-z) return codepoints-to-string($letter)

This will return a sequence of the following:


('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z')

Execute [3]

Creating Letter Collections


You can also use this to create a list of subcollections: let $data-collection := '/db/apps/terms/data' let $number-for-a := string-to-codepoints('a') let $number-for-z := string-to-codepoints('z') for $letter in ($number-for-a to $number-for-z) return xmldb:create-collection($data-collection, codepoints-to-string($letter) )

This process is very common way to store related files files in subcollections.

Sequences

244

Counting Items
It is very common to need to count your items as you go through them. You can do this by adding the "at $count" to your FLWOR loop: for $item at $count in $sequence return <item> <count>{$count}</count> {if ($count mod 2) then <odd/> else <even/>} </item> Note that the modulo (divide by) function: ($count mod 2) returns 1 for odd numbers, which gets converted to true(), and zero for even numbers, which gets converted to false. You can use this technique to make alternating rows of tables different colors.

Combining Sequence Operations


It is very common to need to "chain" sequence operations in a linear sequence of steps. For example if you wanted to sort a list of sequences and then select the first 10 items your query might look like the following: (: this gets a list of names items from the input :) let $input-sequence := doc('/db/apps/items-manager/data')//item/name/text() let $sorted-items := for $item in $input-sequence order by $item return $item return <ol>{ for $item at $count subsequence($sorted-items, 1, 10) return <li> (: this puts an even or odd class attribute in the li :) {$name }{if ($count mod 2) then attribute class {'odd'} else attribute class {'even'}} </li> }</ol> This technique can be used to paginate results for search results so that users see the first 10 results of a search. A control can then be used to get the next N items from the search result.

Sequences

245

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ fn/ sequence1. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ fn/ sequence2. xq [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ fn/ codepoints1. xq

Sequences Module
Motivation
You want to perform a function on a sequence. You can one of the following functions: map, fold or filter.

Method
Here is the structure of these three functions.
sequences:map($func as function, $seqA as item()*, $seqB as item()*) as item()*

sequences:fold($func as function, $seq as item()*, $start as item()) sequences:filter($func as function, $seq as item()*) as item()* Each of them takes an XQuery function as the first argument.

Map
The map function applies the function item $f to every item from the sequence $seq in turn, returning the concatenation of the resulting sequences in order. W3C Page on Map [1]

References
[1] http:/ / www. w3. org/ TR/ xpath-functions-30/ #func-map

Setting HTTP Headers

246

Setting HTTP Headers


Motivation
You want to put information in your outgoing HTTP headers to control aspects such as web caching and ETAGS.

Sample Module
The following module was provided by Thomas White. File tw_stream-binary-cached.xql xquery version "1.0" encoding "UTF-8"; module namespace cached-binary = "http://www.thomas-white.net/xqm/stream-binary-cached.1.0" ; declare default function namespace "http://www.w3.org/2005/xpath-functions"; import module namespace xdb="http://exist-db.org/xquery/xmldb"; import module namespace cache = "http://exist-db.org/xquery/cache"; import module namespace datetime = "http://exist-db.org/xquery/datetime"; import module namespace util = "http://exist-db.org/xquery/util"; declare option exist:serialize "method=xml media-type=text/xml"; declare function cached-binary:eTag( $pathToBinaryResource as xs:string, $last-modified as xs:dateTime, $domain-tag as xs:string ) as xs:string{ concat( $domain-tag, '-', util:document-id( $pathToBinaryResource ) ,'-', fn:translate( fn:substring($last-modified,1,19),':-T' , '') ) }; declare function cached-binary:eTag-from-uri( $pathToBinaryResource as xs:string, $domain-tag as xs:string ) as xs:string{ cached-binary:eTag( $pathToBinaryResource, xdb:last-modified( util:collection-name( $pathToBinaryResource ), util:document-name( $pathToBinaryResource )), $domain-tag) }; declare function cached-binary:stream-binary-with-cache-headers( $original-path as xs:string?, $pathToBinaryResource as xs:string, $expiresAfter as xs:dayTimeDuration?,

Setting HTTP Headers $must-revalidate as xs:boolean, $doNotCache as xs:string, $domain as xs:string? ) { if( fn:string-length($pathToBinaryResource) = 0 or not( util:binary-doc-available( $pathToBinaryResource )) ) then ( response:set-status-code( 404 ), concat( $original-path, ' ( ', $pathToBinaryResource, ' ) not found!') (: ($original-path, $pathToBinaryResource)[1] :) ) else ( let $coll := util:collection-name( $pathToBinaryResource ) let $file := util:document-name( $pathToBinaryResource ) let $last-modified := xdb:last-modified( $coll, $file) let $ETag := cached-binary:eTag( $pathToBinaryResource, $last-modified, $domain ) let $if-modified-since := request:get-header('If-Modified-Since') let $expire-after := if( empty($expiresAfter) ) then xs:dayTimeDuration( "P365D" ) else $expiresAfter (: 365 Day expiry period :) let $content-type:= ( util:declare-option('exist:serialize', concat("media-type=", xdb:get-mime-type( xs:anyURI( $pathToBinaryResource) )) ), response:set-header( "Pragma", 'o' ) ) return if( not($doNotCache = 'true') and ( ( request:get-header('If-None-Match') = $ETag ) or (: ETag :) (fn:string-length($if-modified-since) > 0 and datetime:parse-dateTime( $if-modified-since, 'EEE, d MMM yyyy HH:mm:ss Z' ) <= $last-modified ) )) then ( response:set-status-code( 304 ), response:set-header( "Cache-Control", concat('public, max-age=', $expire-after div xs:dayTimeDuration('PT1S') )) (: 24h=86,400 , must-revalidate :) ) else ( let $maxAge := $expire-after div xs:dayTimeDuration('PT1S') let $headers := ( response:set-header( "ETag", $ETag ), response:set-header( "Last-Modified",

247

Setting HTTP Headers datetime:format-dateTime( $last-modified, 'EEE, d MMM yyyy HH:mm:ss Z' )), response:set-header( "Expires", datetime:format-dateTime( dateTime(current-date(), util:system-time()) + $expire-after, 'EEE, d MMM yyyy HH:mm:ss Z' )), if( $doNotCache = 'true' ) then ( response:set-header( "Cache-Control", 'no-cache, no-store, max-age=0, must-revalidate' ), response:set-header( "X-Content-Type-Options", 'nosniff' ) )else response:set-header( "Cache-Control", concat( 'public, max-age=', $maxAge, if( $must-revalidate ) then ', must-revalidate' else '' )) ) return response:stream-binary( util:binary-doc( xs:anyURI($pathToBinaryResource ) ), xdb:get-mime-type( xs:anyURI( $pathToBinaryResource) ), xs:anyURI( ($original-path, $pathToBinaryResource)[1] ) ) ) ) }; (: HTTP/1.1 200 OK Date: Fri, 30 Oct 1998 13:19:41 GMT Server: Apache/1.3.3 (Unix) Cache-Control: max-age=3600, must-revalidate Expires: Fri, 30 Oct 1998 14:19:41 GMT Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT ETag: "3e86-410-3596fbbc" Content-Length: 1040 Content-Type: text/html Cache-Control: max-age=3600, must-revalidate Expires: Fri, 30 Oct 1998 14:19:41 GMT Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT ETag: "3e86-410-3596fbbc" Cache-Control: public, max-age=1728000

248

Setting HTTP Headers Expires: Thu, 06 Aug 2009 10:04:13 GMT Date: Fri, 17 Jul 2009 10:04:13 GMT Content-Type: text/javascript; charset=UTF-8 ETag: "ih2h6n8r44hc" Last-Modified: Fri, 05 Sep 2003 02:11:15 GMT X-Content-Type-Options: nosniff :) xquery version "1.0" encoding "UTF-8";

249

declare default function namespace "http://www.w3.org/2005/xpath-functions"; import module namespace request = "http://exist-db.org/xquery/request"; import module namespace cached-binary = "http://www.thomas-white.net/xqm/stream-binary-cached.1.0" "tw_stream-binary-cached.xql";

at

cached-binary:stream-binary-with-cache-headers( request:get-parameter("url", ()), request:get-parameter("uri", 'no-uri'), xs:dayTimeDuration(request:get-parameter("expire", 'P30D')), xs:boolean(request:get-parameter("must-revalidate", 'false') = 'true'), request:get-parameter("doNotCache", ''), request:get-parameter("domain", '') )

Simile Exhibit

250

Simile Exhibit
Motivation
You want to create a Simile Exhibit output of an XML file. To do this we will need to convert XML to JSON file format.

Method
You have a file of contributors to a book and you would like to create a map of their locations. <contributors> <contributor> <author-name>John Doe</author-name> <bio>John is a software developer interested in the semantic web.</bio> <location>New York, NY</location> <image-url>http://www.example.com/images/john-doe.jpg</image-url> </contributor> <contributor> <author-name>Sue Anderson</author-name> <bio>Sue is an XML consultant and is interested in XQuery.</bio> <location>San Francisco, CA</location> <image-url>http://www.example.com/images/sue-anderson.jpg</image-url> </contributor> </contributors>

XQuery to Output JSON File


First we must transform our XML file into JSON format. This is a little tricky because JSON required the curly brace characters to be added to the output. The can be done by creating special variables that contain the strings. In the XQuery header we also have to change the serialization method from our traditional XML to text/plain. We also have to wrap our item output in a string-join() function to prevent the last comma from being serialized. JSON files are just another file format for storing hierarchical data. Just like XML. JSON is used mostly by JavaScript developers that are not either not familiar with XML or don't have XML editing tools to validate file formats. JSON does allow for nesting of complex data but does not support many XML features such as namespaces. Unlike XQuery, JSON file formats do not permit a "dash" character in the label unless you put quotes around the label. So note that the image-url property label has quotes around it. xquery version "1.0"; declare option exist:serialize "method=text media-type=text/plain"; let $document := '/db/apps/exhibit/data/contributors.xml' (: special characters such as left and right curly brace and newline let $lcb := '{', $rcb := '}', $nl := ' ' (: json file header and footer as well as item header and footers :) let $json-header := concat($lcb, $nl, ' "items" : [ ') let $json-footer := concat($nl, ' ]', $nl,$rcb)

:)

Simile Exhibit let $item-header := concat($nl, ' ', $lcb, ' ') let $item-footer := concat(' ', $rcb) return <results>{$json-header} { string-join( for $contributor in doc($document)/contributors/contributor return <item>{$item-header}label: "{$contributor/author-name/text()}", location: "{$contributor/location/text()}", "image-url": "{$contributor/image-url/text()}" {$item-footer}</item> , ', ') }{$json-footer}</results>

251

Sample JSON Output


{ "items" : [ { label: location: "image-url": }, { label: location: "image-url": } ] } "John Doe", "New York, NY", "http://www.example.com/images/john-doe.jpg" "Sue Anderson", "San Francisco, CA", "http://www.example.com/images/sue-anderson.jpg"

Alternative Approach
An alternative is to use the fact that curly-braces can be escaped in XQuery by doubling. Since the output is being serialized as text, all elements will be serialised, so there is no need to serialise items separately. xquery version "1.0"; declare option exist:serialize "method=text media-type=text/plain"; let $document := '/db/Wiki/JSON/contributors.xml' return <result> {{ "items" : [ { string-join( for $contributor in doc($document)/contributors/contributor return <item> {{

Simile Exhibit label: "{$contributor/author-name}", location: "{$contributor/location}", "image-url": "{$contributor/image-url}" }} </item> , ', ' ) } ] }} </result> Execute [1]

252

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ JSON/ asJSON4. xq

Sitemap for Content Management System


Motivation
You want to use eXist to manage your web site and use each collection as a web-content folder. You want a way to automatically create a sitemap for the site so that as you add a new collection in your web root folder the site navigation menus will automatically be updated.

Method
We will use the eXist get-child-collections() function to get all of the child collections for a root collection. We create a recursive function that traverses the collection tree. From the eXist function library, here is the description of get-child-collections function.
xmldb:get-child-collections($a as xs:string) xs:string*

Returns a sequence of strings containing all the child collections of the collection specified in $a. The collection parameter can either be a simple collection path or an XMLDB URI.

If we have a collection called /db/webroot we could pass this string as a parameter to this function and all the child collections would be returned as sequence of strings. We can then create a recursive function that works on each of these child collections.

Sitemap for Content Management System

253

Sample use of get-child-collections()


Here is a very simple use of the function get-child-collections(). You just pass it a single argument which is a path to a collection. It will return a sequence of all the child collections in that collection. xquery version "1.0"; let $children := xmldb:get-child-collections('/db/webroot') return <results> <children>{$children}</children> </results>

Sitemap Function: Version 1


declare function local:sitemap($collection as xs:string) as node()* { if (empty(xmldb:get-child-collections($collection))) then () else <ol>{ for $child in xmldb:get-child-collections($collection) return <li> <a href="{concat('/exist/rest', $collection, '/', $child)}">{$child}</a> {local:sitemap(concat($collection, '/', $child))} </li> }</ol> };

This recursive function takes a single input argument of a string and returns a complex node. The result is an HTML ordered list structure. It first does a test to see if there are any children elements in the collection. If there are not any, it just returns. If there are new children elements, then it creates a new ordered list and iterates through all the child elements in that collection creating a new list item for each child and then calling itself. Note that this could have been written so that the conditional operator only calls itself if there are child elements in a collection. This is an example of tail recursion. This pattern occurs frequently in XQuery functions.

Sample Sitemap Program


We can now call this program within an XHTML page template to create a web page:

Source Code
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes";

declare function local:sitemap($collection as xs:string) as node()* { if (empty(xmldb:get-child-collections($collection))) then ()

Sitemap for Content Management System


else <ol>{ for $child in xmldb:get-child-collections($collection) return <li> <a href="{concat('/exist/rest', $collection, '/', $child)}">{$child}</a> {local:sitemap(concat($collection, '/', $child))} </li> }</ol> };

254

<html> <head> <title>Sitemap</title> </head> <body> <h1>Sitemap for collection /db/webroot</h1> {local:sitemap('/db/webroot')} </body> </html>

Adding Titles
Sometimes the title for the navigation bar will be different from the name of the collection. By convention collection names usually are just short lowercase letters without spaces or uppercase letters. Navigation bars typically have labels that contain spaces and uppercase letters. Here is an example that uses a lookup table to look up the title from a an XML file.
xquery version "1.0"; declare function local:sitemap($collection as xs:string) as node()* { if (empty(xmldb:get-child-collections($collection))) then () else <ol>{ for $child in xmldb:get-child-collections($collection) let $db-path := concat($collection, '/', $child) let $path := concat('/exist/rest', $collection, '/', $child) let $lookup := doc('/db/apps/sitemap/06-collection-titles.xml')/code-table/item[$db-path=path]/title/text() order by $child return <li> <a href="{if (empty($lookup)) then ($path) else (concat($path, "/index.xhtml"))}"> {if (empty($lookup)) then ($child) else ($lookup)} </a>

Sitemap for Content Management System


{local:sitemap(concat($collection, '/', $child))} </li> }</ol> };

255

Screen Image

Note that the child collections are all sorted alphabetically. In some cases this may not be the order you would like to display your site navigation menus. You can add an element a sort-order parameter to the XML file that displays the titles and use that field to sort the child collections.

Collection Titles file


Put this file in the following location: /db/apps/sitemap/06-collection-titles.xml <code-table> <item> <path>/db/webroot/about</path> <title>About</title> </item> <item> <path>/db/webroot/training</path> <title>Training</title> </item> <item> <path>/db/webroot/faqs</path> <title>Frequently Asked Questions</title> </item> <item> <path>/db/webroot/training/xforms</path> <title>XForms</title> </item> <item> <path>/db/webroot/training/rest</path> <title>ReST</title> </item> <item>

Sitemap for Content Management System <path>/db/webroot/training/xquery</path> <title>XQuery</title> </item> <item> <path>/db/webroot/training/tei</path> <title>Text Encoding Initiative</title> </item> <item> <path>/db/webroot/training/exist</path> <title>eXist</title> </item> <item> <path>/db/webroot/products</path> <title>Products</title> </item> <item> <path>/db/webroot/support</path> <title>Support</title> </item> </code-table>

256

Customizing Your Sitemap Function


Not all collections should be displayed in a sitemap. Some collections may contain private administrative data that you do not want to display in a public sitemap. There are two ways to handle this. You can keep a general "published collections" list in a separate XML file. Alternatively you can store a small XML file in each collection that displays the properties of that collection . By default this collection may be either public or private, depending on how you write your function. The second option is more portable if you and your associates are each building web applications you would like to share. By standardizing on your collection-properties.xml files you can store properties in a collection and then exchange them with other eXist sites by just exchanging the collections as folders that can be zipped.

Counting Files In Collection and SubCollections


Here is the psudo code: declare function local:count-files-in-collection($collection as xs:string) as xs:integer { let $child-collections := xmldb:get-child-collections($collection) if (empty($child-collections)) then (: return the count of number of files in this collection :) else (: for each subcollection call local:count-files-in-collection($child) :) };

Slideshow

257

Slideshow
Motivation
Despite being a 20-year old program, albeit enhanced over the years, Microsoft PowerPoint is the ubiquitous presentation software. It provides a wide range of functionality, but most of us use it for simple text slides, perhaps with a bit of animation. However Powerpoint does not cleanly separate the content of the presentation from its presentation (as slides, in printed form, as a index) and appearance (styles, colors), is an expensive proprietary product and over-weight for many tasks. Thus there is value in using simple XML tools to provide similar functionality.

Prior art
There are a number of approaches to using XML technologies to provide light-weight, non-proprietorial presentation software. These typically rely on a web browser as the rendering engine (a design choice not open to Richard Gaskins [1] in 1984). Core problems are the task of dividing the presentation into separate slides and supporting navigation in the slide sequence. Slidy [2] by Dave Raggett - SlidyXML, XSLT, Javascript S5 [3] by Eric Meyer DocBook [4] by Norman Walsh - DocBook, XSLT [5] DocBook, CSS, Opera presentation mode

This project uses XQuery as the server-side language.

Presentation format
Other approaches use a defined vocabulary but the choice here is to use XHTML with a little additional markup to define slide boundaries and slideshow properties. This provides a wide range of functionality needed such as formating, linking, images, embedded video.
<ss:slideshow xmlns="http://www.w3.org/1999/xhtml" <ss:css> <ss:slide>slide.css</ss:slide> <ss:print/> </ss:css> <ss:header>DSA 2008 - Lecture 1 - Chris Wallace</ss:header> <ss:footer/> <ss:slide> <h1>Teaching approach</h1> <ul> <li>1 lecture a week <ul> <li>Interaction using SMS whiteboard and Multi-choice cube</li> </ul> </li> <li>1 2-hour workshop every 2 weeks - write down the weeks you have been allocated</li> <li>2 hour Research time every 2 weeks (alternating with the xmlns:ss="http://www.cems.uwe.ac.uk/xmlwiki/slideshow">

Slideshow
workshops) - independent study with tutor support </li> <li>Teaching resources in UWEOnline and in the <a href="https://www.cems.uwe.ac.uk/studentwiki/index.php/UFIEKG-20-2/2008">studentWiki</a> </li> </ul> </ss:slide> ...

258

Here I have used two namespaces: the default is the XHTML used in the slide body, the ss namespace is used for the slideshow elements which define slideshow properties, master slide, and the slide boundaries. This is a very minimal format which would be expanded in future.

The Script
The XML document defining the slideshow content needs to be transformed into slides for projection and into a print format. In this implementation, both versions are generated from the same script.

Namespaces
The two namespaces must be declared - an arbitrary prefix used for the default XHTML namespace. declare namespace ss= "http://www.cems.uwe.ac.uk/xmlwiki/slideshow"; declare namespace h = "http://www.w3.org/1999/xhtml" ;

Parameters
The slide parameters are the uri of the slideshow document (whether a database document or an external document), the slide number and the mode - slide or print. The parameters are passed in a semicolon-delimited query string rather than in the more usual key=value form because I was unable to get the & separator to work in Javascript(??) declare declare declare declare variable variable variable variable $params := tokenize(request:get-query-string(),";"); $uri := $params[1]; $n := xs:integer(($params[2],1)[1]); $mode :=($params[3],"slide")[1];

Fetching the XML


The slideshow is fetched - this is faster if stored in the database but may also be an external file. declare variable $slideshow := doc($uri)/ss:slideshow; declare variable $slides := $slideshow/ss:slide; declare variable $count := count($slides);

Slideshow

259

The Slide Show


Show a Slide A function generates the div holding a slide. The global variables define the common slideshow properties.
declare function local:show-slide($slide as element(ss:slide) ) as element(div) { <div class="slide"> <span class="header">{$slideshow/ss:master/ss:header/node()}</span> {$slide} <span class="footer">{$n}/{$count} &#160; {$slideshow/ss:footer/node()} </div> }; </span>

Contents Slide The <h1> element in each slide is used to generate a contents slide, numbered 0.
declare function local:show-contents() as element(div) { <div class="contents"> <span class="header">{$slideshow/ss:header/node()}</span> <h1>Contents</h1> <ul> {for $slide at $i in $slides return <li>{$i} &#160;<a href="slide.xql?{$uri};{$i}">{string($slide/h:h1)}</a> </li> } </ul> <span class="footer">0/{$count} &#160; {$slideshow/ss:footer/node()} </div> }; </span>

Navigation Navigation is handled by a JavaScript function which handles keypress events and is attached to the page body. This code is different for each slide so is generated for each slide. The keypress mapping is based partly on the codes generated by a common wireless presenter, the Labtec Notebook presenter [6] which is designed for use with PowerPoint. Documentation on the device was hard to find, so the key mapping was analysed by capturing the keypresses observed by a simple javascript. left and right buttons: PageUp and PageDown to step forwards and backwards bottom key : 'b' to blank the screen top button : toggle between F5 to fullscreen, Esc to edit mode Other key mappings were added to allow the cursor keys to be used and to go to print mode. Note that in generating this Javascript code, { } brackets need to be doubled. declare function local:keypress-script() as element(script) { let $prev := if ($n > 0) then $n - 1 else 1 let $next := if ($n < $count) then $n + 1 else $count return <script type="text/javascript"> function keypress(e) {{

Slideshow var code=e.keyCode if (code==34 || code== 39) document.location = "slide.xql?{$uri};{$next}" //Page UP or forward : next if (code==33 || code== 37) document.location = "slide.xql?{$uri};{$prev}" //Page Down or back : previous if (code==66 || code==38 ) document.location = "slide.xql?{$uri};0" //b or up : index if (code==36) document.location ="slide.xql?{$uri};1" //Home : first if (code==35) document.location ="slide.xql?{$uri};{$count}" //End : last if (code==80 || code==40 ) document.location ="slide.xql?{$uri};0;print" //p : print }} </script> }; Generate declare option exist:serialize "method=xhtml media-type=text/html"; if ($mode="slide") then <html> <head> <title>{string($slideshow/ss:title)} - Slides</title> <link rel="stylesheet" type="text/css" href="{$slideshow/ss:css/ss:slide}"/> {local:keypress-script()} </head> <body onkeydown="keypress(event)"> { if ($n=0) then local:show-contents() else local:show-slide($slides[$n]) } </body> </html> else ...

260

Print format
Other functions generate a printable version of the slide show. This comprises : Contents Page
declare function local:print-contents() as element(div) { <div class="contents"> <h2>Contents</h2> <ul> {for $slide at $i in $slides return

Slideshow
<li>{$i} . <a href="slide.xql?{$uri};{$i}">{string($slide/h:h1)}</a> </li> } </ul> </div> };

261

Slides declare function local:print-slides() as element(div)* { for $slide at $i in $slides return $slide };

Links The URIs for links is not visible in the printed slides, so it is useful to add a final page with all the links which appear in the slides listed together. declare function local:print-links() as element(div) { <div class="links"> <h1>Links</h1> <ul> {for $slide at $i in $slides for $link in $slide//h:a order by upper-case($link) return <li>{string($link)} :<em>{string($link/@href)}</em> </li> } </ul> </div> }; Generate Print View If the mode is "print" then generate the print format:
.. else <html> <head> <title>{string($slideshow/ss:title)} - Print</title> <link rel="stylesheet" type="text/css" </head> <body> {local:print-contents()} {local:print-slides()} {local:print-links()} </body> href="{$slideshow/ss:css/ss:print}"/>

Slideshow
</html>

262

Execute
An introductory lecture [7] - incomplete CW 18/09/08

References
[1] [2] [3] [4] [5] [6] [7] http:/ / www. robertgaskins. com/ http:/ / www. w3. org/ Talks/ Tools/ Slidy/ #(1) http:/ / meyerweb. com/ eric/ tools/ s5/ http:/ / docbook. sourceforge. net/ http:/ / www. thingbag. net/ docbook/ sig032503/ enus/ misc/ operashow. html http:/ / www. labtec. com/ index. cfm/ gear/ details/ EUR/ EN,crid=29,contentid=730 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SlideShow/ slide. xql?/ db/ Wiki/ SlideShow/ DSA1a. xml

SMS tracker
Motivation
BrightKite [1] provides a service to micro-blog your location and message to a service to geocode the address, map it, find other tweeters nearby and forward to other micro-blogs. However for UK users the service lacks the availability of an SMS service. The following scripts provide a basic SMS tracker service, allowing a user to text an address and a message to an SMS service and see that location on a generated map. This simple application does not provide the social aspects of BrightKite, being confined to creating a simple track.

Implementation
Dependencies
eXist-db Modules xmldb - to update the track datetime - for date formatting util - serialize to convert XML to CDATA Other an SMS two-way service GoogleGeocoding service kml-based mapping such as GoogleMap or GoogleEarth

SMS tracker

263

The Track structure


Each track is represented as a single XML file, containing a unique name, a title, one or more mobile phone numbers and a list of events. Each event is time-stamped and contains the original address, its latitude and longitude when geo-coded and a message. A local namespace is used for the XML data and for associated functions. Full address geo-coding is not supported in the UK due to copyright restrictions on full addresses and postcodes.
<?xml version="1.0" encoding="UTF-8"?> <track xmlns="http://www.cems.uwe.ac.uk/exist/geo" > <name>wiki</name> <mobile>44771234578</mobile> <title>Demo Track</title> <entries> <entry date="2008-06-12T09:56:08.593Z"> <address>bristol parkway station</address> <location latitude="53.580320" longitude="-0.683640" ambiguous="true"/> <message>Waiting for the paddington train</message> </entry> <entry date="2008-06-12T10:30:51.454Z"> <address>swindon</address> <location latitude="51.558418" longitude="-1.781985"/> <message>Nice empty train</message> </entry> <entry date="2008-06-12T10:51:12.429Z"> <address>didcot parkway</address> <location latitude="51.610994" longitude="-1.242799"/> <message>Grey and its been raining</message> </entry> ...

In-bound messages
Inbound messages have the structure: geo {address} ! {message} SMS messages are sent to the UWE SMS two-way service described in here [2]. The router uses the first word to route the message to the associate service, in this case track2sms.xq. This service is invoked viat HTTP, passing the prefix (prefix), the originating mobile number (from) and the text of the message (text') following the prefix. The script uses the originating mobile number to find the associated track. If there is one, the message is parsed into the address and message text. The address is passed to the Google geocoding service. If the address is recognised, a new event is created and appended to the rest of the events in the track and a confirmation returned to the originator (via the SMS two-way service).
declare namespace declare namespace geo = "http://www.cems.uwe.ac.uk/exist/geo";

kml = "http://earth.google.com/kml/2.0";

declare variable $geo:googleKey := "ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ"; declare variable $geo:googleUrl := "http://maps.google.com/maps/geo?q=";

SMS tracker

264

declare function geo:geocode($address as xs:string) element(geo:location)* { let $address := normalize-space($address) let $address := encode-for-uri($address) let $url :=

as

concat($geo:googleUrl,$address,"&amp;output=xml&amp;key=",$geo:googleKey) let $response := doc($url) for $placemark in $response//kml:Placemark let $point := $placemark/kml:Point/kml:coordinates let $latlong := tokenize($point,",") return <geo:location latitude="{$latlong[2]}" }; declare variable $sep declare variable $from declare variable $text declare variable $track declare variable $now := := := := := "!"; request:get-parameter("from",()); request:get-parameter("text",()); //geo:track[geo:mobile = $from]; longitude="{$latlong[1]}"/>

string(adjust-dateTime-to-timezone(current-dateTime())); declare option indent=yes"; if (exists($track)) let $address := exist:serialize "method=text media-type=text/text

then if (contains($text,$sep)) then normalize-space(substring-before($text,$sep)) else normalize-space($text) let $message := substring-after($text,$sep) let $location := geo:geocode($address) return if (exists($location) and count($location)=1) then let $update := update insert <entry xmlns="http://www.cems.uwe.ac.uk/exist/geo" date='{$now}' > <address>{$address}</address> {$location} <message> {$message} </message> </entry> into $track/geo:entries

SMS tracker
return concat("Reply: else concat("Reply: else () ",$track/name," address :", $address, "not geocoded or ambiguous", $text,":",$message) ",$address, " is at lat: ", $location/@latitude, " long:.", $location/@longitude)

265

Generating the Map


The track is identified by name and a KML file of the events on the track is generated.
declare namespace geo = "http://www.cems.uwe.ac.uk/exist/geo";

declare namespace kml = "http://earth.google.com/kml/2.1" ; declare function geo:entry-to-kml($entry element(Placemark) { let $location := $entry/geo:location let $latlong := concat($location/@latitude," ",$location/@longitude) let $dt := datetime:format-dateTime($entry/@date,"yy/MM/dd HH:mm") let $popup := <div xmlns="http://www.w3.org/1999/xhtml"> <h3>{string($entry/geo:address)}</h3> <p> {string($entry/geo:message)} </p> </div> return <Placemark> <name>{$dt} &#160;{string($entry/geo:title)}</name> <description> {util:serialize($popup,"method=xhtml")} </description> <Point> <coordinates> {string-join(($location/@longitude,$location/@latitude),",")} </coordinates> </Point> </Placemark> }; as element(geo:entry)) as

declare option exist:serialize

"method=xml indent=yes

media-type=application/vnd.google-earth.kml+xml"; declare variable $name := request:get-parameter("name",());

declare variable $track := //geo:track[geo:name=$name];

SMS tracker
let $dummy := response:set-header('Content-Disposition',concat('inline;filename=',$name,'.kml;')) return <kml xmlns="http://earth.google.com/kml/2.1" <Folder> <name>{$name}</name> <title>{$track/geo:title}</title> { for $entry in $track//geo:entry return } </Folder> </kml> geo:entry-to-kml($entry) >

266

Example Map
Google Map [3] Note that one address has been miscoded but the feedback allowed the address to be changed and resent.

To do
1. edit track to remove or correct bad geo-coding 2. add events from a browser

References
[1] http:/ / brightkite. com/ [2] http:/ / en. wikibooks. org/ wiki/ XQuery/ String_Analysis#SMS_service [3] http:/ / maps. google. co. uk/ maps?q=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Tracker/ map. xq?name=wiki

Southampton Pubs

267

Southampton Pubs
Pubs of Southampton
Data on Pubs in Southampton [1] has collected by a couple of enthusiasts. John Goodwin created an RDF representation of this data and an interface [3] to the data.
[2]

Conversion to KML
The RDF is straightforward to convert to KML. The RDF uses a number of namespaces, not all of which are used in this extract. March 2009: This script was discovered to be broken. The base RDF file has been changed to add a new namespace for addresses [4] in place of the local pub namespace. Since there is of course no notification of such changes, the user of published RDF data sets is not in a much better position than the web scraper, unless the application is written to first check that the vocabulary assumed by the application is still used. However there is no mechanism for expressing the mixture of bits of vocabs used in a RDF dataset. If there were, at least the interfaces could be checked by having a similar definition of the parts actually used in this application to compare. 5 March 2008 Sadly John has been forced to take down this data set due to adverse reaction to one of the pub reviews.
declare namespace rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; declare namespace rdfs= "http://www.w3.org/2000/01/rdf-schema#"; declare namespace pub= "http://www.johngoodwin.me.uk/pubs/"; declare namespace geo ="http://www.w3.org/2003/01/geo/wgs84_pos#"; declare namespace con ="http://www.w3.org/2000/10/swap/pim/contact#"; declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none"; let $x := response:set-header('Content-disposition','Content-disposition: inline;filename=sotonpubs.kml;') let $pubs := doc("http://www.johngoodwin.me.uk/pubs/models/pubs.rdf")/rdf:RDF return <Folder> {for $pub in $pubs/rdf:Description let $description := <div> <div style="color:gray">{concat($pub/con:address//con:street," ", $pub/con:address//con:postalCode)}</div> <div style="color:blue">{string($pub/pub:description)}</div> <hr/> <div style="font-size:10pt">{$pub/pub:dateSurveyed}</div> </div> return

Southampton Pubs
<Placemark> <name>{string($pub/rdfs:label) } </name> <description>{ <Point> <coordinates>{concat($pub/geo:long,",",$pub/geo:lat,",0")}</coordinates> </Point> </Placemark> } </Folder> util:serialize($description,"method=xhtml")}</description>

268

On GoogleMap [5]

References
[1] [2] [3] [4] http:/ / www. pubsinsouthampton. co. uk/ http:/ / www. johngoodwin. me. uk/ pubs/ models/ pubs. rdf http:/ / www. johngoodwin. me. uk/ pubs/ pubindex. html http:/ / www. w3. org/ 2000/ 10/ swap/ pim/ contact#

[5] http:/ / maps. google. co. uk/ maps?q=http:%2F%2Fwww. cems. uwe. ac. uk%2Fxmlwiki%2FRDF%2Fmappubs. xql

SPARQLing Country Calling Codes


Motivation
Stimulated by Henry Story's blog entry [1], the following script works on the same problem. This script uses the functions defined in previous module to execute a SPARQL query on the dbpedia server, and to convert SPARQL Query results to tuples.

First attempt
import module namespace fr="http://www.cems.uwe.ac.uk/wiki/fr" "fr.xqm"; declare variable $query := " PREFIX : <http://dbpedia.org/resource/> PREFIX p: <http://dbpedia.org/property/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE { ?resource p:callingCode ?callingCode. } "; declare option exist:serialize "method=xhtml media-type=text/html"; <html> <head> <title>Country Calling codes</title> </head> at

SPARQLing Country Calling Codes <body> <h1>Country Calling codes</h1> <table border="1"> { for $country in fr:sparql-to-tuples(fr:execute-sparql($query)) let $name := fr:clean($country/resource) order by $name return <tr> <td><a href="{$country/resource}">{$name}</a></td> <td>{$country/callingCode}</td> </tr> } </table> </body> </html> Run [2] In this script the resource uri is parsed to get the local name part of the resource URI in the fr:clean() function. The more sound alternative is to filter the multilingual rdfs:label property: SELECT * WHERE { ?resource p:callingCode ?callingCode. ?resource rdfs:label ?name. FILTER (lang(?name) = 'en') } Run [3] but this query is naturally much slower.

269

Discussion
This query returns a set of dbpedia resources which have a callingCode property. However, it includes resources which are not countries and it proves quite difficult to identify which resources are countries. It might be expected that either the skos:subject or rdfs:type predicates would identify countries, but this is not the case. Of course, what entities are classified as countries is a debatable issue, as is currently illustrated by Kosova and by the documentation on ISO 3166. Perhaps countries are better identified by properties. There is a property countryCode which looks promising: The SPARQL query becomes: PREFIX : <http://dbpedia.org/resource/> PREFIX p: <http://dbpedia.org/property/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE { ?resource p:callingCode ?callingCode. ?resource p:countryCode ?countryCode. } Run [4]

SPARQLing Country Calling Codes However this shows that many countries have incomplete data in dbpedia, or that the coding of this property is inconsistent. This is not surprising because there are a number of types of country codes, which result in different definitions of country: ISO 3166-1 alpha-3 [5] ISO 3166-1 alpha-2 [6] ISO 3166-1 numeric [7] IOC country codes [8] License plate numbers [9] Top-level domain codes [10]

270

Wikipedia scraping
In fact, International Calling codes are listed in a wikipedia entry [11] Thus a more direct approach would be to generate the table by scraping wikipedia directly. However, now we err in the opposite direction, in that there are calling codes for telecom services as well as countries, and the format of numbers and names is inconsistent - some multiple numbers, some numbers with leading + , some countries with appended synonyms etc. In this script, the path expression finds the anchor "Alphabetical_Listing" and then finds the following table. declare namespace h= "http://www.w3.org/1999/xhtml" ; let $url := "http://en.wikipedia.org/wiki/International_calling_codes" let $wikipage := doc($url) let $section := $wikipage//h:table[@class="wikitable sortable"][2] return $section

Jan 2010 - the page layout had changed so that the previous path to this table :
let $section := $wikipage//h:a[@name="Alphabetical_Listing"]/../following-sibling::h:table[1]

to the current : let $section := $wikipage//h:table[@class="wikitable sortable"][2] Wikipedia [12]

Export as RDF
An alternative is to export this table as RDF. Here the resource is the dbpedia resource and the property is defined in the dbpedia property namespace. declare namespace h= "http://www.w3.org/1999/xhtml" ; declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; declare namespace p = "http://dbpedia.org/property/"; let $url := "http://en.wikipedia.org/wiki/International_calling_codes" let $wikipage := doc($url) let $section := $wikipage//h:table[@class="wikitable sortable"][2] return <rdf:RDF xmlns:p = "http://dbpedia.org/property/">

SPARQLing Country Calling Codes {for $row in $section/h:tr[h:td] let $country := string($row/h:td[1]) let $code := string($row/h:td[2]/h:a[1]) let $code := replace($code,"\*","") let $resource := concat("http://dbpedia.org/resource/", replace($country," ","_")) return <rdf:Description rdf:about="{$resource}"> <p:internationalcallingCode>{$code}</p:internationalcallingCode> </rdf:Description> } </rdf:RDF> Similarly the structure of this table changed so this code needed to be updated. RDF [13]

271

References
[1] http:/ / blogs. sun. com/ bblfish/ entry/ sparqling_calling_codes [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ countryCodes. xq [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ countryCodes2. xq [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ countryCodes1. xq [5] http:/ / en. wikipedia. org/ wiki/ ISO_3166-1_alpha-3 [6] http:/ / en. wikipedia. org/ wiki/ ISO_3166-1_alpha-2 [7] http:/ / en. wikipedia. org/ wiki/ ISO_3166-1_numeric [8] http:/ / en. wikipedia. org/ wiki/ List_of_IOC_country_codes [9] http:/ / en. wikipedia. org/ wiki/ List_of_international_license_plate_codes [10] http:/ / en. wikipedia. org/ wiki/ Country_code_top-level_domain [11] http:/ / en. wikipedia. org/ wiki/ International_calling_code [12] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Scrape/ wikicallingcodes. xq [13] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ wikicallingcodesrdf. xq

Special Characters

272

Special Characters
Motivation
You want to control where you put newlines and quote characters in your output.

Method
We will create XQuery variables (referents) to the decimal encoded character values using the &#NN' notation where NN is the decimal number for this character in the character set. We can then add these variables anywhere in the output stream.

Example Program
In this example we will create a variable $nl had have it refer to the newline character. We will then put this in the middle of a string. xquery version "1.0"; let $nl := "&#10;" let $quote := "&#34;" let $string := concat("Hello", $nl, "World") return $string Returns: Hello World The following shows how both quote and newline special characters can be created. let $nl := "&#10;" let $quote := "&#34;" let $string := concat($quote, "Hello", $nl, "World", $quote) return $string Returns: "Hello World" Note that the string length of these variables string-length($nl) and string-length($quote) is only one character.

Other Useful Escape Characters


let let let let let $open-curly := "&#123;" (: for { :) $closed-curly := "&#125;" (: for } :) $space := "&#32;" (: space :) $tab := "&#9;" (: tab :) $ampersand := "&#38;" (: ampersand :)

Reference For other characters see the following table:

Special Characters http://www.asciitable.com/

273

Splitting Files
Motivation
You have a single large XML document with many consistent records in it. You want to split it into many smaller documents so that each can be edited by a separate user. There are many good reasons to split large files up. Some have to do with how much data you want to load into an editor at a time or how you want to publish individual files to a remote site. eXist and many other systems do versioning and keep date/time stamps for each file. Using smaller files these functions may be easier to do.

Method
We will create an XQuery that will iterate through all the records in the document. For each record we will use the XQuery function to store a document in a collection. The format of this function is: xmldb:store($collection, $filename, $data) Where: $collection is a string that holds the path to the collection we will be storing the data for each record. For example '/db/test/data' $filename is the name of the file. The name can either be derived from the data or it can be generated by a sequence counter in the split query. For example 'Hello.xml" or "1.xml". $data is the data we will be storing into the file

Sample Input XML Document


<root> <row> <Term>Hi</Term> <Definition>An informal short greeting.</Definition> </row> <row> <Term>Hello</Term> <Definition>A more formal greeting.</Definition> </row> </root>

Sample XQuery
xquery version "1.0"; let $input-document := '/db/test/input.xml' let $collection := '/db/test/terms' (: the login used must have write access to the collection :) let $output-collection := xmldb:login($collection, 'my-login',

Splitting Files 'my-password') return <SplitResults>{ for $term-data in doc($input-document)/root/row (: For brevity we will create a file name with the term name. Change this to be an ID function if you want :) let $term-name := $term-data/Term/text() let $documentname := concat($term-name, '.xml') let $store-return := store($collection, $documentname, $term-data) return <store-result> <store>{$term-name}</store> <documentname>{$documentname}</documentname> </store-result> }</SplitResults>

274

Using A Sequence Counter for Artificial Keys


Sometimes there are not any elements in the importing record that can be used as a unique key or are not appropriate to use as an artificial key. In this case you will want to use a counter to create an XML document with a unique number in it. The sequence number generated is called an "artificial key" since it is not really related directly to any data elements in the record. You can achieve this by adding an "at counter" to your for loop. To do this just add the string at $count after the for variable like the following for $term-data at $count in $input-file/row The store function can then use the $count variable to create a file name with this number: let $filename := concat($count, '.xml')

Adding a ID to each item using the XQuery update Operator


Once you have inserted the data into a collection you will then want to assign each item a unique ID. This is called an artificial key since it is created by an artifical import process and it not related to data inside of the item. Artificial keys are usually assigned by the computer system that stores the data but not derived from the data. <item> <person-name>John Doe</person-name> ... </item> You can also automatically add an ID to each item by doing the following: for $item at $count in $items update insert <id>{$count}</id> preceding $item/person-name After this update the new ID element will be inserted before the person-name element: <item> <id>47</id>

Splitting Files <person-name>John Doe</person-name> ... </item> It is a best practice to make sure that items do not already have an ID element. for $item at $count in $items[not(id)] update insert <id>{$count}</id> preceding $item/person-name This prevents duplicate ids from being added if the script gets run twice. You can also modify this to start the count one higher then the largest id in a collection. (: get the largest ID in the collection :) let $largest-id := max( collection($my-collection)/*/id/text() ) let $offset := $largest-id + 1 for $item at $count in $items[not(id)] update insert <id>{$count + $offset}</id> preceding $item/person-name

275

References
The split pattern is documented in the Enterprise Pattern Integration [1] Web site. Note that pattern is called "Splitter" dispite the fact that the name in the URL is "Sequencer". Also note that the size of the file you select to load into the client has a large impact on the way that concurrent edits are performed. This has a large impact on what data needs to be locked for editing. See XRX Locking Grain Design [2] for more information.

References
[1] http:/ / www. eaipatterns. com/ Sequencer. html [2] http:/ / www. oreillynet. com/ xml/ blog/ 2008/ 05/ xrx_locking_grain_design. html

Subversion

276

Subversion
Motivation
You want to be able to access a Subversion (SVN) repository, including checking out the repository's files directly into the eXist database and committing changed files back to the repository, using XQuery.

Method
A subversion XQuery module has been added to the bleeding edge development version of eXist 1.5. You can use it to query remote subversion servers, and even to check out a remote repository to store the repository's contents in the database. (If you do check out a repository, note that the subversion repository's files, including its many ".svn" files, will be stored directly in your database.) As of May 2011, the subversion module can perform most, but not all, common subversion functions.

Installation Steps
Building the subversion extension
As with all eXist extensions that are not enabled by default, you need to instruct eXist's build process to include the extension. You should first copy the file $EXIST_HOME/extensions/build.properties to a new file, called $EXIST_HOME/extensions/local.build.properties. This local file will be used by the build process, but it will be ignored by your subversion client so that you don't accidentally commit it to the eXist repository. You should now locate the following line: #SVN extension include.feature.svn = false Change false to true: include.feature.svn = true Save the local.build.properties file. With these changes you must now rebuild (i.e. recompile) eXist so that the subversion extension is included in eXist's jar files.

Enable the subversion module in conf.xml


To ensure the module is available when you start eXist, un-comment the following lines in your $EXIST_HOME/conf.xml file
<module uri="http://exist-db.org/xquery/versioning/svn" class="org.exist.versioning.svn.xquery.SVNModule" />

Save conf.xml. Now you can start eXist, and the subversion module will now be ready for you to use. You can build the subversion function documentation at http:/ / localhost:8080/ exist/ admin/ admin. xql?panel=fundocs and then accessing http://localhost:8080/exist/functions/subversion. You should now be able to test the subversion XQuery functions. This should look very similar to the function listings on the eXist demo site here: http://demo.exist-db.org/exist/xquery/functions.xql [1]

Subversion

277

Current Status
Subversion repositories can be accessed over HTTP and HTTPS, both anonymously and with username/password authentication. The following functions have been tested to work: subversion:checkout($repository-uri as xs:string, $database-path as xs:string) xs:long subversion:checkout($repository-uri as xs:string, $database-path as xs:string, $login as xs:string, $password as xs:string) xs:long subversion:get-latest-revision-number($repository-uri as xs:string, $login as xs:string, $password as xs:string) xs:long subversion:info($database-path as xs:string) element() subversion:list($repository-uri as xs:string) element() subversion:log($repository-uri as xs:string, $login as xs:string, $password as xs:string, $start-revision as xs:integer?, $end-revision as xs:integer?) element() subversion:status($database-path as xs:string) element() subversion:update($database-path as xs:string) xs:long subversion:update($database-path as xs:string, $login as xs:string, $passwrod as xs:string) xs:long subversion:add($database-path as xs:string) empty() The following works under some cases but has buffer errors for some sizes of commits: subversion:commit($database-path as xs:string, $message as xs:string?, $login as xs:string, $password as xs:string) xs:long The following functions are not yet confirmed to work and are still being tested: subversion:clean-up($database-path as xs:string) empty() subversion:lock($database-path as xs:string, $message as xs:string?) empty() subversion:revert($database-path as xs:string) empty() subversion:unlock($database-path as xs:string) empty()

Examples
Querying a Remote Repository
subversion:get-latest-revision-number() The subversion:get-latest-revision-number() function queries the remote SVN repository, returning the latest revision number. For example:
xquery version "1.0"; import module namespace subversion = "http://exist-db.org/xquery/versioning/svn"; let $repository-uri := xs:anyURI('https://exist.svn.sourceforge.net/svnroot/exist/trunk/eXist/webapp/eXide') let $username := '' let $password := '' return subversion:get-latest-revision-number($repository-uri, $username, $password)

Subversion

278

This query returns the following result: 14458 subversion:info() Once you have done a checkout of a resource from subversion you can query that resource locally and find out more about it. subversion:info('/db/apps/faqs/data') This will return: <info uri="/db/cms/apps/faqs/data"> <info local-path="/db/apps/faqs/data" URL="https://www.example.com/repo/trunk/db/apps/faq/data" Repository-UUID="db6794ef-7b42-44a9-8912-f63d0efeae0f" Revision="10" Node-Kind="dir" Schedule="normal" Last-Changed-Author="dmccreary" Last-Changed-Revision="8" Last-Changed-Date="Thu Sep 01 15:03:04 CDT 2011"/> subversion:list() The subversion:list() function lists the contents of a remote repository, returning the results as an XML node:
xquery version "1.0"; let $repository-uri := xs:anyURI('https://exist.svn.sourceforge.net/svnroot/exist/trunk/eXist/webapp/scripts/') return subversion:list($repository-uri)

This script will return the following result: <entries> <entry type="directory">edit_area</entry> <entry type="directory">jquery</entry> <entry type="directory">openid-selector</entry> <entry type="directory">syntax</entry> <entry type="directory">yui</entry> <entry type="file">fundocs.js</entry> <entry type="file">main.js</entry> <entry type="file">prototype.js</entry> </entries>

Subversion subversion:log() The subversion:log() function queries the remote SVN repository, returning the log of changes as an XML node. For example, this query will return show the log of changes between two arbitrary revision numbers (note that substituting empty nodes () for $start-revision and/or $end-revision will return a more open-ended log of revisions):
xquery version "1.0"; let $repository-uri := xs:anyURI('https://exist.svn.sourceforge.net/svnroot/exist/trunk/eXist/webapp/eXide') let $username := '' let $password := '' let $start-revision := 14300 let $end-revision := 14350 return subversion:log($repository-uri, $username, $password, $start-revision, $end-revision)

279

The results of this query are as follows (note that the @revtype values are 'A' for item added, 'D' for item deleted, 'M' for item modified, and 'R' for item replaced):
<log uri="https://exist.svn.sourceforge.net/svnroot/exist/trunk/eXist/webapp/eXide" start="14300"> <entry rev="14331" author="wolfgang_m" date="2011-04-29T07:00:54.297-04:00"> <message>[feature] eXide - a web-based XQuery IDE for eXist. Features: fast syntax highlighting, ability to edit huge XQuery files, code completion for functions and variables, code templates, powerful navigation, on-the-fly compilation, generation of app skeletons, integration with app repository... This is the initial checkin of eXide.</message> <paths> <path revtype="A">/trunk/eXist/webapp/eXide/templates</path> <path revtype="A">/trunk/eXist/webapp/eXide/collections.xql</path> <path revtype="A">/trunk/eXist/webapp/eXide/session.xql</path> ....etc.... <path revtype="A">/trunk/eXist/webapp/eXide/scripts/ace/cockpit.js</path> <path revtype="A">/trunk/eXist/webapp/eXide/index.html</path> </paths> </entry> <entry rev="14346" author="wolfgang_m" date="2011-04-30T08:35:23.395-04:00"> <message>[website] eXide: fixed completion popup window (support mouse, extra "close" link if popup looses focus); improved auto-indent in editor after { and (.</message> <paths> <path revtype="M">/trunk/eXist/webapp/eXide/src/mode-xquery.js</path> <path revtype="M">/trunk/eXist/webapp/eXide/src/util.js</path> <path revtype="M">/trunk/eXist/webapp/eXide/eXide.css</path>

Subversion
</paths> </entry> </log>

280

Getting the Last 10 Commit Messages The log function can be combined with the get-latest-revision-number function to get the last 10 commit messages in the system. let $latest-version := subversion:get-latest-revision-number($repo-url, $svn-account, $svn-password) (: if we have more than 10 revisions then get them all, else start with one :) let $start := if ($latest-version gt 10) then $latest-version - 10 else 1 return <last-10-commit-messages> {subversion:log($repo-url, $svn-account, $svn-password, $start , $latest-version)//*:message} </last-10-commit-messages>

Operating on a Local Working Copy


subversion:checkout() The following example checks out eXist's "functions" app to the "/db/svn" collection:
xquery version "1.0";

let $repository-uri := xs:anyURI('https://exist.svn.sourceforge.net/svnroot/exist/trunk/eXist/webapp/functions/') let $destination-path := '/db/svn' let $version := subversion:checkout($repository-uri, $destination-path)

return concat('Revision ', $version, ' successfully checked out to collection ', $destination-path)

This returns: Revision 14457 successfully checked out to collection /db/svn The /db/svn collection will now contain the following files: .svn (collection) controller.xql filter.xql functions.xql

Subversion subversion:add() After you have run a checkout you are now ready to do a subversion:commit() or an subversion:add(). The both of these functions take a single argument which is the database collection path you want to send to your subversion server. subversion:update() Assuming we have already checked out a repository to /db/svn, we can update the working copy to the latest revision using the subversion:update() function: xquery version "1.0"; let $working-copy := '/db/svn' let $update := subversion:update($working-copy) return concat('Successfully updated to revision ', $update)

281

This script will return the following result: Successfully updated to revision 14457 You can also get updates from a secure site by using subversion:update($working-copy, $user, $password) subversion:status() The subversion:status() function returns the status of files in the local working copy. For example, assuming you have checked out the repository https:/ / exist. svn. sourceforge. net/ svnroot/ exist/ trunk/ eXist/ webapp/ functions/ to the /db/svn collection, you can get the status of its files with the following query: xquery version "1.0"; let $destination-path := '/db/svn' return subversion:status($destination-path)

The results will be:


<status> <entry status="normal" locked="false" working-revision="14490" last-changed-revision="13019" author="joewiz" path="/db/svn/controller.xql"/> <entry status="normal" locked="false" working-revision="14490" last-changed-revision="10350" author="wolfgang_m" path="/db/svn/filter.xql"/> <entry status="normal" locked="false" working-revision="14490" last-changed-revision="13019" author="joewiz" path="/db/svn/functions.xql"/> <entry status="normal" locked="false" working-revision="14490" last-changed-revision="13019" author="joewiz" path="/db/svn"/> </status>

References
[1] http:/ / demo. exist-db. org/ exist/ xquery/ functions. xql

Sudoku

282

Sudoku
Sudoku solver in XQuery

A Puzzle
A sudoku puzzle can be expressed in matrix form. Here is part of one from a Times book of sudokus. <?xml version="1.0" encoding="UTF-8"?> <sudoku name="Times 1 p1"> <matrix> <row> <col/> <col>6</col> <col>1</col> <col/> <col>3</col> <col/> <col/> <col>2</col> <col/> </row> <row> <col/> <col>5</col> <col/> <col/> <col/> <col>8</col> <col>1</col> <col/> <col>7</col> </row> <row> <col/>

The Main script


The main script is passed a URL referencing the problem XML file. The matrix format is converted to a sequence of cells, the puzzle solved, the resultant cell list converted back to a matrix and the matrix printed. The elapsed time of the solution search is computed and displayed after the initial problem and the solution. import module namespace su = 'http://www.cems.uwe.ac.uk/wiki/sudoku' at 'sudoku4.xqm'; declare option exist:serialize 'method=xhtml declare function local:duration-as-ms($t) { round((minutes-from-duration($t) * 60 + media-type=text/html';

Sudoku seconds-from-duration($t)) * 1000 ) }; let let let let let let let let let $url := request:get-parameter('url',()) $sudoku :=doc($url)/sudoku $p := $sudoku/matrix $pc := su:matrix-to-cells($p) $start := util:system-time() $ps := su:solve($pc) $finish := util:system-time() $elapsedms := local:duration-as-ms($finish - $start) $s := su:cells-to-matrix($ps)

283

return <div> <h1>Solving Sudoku problem {string($sudoku/@name)}</h1> <table border = '1'> <tr> <td>{su:matrix-to-table($p)}</td> <td>{su:matrix-to-table($s)}</td> </tr> </table> <p>Elapsed time in milliseconds : {$elapsedms}</p> </div>

Functions
This module defines the necessary functions to support a brute force, depth-first search of the solution tree. Two representations of a sudoku puzzle are used here: nested columns within rows - element(matrix) - the input format list of cells with explicit row and column numbers - element(cells) The algorithm starts with the cell list representation. The number of possible solutions to every empty square is calculated. If there there is a cell with only one value, that cell is added to the list of cells and the algorithm continues. If there is more than one possible value for a cell, the algorithm iterates over the possible values, positing that each in turn is the correct value. If there is no possible value, that partial solution is infeasible and that solution path is abandoned, returning null and the next possible cell value will be tried. declare function su:matrix-to-table($s as element(matrix)) as element(table) { <table class="sudoku"> { for $r in $s/row return <tr> { for $c in $r/col return <td>{string($c)}</td> } </tr>

Sudoku } </table> }; declare function su:matrix-to-cells($s as element(matrix)) as element(cell)* { for $i in (1 to 9) for $j in (1 to 9) let $c := $s/row[$i]/col[$j] return if ($c/text()) then <cell row='{$i}' col='{$j}'>{string($c)}</cell> else () }; declare function su:cells-to-matrix($s as element(cell)*) as element(matrix) { <matrix> { for $i in (1 to 9) return <row> { for $j in (1 to 9) let $c := $s[@row = $i][@col = $j] return <col>{string($c)}</col> } </row> } </matrix> }; declare function su:block($s as element(cell)*, $i as xs:integer, $j as xs:integer ) as element(cell)+ { (: return the block of 9 cells containing $i, $j :) let $tci := (($i - 1) idiv 3 * 3 ) + 1 let $tcj := (($j - 1) idiv 3 * 3 ) + 1 return $s[@row = ($tci to $tci + 2)][@col = ($tcj to $tcj + 2)] }; declare function su:row($s as element(cell)*,$i as xs:integer) as element(cell)+ { (: return the cells in row $i :) $s[@row = $i] }; declare function su:col($s as element(cell)* ,$j as xs:integer) as element(cell)+{

284

Sudoku (: return the cells in column $j :) $s[@col = $j] }; declare function su:values($s as element(cell)*, $i as xs:integer, $j as xs:integer) as xs:integer* { (: return the set (sequence) of values in a cell's row, column and block :) distinct-values( (su:row($s,$i) ,su:col($s,$j) , su:block($s,$i,$j) )) }; declare function su:missing-values($s as element(cell)*,$i as xs:integer,$j as xs:integer) as xs:integer* { (: return the numbers missing from 1 to 9 i.e. the possible values for cell $i , $j :) let $vals := su:values($s,$i,$j) return (1 to 9) [not(. = $vals)] }; declare function su:missing-cells($s as element(cell)*) as element(cells)* { for $i in (1 to 9) for $j in (1 to 9) where empty($s[@row = $i][@col = $j]) return let $m := su:missing-values($s,$i,$j) return <cell row='{$i}' col='{$j}' n='{count($m)}'>{$m}</cell> }; declare function su:best-cell($s as element(cell)*) as element(cell)* { (: return (one of ) the cells with the minimum number of possible values :) let $empty := su:missing-cells($s) let $min := min( $empty/@n) return ($empty[@n = $min])[1] }; declare function su:search-for-solution($s as element(cell)*, $cell as element(cell), $posvalues as xs:string*) { (: recursive search of a set of possible values for a cell :) if (empty($posvalues)) then () else let $pos:= $posvalues[1] (: choose the first :)

285

Sudoku let $posit := <cell row='{$cell/@row}' col='{$cell/@col}'>{$pos}</cell> let $sol := su:solve(($s,$posit)) (: try with this posited value for the cell :) return if ($sol ) (: a solution :) then $sol else (: continue with the rest of the possible values :) su:search-for-solution($s, $cell, subsequence($posvalues,2)) }; declare function su:solve($s as element(cell)*) as element(cell)* { (: solve a sudoku problem - $s is a sequence of cells with values :) let $cell:= su:best-cell($s) return if (empty($cell) ) then $s (: solved :) else if ( $cell/@n=0) (: infeasible :) then () else if ($cell/@n = 1) (: forced move :) then su:solve(($s,$cell)) else (: multiple possible, so do depth-first search :) su:search-for-solution($s, $cell, tokenize($cell, ' ' )) };

286

Execution
With a few problems from the Times book of Sudoku problems: solve Puzzle 1 [1] solve Puzzle 2 [2] solve Puzzle 100 [3] - the last

Discussion
This code requires eXist 1.3 or above to run.

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ su6. xql?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ tp1. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ su6. xql?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ tp2. xml [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ su6. xql?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ tp100. xml

Synchronizing Remote Collections

287

Synchronizing Remote Collections


Motivation
You want to update items on collections that are new or newer than another collection.

Method
Many database store creation dates and last-modified dates along with resources. These dates can be used to see if a local collection is out of sync with a remote collection. An XQuery script can be written that will only list the new files or files that are newer then the creation date on your local collection. For the eXist database, here are the two functions that are used to access the timestamps. xmldb:last-modified($collection, $resource) xmldb:created($collection, $resource) Where: $collection is the path to the collection (xs:string) $resource is the name of the resource (xs:string) For example: let $my-file-last-modified := xmldb:last-modified('/db/test', 'myfile.xml') Will return the date and time that the file myfile.xml in the collection /db/text was last modified. The format of the timestamp is the XML Schema dateTime format [1]: "2009-06-04T07:50:04.828-05:00" For example, this indicates the time is 7:50 am on June the 4th, 2009 for Central Standard Time which is 5 hours behind Coordinated Universal Time (UTC).

Sample Recursive Collection Last Modified Function


You can combine the xmldb:last-modified() function with another function xmldb:get-child-collections($collection) that returns all of the child collections of the current collection. By calling itself using tail recursion you can find all the last modified dates within a collection and all its subcollections. Here is a sample XQuery function that returns a list of all the last-modified date-times of the resources in a collection and all of the subcollections under it. declare function local:collection-last-modified($collection as xs:string) as node()* { <collection> {attribute {'cid'} {$collection} } {for $resource in xmldb:get-child-resources($collection) return <resource> {attribute {'id'} {$resource}} {attribute {'last-modified'} {xmldb:last-modified($collection, $resource)}} </resource>,

Synchronizing Remote Collections if (exists(xmldb:get-child-collections($collection))) then ( for $child in xmldb:get-child-collections($collection) order by $child return (: note the recursion here :) local:collection-last-modified(concat($collection, '/', $child)) ) else () } </collection> }; Note that two attributes are added to each resource. One is the resource id which must be unique in each collection an the other is the date the resource was last modified.

288

Sample Driver
You can call this function by simply passing the collection root you wish to start at. xquery version "1.0"; let $collection := '/db/test' return <last-modified-report> {local:collection-last-modified($collection)} </last-modified-report> This returns the following file:
<last-modified-report> <collection cid="/db/test"> <resource id="get-remote-collection.xq" last-modified="2009-04-29T08:16:06.104-05:00"/> <collection cid="/db/test/views"> <resource id="get-site-mod-dates.xq" last-modified="2009-04-30T09:01:58.599-05:00"/> <resource id="site-last-modified.xq" last-modified="2009-04-30T09:07:10.016-05:00"/> </collection> </collection> </last-modified-report>

Synchronizing Remote Collections

289

Driving Syncs with Apache Ant


You can now use these transforms to create batch files that will transfer only the files that have changed or are new. Many databases provide Apache Ant tasks that have functions that extract and store operations. Here is a sample Apache ant target that does an extract on a local file and stores it on a remote file. <target name="push-bananas"> <xdb:extract xmlns:xdb="http://exist-db.org/ant" uri="xmldb:exist://${local-host}/exist/xmlrpc/db/test/sync" resource="bananas.xml" destfile="C:/backup/db/test/sync/bananas.xml" user="${local-user}" password="${local-pass}" /> <xdb:store xmlns:xdb="http://exist-db.org/ant" uri="xmldb:exist://${remote-host}/exist/xmlrpc/db/test/sync" srcfile="/backup/db/test/sync/bananas.xml" createcollection="true" user="${remote-user}" password="${remote-pass}" /> </target> Note that the the following properties must be set in this Ant file. <property name="local-host" value="localhost"/> <property name="local-user" value="admin"/> <property name="local-pass" value="put-local-pw-here"/> <property name="remote-host" value="example.com"/> <property name="remote-user" value="admin"/> <property name="remote-pass" value="put-remote-pw-here"/>

References
[1] http:/ / www. w3. org/ TR/ 2001/ REC-xmlschema-2-20010502/ #dateTime

TEI Concordance

290

TEI Concordance
Motivation
You want to build a multi-lingual concordance from parallel texts already in the TEI format (see http:/ / www. tei-c. org/)

Architecture
There are three steps in this example: 1. preprocessing the texts to enable easier indexing, which is done in XSLT 2.0 2. querying the text to return a tei:entry (see http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DI.html ) which is done in XQuery 3. processing the tei:entry into HTML which is done in XSLT 1.0 in the browser In this particular example the languages in use are English and te reo Mori. It assumes that structural tags have 'n' attributes with urls pointing to the original source of the data.

Preprocessing the text


This stylesheet splits the text into words (tei:w see http:/ / www. tei-c. org/ release/ doc/ tei-p5-doc/ en/ html/ ref-w. html ) and records on each a 'lemma' or normalised form of the word and the language it's in. These are re-calculated to allow indexes to be built of them.
<?xml version="1.0"?> <xsl:stylesheet version="2.0" xmlns="http://www.tei-c.org/ns/1.0" xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output indent="yes"/>

<!-- This is a simple stylesheet that inserts word tags around words (and implicitly defines what those words are) -->

<xsl:variable name="lowernormal" select="'qwertyuiopasdfghjklzxcvbnmaeiouaeiou'"/> <xsl:variable name="upper" select="'QWERTYUIOPASDFGHJKLZXCVBNM'"/>

<xsl:variable name="drop" select="'{}()*'"/> <xsl:variable name="punctuation" select="'.:;,!?'"/> <xsl:variable name="regexp" select="('.:;,!?')*()"/>

<xsl:template match="@*|node()" priority="-1"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template>

<xsl:template match="text()[normalize-space()]"> <xsl:variable name='orig' select="."/>

TEI Concordance

291

<xsl:variable name='lang' select="$orig/ancestor::*[normalize-space(@xml:lang)][1]/@xml:lang"/>

<xsl:analyze-string select="." regex="[\p{{L}}\p{{N}}]+"> <xsl:matching-substring>

<xsl:variable name="normalised"> <xsl:call-template name="normal"> <xsl:with-param name="string" select="translate(.,$upper,$lowernormal)"/> </xsl:call-template> </xsl:variable>

<xsl:element name="w" namespace="http://www.tei-c.org/ns/1.0"> <xsl:attribute name="xml:lang"><xsl:value-of select="$lang"/></xsl:attribute> <xsl:attribute name="lemma"><xsl:value-of select="$normalised"/></xsl:attribute> <xsl:value-of select="."/> </xsl:element>

</xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template>

<xsl:template name="normal"> <xsl:param name="string"/>

<xsl:if test="string-length($string) &gt; 0"> <xsl:if test="not(compare(substring($string,1,1),substring($string,2,1))=0)"> <xsl:value-of select="substring($string,1,1)"/> </xsl:if> <xsl:call-template name="normal"> <xsl:with-param name="string" select="substring($string,2)"/> </xsl:call-template> </xsl:if> </xsl:template>

</xsl:stylesheet>

Querying the text


The query builds a single <tei:entry> tag containing multiple <tei:cit>, one for each hit. A processing instruction is used to associate the TEI with a stylesheet.
xquery version "1.0"; declare default element namespace "http://www.tei-c.org/ns/1.0"; declare option exist:serialize "method=xml media-type=application/xml process-xsl-pi=yes indent=yes";

TEI Concordance

292

let $target := 'xml-stylesheet', $content := 'href="teiresults2htmlresults.xsl" type="text/xsl" ' return <TEI> <teiHeader> <!-- substantial header information needs to go here to be well formed TEI --> </teiHeader> <text> <body> <div> { let $collection := '/db/kupu/korero', $q := request:get-parameter('kupu', 'mohio'), $lang := request:get-parameter('reo', 'mi'), $first := request:get-parameter('kotahi', 1) cast as xs:decimal, $last := 25 + $first return <entry xml:lang="{$lang}" n="{$last}"> <form> <orth>{$q}</orth> </form>{ for $word at $count in subsequence(collection($collection)//w[@lemma=$q][@xml:lang=$lang], $first, $last) let $this := $word/ancestor::*[@n][1] let $thisid := $this/@xml:id let $url := $this/@n let $lang := $word/@xml:lang let $that := if ( $this/@corresp ) then ( $this/../../*/*[concat('#',@xml:id)=$this/@corresp] ) else ( "no corresp" ) return <cit n="{$url}" corresp="#{$word/@xml:id}"> {$this} {$that} </cit> }</entry> } </div> processing-instruction {$target} {$content},

document {

TEI Concordance
</body> </text> </TEI> }

293

Transformation to HTML
The TEI is transformed into HTML in the browser following the processing instruction:
<?xml version="1.0"?>

<xsl:stylesheet version="1.0"

xmlns:html="http://www.w3.org/1999/xhtml"

xmlns:tei="http://www.tei-c.org/ns/1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output indent="yes"/>

<xsl:variable name="title"><xsl:value-of select="//tei:orth/text()"/></xsl:variable>

<xsl:variable name="lang"><xsl:value-of select="//tei:entry/@xml:lang"/></xsl:variable>

<xsl:template match="@*|node()" priority="-1">

<xsl:copy>

<xsl:apply-templates select="@*|node()"/>

</xsl:copy>

</xsl:template>

<xsl:template match="/">

<html:html xml:lang="{$lang}" >

<html:head>

<html:title><xsl:value-of select="$title"/></html:title>

<html:meta property="dc:title" xml:lang="{$lang}" content="{$title}"/>

</html:head>

<html:body xml:lang="{$lang}" >

<xsl:apply-templates select="/tei:TEI/tei:text/tei:body/tei:div/tei:entry"/>

</html:body>

</html:html>

</xsl:template>

<xsl:template match="@xml:id" />

<xsl:template match="tei:entry">

<html:h2 xml:lang="mi">He Kupu Tawhito</html:h2>

<html:h1 xml:lang="mi">Kupu matua: <html:span class="hit-word" style="font-style: italic" xml:lang="{$lang}"><xsl:value-of select="$title"/></html:span></html:h1>

<html:div>

<xsl:apply-templates select="tei:cit"/>

</html:div>

<xsl:variable name="url"><xsl:value-of select="concat('kupu.xql?reo=', @xml:lang, '&amp;kupu=', tei:form/tei:orth/text(), '&amp;kotahi=', @n)"/></xsl:variable>

<html:div> <html:p> <html:a href="{$url}" style="font-style: italic">Panuku</html:a> </html:p> </html:div>

TEI Concordance
</xsl:template>

294

<xsl:template match="tei:cit" >

<html:div>

<xsl:apply-templates select="node()"/>

</html:div>

<html:hr/>

</xsl:template>

<xsl:template match="tei:p">

<html:div>

<xsl:apply-templates select="node()"/>

<html:a href="{@n}" alt="ko te tohutoro"

style="font-style: italic"></html:a>

</html:div>

</xsl:template>

<xsl:template match="tei:w">

<xsl:variable name="url"><xsl:value-of select="concat('kupu.xql?reo=', @xml:lang, '&amp;kupu=', @lemma)"/></xsl:variable>

<xsl:choose>

<xsl:when test="concat('#',@xml:id)=../../@corresp">

<html:span class="hit-word" style="font-style: italic"><html:a href="{$url}" alt="">

<xsl:apply-templates select="node()"/>

</html:a></html:span>

</xsl:when>

<xsl:otherwise>

<html:a href="{$url}">

<xsl:apply-templates select="node()"/>

</html:a>

</xsl:otherwise>

</xsl:choose>

</xsl:template>

</xsl:stylesheet>

TEI Document Timeline

295

TEI Document Timeline


Motivation
You want to create a timeline of the dates with a single TEI document.

Approach
TEI [1] documents may include date elements in any of the sections of the document - in the meta-data, in the document publication details, in front and back matter as well as in the body of the text. Let's assume that we want a time line showing dates in the text body. We will use the Simile Timeline [2] Javascript API to create a browsable timeline in an HTML page.

Extracting timeline dates


TEI documents store dates in the date element in the following format: <date when="1861-03-16">March 16</date> or <date when="1861">1861</date> We will write an XQuery script that will extract all of the date elements in the body of a TEI document and generate a Simile Timeline.

Getting the dates


Dates are used throughout the sections of a TEI document, but we are most likely to be interested in dates in the body of the text. let $dates := doc($tei-document)//tei:body//tei:date For example: <date <date <date <date <date <date when="1642-01">January 1642</date> when="1616">1616</date> when="1642">1642</date> when="1642-08-13">13 August 1642</date> when="1643-05">May</date> when="1643-07">July 1643</date>

Transforming to Simile events


We can then transform this sequence of date elements into the format that is needed by Simile. [3] <data>{ for $date in $dates return <event start='{$date/@when}' > {$date/text()} </event> }</data>

TEI Document Timeline Note that there are two path expressions in the above query. The first expression $date/@when extracts the when attribute of the date element. The second path expression $date/text() extracts the body text of the date element, i.e. the text between the begin and end date tags: <date when="1642-08-13">13 August 1642</date>

296

Sample XQuery to Extract Dates from TEI File


xquery version "1.0"; declare namespace tei = "http://www.tei-c.org/ns/1.0"; (: get the file name from the URL parameter :) let $file := request:get-parameter('file', '') (: this is where we will get our TEI documents :) let $data-collection := '/db/Wiki/TEI/docs' (: open the document :) let $tei-document := concat($data-collection, '/', $file) (: get all dates in the body of the document :) let $dates := doc($tei-document)//tei:body//tei:date return <data>{ for $date in $dates return <event start='{$date/@when}'> {$date/text()} </event> }</data> For example, here are the dates in the TEI document "The Discovery of New Zealand" by J. C. Beaglehole, produced by the New Zealand Electronic Text Centre [4] Execute [5]

Discussion
TEI dates are generally XML dates which are recognised by the Simile timeline API. However TEI supports the encoding of relative dates such as <date when="--01-01">New Years Day</date> so dates really need filtering using a suitable RegExp. One option is to check the date format with the "castable" XQuery function.

TEI Document Timeline

297

Creating the Timeline bubbles


Providing Context
We can enhance the timeline by providing some context for the date in the timeline bubble. One approach is to include some of the preceding and following text. Each date node is part of a parent node, e.g. <date when="1777-02-12">12 February 1777</date> is a child node in
<p>Cook left Queen Charlotte's Sound for the fourth time on <date when="1774-11-10">10 November</date>. He returned for a fifth visit on <date when="1777-02-12">12 February 1777</date> and remained a fortnight; but this last voyage contributed nothing to the discovery of New Zealand. The discoverer was bound for the northern hemisphere, and for his death.</p>

We need to access the mixture of elements and text nodes on either side of the target date. For example, preceding this node are a text node ("Cook left.."), a date node and another text node ("He returned .."). Following the target date is the text node ("and remained ..."). We can select these nodes using the preceding-sibling and following-sibling axes: let $nodesbefore := $date/preceding-sibling::node() let $nodesafter := $date/following-sibling::node()

A crude approach to construct a context string is to join the node strings and extract a suitable substring. The text after: let $after := string-join($nodesafter, ' ') let $afterString := substring($after,1,100)

and the text before: let $before := string-join($nodesbefore,' ') let $beforeString := substring($before,string-length($before)101,100)

We can then create an XML fragment with the target date in bold: let $context := <div> {concat('...', $beforeString,' ')} <b>{$date/text()}</b> {concat($afterString,' ...')} </div> Finally the element needs to be serialized and added to the event:

TEI Document Timeline return <event start='{$when}' title='{$when}' > {util:serialize($context,("method=xhtml","media-type=text/html"))} </event> Execute [6]

298

Improved Context
The context is extracted from the parent node without regard to word or sentence boundaries. Splitting on word boundaries would be better. let $nodesafter := $date/following-sibling::node() (: join the nodes, then split on space :) let $after := tokenize(string-join($nodesafter, ' '),' ') (: get the first $scope words :) let $afterwords := subsequence($after,1,$scope) (: join the subsequence of words, and suffix with ellipsis if the paragraph text has been truncated :) let $afterString := concat (' ',string-join($afterwords,' '),if (count($after) > $scope) then '... ' else '') Similarly, the text before the target date: let $nodesbefore := $date/preceding-sibling::node() let $before := tokenize(string-join($nodesbefore,' '),' ') let $beforewords := subsequence($before,count($before) - $scope + 1,$scope) let $beforeString := concat (if (count($before) > $scope) then '... ' else '',string-join($beforewords,' '),' ') Splitting on sentence boundaries would be even better. We can use the pattern '\. ' as the marker. This may not be entirely accurate but false positives will merely shorten the context. The ellipsis is not now needed. $scope now is the number of sentences on either side. let $nodesafter := $date/following-sibling::node() (: join the nodes, then split on the pattern fullstop space :) let $after := tokenize(string-join($nodesafter, ' '),'\. ') (: get the first $scope sentences :) let $afterSentences := subsequence($after,1,$scope) (: join the subsequence of sentences :) let $afterString := concat (' ',string-join($afterSentences,'. '))

Similarly for the beforeString. let $nodesbefore := $date/preceding-sibling::node() let $before := tokenize(string-join($nodesbefore,' '),'\. ')

TEI Document Timeline let $beforeSentences := subsequence($before,count($before) - $scope + 1,$scope) let $beforeString := concat (string-join($beforeSentences,'. '),'. ')

299

Execute [7]

Discussion
In addition, each event could link into the full text of the document. (to do)

Generating an HTML page


Since the event stream is parameterised by the source document, the HTML page containing the timeline also needs to be parameterised, so we will generate it using another XQuery script.

Simile API
The definition of the timeline layout uses the SIMILE timeline Javascript API. To define the basic bands: function onLoad(file,start) { var theme = Timeline.ClassicTheme.create(); theme.event.label.width = 400; // px theme.event.bubble.width = 300; theme.event.bubble.height = 300; var eventSource = new Timeline.DefaultEventSource(); var bandInfo = [ Timeline.createBandInfo({ eventSource: eventSource, theme: theme, trackGap: 0.2, trackHeight: 1, date: start, width: "90%", intervalUnit: Timeline.DateTime.YEAR, intervalPixels: 45 }), Timeline.createBandInfo({ date: start, width: "10%", intervalUnit: Timeline.DateTime.DECADE, intervalPixels: 50 }) ]; bandInfo[1].syncWith = 0; bandInfo[1].highlight = true;

TEI Document Timeline

300

Timeline.create(document.getElementById("my-timeline"), bandInfo); Timeline.loadXML("dates.xq?file="+file, function(xml, url) { eventSource.loadXML(xml, url); }); } Note that the bands are set for YEAR and DECADE which are appropriate for historical texts. The function has two parameters: the source file and the start year. The events are generated by a call to the transformation script in the previous section.
Timeline.loadXML("dates.xq?file="+file, function(xml, url) { eventSource.loadXML(xml, url); });

Setting the Start date


The start date is the earliest date in the sequence of dates. We can find this by ordering the dates using the order by clause and then selecting the first item in the sequence. let $orderedDates := for $date in $doc//tei:body//tei:date/@when order by $date return $date let $start := $orderedDates[1]

We can retrieve the Document title and author

Full script
xquery version "1.0";

declare namespace tei = "http://www.tei-c.org/ns/1.0"; declare option exist:serialize "method=xhtml media-type=text/html";

let $file:= request:get-parameter('file','') let $data-collection := '/db/Wiki/TEI/docs' let $tei-document := concat($data-collection, '/', $file) let $doc := doc($tei-document) (: get the title and author from the titleStmt element :) let $header := $doc//tei:titleStmt (: there may be several titles, differentiated by the type property just take the first :) let $doc-title := string(($header/tei:title)[1])

let $doc-author := string(($header/tei:author/tei:name)[1])

(: get the start date :) let $orderedDates := for $date in $doc//tei:body//tei:date/@when order by $date return $date let $start := $orderedDates[1]

TEI Document Timeline

301

return <html> <head> <title>TimeLine: {$doc-title}</title> <script src="http://simile.mit.edu/timeline/api/timeline-api.js" type="text/javascript"></script> <script <![CDATA[ function onLoad(file,start) { var theme = Timeline.ClassicTheme.create(); theme.event.label.width = 400; // px theme.event.bubble.width = 300; theme.event.bubble.height = 300; type="text/javascript">

var eventSource = new Timeline.DefaultEventSource();

var bandInfo = [ Timeline.createBandInfo({ eventSource: theme: trackGap: trackHeight: date: width: intervalUnit: eventSource, theme, 0.2, 1, start, "90%", Timeline.DateTime.YEAR,

intervalPixels: 45 }), Timeline.createBandInfo({ date: width: intervalUnit: start, "10%", Timeline.DateTime.DECADE,

intervalPixels: 50 })

]; bandInfo[1].syncWith = 0; bandInfo[1].highlight = true;

Timeline.create(document.getElementById("my-timeline"), bandInfo); Timeline.loadXML("dates.xq?file="+file, function(xml, url) { eventSource.loadXML(xml, url); });

} ]]> </script> </head> <body onload="onLoad('{$file}','{$start}');">

TEI Document Timeline


<h1>Timeline of <em>{$title}</em> by {$author}</h1> <div id="my-timeline" style="height: 700px; border: 1px solid #aaa"></div> </body> </html>

302

Examples
Beaglehole Timeline [8] Buck [9] Dates in this encoding are confined to the Bibliography and are publication rather than subject events.

Discussion
Simile Timeline has a problem displaying many events on closely related dates, so not all events may appear on the timeline.

References
[1] http:/ / www. tei-c. org/ [2] [3] [4] [5] [6] [7] [8] [9] http:/ / www. simile-widgets. org/ timeline/ http:/ / simile. mit. edu/ wiki/ How_to_Create_Event_Source_Files http:/ / www. nzetc. org/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ TEI/ dates-ex2. xq?file=BeaDisc. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ TEI/ dates. xq?file=BeaDisc. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ TEI/ dates3. xq?file=BeaDisc. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ TEI/ timeline2. xq?file=BeaDisc. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ TEI/ timeline2. xq?file=BucExpl. xml

The Emp-Dept case study


The Employee-Department case study used in the comparison of ../XQuery from SQL/ is used here to explore the XQuery/SPARQL pairing.

Conversion to RDF
Taking as the starting point the XML documents defining the three tables: Emp [1] Dept [2] Salgrade [3] These documents are converted to RDF using an XQuery script guided by a mapping file. The generated RDF is cached. and accessed by an XQuery script to de-reference the resource URIs. Individual resource URIs are re-written in Apache to calls on an XQuery script which retrieves the fragment of RDF from the cached file. Thus; an employee [4] a department [5] The full RDF [6] (need to change the rewrite rule to fix this strange uri) This should be replaced by a query on the SPARQL endpoint.

The Emp-Dept case study

303

RDF browsing
This RDF can be browsed with and RDF browser such as Disco Explorer [9] or Tabulator as an add-in to Opera [10] or Firefox [11]
[7]

, OpenLink RDF browser

[8]

OpenLink Data

Querying with SPARQL


The RDF can be queried with SPARQL. The same queries uses in the SQL /XQuery example are expressed in SPARQL in this tutorial

References
[1] [2] [3] [4] [5] [6] [7] [8] [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ emp. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ dept. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ salgrade. xml http:/ / www. cems. uwe. ac. uk/ empdept/ emp/ 7369 http:/ / www. cems. uwe. ac. uk/ empdept/ dept/ 30 http:/ / www. cems. uwe. ac. uk/ empdept/ all/ all http:/ / www4. wiwiss. fu-berlin. de/ rdf_browser/ http:/ / demo. openlinksw. com/ DAV/ JS/ rdfbrowser/ index. html http:/ / demo. openlinksw. com/ rdfbrowser2/

[10] http:/ / widgets. opera. com/ widget/ 5053/ [11] http:/ / dig. csail. mit. edu/ 2007/ tab/

Time Based Queries


Motivation
You want to find items in a collection based on date-time information.

Method
By default, XML files use standard ISO dateTime structures to store temporal information. There are two main XML data types related to storing time: xs:date - for storing just the date in YYYY-MM-DD format. xs:dateTime - for storing both the date and time. There are many other structures for storing just year, month, day and time etc. but this example will only cover dates and dateTimes.

Sample Event Structure


In the following example we will store "event" data using a simple date structures. Events will have just a single start date or a date range with a start date and end date: <event> <id>6</id> <name>Architecture Tradeoff Analysis</name> <start-date>2011-04-07</start-date> <end-date>2011-04-21</end-date> </event> You can find all events that are occurring during any specific point in time with the following xquery structure.

Time Based Queries Source from events-at-time.xq {: get a URL parameter to this XQuery :) let $date := xs:date(request:get-parameter('date', '')) (: create a sequence of all events :) let $events := collection('/db/apps/timelines/data')//event return {: return all events that start before the date AND end after the date :) for $event in $events[ xs:date(./start-date/text()) lt $date and xs:date(./end-date/text()) gt $date ] return $event

304

You can also set up very fast searches based on the date-time structures even for large collections of 100,000 items after you learn how to configure range indexes on XML xs:dateTime structures. See: http:/ / www. w3. org/ TR/ xmlschema-2/#dateTime how date-time structures work.

References
The following page has instructions on indexing: http://demo.exist-db.org/exist/indexing.xml And make sure to read section 2.2 on range indexes. I would also use the xs:date and xs:dateTime structures in your range indexes. Your collection configuration file (see http:/ / demo. exist-db. org/ exist/ indexing. xml#idxconf) might have the following lines if you are tracking document creation and modified dateTimes: <create qname="start-date" type="xs:date"/> <create qname="end-date" type="xs:date"/> <create qname="created-dateTime" type="xs:dateTime"/> <create qname="last-modified-dateTime" type="xs:dateTime"/> Note that all eXist collections and resource also have both these dates in their metadata. You can use the xmdb module to get these timestamps. http:/ / demo. exist-db. org/ exist/ functions/ xmldb/ created http:/ / demo. exist-db. org/ exist/ functions/ xmldb/ last-modified

Time Based Queries

305

Other Resources
Timeline mashups in this Wikibook Gantt Chart using JQuery [1] Timeline SVG reports Gantt Charts from XML data using Anychart [2]

References
[1] http:/ / plugins. jquery. com/ project/ ganttView [2] http:/ / anychart. com/ products/ anygantt/ gallery/

Time Comparison with XQuery


Motivation
You have two identical lists of items with timestamps. You want to compare the items to see what items are newer.

Method
We will write a function that compares the timestamps of the items in two lists.

Sample Data Sets


let $list1 := <list> <item dateTime="2009-06-01T11:59:00.000-05:00">apples</item> <item dateTime="2009-02-01T11:59:00.000-05:00">bananas</item> <item dateTime="2009-02-01T11:59:00.000-05:00">carrots</item> <item dateTime="2009-02-01T11:59:00.000-05:00">eggplant</item> <item dateTime="2009-02-01T11:59:00.000-05:00">grapes</item> <item dateTime="2009-02-01T11:59:00.000-05:00">oranges</item> </list> let $list2 := <list> <item dateTime="2009-01-01T11:59:00.000-05:00">apples</item> <item dateTime="2009-02-01T11:59:00.000-05:00">bananas</item> <item dateTime="2009-03-01T11:59:00.000-05:00">carrots</item> <item dateTime="2009-02-01T11:58:00.000-05:00">eggplant</item> <item dateTime="2009-02-01T12:00:00.000-05:00">grapes</item> <item dateTime="2009-04-01T11:59:00.000-05:00">oranges</item> </list>

Time Comparison with XQuery

306

Sample XQuery Function


declare function local:older($list1 as node()*, $list2 as node()*) as node()* { for $item1 in $list1/item let $item2 := $list2/item[./text() = $item1/text()] return <div> {attribute {'class'} {if ( xs:dateTime($item1/@dateTime) lt xs:dateTime($item2/@dateTime) ) then 'older' else 'newer' } } {$item1/text()} </div> };

Comparison Screen Image

Sample Test Driver


<html> <head> <style language="text/css"> <![CDATA[ body {font-family: Ariel,Helvetica,sans-serif; font-size: medium;} h2 {padding: 3px; margin: 0px; text-align: center; font-size: large; background-color: silver;} .left, .right {border: solid black 1px; padding: 5px;} .older {background-color: pink;} .left {float: left; width: 390px} .right {margin-left: 410px; width: 390px} ]]> </style> </head> <body> <h1>Older Items on Second List Report</h1>

Time Comparison with XQuery <div class="left"> <h2>List 1</h2> {for $item in $list1/item return <div>{$item/text()} dateTime={string($item/@dateTime)}</div>} </div> <div class="right"> <h2>List 2</h2> {for $item in $list2/item return <div>{$item/text()} dateTime={string($item/@dateTime)}</div>} </div> <br/> <p>The pink items are older items.</p> <div class="left"> <h2>Items on 2 Older Then 1</h2> {local:older($list1, $list2)} </div> <div class="right"> <h2>Items on 1 Older Then 2</h2> {local:older($list2, $list1)} </div> </body> </html> Execute [1]

307

Collating
Alternatively, two ordered lists can be collated to derive a set of updates. Here the items are wrapped in a div to carry the added information about the merge. Items in list1 but not list2 are flagged as new, items in list 2 but not list 1 as to be deleted and items which are newer in list 1 than list 2 as newer.
declare function local:merge($a, $b if (empty($a) and empty($b)) then () else then else then else if (empty ($b) or $a[1] lt $b[1]) (<div class="add">{$a[1]}</div>, local:merge(subsequence($a, 2), $b)) if (empty($a) or $a[1] gt $b[1]) (<div class="delete">{$b[1]}</div>,local:merge($a, subsequence($b,2))) (<div class="{ if (xs:dateTime($a[1]/@dateTime) gt xs:dateTime($b[1]/@dateTime)) then "newer" else "older"}"> {$a[1]} </div>, local:merge(subsequence($a,2), subsequence($b,2)) ) }; as node()*) as node()* {

The sample data and main script are changed slightly:

Time Comparison with XQuery declare option exist:serialize "method=xhtml media-type=text/html"; let $list1 := <list> <item dateTime="2009-06-01T11:59:00.000-05:00">apples</item> <item dateTime="2009-02-01T11:59:00.000-05:00">bananas</item> <item dateTime="2009-02-01T11:59:00.000-05:00">carrots</item> <item dateTime="2009-02-01T11:59:00.000-05:00">cabbage</item> <item dateTime="2009-02-01T11:59:00.000-05:00">eggplant</item> <item dateTime="2009-02-01T11:59:00.000-05:00">grapes</item> </list> let $list2 := <list> <item dateTime="2009-01-01T11:59:00.000-05:00">apples</item> <item dateTime="2009-02-01T11:59:00.000-05:00">bananas</item> <item dateTime="2009-03-01T11:59:00.000-05:00">carrots</item> <item dateTime="2009-02-01T11:58:00.000-05:00">eggplant</item> <item dateTime="2009-02-01T12:00:00.000-05:00">grapes</item> <item dateTime="2009-04-01T11:59:00.000-05:00">oranges</item> </list> return <html> <head> <style language="text/css"> <![CDATA[ body {font-family: Ariel,Helvetica,sans-serif; font-size: medium;} h2 {padding: 3px; margin: 0px; text-align: center; font-size: large; background-color: silver;} .left, .right {border: solid black 1px; padding: 5px;} .newer{background-color: lightgreen;} .older{background-color: lightred;} .delete{background-color: red;} .add{background-color: green;} .left {float: left; width: 390px} .right {margin-left: 410px; width: 390px} ]]> </style> </head> <body> <h1>Update Report</h1> <div class="left"> <h2>List 1</h2>

308

Time Comparison with XQuery {for $item in $list1/item return <div>{$item/text()} dateTime={string($item/@dateTime)}</div>} </div> <div class="right"> <h2>List 2</h2> {for $item in $list2/item return <div>{$item/text()} dateTime={string($item/@dateTime)}</div>} </div> <br/> <p>Green are new, light green are newer and red to be removed</p> <div class="left"> <h2>Merged Lists</h2> {local:merge($list1/item, $list2/item)} </div> </body> </html> Execute [2]

309

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ compareTimes. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ compareTimes2. xq

Timelines of Resource
Motivation
You want to create a timeline using the creation and modification dates of a collection.

Method
Many file systems and XML databases such as the eXist database automatically keep two dates associated with each resource. One is the creation date and the other is the date the resource was last modified. These dates are required for many systems that preform incremental backups of resources. We can use these dates to automatically create a timeline report of one or more collections. These reports can server as audit trails and can help you find out who changed what and when. Here are two of the functions that we will use in these examples:
xmldb:last-modified($collection as item(), $resource as xs:string) xs:dateTime? xmldb:created($collection as xs:string, $resource as xs:string) xs:dateTime

Note that you can also use the created function with a single parameter to see when a collection was created. Our query will take a single parameter, the database path expression to the collection we wish to create a timeline for. Our timeline will then display the creation and modification dates for this collection. Here is a sample query fragment that lists all child of the resources in a collection and formats the data according to the timeline event XML structure.

Timelines of Resource
let $collection := '/db/test'

310

return <data date-time-format="iso8601">{ for $child in xmldb:get-child-resources($collection) return ( <event start="{xmldb:created($collection, $child)}" isDuration="false">{$child} created</event>, <event start="{xmldb:last-modified($collection, $child)}" isDuration="false">{$child} last-modified</event> ) }</data>

This will return a file of in the following format:


<data date-time-format="iso8601"> <event start="2009-02-17T12:50:55.992-06:00" isDuration="false">foo.xq created</event> <event start="2009-02-18T15:12:47.529-06:00" isDuration="false">foo.xq last-modified</event> <event start="2008-11-25T13:53:23.877-06:00" isDuration="false">bar.xq created</event> <event start="2008-11-25T14:22:27.798-06:00" isDuration="false">bar.xq last-modified</event> <event start="2008-11-25T15:39:40.445-06:00" isDuration="false">foo.xhtml created</event> <event start="2008-11-25T15:41:51.547-06:00" isDuration="false">bar.xhtml last-modified</event> <event start="2009-02-06T14:24:34.74-06:00" isDuration="false">hello-world.xml created</event> <event start="2009-02-06T15:13:24.251-06:00" isDuration="false">hello-world.xml last-modified</event> <event start="2008-11-25T14:07:00.273-06:00" isDuration="false">test.xml created</event> <event start="2008-11-25T14:07:00.273-06:00" isDuration="false">test.xml last-modified</event> </data>

Timing Fibonacci algorithms

311

Timing Fibonacci algorithms


Fibonacci algorithms
Fibonacci is the computer science 'hello world'. A recursive function which follows the definition of the Fibonacci series looks pretty much the same in every language. Here is the XQuery version: declare function myfn:fib-recur($n as xs:integer) as xs:integer? { if ($n <0) then () else if ($n = 0) then 0 else if ($n=1) then 1 else myfn:fib-recur($n - 1) + myfn:fib-recur($n - 2) }; The empty sequence is used here as a return when the function is undefined so the return type is an optional integer. Top-down, goal-oriented, close to the mathematical definition and but requiring repeated re-evaluations of intermediate values. So how about bottom-up, starting with the knowns and working up to the goal? declare function myfn:fib-itr-x($n as xs:integer, $m1 as xs:integer, $m2 as xs:integer) as xs:integer { if ($n = 1) then $m2 else myfn:fib-itr-x($n - 1, $m2, $m1 + $m2) };

with a 'front-door' function to get started: declare function myfn:fib-itr($n as xs:integer) as xs:integer? { if ($n < 0) then () else if ($n = 0) then 0 else myfn:fib-itr-x($n, 0, 1) }; Iterative solutions in which variables are updated look rather messy by comparison with this tail-recursive formulation, a style essential to many algorithms in XQuery.

Timing
Just how much worse is the recursive formulation? We need to time the calls, and now we really could do with those higher order functions so we can pass either fib function to a timer function to execute. Step in eXists function modules. These raise XQuery from an XML query language to a viable web application platform. The util module provides two functions:
* util:function(qname,arity) to create a function template which can be passed to * util:call (function, params)to evaluate the function

so we can create the recursive function template with: let $call-fib-recur := util:function(QName("http:example.com/myfn","myfn:fib-recur"),1)

Timing Fibonacci algorithms The timer function takes a function, a sequence of the parameters to be passed to the function and a repetition number. The timing is based on system time and the time difference converted to seconds and then to milliseconds: declare function myfn:time-call($function as function, $params as item()* ,$reps as xs:integer ) as xs:decimal { let $start := util:system-time() let $result := for $i in 1 to $reps return util:call($function, $params) let $end := util:system-time() let $runtimems := (($end - $start) div xs:dayTimeDuration('PT1S')) * 1000 return $runtimems div $reps };

312

and call it as myfn:time-call($call-fib-recur,10,5)

An intermediate data structure


Calling the timer with n ranging from 1 to max will generate the required data. Rather than simply outputing this data, we need an intermediate XML data structure containing the results. This can then be transformed into different representation or analysed later, to fit a curve to the data perhaps or export to a spreadsheet. let $runs := <dataset> { for $n in -1 to $max return <tuple> <n>{$n} <fib>{ myfn:fib-itr($n) } </fib> <recursive>{ myfn:time-call($call-fib-recur,$n,$reps) }</recursive> <iterative>{ myfn:time-call($call-fib-itr,$n,$reps) } </iterative> </tuple> } </dataset>

Results as a table
This data structure can be transformed to a table, iterating over the tuples. declare function myfn:dataset-as-table($dataset ) as element(table) { <table> <tr> {for $data in $dataset/*[1]/* return <th>{name($data)}</th> }

Timing Fibonacci algorithms </tr> {for $tuple in $dataset/* return <tr> {for $data in $tuple/* return <td>{string($data)}</td> } </tr> } </table> }; Here the XPath name() function is used to convert from the tag names to strings. This reflection allows very generic functions to be written and is a key technique for making the transition from problem-specific structures to generic functions. Note that the dataset has not been typed. This is because the function is written with minimal requirements of the structure which would require a permissive schema language to express.

313

Results as a graph
For graphing, this basic matrix could be imported directly into Excel, or, thanks to the wonderful GoogleCharts, to a simple line graph. Selected columns of the dataset are extracted and joined with commas, then all datasets joined with pipes. declare function myfn:dataset-as-chart($dataset, $vars as xs:string+) as element(img) { let $series := for $var in $vars return string-join( $dataset/*/*[name(.) = $var],",") let $points := string-join($series ,"|" ) let $chartType := "lc" let $chartSize := "300x200" let $uri := concat("http://chart.apis.google.com/chart?", "cht=",$chartType,"&amp;chs=",$chartSize,"&amp;chd=t:",$points) return <img src="{$uri}"/> };

The final script


Finally, adding some intro, some page layout and CSS , the final script looks like:
declare namespace myfn = "http://www.cems.uwe.ac.uk/xmlwiki/myfn"; declare function myfn:time-call($function as function, $params as xs:integer ) as xs:decimal { let $start := util:system-time() let $result := for $i in 1 to $reps return util:call($function ,$params) ,$reps

Timing Fibonacci algorithms


let $end := util:system-time() let $runtimems := (($end - $start) div xs:dayTimeDuration('PT1S')) * 1000 return $runtimems }; declare function myfn:fib-recur($n as xs:integer) as xs:integer? { if ($n <0) then () else if ($n = 0) else if ($n=1) }; declare function myfn:fib-itr($n as xs:integer) as xs:integer? { if ($n < 0) then () else if ($n = 0) then 0 else }; declare function myfn:fib-itr-x($n as xs:integer, $m1 as xs:integer, $m2 as xs:integer) as xs:integer { if ($n = 1) then $m2 else myfn:fib-itr-x($n - 1,$m2, $m1 + $m2) }; declare function myfn:dataset-as-chart($dataset, as element(img) { let $series := for $var in $vars return string-join( $dataset/*/*[name(.) = $var],",") let let $points := string-join($series ,"|" ) "300x200" $chartType := "lc" $vars as xs:string+) myfn:fib-itr-x($n, 0, 1) then 0 then 1 + myfn:fib-recur($n - 2) div $reps

314

else myfn:fib-recur($n - 1)

let $chartSize :=

let $uri := concat("http://chart.apis.google.com/chart?", "cht=",$chartType,"&amp;chs=",$chartSize,"&amp;chd=t:",$points) return <img src="{$uri}"/> }; declare function myfn:dataset-as-table($dataset ) as element(table) { <table> <tr> {for $data in $dataset/*[1]/* return <th>{name($data)}</th>

Timing Fibonacci algorithms


} </tr> {for $tuple in $dataset/* return <tr> {for $data in $tuple/* return <td>{string($data)}</td> } </tr> } </table> }; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=no indent=yes doctype-public=-//W3C//DTD&#160;XHTML&#160;1.0&#160;Transitional//EN doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"; let $max := xs:integer(request:get-parameter("max",1)) let $reps := xs:integer(request:get-parameter("reps",1)) let $call-fib-recur := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/myfn","myfn:fib-recur"),1) let $call-fib-itr:= util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/myfn","myfn:fib-itr"),1) let $runs := <dataset> { for $n in return <tuple> <n>{$n}</n> <fib>{ myfn:fib-itr($n) } </fib> <recursive>{ myfn:time-call($call-fib-recur,$n,$reps) }</recursive> <iterative>{ myfn:time-call($call-fib-itr,$n,$reps) } </iterative> </tuple> } </dataset> return <html> <head> <title>Fibonacci with XQuery</title> <style type="text/css"> <![CDATA[ body { -1 to $max

315

Timing Fibonacci algorithms


background-color: #FFFFDD; } #graph { float: right; width: 50%; padding: 10px; } #table { float: left; width: 50% padding: 10px; } td,th { border-right: 1px solid #FFFF00; border-bottom: 1px solid #FFFF00; padding: 6px 6px 6px 12px; } ]]> </style> </head> <body> <h1>Fibonnacci from 1 to {$max} with {$reps} <div id="graph"> {myfn:dataset-as-chart($runs,("recursive","iterative"))} </div> <div id="table"> {myfn:dataset-as-table($runs)} </div> </body> </html> repetitions</h1> <p>Recursive and iterative methods in XQuery on eXist.db</p>

316

Execution
execute [1] (with preset limits) on the CEMS server.

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Literate/ fibgraph. xq

Transformation idioms

317

Transformation idioms
Motivation
Document transformation using the basic typeswitch statement applies the same transformation to an element independent of where it occurs in the document. The transformation also preserves document order since it processes elements in document order. In comparison with XSLT, XQuery lacks some mechanisms such as modes, priority and numbering. This article addresses some of these limitations.

Example
The example uses a custom XML schema to markup the contents of the book "Search: The Graphics Web Guide", Ken Coupland, a compendium of websites. This document is formatted with a site-specific schema. The document contains site elements which are tagged with a category, and also category elements which provide a commentary on the category. For comparison this dataset is used in a student case study which uses XSLT for transformations. [Sample file [1]]

Identity Transformation
module namespace coupland = "http://www.cems.uwe.ac.uk/xmlwiki/coupland"; (: conversion module generated from a set of tags :) declare function coupland:convert($nodes as node()*) as item()* { for $node in $nodes return typeswitch ($node) case element(websites) return coupland:websites($node) case element(sites) return coupland:sites($node) case element(site) return coupland:site($node) case element(uri) return coupland:uri($node) case element(name) return coupland:name($node) case element(description) return coupland:description($node) default return coupland:convert-default($node) }; declare function coupland:convert-default($node as node()) as item()* { $node }; declare function coupland:websites($node as element(websites)) as item()* { element websites{ $node/@*,

Transformation idioms coupland:convert($node/node()) } }; declare function coupland:sites($node as element(sites)) as item()* { element sites{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:site($node as element(site)) as item()* { element site{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:uri($node as element(uri)) as item()* { element uri{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:name($node as element(name)) as item()* { element name{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:description($node as element(description)) as item()* { element description{ $node/@*, coupland:convert($node/node()) } };

318

Transformation idioms

319

Customising the identity transformation


The module code is only a basic skeleton which we would edit to customize the transformation. In this example we will transform the document to HTML. This will require editing a number of the element converters.

Default action
Change the compare-default function to provide a different default action. For example: declare function coupland:convert-default ($node) if ($node instance of element()) then coupland:convert($node/node()) else $node }; {

would include the content of the node but remove the tag and its attributes.

Change element name


Site descriptions will be rendered as divs: declare function coupland:description($node as element(description)) as item()* { element div{ $node/@*, coupland:convert($node/node()) } };

Ignore element
The 'class' element is not needed: declare function coupland:class($node as element(class)) as item()* { () };

Define transformation
The image element should be transformed to an html img elment using the uri as the source: declare function coupland:image($node as element(image)) as item()* { element div { element img { attribute src { $node} } } };

Transformation idioms

320

Transformation depends on context


By default all elements with the same name anywhere in the document are transformed in the same way. Often this is not what is required: declare function coupland:name($node as element(name)) as item()* { if ($node/parent::node() instance of element(site)) then element h3{ $node/@*, coupland:convert($node/node()) } else element h1{ $node/@*, coupland:convert($node/node()) } };

Reordering elements
Each site is to be rendered in the order name, uri and then the rest of the sub-elements: declare function coupland:site($node as element(site)) as item()* { element div{ element div { coupland:convert($node/name), coupland:convert($node/uri) } , coupland:convert($node/(node() except (uri,name))) } };

Numbering categories
The xsl:number instruction provides a mechanism to generate hierarchical section numbers. This instruction is very powerful. In specific cases we can generate numbers using functions. For example to number the categories we can use this function to create a number for a node in a sequence of siblings. Note that the number is based on the order of nodes in the original document, not the transformed document (as does xsl:number) . declare function coupland:number($node) as xs:string { concat(count($node/preceding-sibling::node()[name(.) = name($node)]) + 1,". ") };

and call this function when transforming category names:

Transformation idioms element h2{ $node/@*, coupland:number($node/parent::node()), coupland:convert($node/node())

321

Parameterisation
The transformation can clearly be applied to different documents, but often the same transformation is to be used in different contexts. XSLT provides parameters and variables which are global to all templates. In XQuery we can either declare global variables in the module or pass one or more parameters around the functions ( module generation is helpful here). ....

Generating an index
XSLT uses the mode mechanism to allow the same template to be processed in multiple ways. A common use case is where the same transformation must generate both an index and the content. Several approaches suggest themselves. We could mimic the XSLT approach by passing an additional mode parameter in the calls and choose which transformation to apply in each function. Alternatively we append the mode to the function name. It is more difficult to use context (either global or passed) because the mode will need to be updated. The simplest approach is to use use two typeswitch transformation and combine the results at a higher level. This clearly separates the two modes of transformation. The technique of module generation is helpful here.

Complex transformation
The overall HTML document can be structured in the transformer for the root element. The page uses the blueprint stylesheets. Each category of site is rendered, with the sites which are classified in that category.
declare function coupland:websites($node as element(websites)) as item()* { (: the root element so convert to html :) <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Web Sites by Coupland</title> <link rel="stylesheet" href="../../css/blueprint/screen.css" type="text/css" media="screen, projection"/> <link rel="stylesheet" href="../../css/blueprint/print.css" type="text/css" media="print"/> <!--[if IE ]><link rel="stylesheet" href="../../css/blueprint/ie.css" type="text/css" media="screen, projection" /><![endif]--> <link rel="stylesheet" href="screen.css" type="text/css" media="screen"/> </head> <body> <div class="container"> { for $category in $node/category order by $category/class return <div>

Transformation idioms
<div class="span-10"> {coupland:convert($category)} </div> <div class="span-14 last"> {for $site in $node/sites/site[category=$category/class] order by ($site/sortkey,$site/name)[1] return coupland:convert($site) } </div> <hr /> </div> } </div> </body> </html> };

322

Completed transformation
The full XQuery module now looks like this:
module namespace coupland = "http://www.cems.uwe.ac.uk/xmlwiki/coupland"; (: conversion module generated from a set of tags

:)

declare function coupland:convert($nodes as node()* as node()?) as item()* { for $node in $nodes return typeswitch ($node) case element(category) return coupland:category($node) case element(class) return coupland:class($node) case element(description) return coupland:description($node) case element(em) return coupland:em($node) case element(hub) return coupland:hub($node) case element(image) return coupland:image($node) case element(name) return coupland:name($node) case element(p) return coupland:p($node) case element(q) return coupland:q($node) case element(site) return coupland:site($node) case element(sites) return coupland:sites($node) case element(sortkey) return coupland:sortkey($node) case element(subtitle) return coupland:subtitle($node) case element(uri) return coupland:uri($node) case element(websites) return coupland:websites($node)

Transformation idioms

323

default return coupland:convert-default($node) };

declare function coupland:convert-default($node as node() as node()?) as item()* { $node };

declare function coupland:category($node as element(category) as node()?) as item()* { if ($node/parent::node() instance of element(site)) then () else element div{ $node/@*, coupland:convert($node/node()) } };

declare function coupland:class($node as element(class) as node()?) as item()* { () };

declare function coupland:description($node as element(description) as node()?) as item()* { element div{ $node/@*, coupland:convert($node/node()) } };

declare function coupland:em($node as element(em) as node()?) as item()* { element em{ $node/@*, coupland:convert($node/node()) } };

declare function coupland:hub($node as element(hub) as node()?) as item()* { element hub{ $node/@*, coupland:convert($node/node())

Transformation idioms
} };

324

declare function coupland:image($node as element(image) as node()?) as item()* { element div { element img { attribute src { $node} } } };

declare function coupland:name($node as element(name) as node()?) as item()* { if ($node/parent::node() instance of element(site)) then element span { attribute style {"font-size: 16pt"}, $node/@*, coupland:convert($node/node()) } else element h1{ $node/@*, coupland:number($node/parent::node()), coupland:convert($node/node()) } };

declare function coupland:p($node as element(p) as node()?) as item()* { element p{ $node/@*, coupland:convert($node/node()) } };

declare function coupland:q($node as element(q) as node()?) as item()* { element q{ $node/@*, coupland:convert($node/node()) } };

declare function coupland:site($node as element(site) as node()?) as item()* {

Transformation idioms
element div{ element div { coupland:convert($node/name), coupland:convert($node/uri) } , coupland:convert($node/(node() except (uri,name))) } };

325

declare function coupland:sites($node as element(sites) as node()?) as item()* { for $site in $node/site order by $node/sortkey return coupland:convert($node/site) };

declare function coupland:sortkey($node as element(sortkey) as node()?) as item()* { () };

declare function coupland:subtitle($node as element(subtitle) as node()?) as item()* { element div{ $node/@*, coupland:convert($node/node()) } };

declare function coupland:uri($node as element(uri) as node()?) as item()* { <span> {element a{ attribute href {$node }, "Link" } } </span> };

declare function coupland:websites($node as element(websites) as node()?) as item()* { (: the rot element so convert to html :) <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>

Transformation idioms
<title>Web Sites by Coupland</title> <link rel="stylesheet" href="../../css/blueprint/screen.css" type="text/css" media="screen, projection"/> <link rel="stylesheet" href="../../css/blueprint/print.css" type="text/css" media="print"/> <!--[if IE ]><link rel="stylesheet" href="../../css/blueprint/ie.css" type="text/css" media="screen, projection" /><![endif]--> <link rel="stylesheet" href="screen.css" type="text/css" media="screen"/> </head> <body> <div class="container"> { for $category in $node/category order by $category/class return <div> <div class="span-10"> {coupland:convert($category)} </div> <div class="span-14 last"> {for $site in $node/sites/site[category=$category/class] order by ($site/sortkey,$site/name)[1] return coupland:convert($site) } </div> <hr /> </div> } </div> </body> </html> };

326

Transformed to HTML [2].

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ Coupland1. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtrans2. xq

Typeswitch Transformations

327

Typeswitch Transformations
Motivation
You have an XML document that you want to transform into a different format of XML. You want to control and customize the transformation process, and you want a modular way to store the transformation rules so that you or others can easily modify and maintain them.

Background on using XQuery vs. XSLT for Document Transformation


You may have heard the conventional wisdom that "XQuery is best for querying or selecting XML, and XSLT is best for transforming it." In reality, both methods are capable of transforming XML. Despite XSLT's somewhat longer history and larger install base, the "XQuery typeswitch" method of transforming XML provides numerous advantages. These are covered in more detail in XQuery Benefits.

Method
We will use XQuery's typeswitch expression to transform an XML document from one form into another. The basic approach is simple and straightforward: For each XML node in the input document, we will specify what should be created in the output document. The typeswitch expression performs this core function of identifying what happens to each node in the source document. We will write an XQuery function that takes a node, tests it using a typeswitch expression, and dispatches that node to the appropriate handler function, which transforms the node into the new format and sends any child elements back to the main function using the passthru function. This recursive routine effectively crawls through an entire node and its children, transforming them into the target format. Once the structure has been set up, the transform is easy to modify, even if there is very complex nesting of the tags within the input document. (The tail recursion technique will be familiar to discerning users of XSLT, but there is absolutely no XSLT prerequisite for this article.)

Example Data
Suppose you have a simple XML document that you would like to transform:

Sample Input Document


<bill> <btitle>This is the Bill title</btitle> <section-id>1</section-id> <bill-text>This is the text with <strike>many</strike> examples.</bill-text> </bill>

Sample Output Document


Here is the format that you would like to turn the source input into: <Bill> <BillTitleText>This is the Bill title</BillTitleText> <BillSectionID>1</BillSectionID> <BillText>This is the text with <del>many</del> examples.</BillText> </Bill>

Typeswitch Transformations

328

Example Transformation With Typeswitch


The most effective way to use the typeswitch expression to transform XML is to create a series of XQuery functions. In this way, we can cleanly separate the major actions of the transformation into modular functions. (In fact, the library of functions can be saved into an XQuery library module, which can then be reused by other XQueries.) The "magic" of this typeswitch-style transformation is that once you understand the basic pattern and structure of the functions, you can adapt them to your own data. You'll find that the structure is so modular and straightforward that it's even possible to teach others the basics of the pattern in a short period of time and empower them to maintain and update the transformation rules themselves. The first function in our module is where the typeswitch expression is located. This function is conventionally called the "dispatch" function: declare function local:dispatch($node as node()) as item()* { typeswitch($node) case text() return $node case element(bill) return local:bill($node) case element(btitle) return local:btitle($node) case element(section-id) return local:section-id($node) case element(bill-text) return local:bill-text($node) case element(strike) return local:strike($node) default return local:passthru($node) };

Notice that the typeswitch expression tests the input node against a list of criteria: is the node a text node, a bill element, or a btitle element, or a section-id element, etc? If it's a text node (e.g. "This is the Bill title"), we simply return the text, unmodified. (Note that the text() node test comes first since text() is likely to be the single most plentiful node type in a text-rich document, and placing the most common type first improves performance.) If instead the node is a bill element, then we pass the node to the aptly-named local:bill() function for bill-specific handling. The local:bill() function (see below) turns the <bill> element into a <Bill> element. It then passes the contents of the bill element to the local:passthru() function. If our node doesn't match any of the pre-defined rules, then the typeswitch expressions resorts to the required final "default" (think: "fallback") statement; this default is used for all nodes that don't match any of the preceding tests. In our example, the default expression sends nodes without matches to the local:passthru() function. (Typeswitch isn't limited to matching text() and element() nodes; it can also match other the node types: processor-instruction() and comment(), but not typically attribute(). Attributes are conventionally dealt with inside the handler function of the attribute's parent element, rather than in the core typeswitch function.)

The Passthru Function


The passthru() function recurses through a given node's children, handing each of them back to the main typeswitch operation. declare function local:passthru($nodes as node()*) as item()* { for $node in $nodes/node() return local:dispatch($node) };

(Note: This is such a simple function that it may appear extraneous. Why not simply replace instances of local:passthru($node) with local:dispatch($node/node())? Its primary benefit is that it simplifies the code, relieving

Typeswitch Transformations you of the burden of typing an extra "/node()" for each recursion. A secondary benefit is that it introduces the possibility of filtering a node before it is sent to the typeswitch routine.)

329

Functions to Handle Each Element


declare function local:bill($node as element(bill)) as element() { <Bill>{local:passthru($node)}</Bill> }; declare function local:btitle($node as element(btitle)) as element() { <BillTitle>{local:passthru($node)}</BillTitle> }; declare function local:section-id($node as element(section-id)) as element() { <BillSectonID>{local:passthru($node)}</BillSectonID> }; declare function local:strike($node as element(strike)) as element() { <del>{local:passthru($node)}</del> declare function local:bill-text($node as element(bill-text)) as element() { <BillText>{local:passthru($node)}</BillText> };

Execute the transformation


We can now write a query that takes the source XML and uses the local:dispatch() function to transform the input into the target format: let $input := <bill> <btitle>This is the Bill title</btitle> <section-id>1</section-id> <bill-text>This is the text with <strike>many</strike> examples.</bill-text> </bill> return local:dispatch($input) Execute [1]

Compact approach
While the above approach is recommended as the most modular, extensible approach, it is perfectly acceptable to express the same transformation using a more compact, self-contained function: declare function local:transform($nodes as node()*) as item()* { for $node in $nodes return typeswitch($node) case text() return $node

Typeswitch Transformations case element(bill) return element Bill {local:transform($node/node())} case element(btitle) return element BillTitle {local:transform($node/node())} case element(section-id) return element BillSectonID {local:transform($node/node())} case element(strike) return element del {local:transform($node/node())} case element(bill-text) return element BillText {local:transform($node/node())} default return local:transform($node/node()) };

330

Besides the fact that this function is entirely self-contained (beginning with a FLWOR expression and using $node/node() to recurse through child nodes), notice that the function uses computed element constructors to accomplish the transformation.

Conclusion
This is the heart of the XQuery Typeswitch approach to XML document transformation. On the basis of this simple pattern, entire libraries have been written to transform source formats like TEI, DocBook, and Office OpenXML documents into other formats like XHTML, XSL-FO, and each other. While we can create typeswitch modules by hand, building them up element by element, we can also use XQuery to generate a skeleton typeswitch module; see this article's companion article, XQuery/Generating_Skeleton_Typeswitch_Transformation_Modules. In addition to the "skeleton generator", this article also provides examples of more complex transformation patterns with XQuery typeswitch: changing an element's name, ignoring an element, transforming differently based on the context of the element, reordering elements. It also provides a detailed comparison of XQuery and XSLT's approaches to the same example transformation, so it is useful for readers coming from the world of XSLT.

References
DocBook to XHTML [2] Link to sample code that converts Docbook to XHTML in Dan McCreary's eXist Brach W3C XQuery Typeswitch definition [3] Comparison of typeswitch and XSLT apply-templates [4] i18n example by Ryan Semerau [5] typeswitch in BEA/Oracle mapper [6] Dec 2002 article by Per Bothner about using typeswitch to transform XML to HTML in xml.com [7] Transforming XML Structures With a Recursive typeswitch Expression [8] (from MarkLogic "Application Developer's Guide")

Typeswitch Transformations

331

References
[1] [2] [3] [4] [5] [6] [7] [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ eg1. xq https:/ / exist. svn. sourceforge. net/ svnroot/ exist/ branches/ dmccreary/ docs/ webapp/ docs/ docbook5/ docbook2xhtml-v2. xqm http:/ / www. w3. org/ TR/ xquery/ #id-typeswitch http:/ / developer. marklogic. com/ blog/ tired-of-typeswitch http:/ / xquerywebappdev. wordpress. com/ 2010/ 05/ 05/ non-obtrusive-i18n http:/ / download. oracle. com/ docs/ cd/ E14981-01/ wli/ docs1031/ dtguide/ dtguideMapper. html#wp1399341 http:/ / www. xml. com/ pub/ a/ 2002/ 12/ 23/ xquery. html?page=1 http:/ / developer. marklogic. com:8040/ 4. 2doc/ docapp. xqy#display. xqy?fname=http:/ / pubs/ 4. 2doc/ xml/ dev_guide/ typeswitch. xml

UK shipping forecast
Motivation
The UK shipping forecast is prepared by the UK met office 4 times a day and published on the radio, the Met Office web site [1] and the BBC web site [2]. However it is not available in a computer readable form. Tim Duckett recently blogged about creating a Twitter stream [3]. He uses Ruby to parse the text forecast. The textual form of the forecast is included on both the Met Office and BBC sites. However as Tim points out, the format is designed for speech, compresses similar areas to reduce the time slot and is hard to parse. The approach taken here is to scrape a JavaScript file containing the raw area forecast data.

Implementation
Dependancies
eXist-db Modules The following scripts use these eXist modules: request - to get HTTP request parameters httpclient - to GET and POST scheduler - to schedule scrapping tasks dateTime - to format dateTimes util - base64 conversions xmldb - for database access

Other UK Met office web site

Met Office page


The Met office page shows an area-by-area forecast but this part of the page is generated by JavaScript from data in a generated JavaScript file [4]. In this file, the data is assigned to multiple arrays. A typical section looks like // Bailey gale_in_force[28] = "0"; gale[28] = "0"; galeIssueTime[28] = ""; shipIssueTime[28] = "1725 Sun 06 Jul"; wind[28] = "Northeast 5 to 7.";

UK shipping forecast weather[28] = "Showers."; visibility[28] = "Moderate or good."; seastate[28] = "Moderate or rough."; area[28] = "Bailey"; area_presentation[28] = "Bailey"; key[28] = "Bailey"; // Faeroes ...

332

Area Forecast
JavaScript conversion
This function fetches the current JavaScript data using the eXist httpclient module, converts the base64 data to a string, picks out the required area data and parses the code to generate an XML structure using the JavaScript array names.
declare namespace httpclient = "http://exist-db.org/xquery/httpclient";

declare function met:get-forecast($area as xs:string) as element(forecast)? { let $jsuri := "http://www.metoffice.gov.uk/lib/includes/marine/gale_and_shipping_table.js" (: fetch the javascript source response :) let $base64:= httpclient:get(xs:anyURI($jsuri),true(),())/httpclient:body/text() (: this is base64 encoded , so decode it back to text :) let $js := util:binary-to-string($base64) and locate the text of the body of the

(: isolate the section for the required area, prefixed with a comment let $areajs := normalize-space(substring-before(

substring-after($js,concat("// ",$area)),"//")) return if($areajs ="") then () else (: build an XML element containing elements for each of the data items, using the array names as the element names :) (: area not found :)

<forecast> { for $d in tokenize($areajs,";")[position() < last()] (: JavaScript statements terminated by ";" - ignore the last empty :) let $ds := tokenize(normalize-space($d)," *= *") (: separate the LHS and RHS of the assignment statement :) return element {replace(substring-before($ds[1],"["),"_","")}(: element name is the array name, converted to a legal name :)

UK shipping forecast
{replace($ds[2],'"','')} quotes :) } </forecast> }; (: element text is the RHS minus

333

For example, the output for one selected area is: <forecast> <galeinforce>0</galeinforce> <gale>0</gale> <galeIssueTime/> <shipIssueTime>0505 Mon 07 Jul</shipIssueTime> <wind>Northwest backing west 5 to 7.</wind> <weather>Squally showers.</weather> <visibility>Moderate or good.</visibility> <seastate>Moderate or rough.</seastate> <area>Fastnet</area> <areapresentation>Fastnet</areapresentation> <key>Fastnet</key> </forecast>

Format the forecast as text


The forecast data needs to be formatted into a string: declare function met:forecast-as-text($forecast as element(forecast)) as xs:string { concat( $forecast/weather, " Wind ", $forecast/wind, " Visibility ", $forecast/visibility, " Sea ", $forecast/seastate ) };

Area Forecast
Finally these functions can be used in a script which accepts a shipping area name and returns an XML message: import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $area := request:get-parameter("area","Lundy") let $forecast := met:get-forecast($area) return <message area="{$area}" dateTime="{$forecast/shipIssueTime}"> {met:forecast-as-text($forecast)} </message> Lundy [5]

UK shipping forecast Fastnet [6]

334

Message abbreviation
To create a message suitable for texting (160 characters), or tweeting (140 character limit), the message can compressed by abbreviating common words.

Abbreviation dictionary
A dictionary of words and abbreviations is created and stored locally. The dictionary has been developed using some of the abbreviations in Tim Duckett's Ruby implementation. <dictionary> <entry full="west" abbrev="W"/> <entry full="westerly" abbrev="Wly"/> .. <entry full="variable" abbrev="vbl"/> <entry full="visibility" abbrev="viz"/> <entry full="occasionally" abbrev="occ"/> <entry full="showers" abbrev="shwrs"/> </dictionary> The full dictionary [7]

Abbreviation function
The abbreviation function breaks down the text into words, replaces words with abbreviations and builds the text up again: declare function met:abbreviate($forecast as xs:string) as xs:string { string-join( (: lowercase the string, append a space (to ensure a final . is matched) and tokenise :) for $word in tokenize(concat(lower-case($forecast)," "),"\.? +") return (: if there is an entry for the word , use its abbreviation, otherwise use the unabbreviated word :) ( /dictionary/entry[@full=$word]/@abbrev,$word) [1] , " ") (: join the words back up with space separator :) };

Abbreviated Message
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $area := request:get-parameter("area","Lundy") let $forecast := met:get-forecast($area) return <message area="{$area}" dateTime="{$forecast/shipIssueTime}">

UK shipping forecast {met:abbreviate(met:forecast-as-text($forecast))} </message> Lundy [8] Fastnet [9]

335

All Areas forecast


This function is an extension of the area forecast. The parse uses the comment separator to break up the script, ignores the first and last sections and the area name in the comment declare function met:get-forecast() as element(forecast)* { let $jsuri := "http://www.metoffice.gov.uk/lib/includes/marine/gale_and_shipping_table.js" let $base64:= httpclient:get(xs:anyURI($jsuri),true(),())/httpclient:body/text() let $js := util:binary-to-string($base64) for $js in tokenize($js,"// ")[position() > 1] [position()< last()] let $areajs := concat("gale",substring-after($js,"gale")) return <forecast> { for $d in tokenize($areajs,";")[position() < last()] let $ds := tokenize(normalize-space($d)," *= *") return element {replace(substring-before($ds[1],"["),"_","")} {replace($ds[2],'"','')} } </forecast> };

XML version of forecast


This script returns the full Shipping forecast in XML: import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; <ShippingForecast> {met:get-forecast()} </ShippingForecast>

Execute [10]

UK shipping forecast

336

RSS version of forecast


XSLT would be suitable for transforming this XML to RSS format ...

SMS service
One possible use of this data would be to provide an SMS on-request service, taking an area name and returning the abbreviated forecast. The complete set of forecasts are created, and the one for the area supplied as the message selected and returned as an abbreviated message. import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $area := lower-case(request:get-parameter("text",())) let $forecast := met:get-forecast()[lower-case(area) = $area] return if (exists($forecast)) then concat("Reply: ", met:abbreviate(met:forecast-as-text($forecast))) else concat("Reply: Area ",$area," not recognised")

The calling protocol is determined here by the SMS service installed at UWE and described here [2] Execute [11]

Caching
Fetching the JavaScript on demand is neither efficient nor acceptable net behaviour, and since the forecast times are known, it is preferable to fetch the data on a schedule, convert to the XML form and save in the eXist database and then use the cached XML for later requests.

Store XML forecast


import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; declare variable $col := "/db/Wiki/Met/Forecast"; if (xmldb:login($col, "user", "password")) (: a user who has write access to the Forecast collection :) then let $forecast := met:get-forecast() let $forecastDateTime := met:timestamp-to-xs-date(($forecast/shipIssueTime)[1]) (: convert to xs:dateTime :) let $store := xmldb:store( $col, (: collection to store forecast in :)

UK shipping forecast "shippingForecast.xml", here as we only want the latest :) be stored :) <ShippingForecast at="{$forecastDateTime}" > {$forecast} </ShippingForecast> ) return <result> Shipping forecast for {string($forecastDateTime)} stored in {$store} </result> else () The timestamp used on the source data is converting to an xs:dateTime for ease of later processing.
declare function met:timestamp-to-xs-date($dt as xs:string) as xs:dateTime { (: convert timestamps in the form 0505 Tue 08 Jul to xs:dateTime :) let $year := year-from-date(current-date()) year since none provided :) let $dtp := tokenize($dt," ") let $mon := index-of(("Jan","Feb", "Mar","Apr","May", "Jun","Jul","Aug","Sep","Oct","Nov","Dec"),$dtp[4]) let $monno := if($mon < 10) then concat("0",$mon) else $mon return xs:dateTime(concat($year,"-",$monno,"-",$dtp[3],"T",substring($dtp[1],1,2),":",substring($dtp[1],3,4),":00")) }; (: assume the current

337 (: file name - overwrite is OK (: then the constructed XML to

Reducing the forecast data


The raw data contains redundant elements (several versions of the area name) and elements which are normally empty (all gale related elements when no gale warning) but lacks a case-normalised area name as a key. The following function performs this restructuring: declare function met:reduce($forecast as element(forecast)) as element(forecast) { <forecast> { attribute area {lower-case($forecast/area)}} { $forecast/* [not(name(.) = ("shipIssueTime","area","key"))] [ if (../galeinforce = "0" ) then not(name(.) = ("galeinforce","gale","galeIssueTime")) else true() ] }

UK shipping forecast </forecast> }; There would be a case to make for using XSLT for this transformation. The caching script applies this transformation to the forecast before saving.

338

SMS via cache


The revised SMS script can now access the cache. First a function to get the stored forecast:
declare function met:get-stored-forecast($area as xs:string) as element(forecast) { doc("/db/Wiki/Met/Forecast/shippingForecast.xml")/ShippingForecast/forecast[@area = $area] };

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $area := lower-case(normalise-space(request:get-parameter("text",()))) let $forecast := met:get-stored-forecast($area) return if (exists($forecast)) then concat("Reply: ", datetime:format-dateTime($forecast/../@at,"HH:mm")," ",met:abbreviate(met:forecast-as-text($forecast))) else concat("Reply: Area ",$area," not recognised")

In this script, the selected forecast for the input area extracted by the met function call is a reference to the database element, not a copy. Thus it is still possible to navigate back to the parent element containing the timestamp. The eXist datetime functions are wrappers for the Java class java.text.SimpleDateFormat [12] which defines the date formatting syntax. Lundy [13]

Job scheduling
eXist includes a scheduler module which is a wrapper for the Quartz scheduler DBA user. For example, to set a job to fetch the shipping forecast on the hour, let $login := xmldb:login( "/db", "admin", "admin password" ) let $job := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq" , "0 0 * * * ?") return $job
[14]

. Jobs can only be created by a

UK shipping forecast

339

where "0 0 * * * ?" means to run at 0 seconds, 0 minutes past every hour of every day of every month, ignoring the day of the week. To check on the set of scheduled jobs, including system schedule jobs: let $login := xmldb:login( "/db", "admin", "admin password" ) return scheduler:get-scheduled-jobs()

It would be better to schedule jobs on the basis of the update schedule for the forecast. These times are 0015, 0505, 1130 and 1725. These times cannot be fitted into a single cron pattern so multiple jobs are required. Because jobs are identified by their path, the same url cannot be used for all instances, so a dummy parameter is added. Discussion The times are one minute later than the published times. This may not be enough slack to account for discrepancies in timing on both sides. Clearly a push from the UK Met Office would be better than the pull scraping. The scheduler clock runs in local time (BST) as are the publication times. let $login := xmldb:login( "/db", "admin", "admin password" ) let $job1 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=1" "0 16 0 * * ?") let $job2 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=2" "0 6 5 * * ?") let $job3 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=3" "0 31 11 * * ?") let $job4 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=4" "0 26 17 * * ?") return ($job1, $job2, $job3, $job4)

Forecast as kml
Sea area coordinates
The UK Met Office provides a clickable map [1] of forecasts but a KML map would be nice. The coordinates [15] of the sea areas can be captured and manually converted to XML. <?xml version="1.0" encoding="UTF-8"?> <boundaries> <boundary area="viking"> <point latitude="61" longitude="0"/> <point latitude="61" longitude="4"/> <point latitude="58.5" longitude="4"/> <point latitude="58.5" longitude="0"/> </boundary> ...

UK shipping forecast The boundary for an area is accessed by two functions. In this idiom one function hides the document location and returns the root of the document. Subsequence functions use this base function to get the docuement and then apply further predicates to filter as required. declare function met:area-boundaries() as element(boundaries) { doc("/db/Wiki/Met/shippingareas.xml")/boundaries }; declare function met:area-boundary($area as xs:string) as element(boundary) { met:area-boundaries()/boundary[@area=$area] };

340

The centre of an area can be roughly computed by averaging the latitudes and longitudes: declare function met:area-centre($boundary as element(boundary)) as element(point) { <point latitude="{round(sum($boundary/point/@latitude) div count($boundary/point) * 100) div 100}" longitude="{round(sum($boundary/point/@longitude) div count($boundary/point) * 100) div 100}" /> };

kml Placemark
We can generate a kml PlaceMark from a forecast: declare function met:forecast-to-kml($forecast as element(forecast)) as element(Placemark) { let $area := $forecast/@area let $boundary := met:area-boundary($area) let $centre := met:area-centre($boundary) return <Placemark > <name>{string($forecast/areapresentation)}</name> <description> {met:forecast-as-text($forecast)} </description> <Point> <coordinates> {string-join(($centre/@longitude,$centre/@latitude),",")} </coordinates> </Point> </Placemark> };

UK shipping forecast

341

kml area area


Since we have the area coordinates, we can also generate the boundaries as a line in kml. declare function met:sea-area-to-kml( $area as xs:string, $showname as xs:boolean ) as element(Placemark) { let $boundary := met:area-boundary($area) return <Placemark > {if($showname) then <name>{$area}</name> else()} <LineString> <coordinates> {string-join( for $point in $boundary/point return string-join(($point/@longitude,$point/@latitude,"0"),",") , " " ) } </coordinates> </LineString> </Placemark> };

Generate the kml file


import module at "met.xqm"; namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met"

(: set the media type for a kml file :) declare option exist:serialize "method=xml indent=yes media-type=application/vnd.google-earth.kml+xml"; (: set the file name ans extension when saved to allow GoogleEarth to be invoked :) let $dummy := response:set-header('Content-Disposition','inline;filename=shipping.kml;') (: get the latest forecast :) let $shippingForecast := met:get-stored-forecast() return <kml > <Folder> <name>{datetime:format-dateTime($shippingForecast/@at,"EEEE HH:mm")}

UK shipping forecast UK Met Office Shipping forecast</name> {for $forecast in $shippingForecast/forecast return (met:forecast-to-kml($forecast), met:sea-area-to-kml($forecast/@area,false()) ) } </Folder> </kml> raw kml [16] on GoogleMap [17]

342

Push messages
An alternative use of this data is to provide a channel to push the forecasts through as soon as they are received. The channel could be a SMS alert to subscribers or a dedicated Twitter stream which users could follow.

Subscription SMS
This service should allow a user to request an alert for a specific area or areas. The application requires: a data structure to record subscribers and their areas a web service to register a user, their mobile phone number and initial area [to do] an SMS service to change the required area and turn messaging on or off a scheduled task to push the SMS messages when the new forecast has been obtained

Document Structure
<subscriptions> <subscription> <username>Fred Bloggs</username> <password>hafjahfjafa</password> <mobilenumber>447777777</mobilenumber> <area>lundy</area> <status>off</status> </subscription> ... </subscriptions> XML Schema (to be completed)

Access control
Access to this document needs to be controlled. The first level of access control is to place the file in a collection which is not accessible via the web. In the UWE server, the root (via mod-rewrite) is the collection /db/Wiki so resources in this directory and subdirectories are accessible, subject to the access settings on the file, but files in parent or sibling directories are not. So this document is stored in the directory /db/Wiki2. The URL of this file, relative to the external root is http:/ / www. cems. uwe. ac. uk/xmlwiki/../Wiki2/shippingsubscriptions.xml [18] but access fails.

UK shipping forecast The second level of control is to set the owner and permissions on the file. This is needed because a user on a client behind the firewall, using the internal server address, will gain access to this file. By default, world permissions are set to read and update. Removing this access requires the script to login to read as group or owner. Ownership and permissions can be set either via the web client or by functions in the eXist xmldb module.

343

SMS push
This function takes a subscription, formulates a text message and calls a general sms:send function to send. This interfaces with our SMS service provider. declare function met:push-sms($subscription as element(subscription)) as element(result) { let $area := $subscription/area let $forecast := met:get-stored-forecast($area) let $time := datetime:format-dateTime($forecast/../@at,"EE HH:mm") let $text := encode-for-uri(concat($area, " ",$time," ",met:abbreviate(met:forecast-as-text($forecast)))) let $number := $subscription/mobilenumber let $sent := sms:send($number,$text) return <result number="{$number}" area="{$area}" sent="{$sent}"/> };

SMS push subscriptions


First we need to get the active subscriptions. The functions follow the same idiom used for boundaries: declare function met:subscriptions() { doc("/db/Wiki2/shippingsubscriptions.xml")/subscriptions }; declare function met:active-subscriptions() as element(subscription) * { met:subscriptions()/subscription[status="on"] };

and then to iterate through the active subscriptions and report the result: declare function met:push-subscriptions() as element(results) { <results> { let $dummy := xmldb:login("/db","webuser","password") for $subscription in met:active-subscriptions() return met:push-sms($subscription) } </results> }; This script iterates through the subscriptions currently active and calls the push-SMS function for each one.

UK shipping forecast import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; met:push-subscriptions()

344

This task could be scheduled to run after the caching task has run or the caching script modified to invoke the subscription task when it has completed. However eXist also supports triggers so the task could also be triggered by the database event raised when the forecast file store has been completed.

Subscription editing by SMS


A message format is required to edit the status of the subscription and to change the subscription area: metsub [ on |off |<area> ] If the area is changed the status is set to on. The area is validated against a list of area codes. These are extracted from the boundary data: declare function met:area-names() as xs:string* { met:area-boundaries()/boundary/string(@area) };

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $login:= xmldb:login("/db","user","password") let $text := normalize-space(request:get-parameter("text",())) let $number := request:get-parameter("from",()) let $subscription := met:get-subscription($number) return if (exists($subscription)) then let $update := if ( $text= "on") then update replace $subscription/status with <status>on</status> else if( $text = "off") then update replace $subscription/status with <status>off</status> else if ( lower-case($text) = met:area-names()) then ( update replace $subscription/area with <area>{$text}</area>, update replace $subscription/status with <status>on</status> ) else () return let $subscription := met:get-subscription($number)(: get the subscription post update :) return concat("Reply: forecast is ",$subscription/status," for area ",$subscription/area) else ()

UK shipping forecast

345

Twitter
Twitter [19] has a simple REST API to update the status. We can use this to tweet the forecasts to a Twitter account. Twitter uses Basic Access Authentication and a suitable XQuery function to send a message to a username/password, using the eXist httpclient module is : declare function met:send-tweet ($username as xs:string,$password as xs:string,$tweet as xs:string ) as xs:boolean { let $uri := xs:anyURI("http://twitter.com/statuses/update.xml") let $content :=concat("status=", encode-for-uri($tweet)) let $headers := <headers> <header name="Authorization" value="Basic {util:string-to-binary(concat($username,":",$password))}"/> <header name="Content-Type" value="application/x-www-form-urlencoded"/> </headers> let $response := httpclient:post( $uri, $content, false(), $headers ) return $response/@statusCode='200' }; A script is needed to access the stored forecast and tweet the forecast for an area. Different twitter accounts could be set up for each shipping area. The script will need to be scheduled to run after the the full forecast has been acquired. In this example, the forecast for given are is tweeted to a hard-coded twitterer: import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; declare variable $username := "kitwallace"; declare variable $password := "mypassword"; declare variable $area := request:get-parameter("area","lundy"); let $forecast := met:get-stored-forecast($area) let $time := datetime:format-dateTime($forecast/../@at,"HH:mm") let $message := concat($area," at ",$time,":",met:abbreviate(met:forecast-as-text($forecast))) return <result>{met:send-tweet($username,$password,$message)}</result> Chris Wallace's Twitter [20]

UK shipping forecast

346

To do
Creating and editing subscriptions
This task is ideal for XForms.

Triggers
Use a trigger to push the SMS messages when update has been done.

References
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] http:/ / www. metoffice. gov. uk/ weather/ marine/ shipping_forecast. html http:/ / www. bbc. co. uk/ weather/ coast/ shipping/ index. shtml http:/ / www. adoptioncurve. net/ archives/ 2008/ 03/ twittering-the-shipping-forecast. php http:/ / www. metoffice. gov. uk/ lib/ includes/ marine/ gale_and_shipping_table. js http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shipping. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shipping. xq?area=Fastnet http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shippingdictionary. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shippingabbrev. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shippingabbrev. xq?area=Fastnet http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shippingfull. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ met2SMS1. xq?text=lundy http:/ / java. sun. com/ j2se/ 1. 4. 2/ docs/ api/ java/ text/ SimpleDateFormat. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ met2SMS. xq?text=Lundy http:/ / www. opensymphony. com/ quartz/ http:/ / www. users. zetnet. co. uk/ tempusfugit/ marine/ area_coord. htm http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ forecast2kml. xq http:/ / maps. google. co. uk/ maps?q=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ forecast2kml. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ . . / Wiki2/ shippingsubscriptions. xml http:/ / twitter. com http:/ / twitter. com/ kitwallace

Unzipping an Office Open XML docx file

347

Unzipping an Office Open XML docx file


Motivation
You want to uncompress an docx file

Method
We will use the compression:unzip() function used in the prior example and pass it a local version of the function that handles the uncompression.

File Names
Some file names in docx files such as '[Content_Types].xml' are not valid URIs. So these must be renamed to files with valid URIs. Here is a typical list of the path names in a docx file: <item <item <item <item <item <item <item <item <item <item <item path="[Content_Types].xml" type="resource">Types</item> path="_rels/.rels" type="resource">Relationships</item> path="word/_rels/document.xml.rels" type="resource">Relationships</item> path="word/document.xml" type="resource">w:document</item> path="word/theme/theme1.xml" type="resource">a:theme</item> path="word/settings.xml" type="resource">w:settings</item> path="word/fontTable.xml" type="resource">w:fonts</item> path="word/webSettings.xml" type="resource">w:webSettings</item> path="docProps/app.xml" type="resource">Properties</item> path="docProps/core.xml" type="resource">cp:coreProperties</item> path="word/styles.xml" type="resource">w:styles</item>

Note that there are three subfolders created (_rels, word and docProps). The XML files are stored in these files.

unzip-docx function
The following function is used to unzip a docx file. This function name must be passed as a parameter to the unzip function to tell it to do with each docx file. Note that you must pass in parameters to this function from the calling function. unzip-docx function:
declare function local:unzip-docx($path as xs:string, $data-type as xs:string, $data as item()?, $param as item()*) { if ($param[@name eq 'list']/@value eq 'true') then <item path="{$path}" data-type="{$data-type}"/> else let $base-collection := $param[@name="base-collection"]/@value/string() let $zip-collection := concat(

functx:substring-before-last($param[@name="zip-filename"]/@value, '.'),

Unzipping an Office Open XML docx file


'_',

348

functx:substring-after-last($param[@name="zip-filename"]/@value, '.') , '_parts/' ) let $inner-collection := functx:substring-before-last($path, '/') let $filename := if (contains($path, '/')) then functx:substring-after-last($path, '/') else $path (: we need to encode the filename to account for filenames with illegal characters like [Content_Types].xml :) let $filename := xmldb:encode($filename) let $target-collection := concat($base-collection, $zip-collection, $inner-collection) let $mkdir := if (xmldb:collection-available($target-collection)) then ()

else xmldb:create-collection($base-collection, concat($zip-collection, $inner-collection)) let $store := (: ensure mimetype is set properly for .docx rels files :) if (ends-with($filename, '.rels')) then xmldb:store($target-collection, $filename, $data, 'application/xml') else xmldb:store($target-collection, $filename, $data) return <result object="{$path}" destination="{concat($target-collection, '/', $filename)}"/> };

unzip function
declare function local:unzip($base-collection as xs:string, $zip-filename as xs:string, $action as xs:string) { if (not($action = ('list', 'unzip'))) then <error>Invalid action</error> else let $file := util:binary-doc(concat($base-collection, $zip-filename)) let $entry-filter := util:function(QName("local", "local:unzip-entry-filter"), 3) let $entry-filter-params := () let $entry-data := util:function(QName("local", "local:unzip-docx"), 4) let $entry-data-params := ( if ($action eq 'list') then <param name="list" value="true"/> else (), <param name="base-collection" value="{$base-collection}"/>,

Unzipping an Office Open XML docx file <param name="zip-filename" value="{$zip-filename}"/> ) let $login := xmldb:login('/db', 'admin', '') (: recursion :) let $unzip := compression:unzip($file, $entry-filter, $entry-filter-params, $entry-data, $entry-data-params) return <results action="{$action}">{$unzip}</results> };

349

Sample Driver
let $collection := '/db/test/' let $zip-filename := 'hello-world.docx' let $action := 'unzip' (: valid actions: 'list', 'unzip' :) return local:unzip($collection, $zip-filename, $action)

Updates and Namespaces


Motivation
You want to update an XML file and understand how the XQuery update process might impact the namespace of the document the next time it is viewed.

Method
We will use the XQuery update statement to change the values of XML documents and note how the default output changes with respect to how the default namespaces and namespace prefixes are rendered. XQuery updates can impact how namespaces are rendered.

Example
Suppose we have a task that includes a default namespace like the following XML document. Example XML document using a default namespace. <task xmlns="http://www.example.com/task"> <id></id> <task-name>Task Name</task-name> <task-description>Task Description</task-description> </task> In the following example we refer to this XML document as $doc. Most people that want to view XML prefer to use a default namespace and not clutter the entire document with unnecessary prefixes.

Updates and Namespaces Suppose we have just saved this file and we now want to add an ID value to the <id> element. After we update this XML file with the following update statement we note that the serialization will change. update replace $doc/task:task/task:id with <task:id>123</task:id> The next time we view this document we see the following: <task:task xmlns:task="http://www.example.com/task"> <task:id>123</task:id> <task:task-name>Task Name</task:task-name> <task:task-description>Task Description</task:task-description> </task:task> This is known as "fully qualified" document where the namespace prefix of every element is fully shown. It is technically equivalent to the prior example. This may not be what you would like.

350

Updating the Element Value, not the entire Element


To get around this we can just update the text() node of the id instead of the entire id element: update replace $doc//task:id/text() with 123 or you can use the update/value syntax update value $doc//task:id with 123 Using either of these techniques will not dirty up your default namespace structures.

Acknowledgments
Joe Wicentowski was kind enough to make these observations and provide samples.

Uploading Files

351

Uploading Files
Motivation
You want to upload files to your eXist database using simple HTML forms.

Method
We will use the HTML <input> element in the web form and the store function in an XQuery.

HTML Form
We will use a standard HTML form but we will add a enctype="multipart/form-data" attribute. <form enctype="multipart/form-data" method="post" action="upload-document.xq"> <fieldset> <legend>Upload Document:</legend> <input type="file" name="file"/> <input type="submit" value="Upload"/> </fieldset> </form>

Screen Image:

XQuery
On the server side, we will use the request:get-uploaded-file-name() to get the name of the incoming file and the request:get-uploaded-file-data() function to get the data from the file. We can then used the xmldb:store() function to save the file. File: upload-document.xq
let $collection := '/db/test/upload-test' let $filename := request:get-uploaded-file-name('file') (: make sure you use the right user permissions that has write access to this collection :) let $login := xmldb:login($collection, 'admin', 'my-admin-password') let $store := xmldb:store($collection, $filename, request:get-uploaded-file-data('file')) return <results> <message>File {$filename} has been stored at collection={$collection}.</message> </results>

Uploading Files

352

Acknowledgments
This example was posted on the eXist open mailing list by Rmi Arnaud on Nov. 05, 2010.

Uptime monitor
Motivation
You would like to monitor the service availability of several web sites or web services. You would like to do this all with XQuery and store the results in XML files. You would also like to see "dashboard" graphical displays of uptime. There are several commercial services (Pingdom web sites in terms of uptime and response time.
[1]

, Host-tracker

[2]

)which will monitor the performance of your

Although the production of a reliable service requires a network of servers, the basic functionality can be performed using XQuery in a few scripts.

Method
This approach focuses on the uptime and response time of web pages. The core approach is to use the eXist job scheduler to execute an XQuery script at regular time intervals. This script performs a HTTP GET on a URI and records the statusCode of the site in an XML data file. The operation is timed to gather response times from elapsed time (valid on a lightly used server) and the test results stored. Reports can then be run from the test results and alerts send when a site is observed to be down. Even though a prototype, the access to fine-grained data has already revealed some response time issues on one of the sites at the University. Watch list [3]

Conceptual Model
This ER model was created in QSEE, which can also generate SQL or XSD.

In this notation the bar indicates that Test is a weak entity with existence dependence on Watch.

Uptime monitor

353

Mapping ER model to Schemas


Watch-Test relationship Since Test is dependant on Watch, the Watch-Test relationship can be implemented as composition, with the multiple Test elements contained in a Log element which itself is a child of the Watch element. Tests are stored in chronological order. Watch Composition Two possible approaches: add the Log as a element amongst the base data for the Watch Watch uri name Log Test construct a Watch element which contains the Watch base data as WatchSpec and the Log Watch WatchSpec (the Watch entity ) uri name Log The second approach preserves the original Watch entity as a node, and also fits with the use of XForms, allowing the whole WatchSpec node to be included in a form. However it introduces a difficult-to-name intermediate, and results in paths like $watch/WatchSpec/uri when $watch/uri would be more natural. Here we choose the first approach on the grounds that it is not desirable to introduce intermediate elements in anticipation of simpler implementation of a particular interface. Watch entity A Watch entity may be implemented as a file or as an element in a collection. Here we choose to implement Watch as a element in a Monitor container in a document. However this is a difficult decision and the XQuery code should hide this decision as much as possible. Attribute implementation Watch attributes are mapped to elements. Test attributes are mapped to attributes. Schema Model Generated QSEE will generate an XML Schema [4]. In this mapping, all relationships are implemented with foreign keys, with key and keyref used to describe the relationship. In this case, the schema would need to be edited to implement the Watch-Test relationship by composition.

Uptime monitor By Inference This schema has been generated by Trang (in Oxygen ) from an example document, created as the system runs. Compact Relax NG element Monitor { element Watch { element uri { xsd:anyURI }, element name { text }, element Log { element Test { attribute at { xsd:dateTime }, attribute responseTime { xsd:integer }, attribute statusCode { xsd:integer } }+ } }+ } XML Schema XML Schema [5] Designed Schema Editing the QSEE generated schema results in a schema which includes the restriction on statusCodes. XML Schema [6] Test Data An XQuery script transforms an XML Schema (or a subset thereof) to a random instance of a conforming document. Random Document [7] The constraint that Tests are in ascending order of the attribute at is not defined in this schema. The generator needs to be helped to generate useful test data by additional information about the length of strings and the probability distribution of enumerated values, iterations and optional elements

354

Equivalent SQL implementation


CREATE TABLE Watch( uri VARCHAR(8) NOT NULL, name VARCHAR(8) NOT NULL, CONSTRAINT pk_Watch PRIMARY KEY (uri) ) ; CREATE TABLE Test( at TIMESTAMP NOT NULL, responseTime INTEGER NOT NULL, statusCode INTEGER NOT NULL, uri VARCHAR(8) NOT NULL, CONSTRAINT pk_Test PRIMARY KEY (at,uri) ) ;

Uptime monitor ALTER TABLE Test ADD INDEX (uri), ADD CONSTRAINT fk1_Test_to_Watch FOREIGN KEY(uri) REFERENCES Watch(uri) ON DELETE RESTRICT ON UPDATE RESTRICT; In the Relational implementation the primary key uri of Watch is the foreign key of Test. There would be an advantage to adding a system-generated id to use in place of this meaningful URI, both to remove the redundancy created and to reduce the size of the foreign key. However a mechanism is then need to allocate unique ids.

355

Implementation
Dependancies
eXistdb modules xmldb for database update and login datetime for date formating util - for system-time function httpclient - for HTTP GET scheduler - to scheule the monitoring task validation - for database validation

other Google Charts

Functions
Functions in a single XQuery module. module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor";

Database Access
Access to the Monitor database which may be a local database document, or a remote document. declare function monitor:get-watch-list($base as xs:string) as element(Watch)* { doc($base)/Monitor/Watch };

A specific Watch entity is identified by its URI: let $wl:= monitor:get-watch-list("/db/Wiki/Monitor3/monitor.xml")

Further references to a Watch are by reference. e.g. declare function monitor:get-watch-by-uri($base as xs:string, $uri as xs:string) as element(Watch)* {

Uptime monitor monitor:get-watch-list($base)[uri=$uri] };

356

Executing Tests
The test does an HTTP GET on the uri. The GET is bracketed by calls to util:system-time() to compute the elapsed wall-clock time in milliseconds. The test report includes the statusCode.
declare function monitor:run-test($watch as element(Watch)) as element(Test) { let $uri := $watch/uri let $start := util:system-time() let $response := httpclient:get(xs:anyURI($uri),false(),())

let $end := util:system-time() let $runtimems := (($end - $start) div xs:dayTimeDuration('PT1S')) * 1000 let $statusCode := string($response/@statusCode) return <Test }; at="{current-dateTime()}" responseTime="{$runtimems}" statusCode="{$statusCode}"/>

The generated test is appended to the end of the log: declare function monitor:put-test($watch as element(Watch), $test as element(Test)) { update insert $test into $watch/Log };

To execute the test, a script logs in, iterates through the Watch entities and for each, executes the test and stores the result: import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm"; let $login := xmldb:login("/db/","user","password") let $base := "/db/Wiki/Monitor3/Monitor.xml" for $watch in monitor:get-watch-list($base) let $test := monitor:run-test($watch) let $update :=monitor:put-test($watch,$test) return $update

Uptime monitor

357

Job scheduling
A job is schedule to run this script every 5 minutes let $login := xmldb:login("/db","user","password") return scheduler:schedule-xquery-cron-job("/db/Wiki/Monitor/runTests.xq" , "0 0/5 * * * ?")

Index page
The index page is based on a supplied Monitor document, by default the production database.
import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm";

declare option exist:serialize

"method=xhtml media-type=text/html";

declare variable $heading := "Monitor Index"; declare variable $base :=

request:get-parameter("base","/db/Wiki/Monitor3/Monitor.xml");

<html> <head> <title>{$heading}</title> </head> <body> <h1>{$heading}</h1> <ul> {for $watch in return <li>{string($watch/name)}&#160; &#160; monitor:get-watch-list($base)

<a href="report.xq?base={encode-for-uri($base)}&amp;uri={encode-for-uri($watch/uri)}">Report</a> </li> } </ul> </body> </html>

In this implementation, the URI of the monitor document is passed to dependent scripts in the URI. An alternative would to pass this data via a session variable. View [3]

Uptime monitor

358

Reporting
Reporting draws on the log of Tests for a Watch declare function monitor:get-tests($watch as element(Watch)) as element(Test)* { $watch/Log/Test };

Overview Report
The basic report shows summary data about the watched URI and an embedded chart of response time over time. Up-time is the ratio of tests with a status code of 200 to the total number of tests.
import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm";

declare option exist:serialize

"method=xhtml media-type=text/html";

let $base := request:get-parameter("base",()) let $uri:= request:get-parameter("uri",()) let $watch :=monitor:get-watch-by-uri($base,$uri)

let $tests := monitor:get-tests($watch) let $countAll := count($tests) let $uptests := $tests[@statusCode="200"] let $last24hrs := $tests[position() >($countAll - 24 * 12)] let $heading := concat("Performance results for ", string($watch/name)) return <html> <head> <title>{$heading}</title> </head> <body> <h3> <a href="index.xq">Index</a> </h3> <h1>{$heading}</h1> <h2><a href="{$watch/uri}">{string($watch/uri)}</a></h2> {if (empty($tests)) then () else <div> <table border="1"> <tr> <th>Monitoring started</th> <td> {datetime:format-dateTime($tests[1]/@at,"EE dd/MM HH:mm")}</td> </tr>

Uptime monitor
<tr> <th>Latest test</th> <td> {datetime:format-dateTime($tests[last()]/@at,"EE dd/MM HH:mm")}</td> </tr> <tr> <th>Minimum response time </th> <td> {min($tests/@responseTime)} ms </td> </tr> <tr> <th>Average response time</th> <td> { round(sum($tests/@responseTime) div count($tests))} ms</td> </tr> <tr> <th>Maximum response time </th> <td> {max($tests/@responseTime)} ms</td> </tr> <tr> <th>Uptime</th> <td>{round(count($uptests) div count($tests) </tr> <tr> <th>Raw Data </th> <td> <a href="testData.xq?base={encode-for-uri($base)}&amp;uri={encode-for-uri($uri)}">View</a> </td> </tr> <tr> <th>Response Distribution </th> <td> <a href="responseDistribution.xq?base={encode-for-uri($base)}&amp;uri={encode-for-uri($uri)}">View</a> </td> </tr> </table> <h2>Last 24 hours </h2> {monitor:responseTime-chart($last24hrs)} <h2>1 hour averages </h2> {monitor:responseTime-chart(monitor:average($tests,12))} * 100) } %</td>

359

</div> } </body> </html>

View [8]

Uptime monitor

360

Response time graph


The graph is generated using the Google Chart API [1]. The default vertical scale from 0 to 100 fits the typical response time. In this simple example, the graph is unadorned or explained. declare function monitor:responseTime-chart($test as element(Test)* ) as element(img) { let $points := string-join($test/@responseTime,",") let $chartType := "lc" let $chartSize := "300x200" let $uri := concat("http://chart.apis.google.com/chart?", "cht=",$chartType,"&amp;chs=",$chartSize,"&amp;chd=t:",$points) return <img src="{$uri}"/> };

Response Time Frequency Distribution


The frequency distribution of response times summarised the response times. First the distribution itself is computed as a sequence of groups. The interval calculation is crude and uses 11 groups to fit with Google Chart.
declare function monitor:response-distribution($test as element(Test)* ) as element(Distribution) { let $times := $test/@responseTime let $min := min($times) let $max := max($times) let $range := $max - $min let $step := round( $range div 10) return <Distribution> { for $i in (0 to 10) let $low := $min + $i * $step let $high :=$low + $step return <Group i="{$i}" mid="{round(($low + $high ) div 2)}" count="{ count($times[. >= $low] [. < $high]) }"/> } </Distribution> };

This grouped distribution can then be Charted as a bar chart. Scaling is needed in this case. declare function monitor:distribution-chart($distribution as element(Distribution)) as element(img) { let $maxcount := max($distribution/Group/@count) let $scale :=100 div $maxcount let $points := string-join( $distribution/Group/xs:string($scale * @count),",") let $chartType := "bvs"

Uptime monitor let $chartSize := "300x200" let $uri := concat("http://chart.apis.google.com/chart?", "cht=",$chartType,"&amp;chs=",$chartSize,"&amp;chd=t:",$points) return <img src="{$uri}"/> }; Finally a Script to create a page: import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm"; declare option exist:serialize "method=xhtml media-type=text/html";

361

let $base := request:get-parameter("base",()) let $uri:= request:get-parameter("uri",()) let $watch := monitor:get-watch($base,$uri) let $tests := monitor:get-tests($watch) let $heading := concat("Distribution for ", string($watch/name)) let $distribution := monitor:response-distribution($tests) return <html> <head> <title>{$heading}</title> </head> <body> <h1>{$heading}</h1> {monitor:distribution-chart($distribution)} <br/> <table border="1"> <tr> <th>I </th> <th>Mid</th> <th>Count</th> </tr> {for $group in $distribution/Group return <tr> <td>{string($group/@i)}</td> <td>{string($group/@mid)}</td> <td>{string($group/@count)}</td> </tr> } </table> </body> </html>

Uptime monitor

362

Validation
The eXist module provides functions for validating a document against a schema. The Monitor document links to a schema: let $doc := "/db/Wiki/Monitor3/Monitor.xml" return <report> <document>{$doc}</document> {validation:validate-report(doc($doc))} </report> Execute [9] Alternatively, a document can be validated against any schema: let $schema := "http://www.cems.uwe.ac.uk/xmlwiki/Monitor3/trangmonitor.xsd" let $doc := "/db/Wiki/Monitor3/Monitor.xml" return <report> <document>{$doc}</document> <schema>{$schema}</schema> {validation:validate-report(doc($doc),xs:anyURI($schema))} </report> Execute [10] This is used to check that the randomly generated instance is valid:
let $schema := request:get-parameter("schema",()) let $file := doc(concat("http://www.cems.uwe.ac.uk/xmlwiki/XMLSchema/schema2instance.xq?file=",$schema)) return <result> <schema>{$schema}</schema> {validation:validate-report($file,xs:anyURI($schema))} {$file} </result>

Execute [11]

Uptime monitor

363

Downtime alerts
The purpose of a monitor is to alert those responsible for a site to its failure. Such an alert might be by SMS, email or some other channel. The Watch entity will need to be augmented with configuration parameters.

Check if failed
First it is necessary to calculate whether the site is down. monitor:failing () returns true() if all tests in the past $watch/fail-minutes have not returned a statusCode of 200. declare function monitor:failing($watch as element(Watch)) as xs:boolean { let $now := current-dateTime() let $lastTestTime := $now - $watch/failMinutes * xs:dayTimeDuration("PT1M") let $recentTests := $watch/Log/Test[@at > $lastTestTime] return every $t in $recentTests satisfies not($t/statusCode = "200") };

Check if alert already sent


If this test is executed repetitively by a scheduled job, an Alert message on the appropriate channel can be generated. However, the Alert message would be sent every time the condition is true. It would be better to send an Alert less frequently. One approach would add Alert elements to the log, interspersed with the Tests. This does not affect the code which accesses Tests, but allows us to inhibit Alerts when one has been recently. alert-sent() will be true if an alert has been sent in the last $watch/alert-minutes. declare function monitor:alert-sent($watch as element(Watch) as xs:boolean ) { let $now := current-dateTime() let $lastAlertTime := $now - $watch/alertMinutes * xs:dayTimeDuration("PT1M") let $recentAlerts := $watch/Log/Alert[@at > $lastAlertTime] return exists($recentAlerts) };

Alter notification task


The task to check the monitor log iterates through the Watches and for each checks if it is failing but no Alert has been sent in the period. If so, a message is constructed and an Alert element is added to the Log. The use of the Log to record Alert events means that no other state need to be held, and the period with which this task is executes is unrelated to the Alert period.
import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm"; let $login := xmldb:login("/db/","user","password") let $base := "/db/Wiki/Monitor3/Monitor.xml" for $watch in monitor:get-watch-list($base)

Uptime monitor
return if (monitor:failing($watch) and not(monitor:alert-sent($watch))) then let $update := update insert <Alert at="{current-dateTime()}"/> into $watch/Log let $alert := monitor:send-alert($watch,$message) return true() else false()

364

Discussion
Alert events could be added to a separate AlertLog but it is arguably easier to add a new class of Events than create a separate sequence for each. There may also be cases where the sequential relationship between Tests and Events is useful. [ Re-designed Schema]

To do
add create/edit Watch detect missing tests Support analysis for date ranges by filtering tests by date prior to analysis improve the appearance of the charts

References
[1] [2] [3] [4] [5] [6] [7] http:/ / www. pingdom. com/ http:/ / host-tracker. com/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ index. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ qseemonitor. xsd http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ trangmonitor. xsd http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ designmonitor. xsd http:/ / www. cems. uwe. ac. uk/ xmlwiki/ XMLSchema/ schema2instance. xq?file=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ designmonitor. xsd [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor/ report. xq?uri=http:/ / www. google. co. uk/ [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ validate. xq [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ validateschema. xq [11] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ validaterandom. xq?schema=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ monitor. xsd

URL Driven Authorization

365

URL Driven Authorization


Motivation
You want to check to see if a user is loged in before a page is rendered.

Method
This example will use a custom controller.xql file to do this. Sample controller.xql file: (: Protected resource: user is required to log in with valid credentials. If the login fails or no credentials were provided, the request is redirected to the login.xml page. :) else if ($exist:resource eq 'protected.xml') then let $login := local:set-user() return if ($login) then <dispatch xmlns="http://exist.sourceforge.net/NS/exist"> {$login} <view> <forward url="style.xql"/> </view> </dispatch> else <dispatch xmlns="http://exist.sourceforge.net/NS/exist"> <forward url="login.xml"/> <view> <forward url="style.xql"/> </view> </dispatch> else (: everything else is passed through :) <dispatch xmlns="http://exist.sourceforge.net/NS/exist"> <cache-control cache="yes"/> </dispatch>

URL Rewriting Basics

366

URL Rewriting Basics


Motivation
You want to take simple, short, intuitive and well designed incoming URLs and map them to the appropriate structures in your database. You want to achieve the ideal of 'cool URLs' and make your XQuery apps portable within your database and to other databases.

Method
A typical URL in eXist has a format similar to the following: http://www.example.com:8080/exist/rest/db/app/search.xq?q=apple You want users to access this page through a cooler [1], less platform-dependent URL such as: http://www.example.com/search?q=apple In order to go transform your URLs into the latter cool form, you need to understand the fundamentals of URLs in eXist.

Parts of a URL
Fundamentally, eXist's URLs consist of 3 parts: 1. The Hostname and Port: In the example above the hostname is www.example.com and the port is 8080 2. The Web Application Context: In the example above the context is /exist 3. The Path: In the example above the path is /rest/db/app/search.xq?q=apple Customizing an eXist URL can mean targetting 1 or more of the 3 parts.

Rewriting Primer
Some methods below make use of eXist's URL-rewriting facility, that conceptually will let your application follow a MVC (model-view-controller) design. eXist 1.5 comes preconfigured with a working setup that embodies these principles: 1. The collection that lives below /db/myapp/, which is exposed through the REST servlet via /exist/rest/db/myapp/, can at the same time be reached through URL-rewriting in the location /exist/apps/myapp/. 2. Placing a controller.xql inside of /db/myapp/ will determine how the data, a.k.a. model inside of this collection gets presented in the space created by URL-rewriting - so to say: it controls the view at the model.
Please read farther below on how to configure URL-rewriting in version 1.4.1 of eXist to get the same setup.

Customizing URLs
Changing the Port
The port for eXist's default web server (Jetty) is 8080, and it is set in $EXIST_HOME/tools/jetty/etc/jetty.xml line 51. You can modify this file, or you can set the port on startup by setting the -Djetty.port=80 flag upon startup. Note that how you change the port is different based on how you start eXist. If you start eXist from the bin/startup using a UNIX or DOS shell you must change the startup.sh or startup.bat file. If you start eXist automatically using the UNIT tools/wrapper/exist.sh tools or the Windows Services you need to change the jetty.xml file. Restart eXist. Now, with this change made, your URL will now look like: http://www.example.com/exist/rest/db/app/search.xq?q=apple

URL Rewriting Basics instead of: http://www.example.com:8080/exist/rest/db/app/search.xq?q=apple On Unix (including Mac OS X) and Linux, you will need to run eXist as root in order to bind to port 80. Otherwise the server won't start.

367

Changing the Web Application Context


To trim your server's web application context from /exist to /, go to line 134 of the same $EXIST_HOME/tools/jetty/etc/jetty.xml file and change the following: From: <Arg>/exist</Arg> To: <Arg>/</Arg> Restart eXist. Now, with this change made, your URL will now look like: http://www.example.com/rest/db/app/search.xq?q=apple instead of: http://www.example.com/exist/rest/db/app/search.xq?q=apple

Customizing the Remainder of the URL


In customizing the remainder of the URL, eXist's URL Rewriting feature becomes both powerful and challenging. (See eXist Documentation on URL rewriting [2] for complete documentation on this aspect of URLs in eXist.) The heart of eXist's URL Rewriting is a file that controls the URLs for its portion of your site; this file is called controller.xql, and you place it at the root of your web application directory. It controls all of the URLs in its directory and in child directories (although child directories can contain their own controller.xql files - more on this later). If your web application is stored on the filesystem, you would likely place 'controller.xql' in the /webapp directory. If your web application is stored in the eXist database, you might put it in the /db collection. In our running example app, where would you store your controller.xql? Current form: http://www.example.com/rest/db/app/search.xq?q=apple Goal URL: http://www.example.com/search?q=apple A natural location for the controller.xql would be the /db/app directory, because the search.xq file (and presumably the other .xq files) are stored in this directory or beneath. Given this location for our app's root controller.xql, we need to tell eXist to look for the root controller.xql in the '/db/app' directory. We do this by editing the controller-config.xml file in the /webapp/WEB-INF folder. Comment out lines 27-28, and add the following: <root pattern="/*" path="xmldb:exist:///db/app"/> Then restart eXist. This new root pattern will forward all URL requests (/*) to the /db/app directory. Now, with this change made, your URL will now look like: http://www.example.com/search.xq?q=apple instead of: http://www.example.com/rest/db/app/search.xq?q=apple The final step to customizing the URL is to create a controller.xql file that will take a request for /search?q=apple and pass this request to the search.xq file along with the q parameter.

URL Rewriting Basics A basic controller.xql file that will accomplish this goal is as follows: xquery version "1.0"; (:~ Default controller XQuery. Forwards '/search' to search.xq in the same directory and passes all other requests through. :) (: Root path: forward to search.xq in the same collection (or directory) as the controller.xql :) if (starts-with($exist:path, '/search')) then let $query := request:get-parameter("q", ()) return <dispatch xmlns="http://exist.sourceforge.net/NS/exist"> <forward url="search.xq"/> <set-attribute name="q" value="{$query}"/> </dispatch> (: Let everything else pass through :) else <ignore xmlns="http://exist.sourceforge.net/NS/exist"> <cache-control cache="yes"/> </ignore> Note that the $exist:path variable is a variable that eXist makes available to controller.xql files. The value of $exist:path is always equal to the portion of the requested URL that comes after the controller's root directory. A request to '/search' will cause $exist:path to be '/search'. Save this query as controller.xql and place it in your /db/app directory. Congratulations! Our URL is now in the very cool form we had envisioned: http://www.example.com/search?q=apple instead of: http://www.example.com/search.xq?q=apple This $exist:path variable is one of 5 such variables available to controller.xql files. (See the full URL Rewriting documentation for more information on each.) These variables give you very fine control over the URLs requested as well as eXist's own internal paths to your app's resources. Since you may wish to re-route a URL request based on the URL parameters (e.g. q=apple), you may wish to retrieve the URL parameter using the request:get-parameter() function, and then to explicitly pass this parameter to the target query using the <add-parameter> element, as in the example controller.xql file. Thus, in customizing the "path" section of the URL, we have actually paid attention to 3 items: 1. The root pattern and path to its root controller directory (recall the <root> element inside the controller-config.xml file) 2. The remainder of the path after the controller directory 3. The URL parameters included as part of the URL

368

URL Rewriting Basics This simple example only touches the surface of what you can do with URL Rewriting. Using URL Rewriting not only gives your apps 'cool URLs', but it also allows your apps to be much more portable, both on your server and in getting your apps onto other servers.

369

Further considerations
Defining multiple 'roots'
If you want your main app to live in /db/app but you still want to access apps such as the admin app ('/webapp/admin') stored on the filesystem, add a <root> element to controller-config.xml declaring the root pattern you want to associate with the filesystem's /webapp directory. Replace your current root elements with the following: <root pattern="/fs" path="/"/> <root pattern="/*" path="xmldb:exist:///db/app"/> This will pass all URL requests beginning with /fs to the filesystem's webapp directory. All other URLs will still go to the /db/app directory.

Using multiple controller.xql files


While you can get along fine with only one controller.xql (or even none!), eXist allows controller.xql files to be placed at any level of a root controller hierarchy, as defined in the controller-config.xml's <root> element(s). This allows the controller.xql files to be highly specific to the concerns of a given directory. eXist searches for the deepest controller.xql file that matches the deepest level of the URL request, working up toward the root controller.xql.

The importance of order in the controller.xql logic


Make sure that you arrange your conditional expressions in the proper order so that the rules are evaluated in that order, and no rules are inadverently evaluated first. In other words, if another rule matches URLs beginning with '/sea', the URL rewriter would always pass '/search' URLs to that rule instead of your '/search' rule.

Variable Standards
The code inside of controller.xql gets passed some variables in addition to the usual ones. Below controller.xql does not do any forwarding, but instead prints their values, and the path to the document requested, if there is one there xquery version "1.0"; declare namespace exist="http://exist.sourceforge.net/NS/exist"; import module namespace text="http://exist-db.org/xquery/text"; declare declare declare declare declare variable variable variable variable variable $exist:root external; $exist:prefix external; $exist:controller external; $exist:path external; $exist:resource external;

let $document := concat($exist:root, (: $exist:prefix, :) $exist:controller, $exist:path) return <dummy> <exist-root>{$exist:root}</exist-root>

URL Rewriting Basics <exist-prefix>{$exist:prefix}</exist-prefix> <exist-controller>{$exist:controller}</exist-controller> <exist-path>{$exist:path}</exist-path> <exist-resource>{$exist:resource}</exist-resource> <document>{$document}</document> </dummy>

370

Acknowledgments
Joe Wicentowski contributed the core of this article to the eXist-open mailing list on Mon, 19 Oct 2009. It was subsequently edited by Dan McCreary and Joe Wicentowski into its present form.

References
[1] http:/ / www. w3. org/ Provider/ Style/ URI [2] http:/ / exist-db. org/ urlrewrite. html

Using Intermediate Documents


Processing XML often involves the creation of intermediate XML fragments for subsequent processing. Here is an example of two approaches, one using multiple passes on the same data, the other a constructed intermediate view of the data.

MusicXML
MusicXML [1] is an XML application for recording music scores. There is a range of software which produces and consumes MusicXML. There are two styles of MusicXML with two related schemas, one in which measures are within parts (partwise), the other in which parts are within measures (timewise). An example of a MusicXML partwise score is Mozart's Piano Sonata in A Major, K. 331 [2] Here is a sample definition of a note: <note> <pitch> <step>A</step> <octave>3</octave> </pitch> <duration>2</duration> <voice>3</voice> <type>eighth</type> <stem>down</stem> <staff>2</staff> <beam number="1">begin</beam> <notations> <slur type="stop" number="1"/> </notations> </note>

Using Intermediate Documents

371

Notes Range
The Recordare site has some sample code to demonstrate the use of XQuery to process MusicXML [3]. The first script finds the lowest and highest notes in the score. The script shown on the site is not conformant to the current XQuery standard, but a few minor changes brings it up-to-date. declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer { let $step := $thispitch/step let $alter := if (empty($thispitch/alter)) then 0 else xs:integer($thispitch/alter) let $octave := xs:integer($thispitch/octave) let $pitchstep := if ($step = "C") then 0 else if ($step = "D") then 2 else if ($step = "E") then 4 else if ($step = "F") then 5 else if ($step = "G") then 7 else if ($step = "A") then 9 else if ($step = "B") then 11 else 0 return 12 * ($octave + 1) + $pitchstep + $alter } ; let $doc := doc("/db/Wiki/Music/examples/MozartPianoSonata.xml") let $part := $doc//part[./@id = "P1"] let $highnote := max(for $pitch in $part//pitch return local:MidiNote($pitch)) let $lownote := min(for $pitch in $part//pitch return local:MidiNote($pitch)) let $highpitch := $part//pitch[local:MidiNote(.) = $highnote] let $lowpitch := $part//pitch[local:MidiNote(.) = $lownote] let $highmeas := string($highpitch[1]/../../@number) let $lowmeas := string($lowpitch[1]/../../@number) return <result> <low-note>{$lowpitch[1]} <measure>{$lowmeas}</measure> </low-note> <high-note>{$highpitch[1]} <measure>{$highmeas}</measure> </high-note>

Using Intermediate Documents </result> With output: <result> <low-note> <pitch> <step>D</step> <octave>2</octave> </pitch> <measure>3</measure> </low-note> <high-note> <pitch> <step>E</step> <octave>6</octave> </pitch> <measure>5</measure> </high-note> </result> execute [4]

372

Ancestor access
The path to the measure in which a note is located let $highmeas := string($highpitch[1]/../../@number)

uses a fixed set of steps back up the hierarchy. This limits the application of this script to one type of MusicXML schema because the position of the measure in the hierarchy is different in the two schemas. When the script was written, the ancestor axis was not supported but it is now, so those lines are more generally expressible as: let $highmeas := string($highpitch/ancestor::measure/@number)

Note-to-midi
The function to convert notes to midi numbers uses nested if-then-else expressions. XQuery lacks a switch expression which might be used but a clearer approach would be to use a lookup-table, defined either locally in the script or stored in the database. Here a sequence of notes is created as a look-up table. This is bound to a global variable which is used in a revised note-to-midi function: declare variable ( <note name="C" <note name="D" <note name="E" $NOTESTEP := stepNo="0"/>, stepNo="2"/>, stepNo="4"/>,

Using Intermediate Documents <note <note <note <note ); declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer { let $alter := xs:integer(($thispitch/alter,0)[1]) let $octave := xs:integer($thispitch/octave) let $pitchstepNo := xs:integer($NOTESTEP[@name = $thispitch/step]/@stepNo) return 12 * ($octave + 1) + $pitchstepNo + $alter } ; name="F" name="G" name="A" name="B" stepNo="5"/>, stepNo="7"/>, stepNo="9"/>, stepNo="11"/>

373

Intermediate XML
The original script required repeated access to the original MusicXML source. An alternative approach would be to create an intermediate structure to hold the midi notes and use this in subsequent analysis. This structure is a computed view of the original notes augmented with derived data - the midi note and the measure.

let $midiNotes := for $pitch in return <pitch>

$part//pitch

{$pitch/*} <midi>{local:MidiNote($pitch)}</midi> <measure>{string($pitch/../../@number)}</measure> </pitch> and this view is then used to locate the high and low notes and their position in the score: let $highnote := max($midiNotes/midi) let $lownote := min($midiNotes/midi) let $highpitch := $midiNotes[midi = $highnote] let $lowpitch := $midiNotes[midi = $lownote]

Revised script
declare variable ( <note name="C" step="0"/>, <note name="D" step="2"/>, <note name="E" step="4"/>, <note name="F" step="5"/>, $NOTESTEP :=

Using Intermediate Documents


<note name="G" step="7"/>, <note name="A" step="9"/>, <note name="B" step="11"/> ); declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer { let $alter := xs:integer(($thispitch/alter,0)[1]) let $octave := xs:integer($thispitch/octave) let $name := $thispitch/step let $pitchstep := xs:integer($NOTESTEP[@name = $name]/@step) return 12 * ($octave + 1) + $pitchstep + $alter } ; let $doc := doc("/db/Wiki/Music/examples/MozartPianoSonata.xml") let $part := $doc//part[./@id = "P1"] let $midiNotes := for $pitch in return <pitch> {$pitch/*} <midi>{local:MidiNote($pitch)}</midi> <measure>{string($pitch/ancestor::measure/@number)}</measure> </pitch> let $highnote := max($midiNotes/midi) let $lownote return <result> <low-note> {$midiNotes[midi = $lownote]} </low-note> <high-note> { $midiNotes[midi = $highnote]} </high-note> </result> := min($midiNotes/midi) $part//pitch

374

execute [5]

Using Intermediate Documents

375

Discussion
Although arguably a cleaner, more direct design, the second script relies on the construction of temporary XML nodes which are then the subject of XPath expressions. These temporary XML nodes are handled differently in different implementations. In older verisons of eXist each is written to a temporary document in the database which creates an performance overhead and problems of garbage collection. In the 1.3 release, intermediate XML nodes remain in memory, resulting in a major performance improvement. There is however another problem with this approach. The size of the intermediate node may exceed pre-set, but configurable, limits on the size of constructed nodes.

References
[1] [2] [3] [4] [5] http:/ / www. recordare. com/ xml. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Music/ examples/ MozartPianoSonata. xml http:/ / www. recordare. com/ good/ max2002%2Dupdate. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Music/ noterange1. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Music/ noterange3. xq

Using Triggers to assign identifiers


Motivation
You want to automagically assign unique identifiers to all new incoming documents or XML nodes: no matter if they are created from a controller that you scripted, if they are resulting from an HTTP PUT operation in a REST call, if they are the product of a webdav COPY operation, or uploaded from the java admin RPC client.

Method
We will create a trigger that will fire on all document store operations. The trigger will modify the document to be stored in the database. Our identifier will be taken from the output of the util:uuid() function. A simple assignment will do due diligence. No special authority, such as an incremental counter, will be necessary.

System Configuration
This example assumes that the documents that you want to tag with identifiers live below /db/my-collection. eXist triggers are declared in a configuration file that is placed in the /db/system/config area with the above path added to it. Such a file with the relevant lines will look like this: /db/system/config/db/my-collection/collection.xconf
<collection xmlns="http://exist-db.org/collection-config/1.0"> <triggers> <trigger event="store" class="org.exist.collections.triggers.XQueryTrigger"> <parameter name="url" value="xmldb:exist://localhost/db/triggers/assign-id.xq"/> </trigger> </triggers> </collection>

Now every time a store or update event happens to this collection the XQuery script /db/triggers/assign-id.xq gets run.

Using Triggers to assign identifiers Beware! You cannot, due to limitations of current (<=1.5dev) design, attach more than one xquery script to the same trigger event. Only the trigger declared last for an event will be used.

376

XQuery Script
The script will add the uuid as an attribute to the root element of the incoming document, overwriting any uuid attribute that is already there. NOTE: These examples do not work reliably: As soon as your xquery causes exceptions in the thread that it runs in, then there is a great chance, that it will hang indefinitely, eg. if you store a binary resource below the path it works on. Further operations on the processed resource then will NOT trigger the script until a restart of the whole database. xquery version "1.0"; (: An XQueryTrigger that adds a uuid to all documents when they are stored in the database. :) declare namespace util="http://exist-db.org/xquery/util"; declare declare declare declare declare variable variable variable variable variable $local:triggerEvent external; $local:eventType external; $local:collectionName external; $local:documentName external; $local:document external;

declare variable $local:coll := "/db/my-collection"; declare variable $local:uuid := string($local:document/@uuid); declare variable $local:match := collection($local:coll)/*[@uuid = $local:uuid]; (: This is still the xquery prolog: from my experiments, an xquery trigger MUST NOT hava an xquery body. A severe limit: no conditionals allowed, just straight procedural action. :) util:log('debug', '### assign-id.xq trigger fired ###'), update insert attribute {'uuid'} {util:uuid()} into doc($local:documentName)/*, util:log('debug', '### assign-id.xq trigger done ###')

Using Triggers to assign identifiers

377

References
Guide to configuring eXist triggers [1]

References
[1] http:/ / exist-db. org/ triggers. html

Using Triggers to Log Events


Motivation
You want to log all changes in files of a single collection.

Method
We will create a trigger that logs these events. The trigger will append a string to a log file. There are six trigger event types: store: Fired when a document is created in the collection or sub-collection update: Fired when a document is updated in the collection or sub-collection remove: Fired when a document is deleted from the collection or sub-collection create: Fired when a sub-collection is created rename: Fired when a sub-collection is renamed delete: Fired when a sub-collection is deleted

Sample Code
NOTE: These examples do not reliably work! In this example we will be logging all store, update and remove events from the collection /db/my-collection Here is a sample trigger configuration file. This file is placed in the /db/system/config are with the same db path added to it that you want to monitor: /db/system/config/db/my-collection Here is what the trigger file looks like: collection.xconf <collection xmlns="http://exist-db.org/collection-config/1.0"> <triggers> <trigger event="store, update, remove, create, rename, delete" class="org.exist.collections.triggers.XQueryTrigger"> <parameter name="url" value="xmldb:exist://localhost/db/triggers/log-changes.xq"/> <parameter name="test" value="test-value"/> </trigger> </triggers> </collection> Note that the three trigger operations (store, update, remove) are listed in the event attribute and separated by commas. When these operations are fired the XQuery /db/triggers/log-changes.xq gets run. You can pass

Using Triggers to Log Events parameters to this query using the parameter element.

378

XQuery logger
xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace response="http://exist-db.org/xquery/response"; declare namespace session="http://exist-db.org/xquery/session"; declare namespace xdb="http://exist-db.org/xquery/xmldb"; declare namespace util="http://exist-db.org/xquery/util"; declare declare declare declare declare declare declare variable variable variable variable variable variable variable $local:triggerEvent external; $local:eventType external; $local:collectionName external; $local:documentName external; $local:document external; $local:test external; $local:triggersLogFile := "triggersLog2.xml";

(: create the log file if it does not exist :) if(not(doc-available($local:triggersLogFile))) then ( xmldb:store("/db", $local:triggersLogFile, <events/>) ) else(), update insert <event ts="{ current-dateTime() }" event="{$local:triggerEvent}" eventType="{$local:eventType}" test-1="{$local:test}" collectionName="{$local:collectionName}" documentName="{$local:documentName}" > {$local:document} </event> into doc("/db/logs/event-log.xml")/triggers

References
Guide to configuring eXist triggers [1]

Using XQuery Functions

379

Using XQuery Functions


Motivation
You would like to use an existing XQuery function. You need to be able to understand how functions work and how they are documented.

Method
The XQuery 1.0 specification has many built-in functions for handling strings, URIs and other data. To use a built-in function you need to know what its inputs are and their data types and what form of output it creates.

Understanding XQuery Parameters and Return Types


XQuery is a strongly typed language, so functions are usually carefully designed to work with a restricted set of data types. When you pass data into a function using a parameter, you must specify its type, using combinations of elements, nodes, items, sequences or any of the XML Schema data types. In addition to the data type, a suffix called an occurrence indicator, such as "+", "?" or "*", follows the data type to indicate whether a parameter is optional or could have multiple values. Here are the three occurrence indicators and their meanings: ? matches zero or one items * matches zero or more items + matches one or more items Care should be taken to understand the difference between an argument as a single sequence and a repeating set of items. There are around 111 built-in functions to the XQuery language, but many find that 10% of the XQuery functions are used 90% of the time. Here are some of the most commonly used functions:
string-length($string as xs:string) as xs:integer - returns the length of a string as the number of its characters

Example: string-length("Hello") returns 5, since the string "Hello" is five characters long.
concat($input as xs:anyAtomicType?) as xs:string - concatenates a list of strings together.

The function does not accept a sequence of values, just individual atomic values passed as separate arguments. Example: concat('big', 'red', 'ball') returns "bigredball"
string-join($sequence as xs:string*, $delimiter as xs:string) as xs:string - combines the items in a sequence, separating them with a delimiter

Example: string-join(('big', 'red', 'ball'), '-') returns "big-red-ball" Note that string-join takes as its first argument a single sequence of items, whereas concat takes zero or more strings as arguments.

Using XQuery Functions

380

Sample data types


xs:date refers to the built-in atomic schema type named xs:date attribute()? refers to an optional attribute node element() refers to any element node element(po:shipto, po:address) refers to an element node that has the name po:shipto and has the type annotation po:address (or a schema type derived from po:address) element(*, po:address) refers to an element node of any name that has the type annotation po:address (or a type derived from po:address) element(customer) refers to an element node named customer with any type annotation schema-element(customer) refers to an element node whose name is customer (or is in the substitution group headed by customer) and whose type annotation matches the schema type declared for a customer element in the in-scope element declarations node()* refers to a sequence of zero or more nodes of any kind item()+ refers to a sequence of one or more nodes or atomic values

Useful References
For the authoritative documentation of all the functions available in eXist, including those not defined in XQuery, see XQuery Function Documentation [1] For detailed information about how to specify sequence types see XQuery Sequence Types [1] For the standard XQuery and XPath function library, you may also refer to XQuery 1.0 and XPath 2.0 Functions and Operators [2] See also the detailed documentation of functions with examples from noted XQuery expert Pricilla Walmsley in the FunctX XQuery Function Library [3]

References
[1] http:/ / www. w3. org/ TR/ xquery/ #id-sequencetype-syntax [2] http:/ / www. w3. org/ TR/ xpath-functions [3] http:/ / www. xqueryfunctions. com/ xq

UWE StudentsOnline

381

UWE StudentsOnline
This site has ben developed to support staff, students and prospective students in the Faculty of Computing, Engineering and Mathematical Sciences (CEMS) at the University of the West of England, Bristol, UK The public face of this site is Students Online [1] with an intranet called FOLD. This site is implemented in XQuery with some XSLT on eXist-db. (more)

References
[1] http:/ / www. cems. uwe. ac. uk/ studentsonline

Validating a document
Motivation
You want to validate a document using an XML Schema.

Method
Note: Validation is a very complex topic. eXist come with default setting that may prevent files from being added that are associated with a namespace once a schema is saved in the registry. Please be aware of these factors that are documented here [19]. eXist support a validation module that includes a validate() function to validate an XML file against a grammer file such as an XML Schema.
validation:validate($input-doc as item(), $schema-uri as xs:anyURI) as xs:boolean

where: $input-doc is the document you want to validate $schema-uri is a URI to the XML Schema you want to use to validate the document. Note that this must be of type xs:anyURI. This function return a single true/fales value which is true if the document is valid according to the XML Schema.

Sample Code
xquery version "1.0"; let $doc := <root> <element>test</element> </root> let $schema := '/db/test/validate/schema.xsd' (: you must run this every time the XML Schema file changes! :) let $clear := validation:clear-grammar-cache() let $result := if (validation:validate($doc, $schema))

Validating a document then "PASS" else "FAIL" return <results> {$result} </results>

382

Sample XML Schema


Here is a sample XML Schema to validate a very small XML file. In eXist 1.3 only XML files with a namespace are supported. <xs:schema xmlns="http://example.com" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://example.com" elementFormDefault="qualified"> <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:element name="my-data"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> There are two important points about this XML Schema. The targetNamespace="http://example.com" attribute indicates that this XML Schema is targeting the http://example.com namespace. The elementFormDefault="qualified" attribute indicates that as the XML parser reads the root of the file, the target namespace should be used. Without these two attributes the validation will not work.

Sample XML file to be Validated


<root xmlns="http://example.com"> <my-data>test</my-data> </root>

Getting Error Messages


The validate() function only returns a simple boolean true/false value. If there is an error in your XML file this function is not very useful at finding the errors. To assist with this process their is another function called validate-report(). It has the same arguments: validation:validate-report($input-doc, $schema-uri) The result can be modified to be the following:

Validating a document let $result := if ( validation:validate($input-doc, $schema-uri) ) then "The XML File is Valid" else ( "The XML File is Not Valid", validation:validate-report($input-doc, $schema-uri) )

383

References
Documentation on validation in eXist [19]

Validation using a Catalog


Motivation
You have a library of XML Schemas that you want to associate with namespaces.

Method
An XML Catalog file contains a list of URIs and the files use to validate them. For example the following is a catalog file that describes how DocBook files should be validated: <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <public publicId="-//OASIS//DTD XML DocBook V4.1.2//EN" uri="/db/grammar/docbook.dtd"/> <uri name="http://www.oasis-open.org/committees/docbook/" uri="/db/grammar/docbook.dtd /"/> </catalog>

References
http://atomic.exist-db.org/articles/Validating%20XML%20in%20eXist.pdf

Web XML Viewer

384

Web XML Viewer


Motivation
You want to view XML documents using your web browser using HTML markup.

Method
We will use an XQuery function that uses the dispatch pattern and the typeswitch function.

Sample Input
<aaa a1="A1" a2="A2" a3="A3"> <bbb b1="B1" b2="B2" b3="B3">BBB</bbb> <ccc c1="C1" c2="C2" c3="C3"> <ddd d1="D1" d2="D2" d3="D3">DDD</ddd> <eee> <fff>FFF</fff> </eee> </ccc> </aaa>

Sample XML to HMTL Function


(: sequence dispatcher :) declare function xml-to-html:dispatch($input as node()*, $depth as xs:integer) as item()* { let $left-margin := concat('margin-left: ', ($depth * 5), 'px' ) for $node in $input return typeswitch($node) case text() return normalize-space(<span class="d">{$node}</span>) case element() return <div class="element" style="{$left-margin}"> <span class="t">&lt;{name($node)}</span> { (: for each attribute create two spans for the name and value :) for $att in $node/@* return ( <span class="an"> {name($att)}=</span>, <span class="av">"{string($att)}"</span> ) } {normalize-space(<span class="t">&gt;</span>)} { (: now get the sub elements :) for $c in $node

Web XML Viewer return xml-to-html:dispatch($c/node(), $depth + 1) } <span class="t">&lt;/{name($node)}&gt;</span> </div> (: otherwise pass it through. default return $node }; Used for comments, and PIs :)

385

Sample Driver
xquery version "1.0"; import module namespace xml-to-html="http://example.com/xml-to-html" at "xml-to-html.xqm"; let $title := 'View XML as HTML' let $input := <aaa a1="A1" a2="A2" a3="A3"> <bbb b1="B1" b2="B2" b3="B3">BBB</bbb> <ccc c1="C1" c2="C2" c3="C3"> <ddd d1="D1" d2="D2" d3="D3">DDD</ddd> <eee> <fff>FFF</fff> </eee> </ccc> </aaa> let $output := xml-to-html:xml-to-html($input, 1) return <html> <head> <title>{$title}</title> <link type="text/css" rel="stylesheet" href="syntax-colors-oxygen.css"/> </head> <body> <div class="xml"> {$output} </div> </body> </html>

Web XML Viewer

386

Sample Output
<div class="xml"> <div class="element" style="margin-left: 5px"> <span class="t">&lt;aaa</span> <span class="an">a1=</span> <span class="av">"A1"</span> <span class="an">a2=</span> <span class="av">"A2"</span> <span class="an">a3=</span> <span class="av">"A3"</span>&gt;<div class="element" style="margin-left: 10px"> <span class="t">&lt;bbb</span> <span class="an">b1=</span> <span class="av">"B1"</span> <span class="an">b2=</span> <span class="av">"B2"</span> <span class="an">b3=</span> <span class="av">"B3"</span>&gt;BBB<span class="t">&lt;/bbb&gt;</span> </div> <div class="element" style="margin-left: 10px"> <span class="t">&lt;ccc</span> <span class="an">c1=</span> <span class="av">"C1"</span> <span class="an">c2=</span> <span class="av">"C2"</span> <span class="an">c3=</span> <span class="av">"C3"</span>&gt;<div class="element" style="margin-left: 15px"> <span class="t">&lt;ddd</span> <span class="an">d1=</span> <span class="av">"D1"</span> <span class="an">d2=</span> <span class="av">"D2"</span> <span class="an">d3=</span> <span class="av">"D3"</span>&gt;DDD<span class="t">&lt;/ddd&gt;</span> </div> <div class="element" style="margin-left: 15px"> <span class="t">&lt;eee</span>&gt;<div class="element" style="margin-left: 20px"> <span class="t">&lt;fff</span>&gt;FFF<span class="t">&lt;/fff&gt;</span> </div> <span class="t">&lt;/eee&gt;</span> </div> <span class="t">&lt;/ccc&gt;</span> </div> <span class="t">&lt;/aaa&gt;</span> </div> </div>

Web XML Viewer

387

Sample CSS File


File: syntax-colors-oxygen.css /* Begin and end tag Delimiter */ .t {color: blue;} /* Attribute Name and equal sign */ .an {color: orange;} /* Attribute Values and equal sign */ .av {color: orange;} /* Element Data Content */ .d {color: black;}

Screen Image

Wikibook list of code links

388

Wikibook list of code links


Motivation
This Wikibook contains links to code samples executed on a University server. We need to keep track of all the links so that we can ensure that they remain live, so that all links can be executed by a test bed and to support changes to the directory structure or filenames.

Approach
The script is similar to the index script at the beginning, to get the list of pages in the book. Then it fetches each page and extracts the anchor tags whose href links to the UWE eXist site. The WikiBook page is linked from the page title and the actual URL is listed.
declare namespace h ="http://www.w3.org/1999/xhtml"; declare option exist:serialize "method=xhtml media-type=text/html";

let $book:= request:get-parameter("book","XQuery") let $base := "http://en.wikibooks.org" let $indexPage :=doc(concat($base,"/wiki/Category:",$book,"?x")) let $pages := $indexPage//h:div[@id="mw-pages"]//h:li return

<html> <head> <title>Index of {$book} code samples</title> </head> <body> <h1>Index of {$book} code samples</h1> <ul> { for $letter in distinct-values($pages/upper-case(substring(substring-after(.,'/'),1,1)))[string-length(.) = 1] for $page in $pages[starts-with(upper-case(substring-after(.,'/')),$letter)] let $title := string($page) let $url := concat($base,$page/h:a/@href) let $refs := doc($url)//h:a[starts-with(@href,"http://www.cems.uwe.ac.uk/xmlwiki")] order by $title return if (exists($refs)) then <div> <li><a href="{$url}">{$title}</a> <ul> {for $ref in $refs

Wikibook list of code links


return <li> {string( $ref /@href)} </li> } </ul> </li> </div> else } </ul> </body> </html> ()

389

Code samples [1]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ wikicode. xq

Wikipedia Events RSS


Rather than simply representing the Wikipedia events as a re-designed page, a more useful form would be to generate an RSS feed since this is not provided by Wikipedia.

Wikipedia Page scraping


Page scraping allows any web page to be the source of raw data suitable for transformation. This example takes the data on the Wikipedia current events page 24 September 2007 [1] and transforms to a simple HTML page [2] The key components of an XQuery page scraper are: 1. The fn:doc function which accepts a URL and retrieves the page as XML. Many pages are not well-formed XML , but the wikipedia pages are. 2. Setting a namespace if the page has a default namespace. This page has a default namespace of "http://www. w3.org/1999/xhtml" so a namespace must be declared and its namespace prefix used in path expressions which access the page's XML. 3. Identification of a path to the selected content. In this case, the content is located in a td tag with a class of 'description' 4. Re-basing any relative URLs. Here the links to wikipedia articles have relative URLs. To re-base these, the XML is serialized to a string with util:serialize(), the relative URLs edited with replace, and the string converted back to XML using util:parse()

Wikipedia Page scraping

390

Sample XQuery to Extract Data from Wikipedia Current Events Page


In this example, there is some date re-formating to do since the date format in the page's URL is not the XML formated date. Links to the previous and next days are included, making use of XQuery date arithmetic.
declare namespace h= "http://www.w3.org/1999/xhtml" ; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; declare variable $months :=

("January","February","March","April","May","June","July","August","September","October","November","December") ; declare function local:wikidate($date as xs:date) as xs:string { concat(year-from-date($date),"_", $months[month-from-date($date)],"_", day-from-date($date) ) }; declare function local:displaydate($date as xs:date) as xs:string { concat(day-from-date($date)," ", $months[month-from-date($date)],", ", year-from-date($date) ) };

declare function local:add-base($element , $base as xs:string, $delimiter as xs:string) {

let $evtext := util:serialize($element,()) let $evtext := replace($evtext, concat ("href=",$delimiter,"/"), concat("href=",$delimiter,$base,"/") ) return util:parse($evtext) };

let $date := xs:date(request:get-parameter("date",())) let $wikidate := local:wikidate($date) let $url := concat("http://en.wikipedia.org/wiki/Portal:Current_events/",$wikidate) let $wikipage := doc($url) let $desc := $wikipage//h:td[@class="description"] let $nextDay := $date + xs:dayTimeDuration("P1D") let $previousDay := $date - xs:dayTimeDuration("P1D") return <html> <body> <h1>Current events from <a href="{$url}">Wikipedia</a></h1> <h2>Wiki Events for

Wikipedia Page scraping


<span style="font-size:12;"><a href="wikidate.xq?date={$previousDay}">{local:displaydate($previousDay)}</a></span>&#160; {local:displaydate($date)} <span style="font-size:12;"><a href="wikidate.xq?date={$nextDay}">{local:displaydate($nextDay)}</a></span>&#160; </h2> { local:add-base($desc/*,"http://en.wikipedia.org",'"') } </body> </html>

391

References
[1] http:/ / en. wikipedia. org/ wiki/ Portal:Current_events/ 2007_September_24 [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ wikidate. xq?date=2007-09-24

World Temperature records


Introduction
The Met office [1] recently released the temperature records for about 1600 stations world-wide. Each station record is available online as a text file, for example Stornoway [2]. This case study describes a project to make this data available as XML. The home page is http:/ / www. cems. uwe. ac.uk/xmlwiki/Climate/index.html

Parsing temperature record to XML


The first task is to convert the plain text to XML. The main page explains the format of this text file. The code 030260 is the station code defined by the World Meteorological Organisation. It appears that the files are stored in country code directories. (actually these are Blocks in WMO parlance)

Remote data files


The task of using HTTP to GET a remote data file is a common task for which functions already exist in an XQuery module. This module declares a constant used in the parsing: declare variable $csv:newline:= "&#10;"; And the basic function to get text, which may be plain text or base64encoded:
declare function csv:get-data ($uri as xs:string , $binary as xs:boolean) (:~ : : Get a file via HTTP and convert the body of the HTTP response to text force the script to get the latest version using the HTTP Pragma header - URI of the text file to read as xs:string? {

: @param uri

: @param binary - true if data is base64 encoded : @return :) let $headers := element headers { element header {attribute name {"Pragma" }, attribute value {"no-cache"}}} let $response := httpclient:get(xs:anyURI($uri), true(), $headers) the body of the response as text or null

World Temperature records


return if ($response/@statusCode eq "200") then let $raw := $response/httpclient:body return if ($binary) then util:binary-to-string($raw) else xmldb:decode($raw) else () };

392

Parsing Function
We will create an XQuery module containing functions to carry out the parsing: module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met"; The csv module needs to be imported:
import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../lib/csv.xqm";

Now the function to parse the MET climate data:


declare function met:station-to-xml ($station as xs:string) (:~ : GET and parse a MET office temperature record as documented in : http://www.metoffice.gov.uk/climatechange/science/monitoring/subsets.html as element(TemperatureRecord)? {

: @param the station number : @return the temperature record as an adhoc XML structure matched closely to the terms used in the original record

:) let $country := substring($station,1,2) (: this is the directory for all temperature records in a country :)

(: construct the URI for the corresponding record :) let $uri := concat("http://www.metoffice.gov.uk/climatechange/science/monitoring/reference/",$country,"/",$station) (:GET and convert to plain text :) let $data := csv:get-data($uri,false())

return if (empty($data)) then () else (: split into two sections :)

let $headertext := substring-before($data,"Obs:") (: the first section contains the meta data in the form of name=value statements :) let $headers := tokenize($headertext,$csv:nl)

(: the second section is the

temperature record, year by year :)

let $temperatures := substring-after ($data,"Obs:") let $years := tokenize($temperatures, $csv:nl)

return

World Temperature records


element TemperatureRecord { element sourceURI {$uri}, for $header in $headers (: the original temperature record :) (: split each line into a name and its value :) (: to create a valid XML name, just remove any spaces :)

393

let $name := replace(substring-before($header,"=")," ","") let $value := normalize-space(substring-after ($header,"=")) where $name ne "" return element {$name} {

(:create an XML element with the name :) (: these names have values which are a list of temperatures :)

if ($name = ("Normals","Standarddeviations")) then for $temp in tokenize($value,"\s+") return element temp_C {$temp} else if ($name = ("Name","Country")) then replace ($value,"-","") else if ($name = "Long") then else $value }, - xs:decimal($value)

(: temperatures are space-separated :)

(: these names contain redundant hyphens :)

(: the convention for signing longitudes in this data is the reverse of the usual E +, W - convention :)

for $year in $years let $value := tokenize($year,"\s+") where $year ne "" return element monthlyAverages { attribute year {$value[1]}, (: the first value in the row is the year :) (: the remainder are the temperatures for the months Jan to Dec :)

for $i in (2 to 13) let $temp := $value[$i] return element temp_C { if ($temp ne '-99.0') then $temp else () } } } };

(: generate all months, but those with no reading indicated by -99

will be empty :)

Main Script
The main script uses these functions to convert a given station's record:
(:~ : convert climate : @param :) import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; station file to XML id of station

World Temperature records


let $station := request:get-parameter("station",()) return local:station-to-xml($station,false())

394

Stornoway [3]

WMO stations
The station ids are based on those defined by the World Meteorological Organisation. There is a full list of all stations available online as a text file [4] with supporting documentation [5]. A typical record is
00;000;PABL;Buckland, Buckland Airport;AK;United States;4;65-58-56N;161-09-07W;;;7;;

The format of these record is 1. Block Number 2 digits representing the WMO-assigned block. 2. Station Number 3 digits representing the WMO-assigned station. 3. ICAO Location Indicator 4 alphanumeric characters, not all stations in this file have an assigned location indicator. The value "----" is used for stations that do not have an assigned location indicator. 4. Place Name Common name of station location. 5. State 2 character abbreviation (included for stations located in the United States only). 6. Country Name Country name is ISO short English form. 7. WMO Region digits 1 through 6 representing the corresponding WMO region, 7 stands for the WMO Antarctic region. 8. Station Latitude DD-MM-SSH where DD is degrees, MM is minutes, SS is seconds and H is N for northern hemisphere or S for southern hemisphere. The seconds value is omitted for those stations where the seconds value is unknown. 9. Station Longitude DDD-MM-SSH where DDD is degrees, MM is minutes, SS is seconds and H is E for eastern hemisphere or W for western hemisphere. The seconds value is omitted for those stations where the seconds value is unknown. 10. Upper Air Latitude DD-MM-SSH where DD is degrees, MM is minutes, SS is seconds and H is N for northern hemisphere or S for southern hemisphere. The seconds value is omitted for those stations where the seconds value is unknown. 11. Upper Air Longitude DDD-MM-SSH where DDD is degrees, MM is minutes, SS is seconds and H is E for eastern hemisphere or W for western hemisphere. The seconds value is omitted for those stations where the seconds value is unknown. 12. Station Elevation (Ha) The station elevation in meters. Value is omitted if unknown. 13. Upper Air Elevation (Hp) The upper air elevation in meters. Value is omitted if unknown. 14. RBSN indicator P if station is defined by the WMO as belonging to the Regional Basic Synoptic Network, omitted otherwise.

World Temperature records

395

Conversion to XML
A function is needed to convert from the DD-MM-SSH format of latitudes and longitudes. This is complicated by the variations in this format. These variations all appear in the data: DD-MMH DD-MH DD-MM-SH DD-MM-SSH

Because this format occurs in other data, it has been added to a general module of geographic functions.
declare function geo:lz ($n as xs:string?) as xs:integer { xs:integer(concat (string-pad("0",2 - string-length($n)),$n)) };

declare function geo:dms-to-decimal($s as xs:string) as xs:decimal { (:~ : @param $s : - input string in the format of DD-MMH, DD-MH, DD-MM-SH,* DD-MM-SSH

where H is NSE or W

: @return decimal degrees :) let $hemi := substring($s,string-length($s),1) let $rest := substring($s,1, string-length($s)-1)

let $f := tokenize($rest,"-") let $deg := geo:lz($f[1]) let $min:= geo:lz($f[2]) let $sec := geo:lz($f[3]) let $dec :=$deg + ($min + $sec div 60) div 60

let $dec := round-half-to-even($dec,6) return if ($hemi = ("S","W")) then - $dec else $dec };

The geo module has to be imported:


import module namespace geo = "http://www.cems.uwe.ac.uk/xmlwiki/geo" at "../lib/geo.xqm";

Parsing the station data.


declare function met:WMO-to-xml ($station as xs:string ) as element (station) { (:~ : @param : :) $station string describing a station

Upper Air data is ignored at present.

let $f := tokenize(normalize-space($station),";") let $cid := concat($f[1],$f[2],"0") (: this constructs the equivalent id used in the temperature records :) return element station{

World Temperature records


element block {$f[1]}, element number {$f[2]}, element id {$cid}, if ($f[3] eq "----") then () else element ICAO {$f[3]},

396

element placeName {$f[4]}, if ($f[5] ne "") then element state {$f[5]} else (),

element country {$f[6]}, element WMORegion {$f[7]}, element latitude {geo:dms-to-dec($f[8])}, element longitude {geo:dms-to-dec($f[9])}, if ($f[12] ne "") if ($f[14] = "P") } }; then element elevation {$f[12]} then element RBSN {} else () else (),

Generating the WMO XML file


The XQuery script GETs the text file and converts each line to an XML station element. The elements are then inserted into an empty XML file one by one.
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../csv.xqm"; <results> { (: create the empty XML document :) let $login := xmldb:login("/db/Wiki/Climate","user","password") let $store := xmldb:store("/db/Wiki/Climate/Stations","metstations.xml",<stations/>) let $doc := doc($store)/stations (: get the text list of stations and convert :) let $station-list := "http://weather.noaa.gov/data/nsd_bbsss.txt" let $csv := csv:get-data($station-list,false()) for $data in tokenize($csv,$nl) where $station ne "" return let $station := met:WMO-station-to-xml($data) let $update := update insert $station into $doc return <station>{$xml/id}</station> } </results>

World Temperature records

397

Indexing
There are 11000 odd stations in total. These need to be indexed for efficient access. In eXist indexes are defined in a configuration file, one per collection (directory). For the collection in which the station XML document is to be written, the configuration file is: <collection xmlns="http://exist-db.org/collection-config/1.0"> <index> <create qname="id" type="xs:string"/> <create qname="country" type="xs:string"/> </index> </collection> This means that all XML documents in the collection will be indexed on the qnames id and country wherever these appear in the XML structure. Indexing will be performed when a document is added to the collection or an exitsing document is updated. A re-index can be forced if required. If the station data is stored in the collection /db/Wiki/Climate/Stations, this configuration file will be stored in /db/system/config/db/Wiki/Climate/Stations as configuration.xconf

WMO Station set binding


Since the code will reference this collection in a number of places, we add a constant to reference the set of stations to the library module:
declare variable $met:WMOStations := doc ("/db/Wiki/Climate/Stations/metstations.xml")//station;

Temperature Station list


A full listing of stations is needed to provide an index. This data is not provided as a simple file, but they are encoded on the HTML page as a JavaScript array.
locations[1]=["409380|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Herat"",409480|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Kabul Airport","409900|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Kandahar Airport"]; ...

However there is no location data here, so we will get that from the WMO station list: The approach taken to converting this to XML was: 1. 2. 3. 4. 5. 6. 7. View source on the HTML page Locate the station list Copy the text Save as a text file in the eXIst data base A script reads this file and parses it to XML The resultant XML is augmented with latitude and longitude from the WMO station data. The final XML document is stored in the database in the same Station directory

(:~ : convert list to XML :) the text representation of MET stations from the WMO

World Temperature records <stationList> { (: get the raw data from a text file stored as base64 in the eXist dataabse :) let $text := util:binary-to-string(util:binary-doc("/db/Wiki/Climate/cstations.txt")) (: ; separates the stations in each country :) for $country in tokenize($text,";") (: the station list is the array element content i.e. the string between =[ and ] :) let $stationlist := substring-before(substring-after($country,"=["),"]") (: The stations in each country are comma-separated, but commas are also used within the names of countries and stations. However a comma followed by a double quote is the required separator. :) let $stations := tokenize($stationlist,',"') for $station in $stations (: some cleanup of names is needed :) let $data :=replace ( replace($station,'"',"")," ","") (: :) let let let let Each station is in the format of Stationid | English name / French name $f := tokenize($data,"\|") $id := $f[1] $country := tokenize($f[2],"/") $WMOStation := $met:WMOStations[id=$id]

398

(: create a station element containing the id , country and english station name :) return element station { element id {$f[1]}, element country {normalize-space($country[1])}, element location {$f[3]}, $WMOStation/latitude, $WMOStation/longitude } } </stationList> Storing this file in the same Stations collection means that it will be indexed on the same element names, id and country,as the full WMO station data.

World Temperature records Temperature station list [6]

399

Climate station set binding


This set of stations will also be referenced in several places so we define a variable:
declare variable $met:tempStations := doc ("/db/Wiki/Climate/Stations/tempstations.xml")//station;

Visualizing the data


We will use XSLT to transform this XML to a presentation of the location of the station and charts of the temperatures. The initial stylesheet was developed by Dave Challender. ( explanation to be added )
<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:msxsl="urn:schemas-microsoft-com:xslt"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"

exclude-result-prefixes="msxsl">

<!--

Authored by Dave Callender, minor mods by Chris Wallace

-->

<xsl:output method="html"/>

<xsl:param name="start-year" select="1000"/>

<xsl:param name="end-year" select="3000"/>

<xsl:template match="Station">

<html>

<head>

<script type="text/javascript" src="http://www.google.com/jsapi"/>

<title>

<xsl:value-of select="station/placeName"/>

<xsl:text> </xsl:text>

<xsl:value-of select="station/country"/>

</title>

</head>

<body>

<xsl:apply-templates select="station"/>

<xsl:apply-templates select="TemperatureRecord" mode="googlevis"/>

<xsl:apply-templates select="TemperatureRecord" mode="table"/>

<xsl:apply-templates select="TemperatureRecord" mode="smoothed"/>

</body>

</html>

</xsl:template>

<!--

Visualization of the full temperature record -->

<xsl:template match="TemperatureRecord" mode="googlevis">

<p/>

<p>Google visualization timeline (takes no account of standard

deviation etc.)</p>

<div id="chart_div" style="width: 700px; height: 440px;"/>

<p/>

<script type="text/javascript">

World Temperature records


google.load('visualization', '1',

400

{'packages':['annotatedtimeline']});

google.setOnLoadCallback(drawChart);

function drawChart() {

var data = new google.visualization.DataTable();

data.addColumn('date', 'Date');

data.addColumn('number', 'temp');

data.addRows([

<xsl:apply-templates select="monthlyAverages[@year][@year &gt;= $start-year][@year &lt;= $end-year]" mode="googlevis"/>

[null,null]

]);

var chart = new

google.visualization.AnnotatedTimeLine(document.getElementById('chart_div'));

chart.draw(data, {displayAnnotations: true});

</script>

</xsl:template>

<xsl:template match="temp_C" mode="googlevis">

<xsl:if test="(node())">

<xsl:text>[new Date(</xsl:text>

<xsl:value-of select="../@year"/>

<xsl:text>,</xsl:text>

<xsl:value-of select="position() - 1 "/>

<!-- Google viz uses 0-based arrays -->

<xsl:text>,15),</xsl:text>

<xsl:value-of select="."/>

<xsl:text>],

</xsl:text>

</xsl:if>

</xsl:template>

<!--

Vizualisation of the smoothed data

-->

<xsl:template match="TemperatureRecord" mode="smoothed">

<p/>

<p>Almost totally meaningless - sum all temps for a year and

divide by 12 (only do if all 12

data points) but shows a bit of playing with data</p>

<p/>

<div id="smoothed_chart_div" style="width: 700px; height: 440px;"/>

<script type="text/javascript">

google.load('visualization', '1',

{'packages':['annotatedtimeline']});

google.setOnLoadCallback(drawChartSmoothed);

function drawChartSmoothed()

World Temperature records


{

401

var data = new google.visualization.DataTable();

data.addColumn('date', 'Date');

data.addColumn('number', 'temp');

data.addRows([

<xsl:apply-templates select="monthlyAverages[@year][@year &gt;= $start-year][@year &lt;=$end-year]" mode="smoothed"/>

[null,null]

]);

var chart = new

google.visualization.AnnotatedTimeLine(document.getElementById('smoothed_chart_div'));

chart.draw(data, {displayAnnotations: true});

</script>

</xsl:template>

<xsl:template match="monthlyAverages" mode="smoothed">

<xsl:if test="count(temp_C[node()])=12">

<xsl:text>[new Date(</xsl:text>

<xsl:value-of select="@year"/>

<xsl:text>,5,15),</xsl:text>

<xsl:value-of select="sum(temp_C[node()]) div 12"/>

<xsl:text>],

</xsl:text>

</xsl:if>

</xsl:template>

<!--

Data tabulated -->

<xsl:template match="TemperatureRecord" mode="table">

<table border="1">

<tr>

<td>Year</td>

<td>Jan</td>

<td>Feb</td>

<td>Mar</td>

<td>Apr</td>

<td>May</td>

<td>Jun</td>

<td>Jul</td>

<td>Aug</td>

<td>Sep</td>

World Temperature records


<td>Oct</td>

402

<td>Nov</td>

<td>Dec</td>

<tr/>

</tr>

<xsl:apply-templates

select="monthlyAverages[@year][@year &gt;=

$start-year][@year &lt; $end-year]"

mode="table"/>

</table>

</xsl:template>

<xsl:template match="monthlyAverages" mode="table">

<tr>

<td>

<xsl:value-of select="@year"/>

</td>

<xsl:apply-templates select="temp_C" mode="table"/>

</tr>

</xsl:template>

<xsl:template match="temp_C" mode="table">

<td>

<xsl:value-of select="."/>

</td>

</xsl:template>

<xsl:template match="Number">

<p> Station Number:&#160; <xsl:value-of select="."/>

</p>

</xsl:template>

<xsl:template match="station">

<h1>

<xsl:value-of select="placeName"/>

<xsl:text>, </xsl:text>

<xsl:value-of select="country"/>

<xsl:text> </xsl:text>

</h1>

<a href="http://maps.google.com/maps?q={latitude},{longitude}">

<img

src="http://maps.google.com/maps/api/staticmap?zoom=11&amp;maptype=hybrid&amp;size=400x300&amp;sensor=false&amp;key=ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ&amp;markers=color:blue|{latitude},{longitude}"

alt="{placeName}"/>

</a>

</xsl:template>

<xsl:template match="@* | node()">

<xsl:copy>

<xsl:apply-templates select="@* | node()"/>

</xsl:copy>

</xsl:template>

World Temperature records


</xsl:stylesheet>

403

Multiple formats
We would like to present either the original XML or the HTML visualisation page. We could use two scripts, or combine them into one script with a parameter to indicate how the output is to be rendered. eXist functions allow the serialization of the output and the mime-type to be set dynamically.
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

let $id := request:get-parameter("station",()) let $render := request:get-parameter("render",()) let $station := doc ("/db/Wiki/Climate/Stations/metstations.xml")//station[id = $id] $id]

let $tempStation := doc("/db/Wiki/Climate/Stations/tempstations.xml")//station[id = let $temp := if ($tempStaion) then met:station-to-xml($id) else () let $station := <Station> {$station} {$temp} </Station>

return if ($render="HTML") then let $ss := doc("/db/Wiki/Climate/FullHTMLMet-V2.xsl") let $options := util:declare-option("exist:serialize","method=xhtml media-type=text/html") let $start-year := request:get-parameter("start","1000") let $end-year := request:get-parameter("end","2100") let $params := <parameters> <param name="start-year" value="{$start-year}"/> <param name="end-year" value="{$end-year}"/> </parameters> return transform:transform($station,$ss,$params) else let $header := response:set-header("Access-Control-Allow-Origin","*") return $station

Stornoway HTML [7] Stornoway XML [8]

World Temperature records

404

Simple HTML index


We can use the stored station list to create a simple HTML index.
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

declare option exist:serialize "method=xhtml media-type=text/html";

<html> <head> <title>Index of </head> <body> <h1>Index of Temperature Record Stations </h1> { for $country in distinct-values($met:tempStations/country) Temperature Record Stations </title>

order by $country return <div> <h3>{$country} </h3> {for $station in $met:tempStations[country=$country] let $id := $station/id order by $station/location return <span><a href="station.xq?station={$id}&render=HTML">{string($station/location)}</a> </span> } </div> } </body> </html>

Temperature Station list [9]

Station Map
We can also generate a (large) KML overlay, with links to each station's page. We need a function transform a station into a PlaceMark with a link to the HTML station page:
declare function met:station-to-placemark ($station) { let $description := <div> <a href="http://www.cems.uwe.ac.uk/xmlwiki/Climate/station.xq?station={$station/id}&render=HTML">Temperature Record</a> </div> return <Placemark> <name>{string($station/location)}, {string($station/country)}</name> <description>{util:serialize($description,"method=xhtml")} </description> <Point> <coordinates>{string($station/longitude)},{string($station/latitude)},0</coordinates>

World Temperature records


</Point> </Placemark> };

405

Then the main script iterates over all the temperature stations to generate the full KML file.
import module namespace met ="http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

declare option exist:serialize "method=xml media-type=application/vnd.google-earth.kml+xml let $x := response:set-header('Content-Disposition','attachment;filename=country.kml')

indent=yes

omit-xml-declaration=yes";

return <kml xmlns="http://www.opengis.net/kml/2.2"> <Folder> <name>Stations</name> { for $station in $met:tempStations return met:station-to-placemark($station) } </Folder> </kml>

Full KML [10] KML rendered via GoogleMaps [11]

Work in progress
Resource URIs RDF

References
[1] http:/ / www. metoffice. gov. uk/ climatechange/ science/ monitoring/ subsets. html [2] http:/ / www. metoffice. gov. uk/ climatechange/ science/ monitoring/ reference/ 03/ 030260 [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ temp2xml. xq?station=030260 [4] http:/ / weather. noaa. gov/ data/ nsd_bbsss. txt [5] http:/ / weather. noaa. gov/ tg/ site. shtml [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ Stations/ tempstations. xml [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ station. xq?station=030260& render=HTML [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ station. xq?station=030260 [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ tempStations. xq [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ stationskml. xq [11] http:/ / maps. google. com/ maps?q=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ stationskml. xq

XHTML + Voice

406

XHTML + Voice
Motivation
You want to have your browser read Twitter updates using a text-to-speech conversion extension that is built into your browser.

Method
XHTML + Voice is supported by the Opera Browser with the Voice extension installed. In this simple application it is used as a browser-based Text-to-Speech engine.

Twitter Radio
This script creates a simple text-to-speech version of Twitter Search. Obama [1]

Limitations
Window has to be active for the T2S to play on refresh The cleaned text to speak is held in a div which is rendered as white on white text. Initially it was output as a block in the header but it did not seen possible to apply styles. Applying a style display:none hid the text from the T2S engine as well! Transforming the atom content to a a string suitable to speak needs more work. Retweets and similar tweets could be removed using levenstein Male and female voices are assigned randomly by tweet. I'd like to cache the voice assigned to a tweeter so that tweets are consistently spoken in the same voice The initial load doesn't seem to trigger playing, hence the play button, but this also re-fetches the page. This is an ideal situation to use AJAX instead of refresh The T2S engine is quite good at rendering the text but it needs be helped in places, for example by replacing texting abbreviations with their expanded form.
declare namespace atom = "http://www.w3.org/2005/Atom";

declare variable $n := xs:integer( request:get-parameter("n",6)); declare variable $search := request:get-parameter("search",""); declare variable $timestamp := request:get-parameter("timestamp",()); declare variable $seconds := $n * 12; declare variable $noise := ( "<b>", "</b>", "&lt;.+?&gt;", "http://[^ ]+", "#\w+", "RT *@\w+", "@\w+", "[\[\]\\=:;()_?!~\|]", '"',

XHTML + Voice
"\.\.+" );

407

declare function local:clean ($talk as xs:string, $noise as xs:string*) as xs:string { if (empty($noise)) then $talk else local:clean(replace($talk,string($noise[1])," "),subsequence($noise,2)) };

declare function local:clean($talk as xs:string) as xs:string { local:clean($talk,$noise) };

declare option exist:serialize media-type=application/xv+xml";

"method=xhtml

let $entries := doc(concat("http://search.twitter.com/search.atom?lang=en&amp;q=",encode-for-uri($search)))//atom:entry let $entries := if (exists($timestamp)) then $entries[atom:published>$timestamp]

else $entries let $entries := $entries[position() <= $n] let $newtimestamp := if (exists($entries)) then

string($entries[1]/atom:published) else $timestamp let $entries := reverse($entries) return <html xmlns="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/vxml" xmlns:xv="http://www.voicexml.org/2002/xhtml+voice" xmlns:ev="http://www.w3.org/2001/xml-events" > <head> <meta http-equiv="refresh" content="{$seconds};url=?search={encode-for-uri($search)}&amp;timestamp={$newtimestamp}&amp;n={$n}"/> <title>Tweets matching {$search}</title> <vxml:form id="say"> <vxml:block> <vxml:prompt src="#news"/> </vxml:block> </vxml:form> <link rel="stylesheet" type="text/css" href="voice2.css" title="Normal"/> </head> <body ev:event="load" ev:handler="#say" >

<h1>Twitter radio, listening to tweets matching {$search}</h1> <form method="get" >

XHTML + Voice
Listen for <input type="text" name="search" value="{$search}" /> Max items <input type="text" name="n" value="{$n}" size="4" /> <input type="submit" value="Tune"/> <button ev:event="click" ev:handler="#say">Play</button> </form> {for $entry in $entries return <div> <a href="{$entry/atom:author/atom:uri}">{substring-before($entry/atom:author/atom:name, "(")}</a> &#160;

408

{util:parse(concat("<span xmlns='http://www.w3.org/1999/xhtml' >",$entry/atom:content/(text(),*),"</span>"))} </div> } <div id="news"> {for $entry in $entries return <p class='{if (math:random ()< 0.5) then "male" else "female"}'> { local:clean($entry/atom:content/text())}

</p> } </div> </body> </html>

With the style sheet: .male { voice-family: male; pause-after:1.5s; } .female { voice-family:female; pause-after:1s } #news { color:white; background-color:white }

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Twitter/ twitterRadio. xq?search=Obama& n=6

XML Differences

409

XML Differences
Motivation
You want to find the differences between two XML files and output a "colored diff" file of the differences.

Background on XML Differences


Unlike plain text files, XML structural differences that must be considered when comparing two XML files. For example when comparing two attributes for an element the order that the attributes appear in a file is not significant. The following two lines are technically the same even though the order of the attributes is different: <myelement attr1="abc" attr2="def"/> <myelement attr2="def" attr1="abc"/> XML differences also tend to ignore the spaces and tabs used when indenting and XML file to make it more readable. So the traditional Longest Common Subsequence (LCS) algorithms used tools such as UNIX diff, GNU diff, or the Subversion diff will not usually give us the results that we desire. [1]

XML Differencing Algorithms


There are many different algorithms for doing comparisons between tree structured data. Because hierarchical data can be so complex each algorithm will have different precision and performance considerations. There are also many options to consider. For example: Do you want to ignore XML comments? Do you want to ignore Processor Instructions (PIs)? Do you want to ignore case (uppercase/lowercase) differences? Do you want to ignore whitespace between elements? Can you assume that the structure of the XML documents being compared is identical and only the text is different? Are you interested if the order of attributes change? Do you want your differences algorithm to output a list of changes to be made on the first or second file? For our first version we will just do a simple scan of the elements and text within the elements.

Method
We will create a recursive XQuery function that compares all the nodes of an XML file.

XML Difference Output Format


We want to create an XML output format that allows the user to easily display the output using a side-by-side file comparison method. For example the output might look like: <xml-diffs> <parameters> <output-format-code>xml<output-format-code> <show-original-indicator>false<show-original-indicator>

XML Differences </parameters> <diff> <change>...<change> <diff> <diff> <addition>...<addition> <diff> <diff> <deletion>...<deletion> <diff> </xml-diffs>

410

Formatting the output for HTML and CSS


The above output could be considered a raw semantic markup without concern as to how the web site wants to display the output using standard HTML div blocks and CSS. As a second step we can place the output in two HTML blocks, one for the initial file usually on the left and one for the second file, usually on the right with the changes marked using tags for the changes. Each div will have a class property that allows the CSS to file to place the output anywhere on an HTML page. For example the may be placed on the left and the may be styled with green.

Algorithm
O(ND) Difference Algorithm was originally designed to compare text files using linebreaks as a fundamental unit of comparison. We will need to modify it to recursively compare XML elements and attributes. XML comparison also should not report differences in the order of attributes. To be continued...

References
[1] "S. Chawathe, A. Rajaraman, H. Garcia-Molina and J. Widom" ("June 1996"). "Change Detection in Hierarchically Structured Information". "Proceedings of the ACM SIGMOD International" "Conference on Management of Data, Montreal".

An O(ND) Difference Algorithm and its Variations" by Eugene Myers Algorithmica Vol. 1 No. 2, 1986, p 251 (http://xmailserver.org/diff2.pdf) X-Diff: An Effective Change Detection Algorithm for XML Documents Yuan Wang, David J. DeWitt, Jin-Yi Cai, [[University of Wisconsin (http://www.cs.wisc.edu/niagara/papers/xdiff.pdf)] Madison

XML Schema to Instance

411

XML Schema to Instance


Motivation
You would like to generate a sample XML instance from an XML Schema file (XSD). This is very useful for example if you would like to dynamically generate blank XForms instances for creating new documents from a schema.

Method
[Note: See also the article XQuery/XQuery and XML Schema which has the same objective. If the author would like to contact me, I'm trying to get this code working to compare with the code I developed. ChrisWallace (talk) 15:09, 13 May 2009 (UTC)ChrisWallace] Create an xquery function that reads in a URI to an XML Schema file (.xsd), along with a set of display parameters, and generates a sample XML instance. These parameters are: 1. $schemaURI = the location of the .xsd file (e.g. db/cms/schemas/MySchema.xsd) 2. $rootElementName = the root element for the sample XML file you wish to generate (i.e. doesn't have to be the root of the whole schema) 3. $maxOccurances = for elements with a maxOccurs attribute greater than one, how many times should the element be repeated in the sample instance? 4. $optionalElements = Should optional elements (i.e. minOccurs="0") be included? 'true' or 'false' 5. $optionalAttributes = Should optional attributes (i.e. use="optional") be included? 'true' or 'false' 6. $choiceStrategy = Where there is a choice between elements or groups of elements, should the sample include a random selection from the choices, or simply use the first choice? 'random' or 'first' Call the function with the following:
xquery version "1.0"; (: Query which calls the function :)

import module namespace content ="/db/cms/modules/content" at "/db/cms/modules/content.xqm";

return content:xsd-to-instance('/db/cms/content_types/schemas/Genericode.xsd','CodeList','1','true','true','random')

Notes
The function currently cannot dynamically set the namespaces in the sample instance. Any assistance in getting this to work would be much appreciated. The function requires that the xsd file use the xs namespace prefix (i.e. xs:element). Attempts to use a wildcard prefix in the xpath statements did not work for some reason (i.e. $xsdFile/*:schema/*:element). An alternative approach is to determine the prefix, assign it to a variable and then concatenate it to all the xpath statements (e.g. $xsdFile/concat($prefix,':schema/',$prefix,'element') but that makes for some pretty ugly code. Another alternative is to use another function to reset whatever the xsd file prefix is to xs. This does work fine but adds a bit more code. Any more efficient alternative suggestions would be welcome.

XML Schema to Instance The function uses two internal queries assigned to variables, $subElementsQuery and $attributesQuery, and then called using util:eval. This enables the recursive collection sub-elements and attributes without having to call an external function. These two queries could just has easily been declared as external functions.

412

XSD to Instance Function


xquery version "1.0"; (: Content Module :)

module namespace content ="/db/cms/modules/content"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace util="http://exist-db.org/xquery/util";

(: Function :) declare function content:xsd-to-instance($schemaURI,$rootElementName,$maxOccurances,$optionalElements,$optionalAttributes,$choiceStrategy) { (: TO DO: - Handle substitution groups - Dynamically include namespacees - Handle any xsd file prefix (e.g. xs:element or xsd:element) :) (: Get the main xsd file :) let $xsdFile := doc($schemaURI) (: Determine the namespace prefix for the xsd file (e.g. xs, xsd or none) :) let $xsdFileNamespacePrefix := substring-before(name($xsdFile/*[1]),':') (: get the root element based on the root element name given in the function parameters :) let $rootElement := $xsdFile//xs:element[@name = $rootElementName] (: Gather the namespace prefixes and namespaces included in the xsd file :) let $namespaces := let $prefixes := in-scope-prefixes($xsdFile/xs:schema) return <Namespaces> {for $prefix in $prefixes return <Namespace prefix="{$prefix}" URI="{namespace-uri-for-prefix($prefix,$xsdFile/xs:schema)}"/>} </Namespaces> (: Determine the namespace prefix and namespace for the root element :) let $rootElementNamespace := $xsdFile/xs:schema/@targetNamespace let $rootElementNamespacePrefix := $namespaces/Namespace[@URI = $rootElementNamespace]/@prefix (: If the root element is a complex type, locate the complex type (not sure why the [1] predicate is required) :)

XML Schema to Instance


let $rootElementType := substring-after($rootElement[1]/@type,':') let $namespacePrefix := substring-before($rootElement[1]/@type,':') let $schemaFromPrefixQuery := string(" let $namespace := namespace-uri-for-prefix($namespacePrefix,$xsdFile/*[1]) let $schemaLocation := if($namespace = $xsdFile/xs:schema/@targetNamespace or $namespace = '') then $schemaURI else $xsdFile//xs:import[@namespace = $namespace]/@schemaLocation let $schema := if($schemaLocation = $schemaURI) then $xsdFile else doc($schemaLocation) return $schema ")

413

let $rootElementTypeSchema := util:eval($schemaFromPrefixQuery) let $complexType := if($rootElement/xs:complexType) then $rootElement/xs:complexType else if($namespacePrefix = 'xs' or $namespacePrefix = 'xsd') then () else if($rootElementTypeSchema//xs:complexType[@name = $rootElementType]) then $rootElementTypeSchema//xs:complexType[@name = $rootElementType] else() (: Query to recursively drill down to find the appropriate elements. If the complex type is a choice, include only the first sub-element. If the complex type is a group, include the group sub-elements. If the complex type is an extension, include the base sub-elements :) let $subElementsQuery := string(" for $xsElement in $complexType/* return if(name($xsElement)='xs:all') then let $complexType := $complexType/xs:all return util:eval($subElementsQuery) else if(name($xsElement)='xs:sequence') then let $complexType := $complexType/xs:sequence

XML Schema to Instance


return util:eval($subElementsQuery) else if(name($xsElement)='xs:choice') then let $choice := if($choiceStrategy = 'random') then let $choiceCount := count($xsElement/*)

414

return $choiceCount - util:random($choiceCount) else 1 return

if(name($xsElement/*[$choice])='xs:element') then let $subElementName := if($xsElement/*[$choice]/@name)

then data($xsElement/*[$choice]/@name)

else data(substring-after($xsElement/*[$choice]/@ref,':')) let $namespace := namespace-uri-for-prefix($namespacePrefix,$xsdFile/*[1]) let $schemaLocation := if($namespace = $xsdFile/xs:schema/@targetNamespace or $namespace = '')

then $schemaURI

else $xsdFile//xs:import[@namespace = $namespace]/@schemaLocation let $minOccurs := $xsElement/*[$choice]/@minOccurs let $maxOccurs := $xsElement/*[$choice]/@maxOccurs return <SubElement>

<Name>{$subElementName}</Name>

<NamespacePrefix>{$namespacePrefix}</NamespacePrefix> <Namespace>{$namespace}</Namespace>

<SchemaLocation>{$schemaLocation}</SchemaLocation> <MinOccurs>{$minOccurs}</MinOccurs>

XML Schema to Instance


<MaxOccurs>{$maxOccurs}</MaxOccurs> </SubElement> else if(name($xsElement/*[$choice])='xs:group') then let $groupName := substring-after($xsElement/*[$choice]/@ref,':') let $namespacePrefix := substring-before($xsElement/*[$choice]/@ref,':') let $groupSchema := util:eval($schemaFromPrefixQuery) let $complexType := $groupSchema//xs:group[@name = $groupName] return util:eval($subElementsQuery) else let $complexType := $xsElement/*[$choice] return util:eval($subElementsQuery) else if(name($xsElement)='xs:group') then let $groupName := substring-after($xsElement/@ref,':') let $namespacePrefix := substring-before($xsElement/@ref,':') let $groupSchema := util:eval($schemaFromPrefixQuery) let $complexType := $groupSchema//xs:group[@name = $groupName] return util:eval($subElementsQuery) else if(name($xsElement)='xs:complexContent') then let $complexType := $complexType/xs:complexContent return util:eval($subElementsQuery) else if(name($xsElement)='xs:extension') then let $extension := let $complexType := $complexType/xs:extension

415

return util:eval($subElementsQuery) let $base := let $baseName := substring-after($xsElement/@base,':')

XML Schema to Instance


let $namespacePrefix := substring-before($xsElement/@base,':')

416

let $baseSchema := util:eval($schemaFromPrefixQuery)

let $complexType := $baseSchema//xs:complexType[@name = $baseName]

return util:eval($subElementsQuery) return $base union $extension else if(name($xsElement)='xs:element') then let $subElementName := if($xsElement/@name)

then data($xsElement/@name)

else data(substring-after($xsElement/@ref,':')) let $namespace := namespace-uri-for-prefix($namespacePrefix,$xsdFile/*[1]) let $schemaLocation := if($namespace = $xsdFile/xs:schema/@targetNamespace or $namespace = '')

then $schemaURI

else $xsdFile//xs:import[@namespace = $namespace]/@schemaLocation let $minOccurs := $xsElement/@minOccurs let $maxOccurs := $xsElement/@maxOccurs

return <SubElement>

<Name>{$subElementName}</Name>

<NamespacePrefix>{$namespacePrefix}</NamespacePrefix>

<Namespace>{$namespace}</Namespace>

XML Schema to Instance


<SchemaLocation>{$schemaLocation}</SchemaLocation>

417

<MinOccurs>{$minOccurs}</MinOccurs>

<MaxOccurs>{$maxOccurs}</MaxOccurs> </SubElement> else() ") (: Employ the sub-elements query to gather the sub-elements :) let $subElements := util:eval($subElementsQuery) (: Query to recursively drill down to find the appropriate attributes :) let $attributesQuery := string(" for $xsElement in $complexType/* return

if(name($xsElement)='xs:attributeGroup') then let $attributeGroupName := substring-after($xsElement/@ref,':') let $namespacePrefix := substring-before($xsElement/@ref,':') let $attributeGroupSchema := util:eval($schemaFromPrefixQuery) let $complexType := $attributeGroupSchema//xs:attributeGroup[@name = $attributeGroupName] return util:eval($attributesQuery) else if(name($xsElement)='xs:complexContent') then let $complexType := $complexType/xs:complexContent return util:eval($attributesQuery) else if(name($xsElement)='xs:extension') then let $extension := let $complexType := $complexType/xs:extension return util:eval($attributesQuery) let $base := let $baseName := substring-after($xsElement/@base,':') let $namespacePrefix := substring-before($xsElement/@base,':') let $baseSchema := util:eval($schemaFromPrefixQuery) let $complexType := $baseSchema//xs:complexType[@name = $baseName] return util:eval($attributesQuery)

XML Schema to Instance


return $base union $extension else if(name($xsElement)='xs:attribute') then $xsElement else() ") (: Employ the attributes query to gather the attributes :) let $attributes := util:eval($attributesQuery)

418

return

(: Create the root element :)

element{if($rootElementNamespacePrefix) then concat($rootElementNamespacePrefix,':',$rootElementName) else $rootElementName } { (: for the time being, namespace attributes must be hard coded :) namespace gc {'http://www.test.com'} (: The following should dynamically insert namespace attributes with prefixes but does not work. It would be great id someone could help figure this out. for $namespace in $namespaces return namespace {$namespace/Namespace/@prefix} {$namespace/Namespace/@URI}, :)

,(: Comma is important, seperates the namespaces section from the attribute section in the element constructor :)

(: Create the element's attributes if any :) for $attribute in $attributes let $attributeName := if($attribute/@name) then data($attribute/@name) else data($attribute/@ref) return (: Make sure there is an attribute before calling the attribute constructor :) if($attributeName) then if($attribute/@use = 'optional') then if($optionalAttributes eq 'true') then attribute{$attributeName} (: Insert default attribute value if any :) {if($attribute/@default) then

XML Schema to Instance


data($attribute/@default) else if($attribute/@fixed) then data($attribute/@fixed) else ()} else() else if($attribute/@use = 'prohibited') then () else attribute{$attributeName} (: Insert default attribute value if any :) {if($attribute/@default) then data($attribute/@default) else if($attribute/@fixed) then data($attribute/@fixed) else ()} else()

419

,(: Comma is important, seperates the attribute section from the element content section in the element constructor :)

(: Insert default element value if any :) if($rootElement/@default) then data($rootElement/@default) else if($rootElement/@fixed) then data($rootElement/@fixed) else

(: Recursively create any sub-elements :) for $subElement in $subElements let $subElementName := $subElement/Name let $namespacePrefix := $subElement/NamespacePrefix let $schemaURI := $subElement/SchemaLocation

(: Set the number of element occurances based on the minOccurances and maxOccurances values if any :) let $occurances := if(xs:integer($subElement/@minOccurs) gt 0 and xs:integer($subElement/@minOccurs) gt xs:integer($maxOccurances)) then xs:integer($subElement/@minOccurs) else if(xs:integer($subElement/@minOccurs) eq 0 and $optionalElements eq 'false') then 0 else if($subElement/@maxOccurs eq 'unbounded') then if($maxOccurances) then xs:integer($maxOccurances) else 2 else if(xs:integer($subElement/@maxOccurs) gt 1) then

XML Schema to Instance


if(xs:integer($maxOccurances) lt xs:integer($subElement/@maxOccurs)) then xs:integer($maxOccurances) else xs:integer($subElement/@maxOccurs) else 1 return for $i in (1 to $occurances) return

420

content:xsd-to-instance($schemaURI,$subElementName,$maxOccurances,$optionalElements,$optionalAttributes,$choiceStrategy) } };

XML Schema to SVG


Motivation
You would like a graphical way to view your XML Schemas. These diagrams should allow you to view the structure of a XML Schema file in a graphical way. These graphical views of XML Schemas make XML Schema development and review easier for non-programmers to view "business rules" in a graphical way. For example required elements may have solid boarders and optional elements have a dashed line around an XML Schema.

Challenges
There are no functions in SVG to automatically estimate the size of text. We will need a small function to estimate the width of a text string based on the count of letters and the type of letters. Although this is only an estimation it usually good enough for non-publishing viewers. A sample of these utilities is given here: SVG Utilities to estimate text width [1]

Approach
We will use an XQuery typeswitch function to dispatch XML Schema elements to various functions.

Sample Models
The following is a SVG file that can be used to display the models for XML Schema. Sample Models in SVG [2]

References
[1] http:/ / xrx. googlecode. com/ svn-history/ r121/ trunk/ 18-xml-schema-to-svg/ svg-utilities. xqm [2] http:/ / xrx. googlecode. com/ svn-history/ r121/ trunk/ 18-xml-schema-to-svg/ sample-models. svg

XML Schema to XForms

421

XML Schema to XForms


Motivation
You want to transform an XML Schema into an XForms application.

Method
We will write an XQuery transform that will transform the XML Schema directly to an XForms file. The following will be automatically generated: 1. 2. 3. 4. A sample Instance will be place into the model. All "boolean" data types will have a bind statement to the xs:booleantype. All "date" data types will have a bind statement to the xs:date type. Each element in the XML Schema will have an input field unless it has the words "text, description, or note" in the element name. 5. All enumerated types with use an xs:select1 with a series of items in the enumeration.

Sample Source Code


http://code.google.com/p/xrx/source/browse/#svn/trunk/14-xml-schema-to-xforms

XMP data
Motivation
Adobe have introduced an XML format for image metadata called XMP. You want to display a photograph and some of the metadata.

Background
Matt Turner has an example of using MarkLogic to extract XMP data from a JPEG image [1].

eXist/ XQuery implementation


Here is the code using eXist XQuery extensions. Matt's " photograph Database as a binary file.
[2]

of a boat has been stored in the XML

Show the full XMP XML


The binary image is retrieved from the database as a BASE64 encoded string, converted to a UTF-8 string, the metadata text extracted and turned back to XML using util:parse() declare function local:extract-xmp ($jpeg as xs:string) let $binary := util:binary-doc($jpeg) let $text := util:binary-to-string($binary) let $xmp := substring-after($text,"<x:xmpmeta") let $xmp := substring-before($xmp,"</x:xmpmeta>") as item() {

XMP data let $xmp := concat("<x:xmpmeta",$xmp,"</x:xmpmeta>") return util:parse($xmp) }; let $photo := request:get-parameter("photo",()) let $xmp := local:extract-xmp(concat("/db/Wiki/eXist/",$photo)) return $xmp XMP XML [3]

422

Some basic Dublin Core elements


Here we extract a few of the Dublin Core elements and create an HTML fragment containing the image and the meta-data. declare namespace dc="http://purl.org/dc/elements/1.1/"; declare function local:extract-xmp ($jpeg as xs:string) ..... }; as item() {

declare option exist:serialize "method=xhtml media-type=text/html"; let $photo := request:get-parameter("photo",()) let $xmp := local:extract-xmp(concat("/db/Wiki/eXist/",$photo)) return <div> <img src="../{$photo}"/> <ul> <li> Format : {string($xmp//dc:format)}</li> <li>Title: {string($xmp//dc:title)}</li> <li>Creator: {string($xmp//dc:creator)}</li> </ul> </div> Basic Dublin Core elements [4]

References
[1] [2] [3] [4] http:/ / xquery. typepad. com/ xquery/ xquery_tricks/ index. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ maineboat. jpg http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ util/ parse-xmp-xml. xq?photo=maineboat. jpg http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ util/ parse-xmp-basic. xq?photo=maineboat. jpg

XQuery SQL Module

423

XQuery SQL Module


Motivation
You would like to perform SQL queries from within your XQuery code.

Method
The eXist system provides a standards module for executing SQL queries.

Configuration Steps
1. Enable the module 2. Configure your connection string 3. Execute a test query

Enable the SQL Module


Your fist step is to enable the SQL Module. To do this you must uncomment the following lines from the conf.xml file in your EXIST_HOME directory: <module class="org.exist.xquery.modules.sql.SQLModule" uri="http://exist-db.org/xquery/sql" /> In eXist 1.5 there is also an additional Oracle module that is undocumented. <module class="org.exist.xquery.modules.oracle.OracleModule" uri="http://exist-db.org/xquery/oracle" /> After this is done you must restart your server. You should now see the additional SQLModule documentation in your function list.

Execute the Query


In order to execute the query there are two steps you must take: 1. get a connection to the database 2. execute the query There are five different functions to get a connection to the database but only one function to execute the query. The connection string allows you to connect to the correct server with the appropriate username and password. In its most basic form, the format of the get-connection function is the following: sql:get-connection('JavaClass', 'JDBC-Connection-URL') This format assumes you can put the login and password to the database directly in the JDBC connection URL. If you can not do this, the format of the connection string with a username and password is: sql:get-connection('JavaClass', 'JDBC-Connection-URL', 'username', 'password') Note that some systems also put the username and password in the JDBC connection string. For example in MySQL the string might be:
sql:get-connection("com.mysql.jdbc.Driver", 'jdbc:mysql://localhost/db1', 'mysql-user-name', 'mysql-password')

XQuery SQL Module In Oracle the string might be sql:get-connection('oracle.jdbc.OracleDriver', 'jdbc:oracle:thin:[USER/PASSWORD]@//[HOST][:PORT]/SERVICE", 'jdbc-connection-string', 'mysql-user-name', 'mysql-password') let $connection := sql:get-connection("com.mysql.jdbc.Driver", 'jdbc:mysql://localhost/db1', 'mysql-user-name', 'mysql-password') let $q1 := "select * from table1" return sql:execute( $connection, $q1, fn:true() )

424

Adder
Motivation
We would like to create a simple XQuery that takes two arguments and returns the the sum of the two numbers.

Example Program Using URL Parameters (HTTP GET)


xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; (: get the parameters from the URL :) let $arg1 := xs:integer(request:get-parameter("arg1", "1")) let $arg2 := xs:integer(request:get-parameter("arg2", "5")) return <results> <sum>{$arg1+$arg2}</sum> </results> Call this like ($hostname)/adder.xq?arg1=123&arg2=456. Execute [1]

Results
<results> <sum>579</sum> </results>

Accumulating Adder
To make this into an interactive application, we can extend the script to create an XHTML document containing a Form. The script computes the new sum from the URL parameters (if any) and returns a minimal XHTML document containing a Form which both reports the sum and prompts for new inputs. Note the embedded XQuery expressions

Adder (in curly braces) which interpolates the computed values into the created XML element. The state of the computation, the value of the accumulator, is retained in a hidden input in the form. xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; let $sum := xs:integer(request:get-parameter("sum",0)) let $number := xs:integer(request:get-parameter("number","0")) let $newSum := $sum + $number return <html> <head><title>Accumulating Adder</title></head> <body> <h1>Accumulating Adder</h1> <form> {$newSum} + <input type="text" name="number" value="{$number}" /> <input type="hidden" name="sum" value="{$newSum}"/> </form> </body> </html> Execute [2]

425

Clearing the accumulator


To support the operation of clearing the accumulator, we can add a couple of submit buttons to the form. The presence of the 'Clear' action is used to set the inputs to zero. xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; let $action := request:get-parameter("action","") let $sum := if ($action= "clear") then 0 else xs:integer(request:get-parameter("sum",0)) let $number := if ($action = "clear") then 0 else xs:integer(request:get-parameter("number"," ")) let $newSum := $sum + $number return <html> <head><title>Accumulating Adder</title></head> <body>

Adder <h1>Accumulating Adder</h1> <form> {$newSum} + <input type="text" name="number" value="{$number}" /> <input type="hidden" name="sum" value="{$newSum}"/> <input type="submit" name="action" value="add"/> <input type="submit" name="action" value="clear"/> </form> </body> </html> Execute [3]

426

Example Using Session Variables


An alternative way of holding the state of this computation is in session variables. The session module in eXist provides the necessary functions. xquery version "1.0"; declare declare declare declare let $sum namespace request="http://exist-db.org/xquery/request"; namespace session="http://exist-db.org/xquery/session"; namespace xs="http://www.w3.org/2001/XMLSchema"; option exist:serialize "method=xhtml media-type=text/html indent=yes";

:= if (exists(session:get-attribute("sum"))) then session:get-attribute("sum") else 0 let $action := request:get-parameter("action","") let $sum := if ( $action= "clear") then 0 else $sum let $number := if ($action = "clear") then 0 else xs:integer(request:get-parameter("number","0")) let $newSum := $sum + $number let $s := session:set-attribute("sum",$newSum) return <html> <head><title>Accumulating Adder</title></head> <body> <h1>Accumulating Adder</h1> <form> {$newSum} + <input type="text" name="number" value="{$number}" /> <input type="submit" name="action" value="add"/> <input type="submit" name="action" value="clear"/> </form> </body>

Adder </html> Execute [4]

427

Sample Using HTTP POST


xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; declare option exist:serialize "method=xml media-type=text/xml indent=yes omit-xml-declaration=no";

(: get the parameters from the URL :) (: call this like ($hostname)/adder.xq?arg1=123&arg2=456 :) let $posted-data := request:get-data() let $arg1 := $posted-data//arg1/text() let $arg2 := $posted-data//arg2/text()

return <results> <sum>{$arg1+$arg2}</sum> </results>

References
[1] [2] [3] [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ adder. xq?arg1=123& arg2=456 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ adderForm_1. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ adderForm_2. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ adderForm_3. xq

Ah-has

428

Ah-has
Redundancy in Expressions
let $r := if ($x = 1) then true() else false(); better as let $r := ($x = 1) and for $p in (0 to string-length($arg1)) return $p better as (0 to string-length($arg1)) and for $i in (1 to 5) return for $j in (11 to 15) return ($i, $j) better as for $i in (1 to 5), $j in (11 to 15) return ($i, $j) OR for $i in (1 to 5) for $j in (11 to 15) return ($i, $j) all three return (1 11 1 12 1 13 1 14 1 15 2 11 2 12 2 13...)

XPath predicates clearer and faster than where clauses


for $x in //Page where $x/heading = 1 return $x better as //Page[heading = 1]

Ah-has

429

Default values
if (exists($a)) then $a else "Default" better as ($a,"Default") [1] OR for a sequence of items ($list1, "Default"[empty($list1)]) OR another possible for a sequence (<test/>,<test/>,<default/>)[not(position() = last() and not(last() = 1))] any number of cascaded defaults can be handled this way. compare with 'COALESCE' in SQL

Checking for Required Parameters


Motivation
You want to check for required parameters and return a useful error message if the required parameters are not present.

Approach
To solve this problem we will use the get-parameter function and only return results if the parameter is present. If it is not present then we will return a useful error message.

Sample Program
xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; let $parameter-1 := request:get-parameter('p1', '') return if (not($parameter-1)) then ( <error> <message>Parameter 1 is missing. Parameter 1 is a required parameter for this XQuery.</message> </error>) else ( <results> <message>Parameter 1={$parameter-1}</message> </results> )

Checking for Required Parameters

430

Output
If you do not supply the required parameter the following will result:
<error> <message>Parameter 1 is missing. </error> Parameter 1 is a required parameter for this XQuery.</message>

Checking for multiple Arguments


The following example checks for multiple arguments. In this case if parameter 1 OR parameter 2 is missing an error will be generated. xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; declare option exist:serialize "method=xhtml media-type=text/xml indent=yes"; let $parameter-1 := request:get-parameter('p1', '') let $parameter-2 := request:get-parameter('p2', '') return if (not($parameter-1) or not($parameter-2)) then ( <error> <message>Parameter 1 or 2 is missing. Both arguments required for this XQuery.</message> <message>Parameter 1={$parameter-1}</message> <message>Parameter 2={$parameter-2}</message> </error>) else ( <results> <message>Parameter 1={$parameter-1}</message> <message>Parameter 2={$parameter-2}</message> </results> ) Note that the following logic is equivalent: if (not($parameter-1) or not($parameter-2)) if (not($parameter-1 and $parameter-2)) Sometimes the second form is easier to read.

Returning HTTP Status Codes


There is considerable discussion if you should also return a HTTP error code such as a 400 error. In general if the checks to the parameters are part of your business logic and not part of the communication protocol you should never return HTTP codes. This tells the calling application that they got the base URL correct and there was no permission problems but the application logic detected an error. The client application should then understand how to parse error documents and display the relevant error messages to the user. For further details see HTTP Status Codes

Checking for Required Parameters

431

Checking for File Availability


Many times a URL parameter is used to open a specific file from a data collection within an application. You can use the doc-available() function and following code sample to check for the existence of a file: $app-data-collection := '/db/apps/my-app/data' let $file := request:get-parameter('file', '') (: check for required parameter :) return if (not($file)) then <error> <message>URL Parameter file is a required parameter'</message> </error> else let $file-path := concat($input-data-collection, '/', $file) (: check that the file is available :) return if (not(doc-available($file-path))) then <error> <message>Document {$file-path} does not exist</message> </error> else (: normal processing here... :)

Dataflow diagrams

432

Dataflow diagrams
This description of the data flow in the Timetable application (another page scraping application) is loosely based on XPL <?xml version="1.0" encoding="UTF-8"?> <Pipeline id="timetable"> <process id="i1"> <title>Input id</title> </process> <process id="i2"> <title>input week number</title> </process> <process id="i3"> <title>input role</title> </process> <process id="s1"> <title>create url</title> <input>i1</input> <input>i2</input> <input>i3</input> </process> <process id="s2"> <title>get html</title> <input>s1</input> <input>x1</input> </process> <process id="x1"> <type>external</type> <input>s2</input> <title>Syllabus Plus</title> </process> <process id="s3"> <title>convert to xhtml</title> <input>s2</input> </process> <process id="s4"> <title>extract xml</title> <input>s3</input> </process> <process id="s5"> <title>transform to vcal</title> <input>s4</input> </process> <process id="s6"> <title>transform to htm</title> <input>s4</input>

Dataflow diagrams </process> </Pipeline> With a map from types to shapes: <ProcessTypes> <type name="input" shape="invtriangle"/> <type name="process" shape="box"/> <type name="external" shape="house"/> </ProcessTypes> Conversion to dot format for onward conversion to a GIF image
declare option exist:serialize "method=text"; declare variable $nl := " "; declare variable $url := request:get-parameter("url","/db/Wiki/DataFlow/timetablexpl.xml"); declare variable $processTypes := /ProcessTypes; let $pipe := doc($url)

433

return

"digraph {" , for $process in $pipe//process let $type := if (exists($process/type)) then $process/type else if (empty($process/input)) then "input" else "process" let $shape := return ( concat ($process/@id, ' [shape=',$shape,',label="',$process/title, '"];',$nl), for $input in $process/input return concat($input, '->', $process/@id,";",$nl) ), "} ",$nl ) string($processTypes/type[@name=$type]/@shape)

Dot file [1] Diagram [2]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ DataFlow/ xpl2dot. xq [2] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ DataFlow/ xpl2dot. xq

DBpedia with SPARQL - Football teams

434

DBpedia with SPARQL - Football teams


DBpedia [1] is a project to convert the contents of Wikipedia to RDF so that it can be linked into other datasets to add to the Semantic Web. A w:SPARQL endpoint [2] is provided to query this database.

SPARQL to KML
This application uses DBpedia to create a kml file showing the birth places of the members of a selected UK Football team. Data quality is limited by a number of factors: the age of the Wikipedia extract on which DBpedia is based the existence or non-existence of individual pages in Wikipedia for players the consistency of property labeling on Wikipedia infoboxes

SPARQL Query
declare variable $query := " PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?player p:currentclub <http://dbpedia.org/resource/Arsenal_F.C.>. OPTIONAL {?player p:cityofbirth ?city}. OPTIONAL {?player p:dateOfBirth ?dob}. OPTIONAL {?player p:clubnumber ?no}. OPTIONAL {?player p:position ?position}. OPTIONAL {?player p:image ?image}. OPTIONAL { { ?city geo:long ?long. } UNION { ?city p:redirect ?city2. ?city2 geo:long ?long. }. }. OPTIONAL { { ?city geo:lat ?lat.} UNION { ?city p:redirect ?city3. ?city3 geo:lat ?lat. }. }. } "; This query is complicated by the need to handle possible redirection of the city name - (can this be improved - this is a generic problem?). To obtain more complete data, the query should also handle the multiple synonyms used for place and date of birth Changes to dbpedia lead to a short life for queries based om the data-model and vocabulary. As of Jan 2011, the query is being updated. Currently to get locations and birthdates for the current players at Arsenal, the following query seems to work.

DBpedia with SPARQL - Football teams PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX p: <http://dbpedia.org/property/> PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> SELECT * WHERE { <http://dbpedia.org/resource/Arsenal_F.C.> p:name ?player. ?player dbpedia-owl:birthPlace ?city; dbpedia-owl:birthDate ?dob. ?city geo:long ?long; geo:lat ?lat. } However this yields multiple geocoded locations, of which it can be assumed that the first is most specific (but not possible ? to filter in SPARQL).

435

DBpedia Query
The prototype SPARQL query is targeted on Arsenal_F.C. This team name needs to be replaced by the supplied team name, the query then URI-encoded and passed to the DBpedia SPARQL endpoint. let $club := request:get-parameter ("club","Arsenal_F.C.") let $queryx := replace($query,"Arsenal_F.C.",$club) Aside: Initially, the query was written with a generic placeholder ($team) rather than a protypical value (Arsenal_F.C.). The prototype idiom has the benefit of providing an executable SPARQL query without editing, is more expressive and less tricky - the $ in $team needs escaping in the replace expression since the second argument is a regular expression.

Execute the SPARQL Query


This query uses the SPARQL endpoint provided by the Virtuoso engine. The format of the result is defined to be XML i.e. the SPARQL Query Result format. A function tidies up the interface:
declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=", encode-for-uri($query) ) return doc($sparql) };

DBpedia Result
The result is in SPARQL Query Results XML format. It is more convenient to convert this generic format to tuples with named elements for later processing. declare namespace r = "http://www.w3.org/2005/sparql-results#"; declare function local:sparql-to-tuples($rdfxml ) { for $result in $rdfxml//r:result return <tuple> { for $binding in $result/r:binding

DBpedia with SPARQL - Football teams return if ($binding/r:uri) then element {$binding/@name} { attribute type {"uri"} , string($binding/r:uri) } else element {$binding/@name} { attribute type {$binding/r:literal/@datatype}, string($binding/r:literal) } } </tuple> };

436

Query to Tuples
let $result:= local:execute-sparql($queryx) let $tuples := local:sparql-to-tuples($result)

KML output
Since we are generating kml, we need to set the media type and file name and create a Document node - in the appropriate places in the script:
declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none";

let $x := response:set-header('Content-disposition',concat('Content-disposition: inline;filename=',$team,'.kml;'))

return <Document> <name>Birthplaces of players <Style id="player"> <IconStyle> <Icon><href>http://maps.google.com/mapfiles/kml/pal2/icon49.png</href> </Icon> </IconStyle> </Style> ..... in the {$team} squad</name>

</Document>

The icon is a stock GoogleEarth footballer icon.

DBpedia with SPARQL - Football teams

437

Document construction
Due to the multiple values for some of the properties, for example cityofbirth is often expressed as an address path, there are multiple tuples for each player. These need grouping and compressing. Here we use the XQuery idiom which uses distinct-values to get a set of player names, and then accesses groups of rows with the name as the key. This scripts takes a simplistic approach of using only the first of multiple tuples which contains a latitude , pending a better resolution of the multiple cityofbirth values. We are only interested in players whose place of birth has been geo-coded, so we filter for tuples with a latitude element: { for $playername in distinct-values($tuples[lat]/player) let $player := $tuples[player=$playername][lat][1]

Data cleanup
The wikiPedia data needs some clean-up before being usable in the kml. A generic clean function decodes the uri-encoded characters, removes some irrelevant text and replaces underscores with spaces. ( this hack needs improving ) declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"http://dbpedia.org/resource/","") let $text := replace($text,"\(.*\)","") let $text := replace($text,"Football__positions#","") let $text := replace($text,"#",",") let $text := replace($text,"_"," ") return $text }; let $name := local:clean($player/player) let $city :=local:clean($player/city) let $position := local:clean($player/position)

Data typing
The date of birth is in the form xs:date, but is optional. If the value is a valid date, it is converted to a more readable form using an eXist function:
let $dob := if ($player/dob castable as xs:date) then datetime:format-date(xs:date($player/dob),"dd MMM, yyyy" ) else ""

Similarly for the position number which should be an xs:integer:


let $no := if ($player/no castable as xs:integer) then concat(" [# ", xs:integer($player/no) ,"] ") else ""

The latitude and longitude should be xs:decimal. Since sometimes several players in a team come from the same place, the mapped positions are dithered a little. let $lat :=xs:decimal($player/lat) + (math:random() - 0.5)* 0.01 let $long :=xs:decimal($player/long) + (math:random() - 0.5)* 0.01

DBpedia with SPARQL - Football teams

438

Placemark Construction
The body of the Placemark description will contain XHTML markup to display an image if there is one and to link to the DBpedia page. The XML needs to be serialised to a string for GoogleMap to render the description in a pop-up:
let $description := <div> {concat ($position, $no, " born ", $dob, " in ", $city)} <div> <a href="{$player/player}">DBpedia</a> <a href="http://images.google.co.uk/images?q={$name}">Google Images</a> </div> {if ($player/image !="") then <div><img src="{$player/image}" else () } </div> order by $name return <Placemark> <name>{$name}</name> <description> {util:serialize($description,"method=xhtml")} </description> <Point> <coordinates>{concat($long, ",",$lat,",0")}</coordinates> </Point> <styleUrl>#player</styleUrl> </Placemark> } height="200"/> </div>

Execute
Map of Arsenal players: generate the kml [3] link to GoogleMaps [4] Note that the q parameter is URI-encoded.

Complete Script
(: generate a sparql query on the dbpedia server This takes a team name and generates a kml file showing the birth place of the players

:) declare namespace r = "http://www.w3.org/2005/sparql-results#"; declare variable $query := "

PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX : <http://dbpedia.org/resource/> PREFIX p: <http://dbpedia.org/property/>

DBpedia with SPARQL - Football teams


SELECT * WHERE { ?player p:currentclub <http://dbpedia.org/resource/Arsenal_F.C.>.

439

OPTIONAL {?player p:cityofbirth ?city}. OPTIONAL {?player p:birth ?dob}. OPTIONAL {?player p:clubnumber ?no}. OPTIONAL {?player p:position ?position}. OPTIONAL {?player p:image ?image}. OPTIONAL { { ?city geo:long ?long. } UNION { ?city p:redirect ?city2. ?city2 geo:long ?long. }. }. OPTIONAL { { ?city geo:lat ?lat.} UNION { ?city p:redirect ?city3. ?city3 geo:lat ?lat. }. }. } ";

declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=", encode-for-uri($query) ) return }; doc($sparql)

declare function local:sparql-to-tuples($rdfxml ) { for $result in $rdfxml//r:result return <tuple> { for $binding return if ($binding/r:uri) then element {$binding/@name} attribute type { {"uri"} , in $result/r:binding

string($binding/r:uri) } else element {$binding/@name} {

attribute type {$binding/@datatype}, string($binding/r:literal) } }

DBpedia with SPARQL - Football teams


</tuple> };

440

declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"http://dbpedia.org/resource/","") let $text := replace($text,"\(.*\)","") let $text := replace($text,"Football__positions#","") let $text := replace($text,"#",",") let $text := replace($text,"_"," ") return $text };

declare option exist:serialize

"method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none";

let $club := request:get-parameter ("club","Arsenal_F.C.") let $queryx := replace($query,"Arsenal_F.C.",$club) let $result:= local:execute-sparql($queryx) let $tuples := local:sparql-to-tuples($result)

let $x := response:set-header('Content-disposition',concat('Content-disposition: inline;filename=',$club,'.kml;'))

return

<Document> <name>Birthplaces of <Style id="player"> <IconStyle> <Icon><href>http://maps.google.com/mapfiles/kml/pal2/icon49.png</href> </Icon> </IconStyle> </Style> {$result} { for $playername in distinct-values($tuples[lat]/player) let $player := $tuples[player=$playername][lat][1] let $name := local:clean($player/player) let $city :=local:clean($player/city) let $position := local:clean($player/position) {local:clean($club)} players</name>

let $dob := if ($player/dob castable as xs:date) then datetime:format-date(xs:date($player/dob),"dd MMM, yyyy" ) else "" let $no := if ($player/no castable as xs:integer) then concat(" [# ", xs:integer($player/no),"] ") else ""

let $lat := if ($player/lat castable as xs:decimal) then

xs:decimal($player/lat) + (math:random() - 0.5)*0.01 xs:decimal($player/long) + (math:random()

else "" else ""

let $long := if ($player/long castable as xs:decimal) then let $description :=

-0.5)* 0.01

DBpedia with SPARQL - Football teams


<div> {concat ($position, $no, " born ", $dob, " in ", $city)} <div><a href="{$player/player}">DBpedia</a> <a href="http://images.google.co.uk/images?q={$name}">Google Images</a>

441

</div> {if ($player/image !="") then <div><img src="{$player/image}" </div> order by $name return <Placemark> <name>{$name}</name> <description> {util:serialize($description,"method=xhtml")} </description> <Point> <coordinates>{concat($long, ",",$lat,",0")}</coordinates> </Point> <styleUrl>#player</styleUrl> </Placemark> } </Document> height="200"/> </div> else ()}

Club Index
We also need an index page, selecting all Clubs in the major English and Scottish leagues. This script follows the same lines as the more complex script above, except that due to the simpler data, the raw SPARQL result is used without transformation. The index is sorted alphabetically by club name and provides links to the player map and to the base DBpedia data.

XQuery Script
declare option exist:serialize "method=xhtml media-type=text/html"; declare namespace r = "http://www.w3.org/2005/sparql-results#"; declare variable $query := " PREFIX : <http://dbpedia.org/resource/> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?club p:league ?league. { ?club p:league :Premier_League.} UNION {?club p:league :Football_League_One.} UNION {?club p:league :Football_League_Two.} UNION {?club p:league :Scottish_Premier_League.} UNION

DBpedia with SPARQL - Football teams {?club p:league } "; :Football_League_Championship.}

442

declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=" ) return doc($sparql) }; declare function local:clean($string as xs:string) as xs:string { let $string := util:unescape-uri($string,"UTF-8") let $string := replace($string,"\(.*\)","") let $string := replace($string,"_"," ") return $string };

<html> <body> <h1>England and Scottish Football Clubs</h1> <table border="1"> { for $tuple in local:execute-sparql($query)//r:result let $club := $tuple/r:binding[@name="club"]/r:uri let $club :=substring-after($club,"/resource/") let $clubx := local:clean($club) let $league := $tuple/r:binding[@name="league"]/r:uri let $league := local:clean(substring-after($league,"/resource/")) let $mapurl := concat("http://maps.google.co.uk/maps?q=",escape-uri(concat("http://www.cems.uwe.ac.uk/xm order by $club return <tr> <td>{$clubx}</td> <td>{$league}</td> <td><a href="{$mapurl}">Player Map</a></td> <td><a href="http://dbpedia.org/resource/{$club}">DBpedia</a></td> </tr> } </table> </body> </html>

DBpedia with SPARQL - Football teams

443

Club Index
Club Index [5]

References
[1] [2] [3] [4] http:/ / www. dbpedia. org http:/ / dbpedia. org/ sparql http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ club2kml. xq?club=Arsenal_F. C. http:/ / maps. google. co. uk/ maps?q=http%3A%2F%2Fwww. cems. uwe. ac. uk%2Fxmlwiki%2FRDF%2Fclub2kml. xq%3Fclub%3DArsenal_F. C. [5] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xRDF/ clubIndex. xq

DBpedia with SPARQL and Simile Timeline Album Chronology


In this example, the SIMILE project's Timeline [1] JavaScript is used to display the chronology of the albums released by a selected artist or group. As with the football teams in the previous example, the data is extracted from DBpedia via a SPARQL query. This example has three components: an XQuery script which creates the HTML page, linking to the Timeline JavaScript an XQuery script to query DBpedia for albums for a selected group, assign a year of release to each album and generate a set of events in the form expected by the TimeLine script. an XQuery script to provide an index of groups in a given category

HTML Page script


The script accepts one parameter, the name of the artist or group. The JavaScript required to customize the Timeline is defined in-line in a CDATA section. The onLoad function accepts two parameters, the name of the artist and a date to provide the initial focus for the timeline. The event stream is provided by the call to the XQuery script group2tl.xq, passing the group name. The JavaScript interface here is minimal.
declare option exist:serialize "method=xhtml media-type=text/html";

let $group:= request:get-parameter("group","Eagles") return <html> <head> <script src="http://simile.mit.edu/timeline/api/timeline-api.js" type="text/javascript"></script> <script <![CDATA[ function onLoad(group,start) { var theme = Timeline.ClassicTheme.create(); theme.event.label.width = 400; // px theme.event.bubble.width = 300; theme.event.bubble.height = 300; type="text/javascript">

DBpedia with SPARQL and Simile Timeline - Album Chronology


var eventSource1 = new Timeline.DefaultEventSource();

444

var bandInfo = [ Timeline.createBandInfo({ eventSource: theme: date: width: intervalUnit: eventSource1, theme, start, "100%", Timeline.DateTime.YEAR,

intervalPixels: 45 }),

]; Timeline.create(document.getElementById("my-timeline"), bandInfo); Timeline.loadXML("group2tl.xq?group="+group, function(xml, url) { eventSource1.loadXML(xml, url); });

} ]]> </script> </head> <body onload="onLoad('{$group}',1980);"> <h1>{$group} Albums</h1> <div id="my-timeline" style="height: 700px; border: 1px solid #aaa"></div> </body> </html>

TimeLine events group2tl.xq


This script has a similar structure to the football team script. A prototype SPARQL query is edited to change the default name to the supplied name, the query sent to the DBpedia SPARQL service and the resultant SPARQL XML result converted to XML tuples. Note - the group name needs an additional encoding because SPARQL requires URIs to be uri encoded and group names may contain characters such as ( and ) which require encoding. Hence Queen_(band) must appear in the URI in SPARQL as Queen_%28band%29. In addition the whole query is uri encoded which allows this encoding to be recovered when the whole query is decoded by the SPARQL service. (Whew!) As with the Football clubs, there may be multiple tuples for the same album and the best (at present just the first non null) set of data needs to be extracted from these tuples. The year of release is hacked out of the multitude of different date formats which may appear and each album represented in the format expected by Timeline, with the pop-up contents serialised. This is limited to the album cover if available and links to DBpedia and Wikipedia.
declare namespace r = "http://www.w3.org/2005/sparql-results#";

declare variable $query := " PREFIX p: <http://dbpedia.org/property/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT * WHERE { ?album p:artist <http://dbpedia.org/resource/The_Allman_Brothers_Band>.

?album rdf:type <http://dbpedia.org/class/yago/Album106591815>.

DBpedia with SPARQL and Simile Timeline - Album Chronology


OPTIONAL {?album p:cover ?cover}. OPTIONAL {?album p:name ?name}. OPTIONAL {?album p:released ?dateofrelease}. } ";

445

declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=", encode-for-uri($query) ) return }; doc($sparql)

declare function local:sparql-to-tuples($rdfxml ) { for $result in $rdfxml//r:result return <tuple> { for $binding return if ($binding/r:uri) then element {$binding/@name} attribute type { {"uri"} , in $result/r:binding

string($binding/r:uri) } else element {$binding/@name} {

attribute type {$binding/@datatype}, string($binding/r:literal) } } </tuple> };

declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"http://dbpedia.org/resource/","") let $text := replace($text,"\(.*\)","") let $text := replace($text,"_"," ") return $text };

declare function local:year-from-date($d) { let $d := replace($d,"[^0-9\-]","") let $dp := tokenize($d,"-") let $year := $dp[1] return if ($year castable as xs:integer and string-length($year)=4) then $year

DBpedia with SPARQL and Simile Timeline - Album Chronology


else () };

446

let $group := request:get-parameter ("group","The_Allman_Brothers_Band") let $groupx := replace($group," ","_") let $queryx := replace($query,"The_Allman_Brothers_Band",encode-for-uri($group)) let $result := local:execute-sparql($queryx)

let $tuples := local:sparql-to-tuples($result) return <data> {for $album in distinct-values($tuples/album) let $rows := $tuples[album=$album] let $name := local:clean($album) let $year := local:year-from-date(($rows/dateofrelease)[1]) let $cover := ($rows/cover)[1] where exists($year) return <event start="{$year}" title="{$name}"> {util:serialize( <div> {if (starts-with($cover,"http://")) then <img src="{$cover}" height="200" alt=""/> else () } <p><a href="{$album}">DBpedia</a> <a href="{replace($album,"dbpedia.org/resource","en.wikipedia.org/wiki")}">Wikipedia</a></p> </div> , "method=xhtml") } </event> } </data>

Execution
Pink Floyd [2] Leonard Cohen [3]

Group Index
This script queries DBpedia for the resources which belong to a specified category, for example Rock_and_Roll_Hall_of_Fame_inductees. A table of group names in alphabetical order provides links to the Timeline view using the script above, and to an HTML table view of the discography.
declare namespace r = "http://www.w3.org/2005/sparql-results#";

declare option exist:serialize "method=xhtml media-type=text/html"; declare variable $query := " PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?group } skos:subject <http://dbpedia.org/resource/Category:Rock_and_Roll_Hall_of_Fame_inductees>.

DBpedia with SPARQL and Simile Timeline - Album Chronology


";

447

declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=", escape-uri($query,true()) ) return }; doc($sparql)

declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"\(.*\)","") let $text := replace($text,"_"," ") return $text };

let $category := request:get-parameter("category","Rock_and_Roll_Hall_of_Fame_inductees") let $queryx := replace($query,"Rock_and_Roll_Hall_of_Fame_inductees",$category) let $result return <html> <body> <h1>{local:clean($category)}</h1> <table border="1"> {$result} { for $group in $result//r:result/r:binding[@name="group"]/r:uri let $name := substring-after($group,"resource/") let $namex := local:clean($name) order by $name return <tr> <td>{$namex}</td> <td><a href="group2html.xq?group={$name}">HTML</a></td> <td><a href="groupTimeline.xq?group={$name}">Timeline</a></td> </tr> } </table> </body> </html> := local:execute-sparql($queryx)

Index of Rock and Roll Groups and Artists [4]

DBpedia with SPARQL and Simile Timeline - Album Chronology

448

References
[1] [2] [3] [4] http:/ / simile. mit. edu/ timeline/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ groupTimeline. xq?group=Pink_Floyd http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ groupTimeline. xq?group=Leonard_Cohen http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ groupIndex. xq

Displaying data in HTML Tables


Motivation
You would like to display your XML data in an HTML table and display alternate rows using a colored background.

Data File
Assume you have a data file such as the following XML file which is a sample glossary of terms and definitions:

terms.xml
<terms> <term> <term-name>Object</term-name> <definition>A set of ideas, abstractions, or things in the real world that are identified with explicit boundaries and meaning and whose properties and behavior follow the same rules</definition> </term> <term> <term-name>Organization</term-name> <definition>A unit consisting of people and processes established to perform some functions</definition> </term> </terms> <term> <term-name>Organization</term-name> <definition>BankOfAmerica</definition> </term> </terms> The <term> tags will repeat for each term in your glossary. You would like to display these terms in an HTML table.

Displaying data in HTML Tables

449

Screen Image

HTML table screen image

The following XQuery will perform the task.

Sample Code
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html"; let $my-doc := doc('file://c:/xml/terms.xml') return <html> <head> <title>Terms</title> </head> <body> <table border="1"> <thead> <tr> <th>Term</th> <th>Definition</th> </tr> </thead> <tbody>{ for $term at $count in for $item in $my-doc/terms/term let $term-name := $item/term-name/text() order by upper-case($term-name) return $item return <tr> {if ($count mod 2) then (attribute bgcolor {'Lavender'}) else ()} <td>{$term/term-name/text()}</td> <td>{$term/definition/text()}</td> </tr> }</tbody> </table> </body> </html> Execute [1]

Displaying data in HTML Tables

450

Discussion
Sorting before counting
There are two nested for loops. The outer loop has the additional at count parameter that increments a counter for each result returned. The inner loop has the loop that returns a generic sorted item to the outer loop. Note that the inner loop does the sorting first and the outer loop does the counting of each item so that alternate rows are shaded. Note that if you know the original file is in the correct order the nested for loops are not necessary. A single for loop with the at $count is all that is needed.

Dynamic Element Construction


The following lines: <tr> {if ($count mod 2) then (attribute bgcolor {'Lavender'}) else ()} conditionally creates a light blue background color for odds rows, rows which evaluate true because modulus 2 of their $count is not zero. This is an example of dynamic element construction. Odd rows: <tr bgcolor="Lavender"> <td>...</td> </tr> Even rows: <tr><td>...</td></tr> It does this by conditionally adding an attribute bgcolor="Lavender" for odd rows in the table. If the test ($count mod 2) is zero, i.e. on even rows, an attribute will not be added. It is recommended best practice that the style of shading alternate rows of a table be done in a central cascading style sheet. The most general way to keep the table formats standard throughout your site would be to add semantic class tags to each row to label them even or odd. <tr> {if ($count mod 2) then (attribute else (attribute

class class

{'even'}) {'odd'})}

The CSS file would then contain the following: .odd {background-color: Lavender;}

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ eg/ stripes. xq

Displaying Lists

451

Displaying Lists
Motivation
You have a list of items in an XML structure and you want to display a comma separated list of the values in an output string.

Method
XQuery provides the string-join() function that will take a sequence of items and a separator string and create and output string with the separator between each of the items. The format of the function is: string-join(nodeset, separator) where nodeset is a list of nodes and separator the string that you would like to separate the values with.

Sample Program
xquery version "1.0"; let $tags := <tags> <tag>x</tag> <tag>y</tag> <tag>z</tag> <tag>d</tag> </tags> return <results> <comma-separated-values>{ string-join($tags/tag, ',') }</comma-separated-values> </results>

Output
<results> <comma-separated-values>a,x,c,z</comma-separated-values> </results> execute [1]

Discussion
The string-join function takes two arguments, the first is the sequence of strings to be joined and the second is the separator.

Displaying Lists

452

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ string-join. xq

Employee Search
This example shows JavaScript and XQuery combining to provide a directly updated Web page. AJAX is used in a form sometimes referred to as AHAH in which the server-side XQuery script returns an XHTML node (in this case a table containing the information about an employee) which is updated into the DOM using innerHTML. The behavior of this application is explained in this interactive sequence diagram. [1]

The search page


<html xmlns="http://www.w3.org/1999/xhtml" > <head> <title>Emp query using AJAX</title> <script language="javascript" src="ajaxemp.js"/> <style> th {{background-color:yellow}} </style> </head> <body> <h1>Emp query using AJAX</h1> <form action="javascript:getEmp();"> <label for="EmpNo" title="e.g. 7369, 7499 and 7521."> Employee Number</label> <input type="text" size="5" name="empNo" id="empNo" /> <input type="submit" value="Find"/> </form> <div id="emp"/> </body> </html>

View [2]

The JavaScript script


function updateEmp() { if (http.readyState == 4) { var divlist = document.getElementById('emp'); divlist.innerHTML = http.responseText; isWorking = false; } } function getEmp() { if (!isWorking && http) { var empNo = document.getElementById("empNo").value;

Employee Search http.open("GET", "getemp.xq?empNo=" + empNo, true); http.onreadystatechange = updateEmp; isWorking = true; http.send(null); } } function getHTTPObject() { var xmlhttp; /*@cc_on @if (@_jscript_version >= 5) try { xmlhttp = new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { try { xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } catch (E) { xmlhttp = false; } } @else xmlhttp = false; @end @*/ if (!xmlhttp && typeof XMLHttpRequest != 'undefined') { try { xmlhttp = new XMLHttpRequest(); xmlhttp.overrideMimeType("text/xml"); } catch (e) { xmlhttp = false; } } return xmlhttp; } var http = getHTTPObject(); // var isWorking = false; create the HTTP Object

453

The get script


declare function local:element-to-table($element) { <table> {for $node in $element/* return <tr> <th>{name($node)}</th> <td> { $node/text() } </td>

Employee Search </tr> } </table> }; let $empNo := request:get-parameter("empNo",()) let $emp := //Emp[EmpNo=$empNo] return if (exists($emp)) then local:element-to-table($emp) else <p>Employee Number {$empNo} not found.</p> Get the XHTML fragment [3]

454

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ showSequence. xq?uri=/ db/ Wiki/ SequenceDiagram/ sequences/ empajaxsd. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ajax/ ajaxemp. html [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ajax/ getemp. xq?empNo=7521

Example Sequencer
The code examples used in the XQuery /SQL comparison are coded in an XML file. The WikiBook page redundantly has the code pasted into the page but an alternative is to provide an application to generate the whole page together with the executed examples from the XML script. Here is a sample of the XML script:
<Query id="30"> <Task>List the name of each employee together with the name of their manager.</Task> <MySQL>select e.ename, m.ename from emp e, emp m where e.mgr = m.empno ;</MySQL> <XQuery><![CDATA[for $emp in //Emp let $manager := //Emp[EmpNo = $emp/MgrNo] return <Emp> {$emp/Ename} <Manager>{string($manager/Ename)}</Manager> </Emp> ]]></XQuery> <Comment>The SQL Join has missed Employee King who has no manager,</Comment> </Query>

To allow the queries to be executed in a selected order, a lesson defines a sequence of queries: <Lesson id="t1"> <Name>Test Lesson 1</Name>

Example Sequencer <Step <Step <Step <Step <Step </Lesson> queryid="32"/> queryid="33"/> queryid="31"/> queryid="21a"/> queryid="20"/>

455

The user can step through the examples in the lesson : Test Lesson [1]

Implementation
Two scripts form the core of this application, one to list the queries in a lesson, the other to execute the query code, both SQL and XQuery and show the results. ....

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ showLesson. xq?lessonid=t1

Excel and XML


Excel spreadsheet to XML
Simple tabular spreadsheet data in Excel 2003 and above can be converted to XML using an additional plugin [1]. When downloaded, to install, go to Tools>Add-ins>>Browse to locate the downloaded file and then install. An additional item 'XML tools' should appear on the tool bar.

Converting the spreadsheet


1. Select the XML tools in the menu bar and 'Convert range to a list'. Guided by the tool, select the required cell range, set option for headings and choose 'Advanced' to enter your own name for the root of the document and for each row. 2. Save the spreadsheet as XML data with an xml extension 3. If you need to change the spreadsheet to add or remove columns, convert the XML back to a data list by going to Data>> List >> Convert to range. Make the changes required and then reconvert back to XML 4. If the spreadsheet contains dates or times, you may have problems in conversion. Set the date format to yyyy-mm-dd. When you convert to XML you will be informed that date formats are incompatible - click "use existing formatting" 5. As a last resort, export the sheet as tab-delimited text, then re-import, ensuring all data is imported as Text rather than General (which will recognize the dates and set a date type).

Excel and XML

456

Creating an XML spreadsheet from scratch


1. To create a new spreadsheet, enter the headings first. Select all columns and set the formatting to text. If you dont you will likely run into problems with dates and times where it is often difficult to wrest control away from Microsoft. 2. Enter dates in the XML format yyyy-mm-dd - you can set the date format to this 3. Enter the data in the table 4. Finally convert to XML as above

Export from XQuery to Spreadsheet


Data is commonly exported as CSV files but Excel will load tabular XML files which are straightforward to generate with XQuery. For example, this table of Employee data [2] displays in a browser as indented XML with folding symbols. Saving the file saves the XML data prior to transformation to this display format. In Firefox, when the page is saved as XML, the filename is the script name with an additional XML suffix. The user will usually need to change this file name. The saved file can then be opened in Excel and after a couple of challenges, the data will be imported. In IE6, the right-button menu provides an option to export directly to Excel. The default file name can be set by the exporting script using the HTTP header Disposition....(need example)

References
[1] http:/ / msdn. microsoft. com/ library/ default. asp?url=/ library/ en-us/ odc_xl2003_ta/ html/ OfficeExcelXMLToolAddin. asp [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=1

eXist Crib sheet

457

eXist Crib sheet


XQuery declaration
xquery version "1.0";

import a module
import module namespace geo="http:/ / www. cems. uwe. ac. uk/ exist/ coord" at "coord.xqm";

Base eXist namespace


declare namespace exist = "http://exist.sourceforge.net/NS/exist";

Standard eXist modules


Optional in an XQuery script but required in a module in 1.2 but not in 1.3 import import import import import import import import module module module module module module module module namespace namespace namespace namespace namespace namespace namespace namespace xmldb = "http://exist-db.org/xquery/xmldb"; util = "http://exist-db.org/xquery/util"; request = "http://exist-db.org/xquery/request"; response = "http://exist-db.org/xquery/response"; session = "http://exist-db.org/xquery/session"; transform = "http://exist-db.org/xquery/transform"; text = "http://exist-db.org/xquery/text"; system = "http://exist-db.org/xquery/system";

declare a namespace
declare namespace tx="http://www.transxchange.org.uk/";

declare a default element namespace


declare default element namespace "http://www.w3.org/1999/xhtml"; Note that you do not associate a prefix with the default namespace.

declare a default function namespace


declare default function namespace "http://www.transxchange.org.uk/";

declare a java binding namespace


declare namespace math="java:java.lang.Math"; Note that the java binding is disabled by default due to security issues. Math functions can now be invoked via the maths extension module.

eXist Crib sheet

458

output XML document


This is the default, but can be declared explictly:
declare option exist:serialize "method=xml media-type=text/xml omit-xml-declaration=no indent=yes";

output SVG document


declare option exist:serialize "method=svg media-type=application/svg+xml omit-xml-declaration=no indent=yes";

output XHTML document


declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=no indent=yes doctype-public=-//W3C//DTD&#160;XHTML&#160;1.0&#160;Transitional//EN doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";

output HTML4 document


declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=no indent=yes doctype-public=-//W3C//DTD&#160;HTML&#160;4.01&#160;Transitional//EN doctype-system=http://www.w3.org/TR/loose.dtd";

output HTML document with no doctype


declare option exist:serialize "method=html media-type=text/html omit-xml-declaration=yes indent=yes";

output XHTML document with no doctype


declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=yes indent=yes";

output plain text document with no doctype


declare option exist:serialize "method=text media-type=text/plain omit-xml-declaration=yes";

output kml document


declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none";

output XForms document


declare option exist:serialize "method=xhtml media-type=text/xml indent=yes";

module header
module namespace c="http:/ / www. cems. uwe. ac. uk/ exist/ coord";

function declaration
in the default namespace in an XQuery script declare function local:times($tt, $dt) { if (exists($dt)) then local:times(($tt, $tt[last()]+ $dt[1]), remove($dt,1)) else $tt

eXist Crib sheet }; in the namespace of a module module namespace time = 'http:/ / www. cems. uwe. ac. uk/ fold/ time'; declare function time:times($tt, $dt) { if (exists($dt)) then time:times(($tt, $tt[last()]+ $dt[1]), remove($dt,1)) else $tt };

459

declare variable
declare variable $pi as xs:double := 3.14159265; in a module with namespace prefix fxx: declare variable $fxx:file := doc('/db/me/file.xml');

embedded CSS stylesheet


<style language="text/css"> <![CDATA[ .good {background-color: green;} .bad {background-color:red;} ]]> </style>

Get HTTP POST URL Parameters


To get two URL parameters from an HTTP POST
http://localhost:8080/exist/rest/db/my-collection/my-xquery.xq?color=blue&shape=circle

Add the following: let $my-color := request:get-parameter('color', 'red') let $my-shape := request:get-parameter('shape', '') If no color parameter is supplied a default color of "red" will be used.

Get HTTP POST Data


To get all the XML data from an HTTP POST let $my-data := request:get-data()

Extend the Output Size Limit


By default the output size in 10,000 bytes. You can extend this by adding the following declare option exist:output-size-limit "new-size"; For example to triple the size limit of the output use the following line: declare option exist:output-size-limit "30000";

Filtering Nodes

460

Filtering Nodes
Motivation
You want to create filters that remove or replace specific nodes in an XML stream. This stream may be in-memory XML documents and may not be on-disk.

Method
To process all nodes in a tree we will start with recursive function called the identity transform. This function copies the source tree into the output tree without change. We begin with this process and then add some exception processing for each filter. (: return a deep copy of the element and all sub elements :) declare function local:copy($element as element()) as element() { element {node-name($element)} {$element/@*, for $child in $element/node() return if ($child instance of element()) then local:copy($child) else $child } };

This function uses an XQuery construct called computed element constructor to construct an element. The format of the element constructor is the following: element {ELEMENT-NAME} {ELEMENT-VALUE} In the above case ELEMENT-VALUE is another query that finds all the child elements of the current node. The for loop selects all nodes of the current element and does the following pseudo-code: if the child is another element ''(this uses the "instance of" instruction)'' then copy the child ''(recursively)'' else return the child ''(we have a leaf element of the tree)'' If you understand this basic structure of this algorithm you can now modify it to filter out only the elements you want. You just start with this template and modify various sections. Note that you can also achieve this function by using the typeswitch operator: declare function local:copy($n as node()) as node() { typeswitch($n) case $e as element() return element {name($e)} {$e/@*, for $c in $e/(* | text()) return local:copy($c) } default return $n

Filtering Nodes };

461

Removing all attributes


The following function removes all attributes from elements since attributes are not copied. declare function local:copy-no-attributes($element as element()) as element() { element {node-name($element)} { for $child in $element/node() return if ($child instance of element()) then local:copy-no-attributes($child) else $child } };

This function can also be arrived at by using the typeswitch operator: declare function local:copy($n as node()) as node() { typeswitch($n) case $e as element() return element {name($e)} {for $c in $e/(* | text()) return local:copy($c) } default return $n };

The function can be parameterized by adding a second function argument to indicate what attributes should be removed.

Change all the Attribute Names for a Given Element


declare function local:change-attribute-name-for-element( $node as node(), $element as xs:string, $old-attribute as xs:string, $new-attribute as xs:string ) as element() { element {node-name($node)} {if (string(node-name($node))=$element) then for $att in $node/@* return

Filtering Nodes if (name($att)=$old-attribute) then attribute {$new-attribute} {$att} else attribute {name($att)} {$att} else $node/@* , for $child in $node/node() return if ($child instance of element()) then local:change-attribute-name-for-element($child, $element, $old-attribute, $new-attribute) else $child } };

462

Replacing all attribute values


For all elements that have a specific attribute replace the old attribute with the new attribute. declare function local:change-attribute-values( $node as node(), $element as xs:string, $attribute as xs:string, $old-value as xs:string, $new-value as xs:string) as element() { element {node-name($node)} {if (string(node-name($node))=$element) then for $att in $node/@* return if (name($att)=$attribute) then attribute {name($att)} {$new-value} else attribute {name($att)} {$old-value} else $node/@* , for $child in $node/node() return if ($child instance of element()) then local:change-attribute-values($child, $element, $attribute, $old-value, $new-value) else $child }

Filtering Nodes };

463

Removing named attributes


Attributes are filtered in the predicate expression not(name()=$attribute-name) so that named attributes are omitted. declare function local:copy-filter-attributes( $element as element(), $attribute-name as xs:string*) as element() { element {node-name($element)} {$element/@*[not(name()=$attribute-name)], for $child in $element/node() return if ($child instance of element()) then local:copy-filter-attributes($child, $attribute-name) else $child } };

Removing named elements


Likewise, elements can be filtered in a predicate: declare function local:remove-elements($input as element(), $remove-names as xs:string*) as element() { element {node-name($input) } {$input/@*, for $child in $input/node()[not(name(.)=$remove-names)] return if ($child instance of element()) then local:remove-elements($child, $remove-names) else $child } };

This adds the node() qualifier and the name of the node in the predicate: /node()[not(name(.)=$element-name)] To use this function just pass the input XML as the first parameter and a sequence of element names as strings as the second parameter. For example: let $input := doc('my-input.xml') let $remove-list := ('xxx', 'yyy', 'zzz') local:remove-elements($input, $remove-list)

Filtering Nodes

464

Example Illustrating the Above Filters


The following script demonstrates these functions:
let $x := <data> <a q="joe">a</a> <b p="5" q="fred" >bb</b> <c> <d>dd</d> <a q="dave">aa</a> </c> </data> return <output> <original>{$x}</original> <fullcopy> {local:copy($x)}</fullcopy> <noattributes>{local:copy-no-attributes($x)} </noattributes> <filterattributes>{local:copy-filter-attributes($x,"q")}</filterattributes> <filterelements>{local:copy-filter-elements($x,"a")}</filterelements> <filterelements2>{local:copy-filter-elements($x,("a","d"))} </output> </filterelements2>

Run [1]

Removing unwanted namespaces


Some systems do not allow you to have precise control of the namespaces used after doing an update despite the use of copy-namespaces [2] declarations. The following XQuery function is an example that will remove the TEI namespace from a node. declare function local:clean-namespaces($node as node()) { typeswitch ($node) case element() return if (namespace-uri($node) eq "http://www.tei-c.org/ns/1.0") then element { QName("http://www.tei-c.org/ns/1.0", local-name($node)) } { $node/@*, for $child in $node/node() return local:clean-namespaces($child) } else $node default return $node };

Below two functions will remove any namespace from a node, nnsc stands for no-namespace-copy. The first one performs much faster: From my limited understanding it jumps attributes quicker. The other one still here, something

Filtering Nodes tricky might be hidden there. (: return a deep copy of the element withouth namespaces declare function local:nnsc1($element as element()) as element() { element { local-name($element) } { $element/@*, for $child in $element/node() return if ($child instance of element()) then local:nnsc1($child) else $child } };

465

(: return a deep copy of the element withouth namespaces declare function local:nnsc2($element as element()) as element() { element { QName((), local-name($element)) } { for $child in $element/(@*,*) return if ($child instance of element()) then local:nnsc2($child) else $child } };

Conversely, if you want to add a namespace to an element, a starting point in this blog post: http:/ / fgeorges. blogspot.com/2006/08/add-namespace-node-to-element-in.html

Removing elements with no string value


Elements which contain no string value or which contain whitespace only can be removed: declare function local:remove-empty-elements($nodes as node()*) node()* { for $node in $nodes return if ($node instance of element()) then if (normalize-space($node) = '') then () else element { node-name($node)} { $node/@*, local:remove-empty-elements($node/node())} else if ($node instance of document-node()) then local:remove-empty-elements($node/node()) else $node } ; as

Filtering Nodes

466

Removing empty attributes


Attributes which contain no text can be stripped: declare function local:remove-empty-attributes($element as element()) as element() { element { node-name($element)} { $element/@*[string-length(.) ne 0], for $child in $element/node( ) return if ($child instance of element()) then local:remove-empty-attributes($child) else $child } };

References
W3C page on computed element constructors [3]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ex/ copy. xq [2] http:/ / www. w3. org/ TR/ xquery/ #id-copy-namespaces-decl [3] http:/ / www. w3. org/ TR/ xquery/ #id-computedConstructors

Filtering Words

467

Filtering Words
Motivation
Sometimes you have a text body and you want to filter out words that are on a given list, often called a stoplist.

Screen Image

Screen Image

Sample Program
xquery version "1.0";

(: Test to see if a word is in a list :)

declare namespace exist = "http://exist.sourceforge.net/NS/exist"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes omit-xml-declaration=yes";

(: A list of words :) let $stopwords := <words> <word>a</word> <word>and</word> <word>in</word>

Filtering Words
<word>the</word> <word>or</word> <word>over</word> </words>

468

let $input-text := 'a quick brown fox jumps over the lazy dog' return <html> <head> <title>Test of is a word on a list</title> </head> <body> <h1> Test of is a word on a list</h1>

<h2>WordList</h2> <table border="1"> <thead> <tr> <th>StopWord</th> </tr> </thead> <tbody>{ for $word in $stopwords/word return <tr> <td align="center">{$word}</td> </tr> }</tbody> </table>

<h2>Sample Input Text</h2> <p>Input Text: <div style="border:1px solid black">{$input-text}</div></p> <table border="1"> <thead> <tr> <th>Word</th> <th>On List</th> </tr> </thead> <tbody>{ for $word in tokenize($input-text, '\s+') return <tr> <td>{$word}</td> <td>{ if ($stopwords/word = $word) then(<font color="green">true</font>)

Filtering Words
else(<font color="red">false</font>) }</td> </tr> }</tbody> </table> </body> </html>

469

Execute [1]

Discussion
The input string is split into words using the tokenize function which accepts two parameters, the string to be parsed and a separator expressed as a regular expression. Here words are separated by one or more spaces. The result is a sequence of words. This program uses XPath generalized equality to compare the sequence $stopwords/word with the sequence (of one item) $word. This is true if the two sequences have items in common, that is if the stoplist contains the word.

Alternative coding
You can also use a quantified expression to perform a stopword lookup using the some...satisfies - see XQuery/Quantified Expressions expression such as: some $word in $stopwords satisfies ($word = $thisword)

There are other alternatives; the stop words as a sequence of strings, or a long string and use contains() or a element in the database. There are however significant differences in performance. There is a set of tests which show the differences in a number of alternatives. Unit Tests [2] What these tests reveal is that, on the eXist db platform, both the suggested implementations are far from optimal. Testing against a sequence of strings takes about a fifth of the time to compare with elements. Generalised equality is equally superior to the use of a qualified expression.

Recommended Practice
It would appear that the preferable approach is: let $stopwords := ("a","and","in","the","or","over") let $input-string := 'a quick brown fox jumps over the lazy dog' let $input-words := tokenize($input-string, '\s+') return for $word in $input-words return $stopwords = $word

If the stop words are held as an element, it is better to convert to a sequence of atoms first: let $stopwords := <words>

Filtering Words <word>a</word> <word>and</word> <word>in</word> <word>the</word> <word>or</word> <word>over</word> </words> let $stopwordsx := $stopwords/word/string(.) let $input-string := 'a quick brown fox jumps over the lazy dog' let $input-words := tokenize($input-string, '\s+') return for $word in $input-words return $stopwordsx = $word Note that referencing the stop list in the database slightly improved performance.

470

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ stoplist. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ UnitTest2/ runTests. xql?uri=/ db/ Wiki/ UnitTest2/ Tests/ match. xml

Fizzbuzz
Here's an XQuery solution to the FizzBuzz problem posed in David Patterson's blog.. solution. I took the liberty of splitting the hyphenated range into two attributes. let $config := <fizzbuzz> <range min="1" max="100"/> <test> <mod value="3" test="0">Fizz</mod> <mod value="5" test="0">Buzz</mod> </test> </fizzbuzz> return string-join( for $i in ($config/range/@min to $config/range/@max) let $s := for $mod in $config/test/mod return if ($i mod $mod/@value = $mod/@test) then string($mod) else () return if (exists($s)) then string-join($s,' ')
[1]

who wrote an XSLT

[2]

Fizzbuzz else $i, " " ) Execute [3]

471

References
[1] http:/ / www. oreillynet. com/ xml/ blog/ 2007/ 03/ fizzbuzz_20_adventures_in_beau. html [2] http:/ / dev. aol. com/ blog/ mdavidpeterson/ 2007/ 03/ 14/ fizz-buzz-in-xslt-1. 0 [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ fizzbuzz. xq

Getting POST Data


Motivation
You want to create an XQuery that will access data in the HTTP POST.

Method
To do this you use the request:get-data() XQuery function.

Sample echo-post.xq
xquery version "1.0"; (: echo-post.xq: Return all data from an HTTP post to the caller. :) declare namespace exist = "http://exist.sourceforge.net/NS/exist"; declare namespace xmldb="http://exist-db.org/xquery/xmldb"; declare namespace request="http://exist-db.org/xquery/request"; declare option exist:serialize "method=xml media-type=text/xml indent=yes"; let $post-data := request:get-data() return <post-data> {$post-data} </post-data>

Getting POST Data

472

Discussion
The program above (called echo-post.xq) is a very useful program for testing your web forms. It just takes the data sent to the XQuery service and returns it wrapped in a <post-data> tag. Sometimes HTTP POST statements put their data in parameters. For example the RichTextEditor CKEdit has multiple text areas that might each contain HTML markup in encoded forms. In this case you can also use the request:get-parameter on HTTP POST data. After your server gets a POST from a CKEditor client the server will use the following:

Sample XQuery to Echo Post Data


In the _samples folder you will find several samples of how to use CKEditor. Each of these HTML files has an HTML form with the following line: <form action="sample_posteddata.xq" method="post"> The following program can be used as a substitute for the sample_postdata.php file. sample_postdata.xq xquery version "1.0"; declare option exist:serialize "method=xml media-type=text/xml omit-xml-declaration=yes indent=yes"; (: Get the content of the editor1 parameter :) let $editor1 := request:get-parameter('editor1', '') (: wrap the content in a div to make sure we have well-formed XML :) let $wrapped-content := concat('&lt;div&gt;', $editor1, '&lt;/div&gt;') (: parse the escaped text so that we now have true XML markup :) let $data-to-save := util:parse($wrapped-content) return <results> {$data-to-save} </results>

Viewing URL Encoded Parameters


Standard HTML forms use a data transmission format called URL encoded form data. URL Encoded data has the following mime-type: Content-Type="application/x-www-form-urlencoded"
xquery version "1.0"; let $title := 'Echo Post' return <results> <title>{$title}</title> <get-data>

Getting POST Data


{request:get-data()} </get-data> <headers> {for $header in request:get-header-names() return <header name="{$header}" value="{request:get-header($header)}"/> } </headers> <parameters> {for $parameter in request:get-parameter-names() return

473

<parameter name="{$parameter}" value="{request:get-parameter($parameter, '')}"/> } </parameters> </results> </results> If you have the following form: <source lang="xml"> <html> <head><title></title></head> <body> <form action="echo-post.xq" method="post"> First name: <input type="text" name="FirstName" value="Mickey" /><br /> Last name: <input type="text" name="LastName" value="Mouse" /><br /> <input type="submit" value="Send HTTP Post to Server" /> </form> </body> </html>

Then it will return the following result from the echo-post.xq


<results> <title>Echo Post</title> <get-data/> <headers> <header name="Host" value="demo.danmccreary.com"/> <header name="User-Agent" value="Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.23) Gecko/20110920 Firefox/3.6.23"/> <header name="Accept" value="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"/> <header name="Accept-Language" value="en-us,en;q=0.5"/> <header name="Accept-Encoding" value="gzip,deflate"/>

<header name="Accept-Charset" value="ISO-8859-1,utf-8;q=0.7,*;q=0.7"/> <header name="keep-alive" value="115"/> <header name="Connection" value="keep-alive"/> <header name="Referer" value="http://demo.danmccreary.com/rest/db/dma/apps/xforms-examples/unit-tests/html-form-post.html"/> <header name="Content-Type" value="application/x-www-form-urlencoded"/>

Getting POST Data


<header name="Content-Length" value="31"/> </headers> <parameters> <parameter name="FirstName" value="Mickey"/>

474

<parameter name="LastName" value="Mouse"/> </parameters> </results>

Getting URL Parameters


Motivation
You want to create an XQuery that takes a parameter from the calling URL.

Format
The format of a calling URL that uses the HTTP Get or POST command is: <hostname>:<port>/<path>/xquery.xq?param1=abc&param2=xyz Where param1 is the first parameter with a value of abc and param2 is the second parameter with a value of xyz. Note that question mark is used to start the parameters and the ampersand is used to separate parameters. xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; let $param1:= request:get-parameter("param1",0) let $param2:= request:get-parameter("param2",0) return <results> <message>Got param1: {$param1} and param2: {$param2}</message> </results> Try this out by activating the following link. Change the parameters and see the changes reflected in the output. getparams.xq?param1=abc&param2=xyz [1]

Checking Data Types


Additionally you can check the data types using the XML Schema data types and the castable as operator. xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; let $myint := request:get-parameter("myint",0) let $myint := if ($myint castable as xs:integer)

Getting URL Parameters then xs:integer($myint) else 0 let $mydecimal := request:get-parameter("mydecimal", 0.0) let $mydecimal := if ($mydecimal castable as xs:decimal) then xs:decimal($mydecimal) else 0.0 return <results> <message>Got </results>

475

myint: {$myint} and

mydecimal: {$mydecimal} </message>

Try this out by activating the following link. Change the parameters and see the changes reflected in the output. invalid decimal [2]

Script to echo all URL parameters


echo-parameters.xq xquery version "1.0"; (: echo a list of all the URL parameters :) let $parameters := request:get-parameter-names() return <results> <parameters>{$parameters}</parameters> {for $parameter in $parameters return <parameter> <name>{$parameter}</name> <value>{request:get-parameter($parameter, '')}</value> </parameter> } </results> Here are the results of sending the parameters "a=1&b=2" to this XQuery: echo-parameters.xq?a=1&b=2 [3] <results> <parameters>b a</parameters> <parameter> <name>b</name> <value>2</value> </parameter> <parameter> <name>a</name> <value>1</value> </parameter>

Getting URL Parameters </results> Change parameters in the URL and see the changes reflected in the output.

476

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Parameters/ getparams. xq?param1=abc& param2=xyz [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Parameters/ paramtypes2. xq?myint=6& mydecimal=x [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Parameters/ echo-parameters. xq?a=1& b=2

Google Geocoding
Motivation
You have one or more geographic names and you want to create a map of these locations.

Method
We will use a Google RESTful web service to return geographical data from a list of place names. Google provides an HTTP-based Geocoding service [1]. This requires registration of a site for a API key and there are limitations on the usage of this API.

Querying Google's API


The following script takes a location and returns the xml from the service:
let $key := "ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ" (: get the geo name as parameter from the incomming URL :) let $location := request:get-parameter("location",()) (: adjust the escape codes :) let $location := escape-uri($location, false()) (: construct a new URL that appends the geo-name and key to the service URL. Tell the service that we want XML output format. :) let $url := concat("http://maps.google.com/maps/geo?q=",$location,"&amp;output=xml&amp;key=",$key) (: send the URL out and put the result in the $response variable :) let $response := doc($url) return $response

Google Geocoding Examples Single City Examples: Minneapolis [2] - example using a single city with no country Bristol,UK [3] - example using a city and country Multiple matches may be returned: Utopia [4] or none: Santa's House [5]

477

Response as KML
The XML response can be reformated as a simpler KML file. Note the addition of the relevant media-type for KML and the declaration of the KML namespace required to access the returned XML.
declare option exist:serialize "method=xml media-type=application/vnd.google-earth.kml+xml indent=yes";

declare namespace kml = "http://earth.google.com/kml/2.0"; let $key := "ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ" let $location := request:get-parameter("location",()) let $location := escape-uri($location,false()) let $url := concat("http://maps.google.com/maps/geo?q=",$location,"&amp;output=xml&amp;key=",$key) let $response := doc($url) let $x := response:set-header('Content-disposition',concat('inline;filename="',$location,'.kml";')) return <kml xmlns="http://earth.google.com/kml/2.0"> <Folder> <name>{$location}</name> { for $place in $response//kml:Placemark return <Placemark> <name>{string($place/kml:address)}</name> {$place/kml:Point} </Placemark> } </Folder> </kml>

If you have GoogleEarth, this should load an overlay: Utopia KML [6]

Google Geocoding

478

GoogleMap
A simple way to view the generated kml is to use GoogleMaps. This script simply constructs the relevant GoogleMap URL and then redirects to that URL:
let $location := request:get-parameter("location",()) let $location := escape-uri($location,false()) let $wikiurl := escape-uri(concat("http://www.cems.uwe.ac.uk/xmlwiki/geocodekml.xq?location=",$location),false()) let $url := concat("http://maps.google.co.uk/maps?q=",$wikiurl) return response:redirect-to(xs:anyURI($url))

Map of Utopia Locations [7] This mimic of the GoogleMap is useful to check that the scripts are working, but more usefully, the geocoding service could be used within an application.

Simple Location Service


In the UK, the Google service geocodes full postcodes. In the case where only the latitude and longitude of the place is required, the following script may be sufficient:
declare namespace kml = "http://earth.google.com/kml/2.0"; declare option exist:serialize "method=xml media-type=text/xml omit-xml-declaration=no"; let $key := "ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ" let $location := request:get-parameter("location",()) let $location := escape-uri($location,false()) let $url := concat("http://maps.google.com/maps/geo?q=",$location,"&amp;output=xml&amp;key=",$key) let $response := doc($url) let $place := $response//kml:Placemark[1] let $point := $place/kml:Point/kml:coordinates let $coords := tokenize($point,",") return <location> <lat>{$coords[2]}</lat> <long>{$coords[1]}</long> </location>

This is now a service which can be used as a REST service. The Postcode for UWE, Bristol [8] Here's a Yahoo Pipe to take some data on Scotch Whiskies: <?xml version="1.0" encoding="UTF-8"?> <WhiskyList> <Whisky> <Brand>Glen Ord</Brand> <Address>Glen Ord Distillery, Muir of Ord, Ross-shire</Address>

Google Geocoding <Postcode>IV67UJ</Postcode> </Whisky> <Whisky> <Brand>Dalwhinnie</Brand> <Address>Dalwhinnie Distillery, Dalwhinnie, Inverness-shire</Address> <Postcode>PH191AB</Postcode> </Whisky> <Whisky> <Brand>Laphroaig</Brand> <Address>Laphroaig Distillery, Port Ellen, Isle of Islay</Address> <Postcode>PA427DU</Postcode> </Whisky> </WhiskyList> and generate a geo-coded RSS feed: Whisky Map [9]

479

RSS feed
Of course this feed could be generated in XQuery alone:
declare namespace geo = "http://www.w3.org/2003/01/geo/wgs84_pos#"; declare namespace kml = "http://earth.google.com/kml/2.0"; declare option exist:serialize "method=xml omit-xml-declaration=no indent=yes encoding=iso-8859-1 media-type=application/rss+xml"; declare variable $key := "ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ"; declare function local:geocode-location($location as xs:string) { let $url := concat("http://maps.google.com/maps/geo?q=",$location,"&amp;output=xml&amp;key=",$key) let $response := doc($url) let $place := $response//kml:Placemark[1] let $point := $place/kml:Point/kml:coordinates let $coords := tokenize($point,",") return ( <geo:lat>{$coords[2]}</geo:lat>, <geo:long>{$coords[1]}</geo:long> ) }; <rss version='2.0' xmlns:geo = "http://www.w3.org/2003/01/geo/wgs84_pos#"> <channel> <title>Whiskies of Scotland</title> { for $whisky in //Whisky let $postcode := $whisky/Postcode

Google Geocoding
let $location := local:geocode-location($postcode) return <item> <title>{string($whisky/Brand)}</title> <description>{string($whisky/Address)}</description> {$location} </item> } </channel> </rss>

480

RSS feed [10]

References
[1] [2] [3] [4] http:/ / www. google. com/ apis/ maps/ documentation/ services. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ googlegeocode. xq?location=Minneapolis http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ googlegeocode. xq?location=Bristol,UK http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ googlegeocode. xq?location=Utopia

[5] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ googlegeocode. xq?location=Santas+ House [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ geocodekml. xq?location=Utopia [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ geocodemap. xq?location=Utopia [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ geocode. xq?location=BS16+ 1QY [9] http:/ / pipes. yahoo. com/ pipes/ pipe. info?_id=OEuSIml73BGq1a0QouNLYQ [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ whiskyRSS. xq

Gotchas
generalised equals
= is a sequence comparison which is true if the intersection is not empty. Thus: (1, 2, 3) = (3, 4, 5) and 3 = (3, 4, 5) but there are some oddities. For example () = () is false, as is () != () The operator eq is used to compare values only.

Gotchas

481

Arithmetic operators
The minus sign needs space around it since $x-3 is a valid variable name which is not the same as $x - 3.

Matching brackets etc


Check carefully for matching single and double quotes, round and square brackets and curly braces. The java client will show the matching bracket, but errors are often poorly diagnosed by XQuery compilers.

Binding
:= is the binding operator. The abbreviated form in which multiple binding statements can be separated by a comma: let $x := 3 let $y := 4 let $x := "fred" Abbreviates to let $x := 3, $y := 4, $x := "fred" is convenient but can lead to errors when code is amended. Consider avoiding this syntax.

Conditional expression
The conditional expression must have the else part. Return the empty sequence () if one alternative is not required: if ($x = 4) then "Four" else ()

Sorting
'Order by' sorts numbers as text order by $c/population To get it sorted as a number, you have to use the number() function order by number($C/population) or cast to a number type order by xs:integer($c/population) or order by $c/population cast as xs:integer

Gotchas

482

XML construction
You can build XML by simply starting a tag. These tags are really XQuery expression operators. However, this puts you in a lexical scope where everything is XML and curly braces then switch back to normal XQuery (and an open tag will then switch back to XML). Escape curly braces with double braces (). Note the toggling between XQuery and XML construction modes: let $a:= "bob" let $b:= "jane" let $ab := ($a, $b) return <people> { for $person in $ab return <person> {$person} </person> } </people>

Comments
Comments in XQuery use (: ... :) whereas comments in XML use <!-- ... --> It is easy to use the wrong kind in the wrong context, particularly XQuery comments in constructed XML. <A> (: a comment :) </A> makes the comment the text part of an XML element.

let
let statements are part of a FLOWR expression and can't appear alone. XQuery is a functional language and let statements are only temporary bindings of names to expressions. let $x := 5 return $x

function result
return is not required in functions. It is part of a FLOWR statement and not a statement in itself (like let). So it is not required and not allowed if there is no for or let. declare function local:sum($a, $b) { $a + $b } or declare function local:sum($a, $b) { let $c := $a + $b return $c } but not declare function local:sum($a, $b) { return $a + $b

Gotchas }

483

Graph Visualization
Graphviz [1] developed by AT&T provides a package of code for generating graph images from a text definition. The input file in 'dot' format can be generated by an XQuery script with text output.

Motivation
You want to create a graph to visualize complex structures such as taxonomies, object hierarchies or organizational hierarchies.

Database visualization
A graphical representation of the relationships between employee and manager in the empdept example. This script generates the dot format file, with employees as (implicit ) nodes and arcs from employee to manager to show managed by relationships. The output is serialised as text. Serialisation strips out all XML tags so XML can be used to structure the output and there is no need to serialise each item. The Graphviz dot format uses { } curly brackets as delimiters so these need to be escaped (doubled) in XQuery. declare option exist:serialize "method=text "; <graph> digraph {{ { for $emp in //Emp let $mgr := //Emp[EmpNo = $emp/MgrNo] where exists($mgr) return concat( $emp/Ename, " -> ", $mgr/Ename, ";") } }} </graph> Generate dot file [2] If this is now passed through a Graphviz transformer (here a standalone service), we get a graph of these relationships as gif image: PNG image [3] SVG image [4] This would look more like a typical organisational chart if the graph was reversed. Graphviz provides a wide range of controls over the content and appearance of the graph. declare option exist:serialize "method=text "; <graph> digraph {{ rankdir=BT; { for $emp in //Emp

Graph Visualization let $mgr := //Emp[EmpNo = $emp/MgrNo] where exists($mgr) return concat( $emp/Ename, " -> ", $mgr/Ename, ";") } }} </graph> PNG image [5] Since Enames are not necessarily unique, it would be better to use the EmpNo as the node identifier and label the node with the name: declare option exist:serialize "method=text "; <graph> digraph {{ {for $emp in //Emp let $mgr := //Emp[EmpNo = $emp/MgrNo] return <emp> {$emp/EmpNo} [label="{$emp/Ename}"]; {if ( exists($mgr)) then <arc> {$mgr/EmpNo} -> {$emp/EmpNo} ; </arc> else () } </emp> } }} </graph> image [6] Similarly, the Department/Employee Hierarchy can be graphed: declare option exist:serialize "method=text "; <graph> digraph {{ {for $dept in //Dept return <dept> Company -> {$dept/DeptNo} ; {$dept/DeptNo} [ label="{$dept/Dname}" ]; { for $emp in //Emp[DeptNo = $dept/DeptNo] return <emp>

484

Graph Visualization {$emp/EmpNo} [label="{$emp/Ename}" ]; {$dept/DeptNo} -> {$emp/EmpNo} ; </emp> } </dept> } }} </graph> image [7]

485

References
[1] http:/ / graphviz. org/ [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ hierarchydot. xq [3] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ hierarchydot. xq [4] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?output=svg& url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ hierarchydot. xq [5] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ hierarchyrevdot. xq [6] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ hierarchynodot. xq [7] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ companytodot. xq

HelloWorld
Motivation
You want to run a small program that tests to see if your XQuery execution environment is working.

XML Output
xquery version "1.0"; let $message := 'Hello World!' return <results> <message>{$message}</message> </results> Execute [1]

HelloWorld

486

Expected Output
<?xml version="1.0" encoding="UTF-8"?> <results> <message>Hello World!</message> </results>

Discussion
The program creates a temporary variable called $message and assigns it a string value. The output is an XML element containing a message element which contains the value of the variable.

Suggestions
Try omitting the curly braces from inside of the result message element. What do you get? [Execute [2]] What happens if you omit the results wrappers? [Execute [3]]

Plain Text
You can get XQuery to return plain text using serialization options which define the serialization and the output media-type. For example to output the message as text, specify the serialization as text and the media-type as text/plain. xquery version "1.0"; declare option exist:serialize "method=text media-type=text/plain"; let $message := 'Hello World!' return $message

[Execute [4]]

Expected Output
Depending on your browser set-up, this will launch a viewer for text documents and display Hello World!

Execution Methods
If you are using the oXygen IDE this can be done by selecting the "transform" icon on the toolbar. If you are running this program in the eXist databases you can upload a file called hello.xq using the "Browse" function in the web administrator and then run the following in the browser: http://localhost:8080/exist/rest/db/hello.xq There are three important items to note in this URL. 1. This is the URL that you would use if you used the default eXist configuration 2. Note that the world "rest" is in the URL before the "/db" indicating that you are using the REST interface (as opposed to the WebDAV, Atom or SOAP interface) 3. Note that the port is "8080" (the default port for development web sites) and that the "context" of the server is "exist". Both of these can be easily changed by editing the $EXIST_HOME/tools/jetty/etc/conf.xml file and restarting your eXist server. The short-form on production sites might be: http://localhost/rest/db/hello.xq

HelloWorld With tools like URL rewriting you can also remove the "/rest" and the "/db" components of the URL.

487

References
[1] [2] [3] [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ helloWorld. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ helloWorld_1. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ helloWorld_2. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ helloWorld_3. xq

HTML Table View


Motivation
When we have an XML file with uniform tabular structure, with no repeated or compound children, a generic table view is handy. We can use the XML element names as the column headers.

Sequence as XHTML Table


Here is a function which achieves this. It takes a sequence of elements and creates an HTML table, one row per node in the sequence. It uses a little introspection with the name(function) to get the names of the children of the first node to form the column headings. For each node, the children is accessed by node name so that there is no requirement that all elements are present or in the same order as in the first node. declare base-uri "http://www.w3.org/1999/xhtml"; declare function local:sequence-to-table($seq) { (: assumes all items in $seq have the same simple element structure determined by the structure of the first item :) <table border="1"> <thead> <tr> {for $node in $seq[1]/* return <th>{name($node)}</th> } </tr> </thead> {for $row in $seq return <tr> {for $node in $seq[1]/* let $data := string($row/*[name(.)=name($node)]) return <td>{$data}</td> } </tr> } </table> };

HTML Table View This could then be used to view selected nodes: local:sequence-to-table(//Emp)

488

This approach is ideal if you know that the first node in a dataset has all the elements for all the columns in the table. This approach is used in the later Database example to display computed sequences. The following line must be added if you are using strict XHTML. This puts all the HTML tags (<table>, <htead>, <th>, <tbody>, <tr> and <td>) in the correct namespace. declare base-uri "http://www.w3.org/1999/xhtml";

Execute [1]

Sequence as CSV
A similar approach can be used to export the sequence as CSV. Here the header Content-Disposition is set so the Browser will allow the generated file to be opened directly in Excel.
declare option exist:serialize "method=text media-type=text/text"; declare variable declare variable $sep := ','; $eol := '&#10;';

declare function local:sequence-to-csv($seq) as xs:string { (: returns a string-join( (string-join($seq[1]/*/name(.),$sep), for $row in $seq return string-join( for $node in $seq[1]/* let $data := string($row/*[name(.)=name($node)]) return if (contains($data,$sep)) then concat('"',$data,'"') else $data , $sep) ),$eol ) }; let $x := response:set-header('Content-Disposition','inline;filename=empdept.csv') return local:sequence-to-csv(//Emp) multi-line string of comma delimited strings :)

Execute [2]

HTML Table View

489

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empTable. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empTablecsv. xq

Incremental Search of the Chemical Elements


Introduction
In this example of AJAX in its AHAH form, an incremental search of the chemical elements is implemented. This is also an example of the use of matching with a regular expression. The raw data is taken from a file [1]provided by Elliotte Rusty Harold [2].

Execution
Search the Elements [3]

The main page


The main page is a simple HTML file. The div element with id list is where the generated contents will be placed. The JavaScript function getList() is called when any of several interface events occur.
declare option exist:serialize "method=xhtml media-type=text/html

doctype-public=-//W3C//DTD&#160;HTML&#160;4.01&#160;Transitional//EN doctype-system=http://www.w3.org/TR/loose.dtd";

<html xmlns="http://www.w3.org/1999/xhtml" > <head> <title>Chemical Elements</title> <script language="javascript" src="ajaxelement.js"/> <style type="text/css"> td {{background-color: #efe; font-size:14px;}} th {{background-color: #ded; text-align: right; font-variant:small-caps;padding:3px; font-size:12px;}} </style> </head> <body> <h1>Chemical Elements</h1> <table class="page"> <tr> <td valign="top" width="30%"><form onSubmit="getList(); return false"> <span><label for="name">Element Name </label> <input type="text" size="5" name="name" id="name" title="e.g. Silver" onkeyup="getList();" onfocus="getList();" /> </span> </form> </td>

Incremental Search of the Chemical Elements


<td id="list"/> </tr> </table> </body> </html>

490

The JavaScript
The JavaScript implements the simple functionality of calling the server-side script getElement.xq with the string entered in the search box, and in the callback, pasting the returned XHTML into the div. function updateList() { if (http.readyState == 4) { var divlist = document.getElementById('list'); divlist.innerHTML = http.responseText; isWorking = false; } } function getList() { if (!isWorking && http) { var name = document.getElementById("name").value; http.open("GET", "getElement.xq?name=" + name); http.onreadystatechange = updateList; // this sets the call-back function to be invoked when a response from the HTTP request is returned isWorking = true; http.send(null); } } function getHTTPObject() { var xmlhttp; /*@cc_on @if (@_jscript_version >= 5) try { xmlhttp = new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { try { xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } catch (E) { xmlhttp = false; } } @else xmlhttp = false; @end @*/ if (!xmlhttp && typeof XMLHttpRequest != 'undefined') {

Incremental Search of the Chemical Elements try { xmlhttp = new XMLHttpRequest(); xmlhttp.overrideMimeType("text/xml"); } catch (e) { xmlhttp = false; } } return xmlhttp; } var http = getHTTPObject(); // var isWorking = false; create the HTTP Object

491

The server-side search


This script is called by the getlist() function when a partial atom name has been entered. The string is converted to a simple regular expression and used in the eXist free text matching function to retrieve matching atoms. The response depends on the number of matches found: if only one match, return a table of details if more than one match, return with a list of matches if none return "no matches" A function turns the ATOM node into a table,mapping each child node to a row with the node name as the legend. declare function local:atom-to-table($element ) { <table class="element"> {for $node in $element/* let $label := replace(name($node),"_"," ") return <tr> <th>{$label}</th> <td> { $node/text() } </td> </tr> } </table> }; let $name := request:get-parameter("name",()) return if ($name != "") then let $search := concat('^',$name) (: anchor the term to the start of the string :) let $elements := doc("/db/Wiki/ajax/periodicTable.xml")/PERIODIC_TABLE let $matches := $elements/ATOM[matches(NAME,$search,"i")] return

Incremental Search of the Chemical Elements if (count($matches) = 0) then <span>No matches</span> else if (count($matches) =1) then local:atom-to-table($matches) else (: multiple matches :) <table class="list"> <tr> <th>Name</th> <th>Symbol</th> <th>Atomic Weight</th> </tr> {for $match in $matches order by $match/NAME return <tr> <th>{string($match/NAME)}</th> <td>{string($match/SYMBOL)}</td> <td>{string($match/ATOMIC_WEIGHT)} </td> </tr> } </table> else ()

492

To do
Some naming problems here - needs tidying. Units need to be included

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ajax/ periodicTable. xml [2] http:/ / www. cafeconleche. org [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ajax/ periodicTable. xq

Limiting Result Sets

493

Limiting Result Sets


Motivation
Sometimes you have many records or instances in a collection and you want to limit the amount of data returned by a query.

Strategy
Limiting Records in a Document
If you are limiting records in a large XML document you can do this by adding a predicate to the end of your for loop: for $person in doc($file)/Person[position() lt 10]

Using Subsequence to Limit Results


The following query retrieves only the first 10 documents, in sequential document order, contained within a collection. for $person in subsequence(collection($my-collection)/person, 1, 10) Where: subsequence($sequence, $starting-item, $number-of-items) Note that the first argument is the item to start at and the second is the total number of items to return. It is NOT the last item to return.

Sorting Before Limiting Results


Note that usually you will want to get the first N documents based on some sorting criteria, for example people that have a last name that start with the letter "A". So the first thing you must do is create a list of nodes that have the correct order and then get the first N records from that list. This can usually be done by creating a temporary sequence of sorted items as a separate FLWOR expression. let $sorted-people := for $person in collection($collection)/person order by $person/last-name/text() return $person for $person at $count in subsequence($sorted-people, $start, $records) return <li>$person/last-name/text()</li>

Limiting Result Sets

494

Getting the Next N Rows


After you fetch the first N items from your sequence you frequently want to get the next N rows. To do this you will need to add buttons to your report for "Previous N Records" and "Next N Records". Thse buttons will pass parameters to your XQuery telling where to start and how many records to fetch. To do this you will need to call your own script with different parameters on the URL. To keep track of the URL you can use the get-url() function that comes with eXist. For example: let $query-base := request:get-url() If your query was run from http:/localhost:8080/exist/rest/db/apps/user-manager/views/list-people.xq this is the string that would be returned. We will also get two parameters from the URL for the record to start at and the number of records to fetch: let $start := xs:integer(request:get-parameter("start", "1")) let $num := xs:integer(request:get-parameter("num", "20")) Now we will create two HTML buttons that allow us to get next N records or the previous N records.
<input type="button" onClick="parent.location='{$query-base}?start={$start - $num}&amp;num={$num}'" value="&lt; Previous"/> <input type="button" onClick="parent.location='{$query-base}?start={$start + $num}&amp;num={$num}'" value="Next &gt;"/>

Full Example Program


xquery version "1.0"; declare namespace xmldb="http://exist-db.org/xquery/xmldb"; declare namespace u="http://niem.gov/niem/universal/1.0"; declare option exist:serialize "indent=yes"; let $start := xs:integer(request:get-parameter("start", "1")) let $num := xs:integer(request:get-parameter("num", "5")) let $query-base := request:get-url() return <html> <title>Contacts</title> <body> <h1>Contacts</h1> <table border="1"> <thead> <tr> <th>ID</th> <th>Last Name</th> <th>First</th> <th>Street</th> <th>City</th> <th>State</th> <th>Zip</th> <th>EMail</th> <th>Phone</th> <th colspan="2">Function</th>

Limiting Result Sets </tr> </thead> <tbody> { for $person in subsequence(collection('/db/contacts/data'), $start, $num)/Person let $pid := $person/id let $lname := $person/u:PersonSurName/text() let $fname := $person/u:PersonGivenName/text() let $street := $person/u:StreetFullText/text() let $city := $person/u:LocationCityName/text() let $state := $person/u:LocationStateName/text() let $zip := $person/u:LocationPostalCodeID/text() let $email := $person/u:ContactEmailID/text() let $phone := $person/u:TelephoneNumberFullID/text() order by $lname, $fname return <tr> <td>{$pid}</td> <td>{$lname}</td> <td>{$fname}</td> <td>{$street}</td> <td>{$city}</td> <td>{$state}</td> <td>{$zip}</td> <td>{$email}</td> <td>{$phone}</td> <td><a href="update-person-form.xq?id={$pid}">Edit</a></td> <td><a href="delete-person.xq?id={$pid}">Delete</a></td> </tr> } </tbody> </table> <input type="button" onClick="parent.location='{$query-base}?start={$start $records}&amp;num={$num}'" value="&lt; Previous"/> <input type="button" onClick="parent.location='{$query-base}?start={$start + $records}&amp;num={$num}'" value="Next &gt;"/> <br/> <a href="create-person.xhtml">Create New Person</a> <br/> <a href="index.xhtml">Return to main demo page</a> </body> </html>

495

Manipulating URIs

496

Manipulating URIs
Motivation
Sometimes you need to be able to manipulate the URI of your own XQuery. This is useful when you need to call your own XQuery with different parameters. For example if you have an XQuery that returns the first 20 rows in a query but you want to add a Get Next 20 Records button you may want to simply call yourself with additional parameters for what record to start with, in this case start at record 21. This program demonstrates some XQuery functions that are not part of the original XQuery specification but are required for precise web server XQuery functionality. The functions are: eXist request:get-uri() - Returns the URI of the current request within the web server. For example /exist/rest/db/test/my-query.xq eXist request:get-url() - Returns the full URL including the server and port. http://www.example.com:8080/exist/rest/db/test/my-query.xq eXist request:get-query-string() - Returns the full query string passed to the servlet (without the initial question mark). eXist system:get-module-load-path() - returns the path to the place where a module has been loaded from eXist system:get-exist-home() - returns the base of the eXist web root

Sample Program
xquery version "1.0"; declare namespace system="http://exist-db.org/xquery/system"; declare namespace request="http://exist-db.org/xquery/request"; declare option exist:serialize "method=html media-type=text/html indent=yes";

let $get-uri := request:get-uri() let $get-url := request:get-url() let $module-load-path := system:get-module-load-path() let $exist-home := system:get-exist-home() let $path := substring-after($module-load-path, 'xmldb:exist://embedded-eXist-server') let $replace := replace($module-load-path, 'xmldb:exist://embedded-eXist-server', '')

return <html> <head> <title>URI Path Example</title> </head> <body> <h1>Sample URI manipulation with XPath</h1> <table border="1"> <thead> <tr> <th>Out</th> <th>In</th>

Manipulating URIs
</tr> </thead> <tr> <td>request:get-url()</td> <td>{$get-url}</td> </tr> <tr> <td>request:get-uri()</td> <td>{$get-uri}</td> </tr> <tr> <td>system:get-module-load-path()</td> <td>{$module-load-path}</td> </tr> <tr> <td>system:get-exist-home()</td> <td>{$exist-home}</td> </tr> <tr>

497

<td>substring-after(system:get-module-load-path(), 'xmldb:exist://embedded-eXist-server')</td> <td>{$path}</td></tr> <tr> <td>replace(system:get-module-load-path(), 'xmldb:exist://embedded-eXist-server', '')</td> <td>{$replace}</td> </tr> </table> </body> </html>

Execute [1]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ uri. xq?a=4& b=5

Nationalgrid and Google Maps

498

Nationalgrid and Google Maps


In the UK, the XML standard for the exchange of timetable information is TransXchange [1]. The location of, for example, bus stops, is expressed in Northings and Easting on the UK National Grid. To plot these on, say , Google Maps requires these coordinates to be transformed into latitude and longitude using the WGS84 datum.

TransXChange
Here is an extract from the beginning of a typical timetable document showing a single StopPoint definition: <TransXChange xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:apd="http://www.govtalk.gov.uk/people/AddressAndPersonalDetails" xmlns="http://www.transxchange.org.uk/" xsi:SchemaLocation="http://www.transxchange.org.uk/ TransXChange_general.xsd" CreationDateTime="2006-12-07T14:47:00-00:00" ModificationDateTime="2006-12-07T14:47:00-00:00" Modification="new" RevisionNumber="0" FileName="SVRSGAO070-20051210-5580.xml" SchemaVersion="2.1" RegistrationDocument="false"> <StopPoints> <StopPoint CreationDateTime="2006-12-07T14:47:00-00:00"> <AtcoCode>0100BRP90340</AtcoCode> <NaptanCode>BSTGAJT</NaptanCode> <Descriptor> <CommonName>Rupert Street (CA)</CommonName> <Landmark>NONE</Landmark> <Street>Rupert Street</Street> <Crossing>Colston Avenue</Crossing> </Descriptor> <Place> <NptgLocalityRef>N0076879</NptgLocalityRef> <Location> <Easting>358664</Easting> <Northing>173160</Northing> </Location> </Place> <StopClassification> <StopType>BCT</StopType> <OnStreet> <Bus> <BusStopType>MKD</BusStopType> <TimingStatus>OTH</TimingStatus> <MarkedPoint> <Bearing> <CompassPoint>N</CompassPoint> </Bearing> </MarkedPoint> </Bus>

Nationalgrid and Google Maps </OnStreet> </StopClassification> <AdministrativeAreaRef>010</AdministrativeAreaRef> </StopPoint>

499

Coordinate transformation
Transformation from OS National Grid Coordinates to WSG84 latitudes and longitudes used in GoogleMaps requires two kinds of transformation: between latitude and longitudes on an ellipsoidal model of the Earth and the Transverse Mercator projection used for the OS between the latitude/longitude coordinates based on on different ellipsoids used in the OS coordinates and the global WGS84 coordinates. An XQuery module which contains these functions and other utility functions is available in the XQuery Examples [2] Google Code project.

Conversion from TransXChange


As an example of the use of these functions, the following script converts the StopPoints of a TransXChange file to a simpler format with lat/long coordinates. Here a local correction is required for more accurate local registration.
(: Transforms the Stopcodes in a TransXchange file to a simpler format with National grid references converted to latitude and longitude :)

declare namespace tx="http://www.transxchange.org.uk/";

import module namespace geo="http://www.cems.uwe.ac.uk/xmlwiki/geo" at "../lib/geo.xqm";

declare option exist:serialize

"method=xml media-type=text/xml highlight-matches=none";

declare function local:camelCase($s) { string-join( for $word in tokenize($s,' ') return concat(upper-case(substring($word,1,1)), lower-case(substring($word,2))), ' ') };

<StopPointSet> {for $stopCode in distinct-values(//tx:StopPoint/tx:AtcoCode) let $stop := (//tx:StopPoint[tx:AtcoCode=$stopCode])[1] let $d := $stop/tx:Descriptor let $l := $stop/tx:Place/tx:Location return <StopPoint> <AtcoCode>{string($stop/tx:AtcoCode)}</AtcoCode> <CommonName>{string($d/tx:CommonName)}</CommonName> {if ($d/tx:Landmark ne 'NONE') then <LandMark>{local:camelCase($d/tx:Landmark)}</LandMark> else () }

Nationalgrid and Google Maps


<Street>{local:camelCase($d/tx:Street)}</Street> <Crossing>{local:camelCase($d/tx:Crossing)}</Crossing> {geo:round-LatLong(geo:OS-to-LatLong(geo:Mercator($l/tx:Easting, $l/tx:Northing)),6)} </StopPoint> } </StopPointSet>

500

Convert [3]

Output
The output of this transformation contains StopPoints e.g.
<StopPoint> <AtcoCode>0170SGP90690</AtcoCode> <CommonName>Coldharbour Lane</CommonName> <Street>Coldharbour Lane</Street> <Crossing>Filton Road</Crossing>

<geo:LatLong xmlns:geo="http://www.cems.uwe.ac.uk/xmlwiki/geo" latitude="51.503924" longitude="-2.544798"/> </StopPoint>

Mapping the bus stops


One application of this data would be to plot the stops within a given range of a location. This requires a distance calculation which is good enough for small distances :
declare function geo:plain-distance ($f, $s as element(geo:LatLong)) let $dlat := ($f/@latitude - $s/@latitude) * 60 as xs:double {

let $longCorr := math:cos(math:radians(($f/@latitude +$s/@latitude) div 2)) let $dlong := ($f/@longitude - $s/@longitude) * 60 * $longCorr return math:sqrt(($dlat * $dlat) + ($dlong * $dlong)) };

To generate the kml file:


(: return the StopPoints within $range of $latitude and $longitude :)

import module namespace geo="http://www.cems.uwe.ac.uk/xmlwiki/geo" at "../lib/geo.xqm"; import module namespace gmap = "http://www.cems.uwe.ac.uk/xmlwiki/gmap" at "../lib/gmap.xqm"; declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none";

let $latitude := xs:decimal(request:get-parameter("latitude", 51.4771)) let $longitude := xs:decimal(request:get-parameter ("longitude",-2.5886)) let $range := xs:decimal(request:get-parameter("range",0.5)) let $focus := geo:LatLong($latitude,$longitude) let $x := response:set-header('Content-Disposition','attachment;filename=stops.kml;')

return <Document> <name>Bus Stops within {$range} miles of {geo:LatLong-as-string($focus)}</name>

Nationalgrid and Google Maps


<Style id="home"> <IconStyle> <Icon><href>http://maps.google.com/mapfiles/kml/pal2/icon2.png</href> </Icon> </IconStyle> </Style> <Style id="stop"> <IconStyle> <Icon><href>http://maps.google.com/mapfiles/kml/pal5/icon13.png</href> </Icon> </IconStyle> </Style>

501

<Placemark> <name>Home</name> <Point> <coordinates>{gmap:LatLong-as-kml($focus)}</coordinates> </Point> <styleUrl>#home</styleUrl> </Placemark> { for $stop in doc("/db/Wiki/geo/stopPoints.xml")//StopPoint let $dist := geo:plain-distance($focus,$stop/geo:LatLong) * 0.868976242 (: distance is in nautical where $dist < $range return <Placemark> <name>{string($stop/CommonName)}</name> <description> {concat($stop/CommonName,' ',$stop/Landmark,' on ', $stop/Street, ' near ', $stop/Crossing)} </description> <Point> <coordinates>{gmap:LatLong-as-kml($stop/geo:LatLong)}</coordinates> </Point> <styleUrl>#stop</styleUrl> </Placemark> } </Document> is {geo:round($dist,2)} miles away. miles :)

Stops within half a mile of my home as KML [4] rendered by GoogleMap [5]. On GoogleMaps the stops appear to be closely aligned to the bus stop overlay, presumably generated from the same base locations.

Icons
Selecting Icons for kml is eased if you can easily browse them. Here is a simple browser in XQuery: declare variable $base := "http://maps.google.com/mapfiles/kml/"; declare option exist:serialize "method=xhtml media-type=text/html"; <html> <h2>Google Earth icons</h2>

Nationalgrid and Google Maps <p>Base url {$base}</p> {for $pal in (2 to 5) return <div> <h2>Palette pal{$pal}</h2> {for $i in (0 to 63) let $icon := concat('pal',$pal,'/icon',$i,'.png') return <img src="{$base}{$icon}" title="{$icon}"/> } </div> } </html> Browse kml icons [6]

502

References
[1] [2] [3] [4] [5] [6] http:/ / www. transxchange. org. uk/ http:/ / code. google. com/ p/ xquery-examples/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ txc2Stops. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ stopsNearbykml. xq http:/ / maps. google. com/ maps?q=http:%2F%2Fwww. cems. uwe. ac. uk%2Fxmlwiki%2Fgeo%2FstopsNearbykml. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ showIcons. xq

Net Working Days


Motivation
To calculate the "effective" age of many documents you want to count the number of working days they have been in various stages. This means you count the weekdays but not the weekend days. You can even discard the holidays if you want to have consistent aging reports.

Approach
Since NetWorkingDays is a calculation that is shared by many systems, it makes sense to use a XQuery module to put the logic into.
module namespace fxx = "http://xquery.wikibooks.org/fxx";

declare function fxx:net-working-days-n($s as xs:date, $f as xs:date, $dates as xs:date*, $total as xs:integer) as xs:integer { if ($s= $f) then $total else if (fxx:weekday($s) and not($s = $dates)) then fxx:net-working-days-n($s + xs:dayTimeDuration('P1D'), $f, $dates,$total + 1) else fxx:net-working-days-n($s + xs:dayTimeDuration('P1D'), $f, $dates,$total ) };

declare function fxx:net-working-days($s as xs:date, $f as xs:date) as xs:integer {

Net Working Days


(: this function returns one less than the number returned from Excel NETWORKDAY : networkdays($d,$d) should be 0 but it is 1. networkdays and workday should be inverses and they are not common practice seems to be to subtract one anyway. :) (: assumes $s <= $f :)

503

fxx:net-working-days-n($s,$f, (), 0)

};

declare function fxx:net-working-days($s as xs:date,$f as xs:date, $dates as xs:date*) as xs:integer { fxx:net-working-days-n($s,$f, $dates, 0)

};

The heart of this calculation is a NetWorkingDays algorithm that is passed two dates.

Sample Test Driver


xquery version "1.0";

import module namespace fxx = "http://xquery.wikibook.org/fxx" at "net-working-days.xq";

(: Test driver for Net Working Days : tags to generate documentation using xqdoc.org scripts at http://www.xqdoc.org/qs_exist.html : : @return XHTML table for next "caldays" days from now including working days calculations from today : @input-parameter: caldays - an integer number of calendar days in the future from now : :)

let $cal-days := xs:integer(request:get-parameter("caldays", "30"))

let $now := xs:date(substring(current-date(),1,10)) return <html> <body> <h1>Days from {$now}</h1> <p>Today is a {fxx:day-of-week-name-en(xs:date(substring(current-date(),1,10)))}</p> <p>Format: net-working-days.xq?cal-days=50</p> <table border="1"> <thead> <tr> <th>Cal Days</th> <th>Furture Date</th> <th>Day of Week</th>

Net Working Days


<th>Net Working Days</th> </tr> </thead> { for $i in (0 to $cal-days) let $d := $now + xs:dayTimeDuration(concat('P',$i,'D'))

504

let $dow := fxx:day-of-week($d) return <tr> <td align="center">{$i}</td> <td align="center">{$d}</td> <td align="center">{fxx:day-of-week-name-en(xs:date(substring($d,1,10)))}</td> <td align="center">{fxx:net-working-days(xs:date(substring(current-date(),1,10)),$d)}</td> </tr> } </table> <br/> <a href="index.xhtml">Back to Unit Testing Main Menu</a> <br/> <a href="../../index.xhtml">Back to CRV Main Menu</a> </body> </html>

Execute [1] - a modified version of this script - under test.

Discussion
The recursive function works but it is slow. It has to call itself once for each date between the two dates. An alternative approach is to count the end days in each fraction of a week, count the weeks and multiply by five. Code??

Acknowledgments
An initial version of this was provided by Chris Wallace.

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ workCalendar. xq

Page scraping and Yahoo Weather

505

Page scraping and Yahoo Weather


Background
Yahoo provide a world weather forecast service via a REST API, delivering RSS. It is described in the API documentation [1]. However the key to each feed for UK towns is a Yahoo Location ID such as UKXX0953 and there is no service available to convert from location names to Yahoo codes. Yahoo does provide alphabetical index pages of locations which contain links to the feeds themselves.

Yahoo Pipe
This task can be accomplished by the Yahoo Pipe [2] written by Paul Daniel. (up to the extraction of the location ID) However the inherent instability of HTML markup leads to the current failure of this pipeline.

XQuery
This script takes a location parameter, extracts the first letter of the location, constructs the URL of the yahoo weather index page for that letter, the index page for the letter B [3] and fetches the page via the httpclient module in eXist. The page is not valid XHTML but the httpclient:get function cleans up the XML so it is well-formed. HTML page [4] The page structure can be seen in the tree view [5]. Next this XML is navigated to locate the li element containing the location and strips out the code for that location. Finally this code is appended to the stem of the URL of the RSS page for this location, created a URL fo rthe RSS feed at that location. RSS feed [6] and the script then redirects to that URL. This process can be visualized using a data flow diagram Diagram [7] declare variable $yahooIndex := "http://weather.yahoo.com/regional/UKXX"; declare variable $yahooWeather := "http://weather.yahooapis.com/forecastrss?u=c&amp;p="; let let let let let let let let $location := request:get-parameter("location","Bristol") $letter := upper-case(substring($location,1,1)) $suffix := if($letter eq 'A') then '' else concat('_',$letter) $index := xs:anyURI(concat ($yahooIndex,$suffix,".html")) $page := httpclient:get($index,true(),()) $href := $page//div[@id="yw-regionalloc"]//li/a[.= $location]/@href $code := substring-after(substring-before($href,'.'),'forecast/') $rss := xs:anyURI(concat($yahooWeather,$code) )

return response:redirect-to ($rss) Bristol RSS feed [8] Cardiff RSS feed [9]

Page scraping and Yahoo Weather

506

Notes
1. Although the index page is not valid XHTML (why not?) and needs tidying, Yahoo have been helpful to the scapper by using ids on the sections. This allows the XPath expression to pick out the relevant section by id, and then select the li containing the location. However such tagging is not stable, and in fact changed recently from an id of browse to the current yw-regionalloc. Note also that there is additional work required because the page for A has a different URL to the remainder of the letters -a feature not easily seen or tested for. 2. eXist is not ideally suited to this task since the page has to be first stored in the database so that XPath expressions can be executed using the structural index. An in-memory XQuery engine such as Saxon would be expected to perform better on this task. At present the performance is a bit slow but the new 1.3 release improves this situation. 3. Extracting the code from the string would be clearer with a regular expression, but XQuery does not provide a simple matching function to extract the matched pattern. An XQuery function which wraps some XSLT to do this is described in analyse-string 4. The script uses the eXist function response:redirect-to to re-direct the browser to the constructed URL for the RSS feed

XSLT
For comparison, here is the equivalent XSLT script, using analyse-string.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:param name="location"/> <xsl:variable name="html2xml">

<xsl:text>http://www.html2xml.nl/Services/html2xml/version1/Html2Xml.asmx/Url2XmlNode?urlAddress=</xsl:text> </xsl:variable> <xsl:variable name="yahooIndex"> <xsl:text>http://weather.yahoo.com/regional/UKXX_</xsl:text> </xsl:variable> <xsl:variable name="yahooWeather"> <xsl:text>http://weather.yahooapis.com/forecastrss?u=c&amp;p=</xsl:text> </xsl:variable> <xsl:template match="/"> <xsl:variable name="letter" select="upper-case(substring($location,1,1))"/> <xsl:variable name="suffix" select="if($letter eq 'A') then '' else concat('_',$letter)"></xsl:variable> <xsl:variable name="page" select="doc(concat ($html2xml,$yahooIndex,$suffix,'.html'))"/> <xsl:variable name="href" select="$page//div[@id='yw-regionalloc']//li/a[.= $location]/@href"/> <xsl:variable name="code" > <xsl:analyze-string select="$href" regex="forecast(.*)\.html"> <xsl:matching-substring> <xsl:value-of select="regex-group(1)"/> </xsl:matching-substring> </xsl:analyze-string> </xsl:variable> <xsl:variable name="rssurl" select="concat($yahooWeather,$code)"/> <xsl:copy-of select="doc($rssurl)"/> </xsl:template>

Page scraping and Yahoo Weather


</xsl:stylesheet>

507

Bristol Weather [10] - but currently broken

XPL
Another approach is to use XPL [11] developed by Erik Bruchez and Alessandro Vernet at Orbeon to describe the sequence of transformations as a pipeline. Here the pipeline is extended to create a custom HTML page from the RSS feed.
<?xml version="1.0" encoding="UTF-8"?> <p:pipeline xmlns:p="http://www.cems.uwe.ac.uk/xpl" <p:output id="weatherPage"/> <p:processor name="xslt"> <p:annotation>construct the index page url from the parameter</p:annotation> <p:input name="parameter" id="location"/> <p:input name="xml"> <dummy/> </p:input> <p:input name="xslt"> <xsl:template match="/"> <xsl:text>http://weather.yahoo.com/regional/UKXX_</xsl:text> <xsl:value-of select="upper-case(substring($location,1,1))"/> <xsl:text>.html</xsl:text> </xsl:template> </p:input> <p:output name="result" id="indexUrl"/> </p:processor> <p:processor name="tidy"> <p:annotation>tidy the index page</p:annotation> <p:input name="url" id="indexUrl"/> <p:output name="xhtml" id="indexXhtml"/> </p:processor> <p:processor name="xslt"> <p:annotation>parse the index page and construct the URL for the RSS feed</p:annotation> <p:input name="xml" id="indexXhtml"/> <p:input name="parameter" id="location"/> <p:input name="xslt"> <xsl:template match="/"> <xsl:variable name="href" select="//div[@id='yw-regionalloc']//li/a[.= $location]/@href"/> <xsl:text>http://weather.yahooapis.com/forecastrss?u=c%26p=</xsl:text> <xsl:value-of select="substring-before(substring-after($href,'forecast/'),'.html')" /> </xsl:template> </p:input> <p:output name="result" id="rssUrl"/> </p:processor> <p:processor name="fetch"> <p:annotation>fetch the RSS feed</p:annotation> xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >

Page scraping and Yahoo Weather


<p:input name="url" id="rssUrl"/> <p:output name="result" id="RSSFeed"/> </p:processor> <p:processor name="xslt"> <p:annotation>Convert RSS to an HTML page</p:annotation> <p:input name="xml" id="RSSFeed"/> <p:input name="xslt" href="http://www.cems.uwe.ac.uk/xmlwiki/weather/yahooRSS2HTML.xsl"/> <p:output name="result" id="weatherPage"/> </p:processor> </p:pipeline>

508

Given implementations for each of the named processor types, this can be executed prototype XQuery processor )

[12]

(albeit rather slowly in this

This is a work in progress - at present this XPL engine is only a very simple, partial prototype, and even this simple sequential example is not conformant with the XPL schema (hence the local namespace). The pipeline can be visualized [13] using GraphViz. The intention is to generate an additional image map to support linking to the underlying processes as well as support the full XPL language

References
[1] [2] [3] [4] [5] [6] [7] http:/ / developer. yahoo. com/ weather/ http:/ / pipes. yahoo. com/ pipes/ pipe. info?_id=MEY4dst33BGiVWNbOTY80A http:/ / weather. yahoo. com/ regional/ UKXX_B. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ geturi. xq?uri=http:/ / weather. yahoo. com/ regional/ UKXX_B. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ treeview. xq?path=no& uri=http:/ / weather. yahoo. com/ regional/ UKXX_B. html http:/ / xml. weather. yahoo. com/ forecastrss?p=UKXX0025 http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2image. php?format=gif& url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ DataFlow/ xpl2dot. xq?url=/ db/ Wiki/ DataFlow/ yahooweatherpl. xml [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ weather/ yahoo. xq?location=Bristol [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ weather/ yahoo. xq?location=Cardiff [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ xslt2html. xq?xslt=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ weather/ yahooRSS. xsl& location=Bristol [11] http:/ / www. orbeon. com/ ops/ doc/ reference-xpl-pipelines [12] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xmlpipes/ executeXPL. xq?location=Bristol [13] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2image. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xmlpipes/ xpl2dot. xq& format=gif

Parsing Query Strings

509

Parsing Query Strings


Motivation
Normal http query strings use the ampersand (&) character in order to differentiate between different terms in a query string. However, because ampersands also are used as the start of entities within HTML and XML, this can make it difficult encoding parametric content into XML links, and it moreover makes it difficult to visually decipher query strings. This program illustrates how to parse query strings using alternative delimiters (such as the semi-colon). This program demonstrates some standard XQuery functions that are not part of the original XQuery specification but are required for precise web server XQuery functionality. The functions are: exist request:get-method() exist util:unescape-uri() exist request:get-query-string() exist request:get-parameter()

exist request:get-parameter-names()

Namespace
module namespace common = "http:/ / www. metaphoricalweb. org/ xmlns/ common";

Platform
eXist

Parsing Query Strings


common:get-parameters
This base function retrieves the query string from the URI, parses the string using the given delimiter and creates an XML structure of the form <params> <param name="param1" value="paramval1"/> <param name="param2" value="paramval2"/> </params>
declare function common:get-parameters($delimiter as xs:string) as node() { let $params := if (request:get-method() = "GET") then let $query-string := util:unescape-uri(request:get-query-string(),"UTF-8") let $parsed-query := tokenize($query-string,$delimiter) return <params> {for $parsed-query-term in $parsed-query let $parse-query-name := substring-before($parsed-query-term,"=") let $parse-query-value := substring-after($parsed-query-term,"=") return <param name="{$parse-query-name}" value="{$parse-query-value}"/> }

Parsing Query Strings


</params> else <params> {for $name in request:get-parameter-names() let $parse-query-name := $name let $parse-query-value := request:get-parameter($name,"") return <param name="{$parse-query-name}" value="{$parse-query-value}"/> } </params> return $params };

510

common:get-parameter
This function retrieves a sequence of string values corresponding to the values for a given parameter key given in the query string. Note that while typically there will be only one string in the sequence, if you have a query string of the form ?a=val1;b=val2;a=val3 then get-parameter("a","",";") will return ("val1","val3")
declare function common:get-parameter($param-name as xs:string,$default-value as xs:string,$delimiter as xs:string) as xs:string* { let $params := common:get-parameters($delimiter) let $param-nodes := $params/param[@name=$param-name] let $param-values := for $param-node in $param-nodes return if ($param-node/@value) then string($param-node/@value) else $default-value return }; $param-values

common:get-parameter-names
This function retrieves the name of each query string key (once and only once per key).
declare function common:get-parameter-names($delimiter as xs:string) as xs:string* { let $params := common:get-parameters($delimiter) for $param-name in distinct-values($params/param/@name) return $param-name };

Parsing Query Strings

511

Example Program
Assumes query string of http://www.metaphoricalweb.org/?a=5;b=test;a=8;c=new+message let $msg := common:get-parameter("c","",";") return $msg returns [Execute [1]] new message <data> { for $key in common:get-parameter-names(";") return <seq>{$key}:{common:get-parameter($key,"",";")}</seq> } </data> returns [Execute [2]] <data> <seq>a:5 8</seq> <seq>b:test</seq> <seq>c:new message</seq> </data let $seq1 := common:get-parameter("a",0,";") return sum(for $n in $seq1 return number($n)) returns [Execute [3]] 13

References
[1] http:/ / www. cems. uwe. ac. uk/ xmldb/ rest/ / db/ Wiki/ param_1. xq?a=5;b=test;a=8;c=new+ message [2] http:/ / www. cems. uwe. ac. uk/ xmldb/ rest/ / db/ Wiki/ param_2. xq?a=5;b=test;a=8;c=new+ message [3] http:/ / www. cems. uwe. ac. uk/ xmldb/ rest/ / db/ Wiki/ param_3. xq?a=5;b=test;a=8;c=new+ message

Project Euler

512

Project Euler
Project Euler [1] is a collection of mathematical problems. Currently there are 166 so it may take some time to get through them all :-).

Problem 1 [2]
Add all the natural numbers below 1000 that are multiples of 3 or 5. sum ((1 to 999)[. mod 3 = 0 or . mod 5 = 0]) Run [3]

Problem 2 [4]
Find the sum of all the even-valued terms in the Fibonacci sequence which do not exceed one million. declare function local:fib($fibs,$max) { let $next := $fibs[1] + $fibs[2] return if ($next > $max) then $fibs else local:fib(($next,$fibs),$max) }; sum( local:fib((2,1),1000000)[. mod 2 = 0]) Run [5] This brute-force approach recursively builds the Fibonacci sequence (in reverse) up to the maximum, then filters and sums the result.

Problem 3 [6]
What is the largest prime factor of the number 317584931803? First we need to get a list of primes. The algorithm known as the Sieve of Eratosthenes is directly expressible in XQuery:
declare function local:sieve($primes as xs:integer*,$nums as xs:integer ) if (exists($nums)) then let $prime := $nums[1] 0]) as xs:integer* {

return local:sieve(($primes,$prime), $nums[. mod $prime != else $primes };

<result> { local:sieve((),2 to 1000) } </result>

The list of primes starts off empty, the list of numbers starts off with the integers. Each recursive call of local:sieve takes the first of the remaining integers as a new prime and reduces the list of integers to those not divisible by the prime. When the list of integers is exhausted, the list of primes is returned.

Project Euler Primes less than 1000 [7] Factorization of a number N is also easily expressed as the subset of primes which divide N:

513

declare function local:factor($n as xs:integer ,$primes as xs:integer*) as xs:integer* { $primes[ $n mod . = 0] };

Hence let $n:= xs:integer(request:get-parameter("n",100)) let $max := xs:integer(round(math:sqrt($n))) let $primes := local:sieve((),2 to $max) return <result> { local:factor($n,$primes) } </result> Factors of 13195 [8] And the largest is max (local:factor($n,$primes)) Largest factor of 13195 [9] Sadly this elegant method runs out of space and time for integers as large as that in the problem.

Problem 4 [10]
Find the largest palindrome made from the product of two 3-digit numbers. declare function local:palindromic($n as xs:integer) as xs:boolean { let $s := xs:string($n) let $sc := string-to-codepoints($s) let $sr := reverse ($sc) let $r := codepoints-to-string($sr) return $s = $r }; max( (for $i in (100 to 999) for $j in (100 to 999) return $i * $j) [local:palindromic(.)] ) Run [11] [ takes 20 seconds]

Project Euler

514

Problem 5 [12]
What is the difference between the sum of the squares and the square of the sums for integers from 1 to 100? declare function local:diff-sum($n as xs:integer) as xs:integer) { sum (1 to $n) * sum(1 to $n) - sum( for $i in 1 to $n return $i * $i ) }; local:diff-sum(100) Run [13] This nasty brute-force method can be replaced by an explicit expression using familiar formula: declare function local:diff-sum($n as xs:integer) as xs:integer { let $sum := $n * ($n + 1) div 2 let $sumsq :=( $n * ($n+1) * (2 * $n +1) ) div 6 return $sum * $sum - $sumsq }; local:diff-sum(100) Run [14]

References
[1] http:/ / projecteuler. net/ index. php?section=about [2] http:/ / projecteuler. net/ index. php?section=problems& id=1 [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ euler1. xq [4] http:/ / projecteuler. net/ index. php?section=problems& id=2 [5] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ euler2. xq [6] http:/ / projecteuler. net/ index. php?section=problems& id=3 [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ sieve. xq [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ factor. xq?n=13195 [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ maxfactor. xq?n=13195 [10] http:/ / projecteuler. net/ index. php?section=problems& id=4 [11] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ euler4. xq [12] http:/ / projecteuler. net/ index. php?section=problems& id=5 [13] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ euler5. xq [14] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ euler5a. xq

Searching,Paging and Sorting

515

Searching,Paging and Sorting


These examples use a simple XML file containing data on Earthquakes around the world. The data [1] come from Swivel [2]. The examples use the generic table-viewer introduced in "Creating Custom Views" for output.

Searching
This example searches for a string in the location of the earthquake.
declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; "util.xqm";

import module namespace wikiutil = "http://www.cems.uwe.ac.uk/xmlwiki" at let $search := request:get-parameter("search","") let $matches := //Earthquake[contains(Location,$search)] return <html> <head> <title>Search Earthquakes for {$search}</title> </head> <body> <h1>Search Earthquakes</h1> <form>Search for <input type="text" name="search" value="{$search}"/> </form> { wikiutil:sequence-to-table($matches) } </body> </html>

Execute [3]

Paging
This script implements paging of the search results. Here the full search is repeated for each call, with the state of the interaction held in a hidden input.
declare option exist:serialize import module namespace "method=xhtml media-type=text/html indent=yes"; "util.xqm";

wikiutil = "http://www.cems.uwe.ac.uk/xmlwiki" at

let $search := request:get-parameter("search","") let $start:= xs:integer(request:get-parameter("start", "1")) let $records := xs:integer(request:get-parameter("records", "5")) let $action := request:get-parameter("action","search")

let $allMatches := //Earthquake[contains(Location,$search)]

(: compute the limits for this page :) let $max := count($result) let $start :=

Searching,Paging and Sorting


if ($action = "Previous") then max(($start - $records, 1)) else if ($action="Next") then if ($max <$start +$records) then $start else $start +$records else if ($action="Search") then 1 else $start let $end := min (($start + $records - 1,$max))

516

(: restrict the full set of matches to this subsequence :) let $matches := subsequence($allMatches,$start,$records)

return <html> <head> <title>Search Earthquakes </title> </head> <body> <h1>Search Earthquakes</h1> <form > Search Location for <input type="text" name="search" value="{$search}"/> <input type="submit" name="action" value="Search"/> <br/> <input type="hidden" name="start" value="{$start}"/> <input type="submit" name="action" value="Previous"/> <input type="submit" name="action" value="Next"/> <p>Displaying {$start} to {$end} out of {$max} records found.</p> {wikiutil:sequence-to-table($matches) } <p>Records per Page <input type="text" name="records" value="{$records}"/></p> </form> </body> </html>

Execute [4]

Sorting
To get the columns sorted, we add a submit button to each column. This requires extending the generic table viewer to sort the nodes by the selected column.
declare function wikiutil:sequence-to-table($seq,$sort) { <table border="1"> <tr> {for $node in $seq[1]/* return <th><input type="submit" name="Sort" value="{name($node)}"/></th> } </tr>

Searching,Paging and Sorting


{for $row in $seq let $sortBy := data($row/*[name(.) = $sort]) order by $sortBy return <tr> {for $node in $seq[1]/* let $data := data($row/*[name(.)=name($node)]) return <td>{$data}</td> } </tr> } </table> };
declare option exist:serialize import module namespace "method=xhtml media-type=text/html indent=yes"; "util.xqm";

517

wikiutil = "http://www.cems.uwe.ac.uk/xmlwiki" at

let $search := request:get-parameter("search","") let $sort := request:get-parameter("Sort","Date") let $matches := //Earthquake[contains(Location,$search)] return <html> <head> <title>Search Earthquakes}</title> </head> <body> <h1>Search Earthquakes</h1> <form>Search Location for <input type="text" name="search" value="{$search}"/> {wikiutil:sequence-to-table($matches,$sort)} </form> </body> </html>

Note that the sort is by string value: Sorting by magnitude succeeds only by chance, whereas the sort on Fatalities does not. Execute [5] An improvement would be to allow successive clicks to a column heading to reverse the sort direction. This requires the addition of two more items into the interaction state, the current sort order and current direction, and changes to the table generator. One would like to be able to say something like: for $row .. let $sortBy := .. let $direction := if (..) then "ascending" else "descending" order by $sortBy $direction but this is not a valid FLWOR expression. Instead we have to have two FLWOR expressions, one for each direction.
declare function wikiutil:sequence-to-table($seq,$sort,$direction) { <table border="1">

Searching,Paging and Sorting


<tr> {for $node in $seq[1]/* return <th><input type="submit" name="Sort" value="{name($node)}"/></th> } </tr> { if ($direction = 1) then for $row in $seq let $sortBy := data($row/*[name(.) = $sort]) order by $sortBy ascending return <tr> {for $node in $seq[1]/* let $data := data($row/*[name(.)=name($node)]) return <td>{$data}</td> } </tr> else for $row in $seq let $sortBy := data($row/*[name(.) = $sort]) order by $sortBy descending return <tr> {for $node in $seq[1]/* let $data := data($row/*[name(.)=name($node)]) return <td>{$data}</td> } </tr> } </table> };

518

Then the script becomes:


import module namespace wikiutil = "http://www.cems.uwe.ac.uk/xmlwiki" at "util.xqm";

declare option exist:serialize

"method=xhtml media-type=text/html indent=yes";

let $search := request:get-parameter("search","") let $sort := request:get-parameter("Sort","Date") let $lastSort := request:get-parameter("LastSort","") let $lastDirection := number(request:get-parameter("LastDirection","1")) let $direction := if ($lastSort = $sort) then - $lastDirection else 1 let $matches := //Earthquake[contains(Location,$search)] return <html> <head> <title>Search Earthquakes</title>

Searching,Paging and Sorting


</head> <body> <h1>Search Earthquakes</h1> <form>Search Location for <input type="text" name="search" value="{$search}"/> <input type="hidden" name="LastSort" value="{$sort}"/> <input type="hidden" name="LastDirection" value="{$direction}"/> { wikiutil:sequence-to-table($matches,$sort, $direction) </form> </body> </html> }

519

Execute [6]

Using a table schema


Greater control over the output can be obtained by providing a schema for the table. This schema can specify the order of columns and column headings, and potentially conversion instructions as well. We can provide the table schema as a sequence of Column definitions: <Schema> <Column name="Location" heading="Earthquake location"/> <Column name="Magnitude" heading="Magnitude (Richter Scale)"/> <Column name="Date" /> </Schema> The schema-based function looks like: declare function wikiutil:sequence-to-table-with-schema($seq,$schema) { <table border="1"> <tr> {for $column in $schema/Column return <th>{string( ($column/@heading,$column/@name)[1])}</th> } </tr> {for $row in $seq return <tr> {for $column in $schema/Column let $data := data($row/*[name(.)=$column/@name]) return <td>{$data}</td> } </tr> } </table> }; Note the use of a XQuery idiom to compute the column heading as the supplied heading if there is one, otherwise the node name: ($column/@heading,$column/@name)[1]

Searching,Paging and Sorting This computes the first non-null item in the sequence, a cleaner and more generalisable alternative to: if (exists($column/@heading)) then $column/@heading else $column/@name Execute [7]

520

References
[1] [2] [3] [4] [5] [6] [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ earthquakes. xml http:/ / www. swivel. com/ data_columns/ show/ 4101545 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ search. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ pagedSearch. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ sortedSearch. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ sortedSearch2. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ search-with-schema. xq

Sequence Diagrams
Background
Sequence Diagrams are tedious to draw, even with a diagramming tool. They are even worse to edit when the sequence changes. An alternative is to define an XML vocabulary to define the message sequencing and to use XQuery to render this description as XHTML. This textual approach also allows explanations to be revealed at each step and alternative renderings of the XML definition of the Sequence to be generated, such as a printed version. This demonstrator uses a simplified meta-model, with only messages between actors and actions undertaken by actors. (article under re-design - CW)

Models
3-tier architecture
Here is a sample description of interaction in a 3-tier architecture: (badly needs re-writing)
<SequenceDiagram id="3tier"> <name>3-tier architecture</name> <description>An overview of the 3-tier Architecture</description> <cast> <actor> <name>user</name> <label>The User</label> <color>pink</color> <location>client</location> <description>The user of the site</description> </actor> <actor> <name>browser</name> <label>Presentation Layer</label>

Sequence Diagrams
<color>lightgreen</color> <location>client</location> <description>A browser such as Firefox, Opera or Internet Explorer</description> </actor> <actor> <name>server</name> <label>Application Layer</label> <color>lightblue</color> <location>server</location> <description>Scripts in languages such as PHP or Java invoked via a web server</description> </actor> <actor> <name>database</name> <label>Persistance Layer</label> <color>grey</color> <location>server</location> <description>A database server such as Oracle or MySQL</description> </actor> </cast> <communication> <connection> <actor>user</actor> <actor>browser</actor> <method/> <prep>on</prep> </connection> <connection> <actor>browser</actor> <actor>server</actor> <method>HTTP</method> <prep>to</prep> </connection> <connection> <actor>server</actor> <actor>database</actor> <method>SQL</method> <prep>to</prep> </connection> </communication> <trace> <message> <from>user</from> <to>browser</to> <action>click</action> <object>link</object> </message> <message>

521

Sequence Diagrams
<from>browser</from> <to>server</to> <action>request</action> <object>URL</object> <url>http://www.cems.uwe.ac.uk/~cjwallac/apps/poll2/tally.php?pollid=2</url> </message> <do> <at>server</at> <action>decode input</action> <object/> </do> <do> <at>server</at> <action>create SQL request</action> <object/> </do> <message> <from>server</from> <to>database</to> <action>request</action> <object>SQL statement</object> </message> <message> <from>database</from> <to>server</to> <action>respond</action> <object>tables</object> </message> <do> <at>server</at> <action>create page with data in table</action> <object/> </do> <message> <from>server</from> <to>browser</to> <action>respond</action> <object>HTML page</object> </message> <message> <from>user</from> <to>browser</to> <action>read</action> <object>page</object> </message> </trace> </SequenceDiagram>

522

Sequence Diagrams

523

Rendering the diagram


The script 'displayDiagram' renders this model as an XHTML table: declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; declare variable $homesym :='||'; declare variable $leftsym := '>>'; declare variable $rightsym := '<<'; declare function local:makeText($event){ concat($event/action,' ',string-join($event/object,' + ')) };

let $id:= request:get-parameter('id','') let $sd :=//SequenceDiagram[@id=$id] let $trace := $sd/trace let $actors := $sd/cast/actor let $nactors := count($sd/cast/actor) let $width := 100 div $nactors return <html> <head><title>Sequence Diagram {string($sd/@id)}</title> </head> <body> <h1>{string($sd/name)} </h1> <div class="description"> {$sd/description/node() } </div> <table border='1'> <tr> {for $a in $actors return <th width='{$width}%' bgcolor='{$a/color}'>{string($a/label)}</th> } </tr> { if ($actors/description) then <tr> {for $a in $actors return <th width='{$width}%' bgcolor='{$a/color}'>{string($a/description)} </th> }

Sequence Diagrams </tr> else () } {for $event in $trace/* return <tr> {if (name($event)='do') then let $p := index-of($actors/name,$event/at ) let $text:= local:makeText($event) return ( for $i in (1 to $p - 1) return <td/>, <td align='center' bgcolor='{$actors[name=$event/at]/color}'> { if ($event/url) then <a href='{$event/url}' target='demo'>{$text}</a> else $text } </td>, for $i in ($p + 1to $nactors) return <td/> ) else if (name($event)='message') then let $pfrom := index-of($actors/name,$event/from ) let $pto := index-of($actors/name,$event/to) let $pfirst := min (($pfrom,$pto)) let $plast := max(($pfrom,$pto)) let $ltor := $pfrom = $pfirst let $text:= local:makeText($event) let $connection := $sd//connection[actor = $event/from and actor= $event/to] let $text := if ($ltor) then concat($connection/method,$leftsym,$text,$leftsym) else concat($rightsym,$text,$rightsym, $connection/method) return ( for $i in (1 to $pfirst - 1)

524

Sequence Diagrams return <td/>, <td align='center' colspan='{$plast - $pfirst + 1 }' bgcolor ='{$actors[name=$event/from]/color}' > {$text} { if ($event/url) then <a href='{$event/url}' target='demo'>Link </a> else () } </td>, for $i in ($plast + 1 to $nactors) return <td/> ) else () } </tr> } </table> </body> </html> Display [1]

525

Further example Diagrams


1. A GoogleEarth application Model [2] Display [3]

Animating the Diagram


Rather than display the complete interaction, the diagram can be simply animated by displaying only the first n steps, together with explanatory text for the last step. The addition of some controls allows the user to step forward and backward in the sequence. 3 Tier [4] GoogleEarth [5] A function to compute the next step: declare function local:next-step($step,$action, $max) { if ($action = "start") then 0 else if ($action = "back") then max (($step -1,0)) else if ($action = "forward") then min(($max,$step+1)) else if ($action="end") then $max else $step };

Sequence Diagrams and call the function let $step := local:next-step( number(request:request-parameter("step",0)), request:request-parameter("action", "start"), count($trace/*)) A form to provide the controls and maintain the interaction state: <h2> <form> <input <input <input <input <input <input </form> </h2> Limit the events displayed to the specified number of steps: for $event in $trace/*[position()<=$step] and display the explanation of the last step: <div class="description"> {if ($step=0) then $sd/description/node() else $trace/*[position()=$step]/description/node() } </div> type="hidden" type="hidden" type="submit" type="submit" type="submit" type="submit" name="id" value="{$id}"/> name="step" value="{$step}"/> name="action" value="start"/> name="action" value="back"/> name="action" value="forward"/> name="action" value="end"/>

526

Print the diagram


One of the advantages of having the whole diagram in XML is that the diagram can be displayed differently in print, so that each step can be shown with its description. Fully expanded Descriptions [6]

References
[1] [2] [3] [4] [5] [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ makeDiagram. xq?id=3tier http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ sequences/ track4. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ showSequence. xq?uri=/ db/ Wiki/ SequenceDiagram/ sequences/ track4. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ makeDiagramMove. xq?id=3tier http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ makeDiagramMove. xq?id=ge4 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ showSequence. xq?uri=/ db/ Wiki/ SequenceDiagram/ sequences/ 3tier. xml& action=expand& amp;steps=9999

Simple RSS reader

527

Simple RSS reader


The BBC provides a wide range of RSS news feed, e.g. UK Educational news [1]

News Page
Reformat the RSS feed as HTML:
declare option exist:serialize "method=xhtml media-type=text/html";

let $news := doc("http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/education/rss.xml") let $dateTime := $news/rss/channel/lastBuildDate return <html> <body> <h2>Education news from the BBC at {string($dateTime)}</h2> { for $newsItem in $news/rss/channel/item[position() < 10] return <div> <h4>{string($newsItem/title)}</h4> <p>{string($newsItem/title/description)} <a href="{$newsItem/link}">more..</a></p> </div> } </body> </html>

Execute [2]

Text-to-Speech
The Opera [3] browser with Voice extension supports text-to-speech, allowing this news to be spoken. This uses the XML vocabularies VoiceXML [4] and XML Events [5].
declare option exist:serialize "method=xhtml media-type=application/xv+xml";

let $news := doc("http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/education/rss.xml") let $dateTime := $news/rss/channel/lastBuildDate let $newsItems := return <h:html xmlns:h="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/vxml" xmlns:ev="http://www.w3.org/2001/xml-events" > <h:head> <h:title>BBC Education news</h:title> <vxml:form id="news"> <vxml:block> {for $newsItem in $newsItems return string($newsItem/description) $news/rss/channel/item[position() < 10]

Simple RSS reader


} </vxml:block> </vxml:form> </h:head> <h:body> <h:h1>BBC Education news <h:p> <h:a ev:event="click" ev:handler="#news" > <h:img src="http://www.naturalreaders.com/images/laba.gif"/> </h:a> (Requires the Opera Browser with Voice extension) </h:p> { for $newsItem in return <h:div> <h:h4>{string($newsItem/title)}</h:h4> <h:p>{string($newsItem/description)} <h:a href="{$newsItem/link}">more..</h:a></h:p> </h:div> } </h:body> </h:html> $newsItems at {string($dateTime)}</h:h1>

528

Execute [6] Note that the html namespace has been given a prefix, so that the default prefix can refer to the RSS feed.

Generic RSS reader


More generally, a reader which could voice any RSS feed would be a useful service. UWE news Execute [7]

References
[1] [2] [3] [4] [5] [6] [7] http:/ / newsrss. bbc. co. uk/ rss/ newsonline_uk_edition/ education/ rss. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RSS/ bbcednews. xq http:/ / www. opera. com/ http:/ / en. wikipedia. org/ wiki/ VoiceXML http:/ / en. wikipedia. org/ wiki/ XML_Events http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RSS/ bbcednewsvoiced. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RSS/ rssvoiced. xq?rss=http:/ / info. uwe. ac. uk/ news/ uwenews/ downloadxml. asp

Simple XForms Examples

529

Simple XForms Examples


Motivation
Although static XForms can be held in the eXist database, XForms can also be generated dynamically using XQuery. In this section, we first show static forms being output and executed by some XForms client-side engines, the FireFox add-in, the Javascript FormFaces and the XSLT/Javascript XSLTForms. Then we look at dynamic form generation. A wide range of XForms examples can be found in the XForms Wikibook [1]

XForms Engines
These examples use: Firefox addin [2] Requires Firefox with the XForms add-in media-type set to application/xhtml+xml FormFaces [3] Cross-browser support - examples tested on Firefox and IE6 The Javascript source is stored in the eXist database and linked to each form media-type set to text/html XSLTForms [4] Uses XSLT to transform to an HTML page and JavaScript to execute. The XSLT transformation may be either server-side (via eXist) or client-side. All examples use the same css stylesheet [5]

XForm Output
Firefox
declare option exist:serialize "method=xhtml media-type=application/xhtml+xml indent=yes";

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:form="http://www.w3.org/2002/xforms" xml:lang="en"> <head> <title>Output a Model value</title> <form:model> <form:instance> <data xmlns=""> <name>Mozilla XForms add-in</name> </data> </form:instance> </form:model> </head> <body> <h2> <form:output ref="name"></form:output>

Simple XForms Examples


</h2> </body> </html>

530

Execute [6]

FormFaces
declare option exist:serialize "method=xhtml media-type=text/html indent=yes";

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:form="http://www.w3.org/2002/xforms" xml:lang="en"> <head> <title>Output a Model value</title> <script language="javascript" src="../formfaces/formfaces.js"/> <link rel="stylesheet" type="text/css" href="xforms.css" /> <form:model> <form:instance> <data xmlns=""> <name>Formfaces</name> </data> </form:instance> </form:model> </head> <body> <h2> <form:output ref="name"></form:output> </h2> </body> </html>

Execute [7]

XSLTForms
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="ajaxforms.xsl" type="text/xsl"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:form="http://www.w3.org/2002/xforms" xml:lang="en"> <head> <title>Output a Model value</title> <script language="javascript" src="formfaces.js"/> <link rel="stylesheet" type="text/css" href="xforms.css" /> <form:model> <form:instance> <data xmlns=""> <name>Formfaces</name> </data> </form:instance> </form:model> </head>

Simple XForms Examples


<body> <h2> <form:output ref="name">&#160;</form:output> </h2> </body> </html>

531

Execute [8]

Simple Controls
Firefox [9] Formfaces [10] XSLTForms [11]

Observations
FormFaces : breaks on Firefox XSLTForms : Changes only made when triggered

Multiple instances
A model can contain multiple instances, whose root node is accessed with the instance(id) construct.
<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xf="http://www.w3.org/2002/xforms" xmlns:ev="http://www.w3.org/2001/xml-events"> <head> <title>Test conditional selection lists</title> <xf:model> <xf:instance id="data" xmlns=""> <data> <selected-season>spring</selected-season> <selected-month>March</selected-month> </data> </xf:instance> <xf:instance id="seasons" xmlns=""> <seasons> <item name="winter"/> <item name="spring"/> <item name="summer"/> <item name="autumn"/> </seasons> </xf:instance> <xf:instance id="months" xmlns=""> <months> <item name="January" season="winter"/> <item name="February" season="winter"/> <item name="March" season="spring"/> <item name="April" season="spring"/>

Simple XForms Examples


<item name="May" season="spring"/> <item name="June" season="summer"/> <item name="July" season="summer"/> <item name="August" season="summer"/> <item name="September" season="autumn"/> <item name="October" season="autumn"/> <item name="November" season="autumn"/> <item name="December" season="winter"/> </months> </xf:instance> </xf:model> </head> <body> <p>Test conditional selection lists month selector depends on the current season</p> <div> <xf:select1 ref="instance('data')/selected-season"> <xf:label>Season:</xf:label> <xf:itemset nodeset="instance('seasons')/item"> <xf:label ref="@name"/> <xf:value ref="@name"/> </xf:itemset> </xf:select1> </div> <div> <xf:select1 ref="instance('data')/selected-month"> <xf:label>Month:</xf:label>

532

<xf:itemset nodeset="instance('months')/item[@season=instance('data')/selected-season]"> <xf:label ref="@name"/> <xf:value ref="@name"/> </xf:itemset> </xf:select1> </div> <div> <xf:output ref="instance('data')/selected-season"> <xf:label>selected-season: </xf:label> </xf:output> <xf:output ref="instance('data')/selected-month"> <xf:label>selected-month: </xf:label> </xf:output> </div> </body> </html>

FireFox [12]

Simple XForms Examples

533

Date Entry
Firefox generates a drop-down calendar. Firefox [13] Formfaces currently has no Calendar widget Formfaces [14]

Server Interaction
Interaction with a server can be via GET or POST. This example is based on Dan McCreary's example in the XForms Wiki book [15]. On Firefox: <html xmlns:xf="http://www.w3.org/2002/xforms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns="http://www.w3.org/1999/xhtml"> <head> <title>XQuery Tester</title> <link rel="stylesheet" type="text/css" href="xforms.css" /> <xf:model> <xf:instance> <data xmlns=""> <input> <arg1>123</arg1> <arg2>456</arg2> </input> <result> <sum>0</sum> </result> </data> </xf:instance> <xf:submission id="get-instance" replace="instance" action="adderGet.xq" separator="&"> </xf:submission> method="get"

<xf:submission id="post-instance" method="post" replace="instance" action="adderPost.xq"> </xf:submission> </xf:model> </head> <body>

Simple XForms Examples <h1>XForm interaction with XQuery</h1> <xf:input ref="input/arg1" incremental="true"> <xf:label>Arg1:</xf:label> </xf:input> <br/> <xf:input ref="input/arg2" incremental="true"> <xf:label>Arg2:</xf:label> </xf:input> <br/> <xf:output ref="result/sum"> <xf:label> Sum:</xf:label> </xf:output> <br/> <xf:submit submission="get-instance"> <xf:label>Get</xf:label> </xf:submit> <xf:submit submission="post-instance"> <xf:label>Post</xf:label> </xf:submit> <p id="status"></p> </body> </html> Firefox [16] Formfaces [17] The respective server scripts are GET xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; let $arg1 := number(request:get-parameter("arg1", "0")) let $arg2 := number(request:get-parameter("arg2", "0")) return <data xmlns=""> <input> <arg1>{$arg1}</arg1> <arg2>{$arg2}</arg2> </input> <result> <sum>{$arg1+$arg2}</sum> </result> </data> POST xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request";

534

Simple XForms Examples

535

let $data := request:get-data() let $arg1 := number($data/arg1) let $arg2 := number($data/arg2) return <data xmlns=""> <input> <arg1>{$arg1}</arg1> <arg2>{$arg2}</arg2> </input> <result> <sum>{$arg1+$arg2}</sum> </result> </data> In this example, the whole model is updated and returned to the client. Alternatively, part of the model can be updated (tbc)

Generic XForms
Tabular example
A simple approach to generic XForms is illustrated in this script based on an example in the XForms wikibook declare option exist:serialize "method=xhtml media-type=application/xhtml+xml indent=yes"; let $data := <Data> <GivenName>John</GivenName> <MiddleName>George</MiddleName> <Surname>Doe</Surname> <CityName>Anytown</CityName> <StateCode>MM</StateCode> <PostalID>55123-1234</PostalID> </Data> return <html xmlns:xf="http://www.w3.org/2002/xforms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Formatting XForms</title> <link rel="stylesheet" type="text/css" href="xforms.css" />

Simple XForms Examples <style type="text/css"> { for $item in $data/* let $width := string-length($item) return concat('._',name($item),' .xf-value {width:', $width,'em} ') } </style> <xf:model> <xf:instance xmlns=""> {$data} </xf:instance> </xf:model> </head> <body> <fieldset> <legend>Name and Address</legend> {for $item in $data/* return ( <xf:input class="_{name($item)}" ref="/Data/{name($item)}"> <xf:label>{name($item)}: </xf:label> </xf:input>, <br/> ) } </fieldset> </body> </html> In this simple tabular example, the XForm and accompanying CSS to define input field widths is generated by reflection on the supplied instance. This example works correctly in Firefox [18] but the styling fails in Formfaces [19]

536

Simple Forms Schema


More control over the generated XForms can be provided by a simple Schema. This example is still for a simple tabular structure: declare option exist:serialize "method=xhtml media-type=application/xhtml+xml indent=yes"; let $data := <Data> <GivenName>John</GivenName> <MiddleName>George</MiddleName> <Surname>Doe</Surname> <CityName>Anytown</CityName>

Simple XForms Examples <StateCode>MM</StateCode> <PostalID>55123-1234</PostalID> </Data> let $schema := <Schema> <Row name="GivenName" label="First Name" width="20"/> <Row name="Surname" label="Surname" width="15"/> <Row name="CityName" label="City" width="15"/> <Row name="StateCode" label="State" width="3"/> <Row name="PostalID" label="ZipCode" width="8"/> </Schema> return <html xmlns:xf="http://www.w3.org/2002/xforms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Formatting XForms</title> <link rel="stylesheet" type="text/css" href="xforms.css" /> <style type="text/css"> { for $item in $schema/* let $id := concat("_",$item/@name) let $width := $item/@width return concat('.',$id,' .xf-value {width:', $width,'em} ') } </style> <xf:model> <xf:instance xmlns=""> {$data} </xf:instance> </xf:model> </head> <body> <fieldset> <legend>Name and Address</legend> {for $item in $schema/* let $id := concat("_",$item/@name) let $label := string( $item/@label) return (

537

Simple XForms Examples <xf:input class="{$id}" ref="/Data/{$item/@name}"> <xf:label>{$label}: </xf:label> </xf:input>, <br/> ) } </fieldset> </body> </html> Firefox [20]

538

References
[1] [2] [3] [4] [5] http:/ / en. wikibooks. org/ wiki/ XForms http:/ / www. mozilla. org/ projects/ xforms/ http:/ / www. formfaces. com/ http:/ / www. agencexml. com/ xsltforms/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ xforms. css

[6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ outputMz. xq [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ outputFF. xq [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ outputXSLT. html [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ controlsMz. xq [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ controlsFF. xq [11] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ controlsXSLT. html [12] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ selectionwiki. xml [13] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ dateMz. xq [14] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ dateFF. xq [15] http:/ / en. wikibooks. org/ wiki/ XForms/ Adder [16] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ adderFormMz2. xq [17] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ adderFormFF2. xq [18] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ addressFormMz. xq [19] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ addressFormFF. xq [20] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ addressFormMzSchema. xq

SPARQL interface

539

SPARQL interface
The following script provides, via a Joseki server at UWE, a query interface to RDF. Literal language and datatype are ignored in this representation. URIs link to the browse query and also directly to the resource. A function converts the SPARQL XML Query result to a table, with links.
declare function fr:sparql-to-table($rdfxml,$script-name ) { (: literal language and datatype ignored in this representation. URI

links to the browse query and directly to the resource are generated :) let $vars := $rdfxml//sr:head/sr:variable/@name return <table border="1"> <tr> {for $var in return <th>{string($var)}</th> } </tr> { for return <tr> { for $var in $vars let $binding return <td> { typeswitch ($binding) case element(sr:uri) return := $row/sr:binding[@name=$var]/* $row in $rdfxml//sr:results/sr:result $vars

(<a href="{$script-name}?uri={string($binding)}">{ string($binding) }</a>, <a href="{string($binding)}"> ^ </a> ) case element(sr:literal) return string($binding) case element (sr:bnode) return concat("_:",$binding) default () } </td> } </tr> } </table> }; return

The SPARQL interface uses the configuration file to declare the namespaces.

SPARQL interface import module namespace fr="http://www.cems.uwe.ac.uk/wiki/fr" at "fr.xqm"; declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; declare namespace rdfs = "http://www.w3.org/2000/01/rdf-schema#"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=no indent=yes doctype-public=-//W3C//DTD&#160;XHTML&#160;1.0&#160;Transitional//EN doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";

540

declare variable $config-file := request:get-parameter("config", "/db/Wiki/RDF/empdeptconfig.xml"); declare variable $config := doc($config-file); declare variable $graph := concat("http://www.cems.uwe.ac.uk/xmlwiki/RDF/xml2rdf.xq?config=",$config-file); declare variable $default-engine := "http://www.cems.uwe.ac.uk/joseki/sparql"; declare variable $script-name := tokenize(request:get-uri(),'/')[last()]; declare variable $default-prolog := "PREFIX fn: <http://www.w3.org/2005/xpath-functions#> PREFIX afn: <http://jena.hpl.hp.com/ARQ/functions#> "; declare variable $browse := "select ?s ?p ?o where { {<uri> ?p ?o } UNION {?s ?p <uri>} UNION {?s <uri> ?o} }"; let let let let let $config-prolog := fr:sparql-prefixes($config) $query := request:get-parameter ("query",()) $uri := request:get-parameter("uri",()) $engine := request:get-parameter("engine",$default-engine) $query := if ($uri) then replace($browse,"uri",$uri) else $query

let $queryx := concat($default-prolog,$config-prolog,$query) let $sparql := concat($engine, "?default-graph-uri=",$graph,

SPARQL interface "&amp;query=",encode-for-uri($queryx) ) let $result := if ($query !="") then fr:sparql-to-table(doc($sparql), $script-name) else () return <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Emp-dept Query</title> </head> <body> <h1>Emp-dept Query</h1> <form action="{$script-name}"> <textarea name="query" rows ="8" cols="90"> {$query} </textarea> <br/> <input type="submit"/> </form> <h2>Result</h2> {$result} </body> </html>

541

Application
Query [1] The interface expands a query like select ?name ?job where { ?emp rdf:type f:emp. ?emp foaf:surname ?name. ?emp f:Job ?job. } into:
prefix foaf: <http://xmlns.com/foaf/0.1/> prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix f: <http://www.cems.uwe.ac.uk/xmlwiki/empdept/concept/> prefix xs: <http://www.w3.org/2001/XMLSchema#> select ?name ?job from <http://www.cems.uwe.ac.uk/xmlwiki/RDF/xml2rdf.xq?config=/db/Wiki/RDF/empdeptconfig.xml"> where { ?emp rdf:type f:emp.

?emp foaf:surname ?name.

SPARQL interface
?emp f:Job ?job. }

542

and sends this to the Joseki service. The graph to query is actually passed as the default graph rather than in the from clause.

To do
handle language and datatype local URIs as local names rather than full URIs# better handling of default graph - should be able to reference the cached rdf defined in the config file

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ sparqlquery. xq

SPARQL Tutorial
SPARQL interface
The emp-dept RDF can be queried using SPARQL via an XQuery front end This script supports SPARQL queries and browsing the RDF graph. The interface expands a query like select ?name ?job where { ?emp rdf:type f:emp. ?emp foaf:surname ?name. ?emp f:Job ?job. } that you can run here [4] into prefix prefix prefix prefix prefix select where ?emp ?emp ?emp } foaf: <http://xmlns.com/foaf/0.1/> rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> rdfs: <http://www.w3.org/2000/01/rdf-schema#> f: <http://www.cems.uwe.ac.uk/empdept/concept/> xs: <http://www.w3.org/2001/XMLSchema#> ?name ?job { rdf:type f:emp. foaf:surname ?name. f:Job ?job.
[1]

to a store

[2]

provided by Talis

[3]

and sends this to the Talis service in a form that can be run that you can run here Results XML [6] is converted to HTML.

[5]

. The resultant SPARQLQuery

SPARQL Tutorial

543

Example Queries
List all employees
select ?emp where { ?emp rdf:type f:emp. } Run [7]

List the names of all employees in alphabetical order


select ?name where { ?emp rdf:type f:emp. ?emp foaf:surname ?name. } ORDER BY ?name Run [8]

List the employees' name, salary, department number and job


select ?name ?sal ?dno ?job where { ?emp rdf:type f:emp; foaf:surname ?name; f:Sal ?sal; f:Dept ?dept; f:Job ?job. ?dept f:DeptNo ?dno. } Note that ; in place of . repeats the subject. Try out this and the following queries here [4].

List the first 5 employees


select ?ename where { ?emp rdf:type f:emp; foaf:surname ?ename. } ORDER BY ?ename LIMIT 5

List the top 5 employees by salary


select ?ename ?sal where { ?emp rdf:type f:emp; foaf:surname ?ename; f:Sal ?sal. } ORDER BY DESC(?sal) LIMIT 5

SPARQL Tutorial

544

List the departments


select ?dept where { ?dept rdf:type f:dept. }

List all departments and all employees


select ?dept ?emp where { {?dept rdf:type f:dept } UNION {?emp rdf:type f:emp} }

List the employees with salaries over 1000


If the RDF literal is typed, for example as xs:integer as is the case with this generated RDF, then the following query will select employees with a salary greater than 1000: select ?emp ?sal where { ?emp rdf:type f:emp; f:Sal ?sal. FILTER (?sal > 1000) } If the RDF literal is not typed, then the variable must be cast: select ?emp ?sal where { ?emp rdf:type f:emp; f:Sal ?sal. FILTER (xs:integer(?sal) > 1000) }

List employees and their locations


select ?emp ?loc where { ?emp rdf:type f:emp. ?emp f:Dept ?dept. ?dept f:Location ?loc. }

List the names of employees and their managers


select ?ename ?mname where { ?emp rdf:type f:emp; f:Mgr ?mgr; foaf:surname ?ename. ?mgr foaf:surname ?mname. }

SPARQL Tutorial

545

Include employees with no manager


select ?ename ?mname where { ?emp rdf:type f:emp; foaf:surname ?ename. OPTIONAL {?emp f:Mgr ?mgr. ?mgr foaf:surname ?mname. } }

List employees with no manager


select ?ename where { ?emp rdf:type f:emp; foaf:surname ?ename. OPTIONAL {?emp f:Mgr ?mgr} FILTER (!bound(?mgr)) }

List the distinct locations of staff


select distinct ?loc where { ?emp rdf:type f:emp. ?emp f:Dept ?dept. ?dept f:Location ?loc. }

List details of the employees who are ANALYSTs


select * where { ?emp rdf:type f:emp. ?emp f:Dept ?dept. ?dept f:Location ?loc. ?emp f:Job ?job. FILTER (?job = "ANALYST") }

List employees who are either ANALYSTs or MANAGERs


select ?emp where { ?emp rdf:type f:emp; f:Job ?job. FILTER (?job = "ANALYST" }

|| ?job = "MANAGER")

SPARQL Tutorial

546

List employees who are neither ANALYSTs nor MANAGERs


select * where { ?emp rdf:type f:emp; f:Job ?job. FILTER (?job != "ANALYST" }

&& ?job != "MANAGER")

List employees whose surname begins with "S"


select * where { ?emp rdf:type f:emp. ?emp foaf:surname ?ename. FILTER (regex(?ename, "^S")) }

List employees whose surname contains "AR"


select * where { ?emp rdf:type f:emp. ?emp foaf:surname ?ename. FILTER (regex(?ename, "AR")) }

List employees whose surname contains M followed by R ignoring case


select * where { ?emp rdf:type f:emp. ?emp foaf:surname ?ename. FILTER (regex(?ename, "m.*r","i")) }

Compute the maximum salary


SPARQL 1.0 lacks min() or max(), although they are added to some implementations. The following recipe, due to Dean Allemang [9] can be used: select ?maxemp ?maxsal where { ?maxemp rdf:type f:emp. ?maxemp f:Sal ?maxsal. OPTIONAL { ?emp rdf:type f:emp. ?emp f:Sal ?sal. FILTER ( ?sal > ?maxsal) }. FILTER (!bound (?sal)) } How does this work? We seek a maximum salary of a maximum employee. For such an employee, the OPTIONAL clauses will not match, since there are no employees with a greater salary and thus ?sal will not be bound. In SPARQL 1.1 max() and min() are allowed so the query to return the maximum salary becomes

SPARQL Tutorial select (max(?sal) as ?maxsal) where { ?maxemp rdf:type f:emp. ?maxemp f:Sal ?sal. }

547

Compute employees with the same salary


select * where { ?emp1 f:Sal ?sal. ?emp2 f:Sal ?sal. FILTER (?emp1 != ?emp2) }

Get the department which SMITH works for


select ?dname where { ?emp rdf:type f:emp. ?emp f:Dept ?dept. ?emp foaf:surname "SMITH". ?dept f:Dname ?dname. }

List the names of employees in Accounting


select ?ename where { ?emp rdf:type f:emp. ?emp f:Dept ?dept. ?emp foaf:surname ?ename. ?dept f:Dname "Accounting". }

Employees hired in this millennium


select ?ename ?hire where { ?emp rdf:type f:emp. ?emp f:HireDate ?hire. ?emp foaf:surname ?ename. FILTER (?hire > "2000-01-01"^^xs:date) } Note that the literal needs to be typed to make this comparison work.

List the names of employees whose manager is in a different department


select ?name ?edname ?mdname { ?emp rdf:type f:emp; foaf:surname ?name; f:Dept ?dept; f:Mgr ?mgr. ?mgr f:Dept ?mdept.

SPARQL Tutorial ?dept f:Dname ?edname. ?mdept f:Dname ?mdname. FILTER (?dept != ?mdept) }

548

List the grades of employees


In relational terms, this is a theta-join between the employee and the salgrade tables: select ?ename ?grade where { ?emp rdf:type f:emp; foaf:surname ?ename; f:Sal ?sal. ?salgrade rdf:type f:salgrade; f:LoSal ?low; f:HiSal ?high; f:Grade ?grade. FILTER (?sal >= ?low && ?sal <= ?high) }

Abbreviated query syntax


A new prefix simplifies referencing individual resources by their URI prefix e: <http://www.cems.uwe.ac.uk/empdept/emp/> select ?sal where { e:7900 f:Sal ?sal. } is short for select ?sal where { <http://www.cems.uwe.ac.uk/empdept/emp/7900> f:Sal ?sal. } We could also introduce a default namespace: prefix : <http://www.cems.uwe.ac.uk/empdept/concept/> select ?name ?sal ?dno ?job where { ?emp rdf:type :emp; foaf:surname ?name; :Sal ?sal; :Dept ?dept; :Job ?job. ?dept :DeptNo ?dno. } and use the abbreviation a for rdf:type: prefix : <http://www.cems.uwe.ac.uk/empdept/concept/> select ?name ?sal ?dno ?job where { ?emp a :emp;

SPARQL Tutorial foaf:surname ?name; :Sal ?sal; :Dept ?dept; :Job ?job. ?dept :DeptNo ?dno. } and if we don't need to return the resource itself, it can be anonymous prefix : <http://www.cems.uwe.ac.uk/empdept/concept/> select ?name ?sal ?dno ?job where { [ a :emp; foaf:surname ?name; :Sal ?sal; :Dept ?dept; :Job ?job ]. ?dept :DeptNo ?dno. }

549

Aggregate features
Aggregation functions like count() and sum() and the GROUP BY clause are not defined in SPARQL 1.0 although they are available on some services (such as the Talis [3] platform) in advance of standardisation in SPARQL 1.1.

Count the number of departments


select (count(?dept) as ?count) where { ?dept rdf:type f:dept. }

Count the number of employees in each department


select distinct ?dept (count(?emp) as ?count) where { ?dept a f:dept. ?emp f:Dept ?dept. } group by ?dept

Generic queries
The uniformity of the triple data model enable us to query the dataset in very general ways, which are useful if we know nothing about the data.

SPARQL Tutorial

550

List all data


select * where { ?s ?p ?o } This would be impracticable on a realistic dataset, but a sample of the triples can be obtained by limiting the number of triples returned. select * where { ?s ?p ?o } LIMIT 20

List all employee data


select ?prop ?val where { ?emp rdf:type f:emp. ?emp ?prop ?val. }

What types are there?


select distinct ?type where { ?s a ?type } This shows that triples defining the emp vocabulary are in the same dataset.

What properties are there?


select distinct ?prop where { ?s ?prop ?o }

What is the domain(s) of a property?


select distinct ?type where { ?s f:Sal ?v. ?s a ?type. }

What are the ranges of a property?


select distinct ?type where { ?s f:Sal ?o. ?o a ?type. } This query only finds ranges which are instances of a type in the dataset. Sal has a range of xs:integer but it is not easy to discover that with a SPARQL query. select distinct ?type where { ?s f:Mgr ?o. ?o a ?type.

SPARQL Tutorial }

551

What properties have a given type as its domain ?


select distinct ?prop where { ?s a f:salgrade. ?s ?prop []. }

Schema queries
The presence of schema data enables SPARQL to be used to query this meta-data. The results could be comapred with the results by directly querying the data.

What properties have a domain of a given type?


select ?prop where { ?prop rdfs:domain f:emp. } Note that this has only returned the properties in the empdept vocab, not the foaf name property used in the raw data.

What integer properties do employees have?


select ?prop where { ?prop rdfs:domain f:emp. ?prop rdfs:range xs:integer. }

What types of resources have salaries?


select ?type where { f:Sal rdfs:domain ?type. } Queries on both the data and the vocab can be made

What literal properties do MANAGERS have?


select DISTINCT ?prop where { ?x f:Job "MANAGER". ?x a ?type. ?prop rdfs:domain ?type. ?prop rdfs:range rdfs:literal. }

SPARQL Tutorial

552

To do
the example RDF lacks language tags which are required to illustrate lang() function all queries to be moved to the codelist together with the SQL and XQuery equivalents

References
[1] [2] [3] [4] [5] [6] [7] [8] [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdeptquery. xq http:/ / api. talis. com/ stores/ cwallace-dev1 http:/ / www. talis. com/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdeptquery. xq?query= http:/ / api. talis. com/ stores/ cwallace-dev1/ services/ sparql http:/ / www. w3. org/ TR/ rdf-sparql-XMLres/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdeptquery. xq?id=1 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdeptquery. xq?id=4a http:/ / dallemang. typepad. com/

String Analysis
XQuery analyze-string
XSLT 2.0 includes the analyze-string construct which captures matching groups (in parentheses) in a regular expresssion. Strangely this is not available in XQuery. It is possible to use the XSLT construct by wrapping an XQuery function round a generated XSLT stylesheet, even though this seems rather painful. In this installation of eXist, the XSLT engine is Saxon 8.
declare function str:analyze-string($string as xs:string, $regex as xs:string,$n as xs:integer ) { transform:transform (<any/>, <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match='/' > <xsl:analyze-string regex="{$regex}" select="'{$string}'" > <xsl:matching-substring> <xsl:for-each select="1 to {$n}"> <match> <xsl:value-of select="regex-group(.)"/> </match> </xsl:for-each> </xsl:matching-substring> </xsl:analyze-string> </xsl:template> </xsl:stylesheet>, () ) };

String Analysis

553

UK Vehicle Registration numbers


To illustrate the use of this function, here is a decoder for UK vehicle license plates. These have undergone a number of changes of format, so the script must first decide which format is used, then analyze the number to find the significant codes for the area and date of registration. The patterns are defined in XML and define the regular expression to be used, and the meaning of the matched groups. Problem: Passing repetition modifiers through is failing
import module namespace str = "http://www.cems.uwe.ac.uk/string" at "../lib/string.xqm"; declare variable $patterns := <patterns> <pattern version="01" regexp="([A-Z][A-Z])(\d\d)[A-Z][A-Z][A-Z]"> <field>Area</field><field>Date</field> </pattern> <pattern version="83" regexp="([A-Z])\d+[A-Z]([A-Z][A-Z])"> <field>Date</field><field>Area</field> </pattern> <pattern version="63" regexp="([A-Z][A-Z])[A-Z]?\d+([A-Z])"> <field>Area</field><field>Date</field> </pattern> </patterns>; declare function local:decode-regno($regno) let $regno := upper-case($regno) let $regno := replace($regno, " ","") return for $pattern in $patterns/pattern let $regexp := concat("^",$pattern/@regexp,"$") return if (matches($regno,$regexp)) then let $analysis := str:analyze-string($regno,$regexp,count($pattern/field)) return <regno version="{$pattern/@version}"> {for $field at $i in $pattern/field let $value := string($analysis[position() = $i]) let $table := concat($field,$pattern/@version) let $value := /CodeList[@id=$table]/Entry[Code=$value] return element {$field} {$value/*} } </regno> else () }; {

String Analysis
let $regno := request:get-parameter("regno",()) return local:decode-regno($regno)

554

Decode tables
Separate tables decode codes to date ranges or areas. These tables are plain XML created from CSV files via Excel. The pre-83 area codes are currently incorrect. e.g. <CodeList id="Area83"> <Entry> <Code>AA</Code> <Location>Bournemouth</Location> </Entry> <Entry> <Code>AB</Code> <Location>Worcester</Location> </Entry> <Entry> <Code>AC</Code> <Location>Coventry</Location> </Entry> ...

Examples
1. A current number plate: WP05LNU [1] 2. One from the previous series: L162BAY [2]

Location Mapping
One use of this conversion is to display the locations on a map. Here we take a file of observed registration numbers, decode them all, group by location and generate a KML file with the locations geocoded through the Google API. <NumberList> <Regno>H251GBU</Regno> <Regno>WRA870Y</Regno> <Regno>ENB427T</Regno> <Regno>C406OUY</Regno> <Regno>N62VNF</Regno> <Regno>R895KCV</Regno> <Regno>C758HOV</Regno> <Regno>H541HEM</Regno> ...
(: this script plots the registration locations of a set of UK vehicle license plates using kml. :)

import module namespace geo="http://www.cems.uwe.ac.uk/exist/geo" at "../lib/geo.xqm";

import module namespace str = "http://www.cems.uwe.ac.uk/string" at "../lib/string.xqm"; declare namespace reg = "http://www.cems.uwe.ac.uk/wiki/reg";

String Analysis

555

declare option exist:serialize "method=xml media-type=application/vnd.google-earth.kml+xml declare variable $reg:icon := "http://maps.google.com/mapfiles/kml/paddle/ltblu-blank.png"; declare variable $reg:patterns := <patterns> <pattern version="01" regexp="([A-Z][A-Z])(\d\d)[A-Z][A-Z][A-Z]"> <field>Area</field><field>Date</field> </pattern> <pattern version="83" regexp="([A-Z])\d+[A-Z]([A-Z][A-Z])"> <field>Date</field><field>Area</field> </pattern> <pattern version="63" regexp="([A-Z][A-Z])[A-Z]?\d+([A-Z])"> <field>Area</field><field>Date</field> </pattern> </patterns>;

indent=yes

omit-xml-declaration=yes";

declare function reg:decode-regno($regno) let $regno := upper-case($regno) let $regno := replace($regno, " ","")

return for $pattern in $reg:patterns/pattern let $regexp := concat("^",$pattern/@regexp,"$") return if (matches($regno,$regexp)) then let $analysis := str:analyze-string($regno,$regexp,count($pattern/field)) return <regno version="{$pattern/@version}"> {for $field at $i in $pattern/field let $value := string($analysis[position() = $i]) let $table := concat($field,$pattern/@version) let $value := /CodeList[@id=$table]/Entry[Code=$value] return element {$field} {$value/*} } </regno> else () };

declare function reg:regno-locations($regnos) { for $regno in $regnos

let $analysis := reg:decode-regno($regno) return if (exists($analysis//Location)) then string($analysis//Location)

String Analysis
else () };

556

let $url := request:get-parameter("url",()) let $x := response:set-header('Content-Disposition','inline;filename=regnos.kml;')

return <Document> <name>Reg nos</name> {for $i in (1 to 10) return <Style id="size{$i}"> <IconStyle> <scale>{$i}</scale> <Icon><href>{$reg:icon}</href> </Icon> </IconStyle> </Style> } { let $locations := reg:regno-locations(doc($url)//Regno)

let $max := count($locations) for $place in distinct-values($locations) let $latlong := geo:geocode(concat($place,',UK')) let $count := count($locations[. = $place]) let $scale := max((round($count div $max order by $count descending return <Placemark> <name>{$place} ({$count})</name> <styleUrl>#size{$scale}</styleUrl> <Point><coordinates>{geo:position-as-kml($latlong)}</coordinates></Point> </Placemark> } </Document> * 10),1))

Generate Map [3]

String Analysis

557

SMS service
The Department of Information Science and Digital Media supports an SMS service [4] with facilities to send and receive text messages. The service is paid for by the University of the West of England, Bristol and all traffic is logged. A decoder for UK vehicle license numbers is one of the demonstration services which are supported for mobile-originated (MO) text messages. The format of the text message is REG <regno> e.g. REG L162 BAY A text message in this format sent to our SMS mobile number 447624803759 passes through a PHP script which allows multiple SMS services to be supported. The script uses the first word of the message to identify the associated service endpoint, and then invokes that endpoint via HTTP, passing the prefix as code, the rest of the message as text and the origination mobile number as from. For the prefix REG, the associated endpoint is an XQuery script: http:/ / www. cems. uwe. ac. uk/ xmlwiki/ regno/ smsregno. xq The smsregno.xq script is essentially the parseregno script above. declare option exist:serialize "method=text media-type=text/text"; ... let $regno := request:get-parameter("text",()) let $data := local:decode-regno($regno) return concat("Reply: ", $regno , " was registered in ", $data/Area/Location, " between ", $data/Date/From , " and ", $data/Date/To )

The SMS switch then sends the Reply on to the originating mobile phone.

String Analysis

558

To do
solve problem with repetition modifiers (or function support for analayze-string) Pre-83 area code data Switch implementation in XQuery to replace the PHP application - awaits switch to eXist v2

References
[1] [2] [3] [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ regno/ parseregno. xq?regno=WP05LNU http:/ / www. cems. uwe. ac. uk/ xmlwiki/ regno/ parseregno. xq?regno=L162BAY http:/ / www. cems. uwe. ac. uk/ xmlwiki/ regno/ regnoMap. xq?url=/ db/ Wiki/ regno/ sample. xml http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ sms/

Tag Cloud
Counting Words
A tag cloud (or weighted list in visual design) is a visual depiction of user-generated tags, or simply the word content of a site, typically used to describe the content of web sites. One method of creating a tag cloud is to create a list of the words in a document, count the number of occurrences of each word, and depict the more frequently occurring words with a larger font size than the words that occur less frequently.

Counting the total number of words in a text object


To get a feeling for one of the basic techniques, let's examine Jon Robie's code, which takes all of the text nodes in a document, strings them together, splits them into a sequence of "words" (tokenizing by whitespace, punctuation, or the 'nbsp' entity), and counts the number of resulting words: let $txt := string-join( $doc//text() , " ") return count(tokenize($txt,'(\s|[,.!:;]|[n][b][s][p][;])+')) Note that the string-join() function here takes an input sequence and returns a single string that is separated by single spaces (the second argument of string-join). If you want to see what this routine treats as a "word" in your document, use the following variation. let $txt := string-join( $doc//text() , " ") let $words := tokenize($txt,'(\s|[,.!:;]|[n][b][s][p][;])+') return <words count="{count($words)}"> { for $word in $words return <word>{$word}</word> } </words> Another variation is the word-count() function found at xqueryfunctions.com: declare function local:word-count( $arg as xs:string? ) count(tokenize($arg, '\W+')[. != '']) } ; as xs:integer {

This version uses the \W+ regular expression (which matches non-alphabetical characters) to return word tokens.

Tag Cloud

559

Counting Keywords
Kurt Cagle suggested the following XQuery for counting keywords: declare namespace xqwb="http://xquery.wikibooks.org"; declare function xqwb:word-count($wordlist as element() ) as element() { <terms> {for $term in distinct-values($wordlist/term) let $term-count := count($wordlist/term[. = $term]) return <term count="{$term-count}">{$term}</term> } </terms> }; let $keywords := <keywords> <term>red</term> <term>green</term> <term>red</term> <term>blue</term> <term>violet</term> <term>red</term> <term>blue</term> <term>blue</term> <term>red</term> <term>orange</term> <term>green</term> <term>yellow</term> <term>indigo</term> <term>red</term> </keywords> let $result := xqwb:word-count($keywords) return $result [Execute [1]]

This Returns the Following


<terms> <term <term <term <term <term <term <term </terms> count="5">red</term> count="2">green</term> count="3">blue</term> count="1">violet</term> count="1">orange</term> count="1">yellow</term> count="1">indigo</term>

Tag Cloud

560

Creating a Tag Cloud


From this you can create a Tag Cloud or word density map such as the "Popular Tags" link on the flickr web site Flicker Popular Tags [2] declare namespace xqwb="http://xquery.wikibooks.org"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; declare function xqwb:word-count($wordlist as element() ) as element() { <terms> {for $term in distinct-values($wordlist/term) let $term-count := count($wordlist/term[. = $term]) return <term count="{$term-count}">{$term}</term> } </terms> };

let $keywords := <keywords> <term>red</term> <term>green</term> <term>red</term> <term>blue</term> <term>violet</term> <term>red</term> <term>blue</term> <term>blue</term> <term>red</term> <term>orange</term> <term>green</term> <term>yellow</term> <term>indigo</term> <term>red</term> </keywords> let $result := xqwb:word-count($keywords) let $total := count($keywords/term) let $scale := 20 return <div> { for $term in $result/term let $fontSize := round( $term/@count div $total * 100 * $scale) order by $term return <span style="font-size:{$fontSize}%">{string($term)}</span>

Tag Cloud } </div> Execute [3]

561

References
[1] http:/ / www. cems. uwe. ac. uk/ xmldb/ rest/ / db/ Wiki/ wordCount. xq [2] http:/ / www. flickr. com/ photos/ tags [3] http:/ / www. cems. uwe. ac. uk/ xmldb/ rest/ / db/ Wiki/ wordCount_1. xq

Topological Sort
Motivation
You have a Directed Acyclic Graph (DAG) to track things such as a dependancy graph. You want to sort in input DAG of nodes so that in the output reflects the dependancy structure. The Topological Sort of a Directed Acyclic Graph puts nodes in a sequence such that every node references only preceding nodes. This ordering is needed for example in scheduling processes in a Pipeline. For example, given a DAG defined as <node id="a"> <ref id="b"/> <ref id="c"/> </node> <node id="b"> <ref id="c"/> </node> <node id="c"/> the topological order would be: <node id="c"/> <node id="b"> <ref id="c"/> </node> <node id="a"> <ref id="b"/> <ref id="c"/> </node> The definition of topological order can be simply expressed in XQuery: declare function local:topological-sorted($nodes) as xs:boolean { every $n in $nodes satisfies every $id in $n/ref/@id satisfies $id = $n/preceding::node/@id };

Topological Sort A recursive algorithm is also straightforward:


declare function local:topological-sort($unordered, $ordered ) if (empty($unordered)) then $ordered else let $nodes := $unordered [ every $id in ref/@id satisfies $id = $ordered/@id] return if ($nodes) then local:topological-sort( $unordered except $nodes, ($ordered, $nodes )) else () }; (: cycles so no order possible :) {

562

which is invoked as let $graph := <graph> <node id="a"> <ref id="b"/> <ref id="c"/> </node> <node id="b"> <ref id="c"/> </node> <node id="c"/> </graph> let $sortedNodes := <graph>{local:topological-sort($graph/node,())}</graph> return local:topological-sorted($sortedNodes)

Explanation
$ordered is initially the original sequence, $ordered is empty. At each iteration, the set of nodes which are dependant only on the ordered nodes are calculated and these are removed from the unordered nodes and added to the ordered nodes.

References

Tree View

563

Tree View
Motivation
You want a general purpose function that creates a tabular view of hierarchical data.

Method
We will write a recursive function to display each node and then to display each child in an HTML table. Some systems call this a "Grid View" of XML data.

element-to-nested-table function
The following function generates an HTML table with nested subtables for the child nodes. declare function local:element-to-nested-table($element) { if (exists ($element/(@*|*))) then <table> {if (exists($element/text())) then <tr class="text"> <th></th> <td>{$element/text()}</td> </tr> else () } {for $attribute in $element/@* return <tr class="attribute"> <th>@{name($attribute)}</th> <td>{string($attribute)}</td> </tr> } {for $node in $element/* return <tr class="element"> <th>{name($node)}</th> <td>{local:element-to-nested-table($node)}</td> </tr> } </table> else $element/text() }; Note that the rows displaying different kinds of items (text, attribute,element) are classed so that they may be styled.

Tree View

564

Document display
This function can be used in a script to provide a viewer for any XML document. declare namespace hc ="http://exist-db.org/xquery/httpclient"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; (: function declaration :) let $uri := request:get-parameter("uri",()) let $element:= httpclient:get(xs:anyURI($uri),true(),())/hc:body/html return <html> <head> <title>Tree view</title> <style type="text/css"> th {{border-style:double}} tr {{border-style:dotted}} tr .attribute {{font-style:italic}} td {{border-style:ridge}} </style> </head> <body> <h1>Tree view of {$uri} </h1> {local:element-to-nested-table($element)} </body> </html> e.g. 1. 2. 3. 4. UWE's news feed [1] Whisky data [2] Employee data [3] Met Office shipping Forecast [4] mal-formed XML

References
[1] [2] [3] [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ treeview. xq?uri=http:/ / info. uwe. ac. uk/ news/ uwenews/ downloadxml. asp http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ treeview. xq?uri=/ db/ Wiki/ whisky1. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ treeview. xq?uri=/ db/ Wiki/ empdept/ emp. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ treeview. xq?uri=http:/ / www. metoffice. gov. uk/ weather/ marine/ shipping_forecast. html

Validating a hierarchy

565

Validating a hierarchy
Whilst schema validation can check for some aspects of model validity, business rules are often more complex than is expressible in XML schema. XQuery is a powerful language for describing more complex rules. One such rule is that a relationship should define a tree structure,for example, the relationship between an employee and her manager. Consider the following set of employees: <company> <emp> <name>Fred</name> <mgr>Bill</mgr> </emp> <emp> <name>Joe</name> <mgr>Bill</mgr> </emp> <emp> <name>Alice</name> <mgr>Joe</mgr> </emp> <emp> <name>Bill</name> </emp> </company> The criteria for a valid hierarchy are: 1. 2. 3. 4. one root (the boss); every employee has at most one manager; every employee reports finally to the boss; there are no cycles

In XQuery we can define the management hierarchy from the boss down to an employee as : declare function local:management($emp as element(emp) , $hierarchy as element(emp)* ) as element(emp)* { if ($emp = $hierarchy ) (: cycle detected :) then () else let $mgr := $emp/../emp[name=$emp/mgr] return if (count($mgr) > 1) then () else if (empty ($mgr)) (: reached the root :) then ($emp,$hierarchy) else local:management($mgr, ($emp,$hierarchy)) };

Validating a hierarchy The function is initially called as local:managment($emp,()) The hierarchy is built up as a parameter to allow cycles to be detected. Finally, the condition for the management structure to be a tree is declare function local:management-is-tree($company)) { let $boss := $company/emp[empty(mgr)] return count($boss) = 1 and (every $emp in $company/emp satisfies $boss = local:management($emp,())[1] ) };

566

Wiki weapons page


Over on Matt Turner's blog [1], he uses MarkLogic to get a list of medieval weapons from the wiki page as the first step in the enrichment of the texts of Shakespeare's plays. Here's another attempt at this task, using only standard XQuery functions. Again, we are fortunate that wiki pages are well-formed XML.
declare namespace h= "http://www.w3.org/1999/xhtml" ;

let $url := "http://en.wikipedia.org/wiki/List_of_medieval_weapons" let $wikipage := doc($url) return string-join($wikipage//h:div[@id="bodyContent"]//h:li[h:a/@title][empty(h:ul)]/h:a,',')

The complex path here is it ensure that only the relevant li tags are included and that only terminals in a hierarchy of terms are included, hence the check that the li has no ul child. Execute [2]

References
[1] http:/ / xquery. typepad. com/ xquery/ 2007/ 10/ xquery-at-work. html [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Scrape/ wikiweaponslist. xq

Wikibook index page

567

Wikibook index page


Motivation
You want to create an index of pages in a Wikibook.

Method
Fetch the Category page for the book and re-format based on the initial letter of the page, hence skipping the category level.

Simple Index page


This index page is based on the names of pages within the book. Its starting point is the Category with the same name as the book, into which all pages need to be put. declare namespace h ="http://www.w3.org/1999/xhtml"; declare option exist:serialize "method=xhtml media-type=text/html"; let let let let $book:= request:get-parameter("book",()) $base := "http://en.wikibooks.org" $indexPage :=doc(concat($base,"/wiki/Category:",$book)) $pages := $indexPage//h:div[@id="mw-pages"]//h:li

return <html> <h1>Index of {$book}</h1> { for $letter in distinct-values($pages/substring(substring-after(.,'/'),1,1))[string-length(.) = 1] return <div> <h3>{$letter}</h3> <ul> {for $page in $pages[starts-with(substring-after(.,'/'),$letter)] let $url := concat($base,$page/h:a/@href) return <li> <a href="{$url}">{substring-after($page,'/')}</a> </li> } </ul> </div> } </html> XQuery Index [25]

Wikibook index page XForms Index [1] XRX [2]

568

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ wikiindex. xq?book=XForms [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ wikiindex. xq?book=XRX

Wikipedia Lookup
Page scraping is one way to retrieve a specific fact from a page provided its structure is stable. Here the task is to use wikipedia to find the Latin name for a bird, given its common name.
declare namespace h = "http://www.w3.org/1999/xhtml";

let $name := request:get-parameter("name",()) let $url := escape-uri(concat("http://en.wikipedia.org/wiki/",$name),false()) let $page := doc($url) let $genus := $page//h:tr[h:td[. ='Genus:']]/h:td[2] let $species := $page//h:tr[h:td[. ='Species:']]/h:td[2] let $binomial := string($page//h:tr[h:th//h:a[.='Binomial name']]/following-sibling::h:tr//h:b) return <bird name="{$name}" genus="{$genus}" species="{$species}" binomial="{$binomial}"/>

Here, the path to locate the data required, assuming the page is in Bird page format, involves complex XPath expressions. For example, the genus is the second cell in a table row whose first cell is 'Genus'. Black Swan [1] Wikipedia [2] The script often fails because: 1. the name is ambiguous Thrush [3]Wikipedia [4] 2. the name is too broad Kiwi [5] Wikipedia [6] It is not hard to see that more semantic markup with ontological relationships would be preferable to these uncertain contortions.

References
[1] [2] [3] [4] [5] [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ birdlinneas. xq?name=Black%20Swan http:/ / en. wikipedia. org/ wiki/ Black%20Swan http:/ / www. cems. uwe. ac. uk/ xmlwiki/ birdlinneas. xq?name=Thrush http:/ / en. wikipedia. org/ wiki/ Thrush http:/ / www. cems. uwe. ac. uk/ xmlwiki/ birdlinneas. xq?name=Kiwi http:/ / en. wikipedia. org/ wiki/ Kiwi

XML to RDF

569

XML to RDF
For the Emp-DEPT case study, RDF must be generated from underlying XML files. An XQuery script generates the RDF. It uses a configuration file to define how columns of a table should be mapped into RDF and the namespaces to be used. This mapping needs a little more work to allow composite keys and allow user defined transformations. An interactive tool to create this map would be useful.

Issues in mapping to RDF


The main guide to publishing linked data on the web is How to Publish Linked Data on the Web connected with the Wikibook entry consists in progressively applying the principles enunciated there.
[1]

. This work

This conversion illustrates a few of the differences between local datasets, whether SQL or XML, and a dataset designed to fit into a global database. Some decisions remain unclear. tables are implicitly within an organisational context. This context has to be added in RDF by creating a namespace for the local properties and identifiers the scope of queries is implicitly within organisational boundaries, but in RDF this scope needs to be explicit. In the SQL query select * from emp; emp is ambiguously either the class of employees or the set of employees in the company. In RDF this needs to be explicit, so that two kind of tuples need to be added: tuples to type employees to a company definition of employee tuples to relate the employee to the company (to be added) linkage to the global database requires two kinds of links: local properties need to be mapped to global predicates. Here the employee name is mapped to foaf:surname (but the case probably needs changing). Alternatively a local predicate f:name could be defined, which is equated to the foaf predicate with owl:samePropertyAs. local identifiers of resources to be replaced by global URIs. Here location is mapped to a dbpedia resource URI. Alternatively, the local URI f:location/Dallas could be equated to the dbPedia resource with owl:sameAs. (where? and why delay this?) foreign keys are replaced by full URIs, pointing directly to the linked resource. The name of this property is no longer the name of the foreign key (e.g. MgrNo but rather the name of the related resource (Manager). However, the foreign key itself might also need to be replaced. primary keys are also replaced by URIs, but the local primary key value, for example the employee number, will need to be retained as a literal if it is not purely a surrogate key. This perhaps should be mapped to rdf:label. datatypes are preferably explicit in the data to avoid conversion in queries although this increases the size of the RDF graph. namespaces have been expanded in full where they occur in RDF attribute values. An alternative would be to define entities in an DTD prolog as shorthand for these namespaces, but not all processors of the RDF would do the expansion. xml:base can be used to default one namespace.

[The choices made here are those of a novice and review would be welcome. ] Some issues not yet addressed: meta-data about the dataset as a whole - its origin, when and how converted, - these can be DC properties of a document, with each entity tied to that document as a part? an alternative approach to mapping would be to start with an ontology and add mapping information to it rather than generating it from the ad-hoc configuration file.

XML to RDF

570

Configuration file
To facilitate the conversion from XML to RDF, a separate configuration file is defined. Here is the configuration file for the emp-dept data.
<?xml version="1.0" encoding="UTF-8"?> <XML-to-RDF> <namespaces> <namespace prefix="f" uri="http://www.cems.uwe.ac.uk/empdept/concept/" /> <namespace prefix="ft" uri="http://www.cems.uwe.ac.uk/empdept/"/> <namespace prefix="rdf" uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#" /> <namespace prefix="rdfs" uri="http://www.w3.org/2000/01/rdf-schema#" /> <namespace prefix="foaf" uri="http://xmlns.com/foaf/0.1/" /> <namespace prefix="xs" uri="http://www.w3.org/2001/XMLSchema#" /> </namespaces> <map type="emp" prefix="f"> <source file="/db/Wiki/empdept/emp.xml" path="//Emp"/> <col name="EmpNo" pk="true" uribase="ft:emp" type="xs:string"/> <col name="Ename" prefix="rdfs" tag="label"/> <col name="Sal" type="xs:integer"/> <col name="Comm" type="xs:integer"/> <col name="HireDate" type="xs:date"/> <col name="MgrNo" tag="Mgr" uribase="ft:emp"/> <col name="MgrNo"/> <col name="DeptNo" tag="Dept" uribase="ft:dept"/> <col name="Ename" prefix="foaf" tag="surname"/> <col name="Job"/> </map> <map type="dept" prefix="f"> <source file="/db/Wiki/empdept/dept.xml" path="//Dept"/> <col name="Dname" prefix="rdfs" tag="label"/> <col name="Dname"/> <col name="Location" uribase="http://dbpedia.org/resource"/> <col name="DeptNo" pk="true" uribase="ft:dept" type="xs:string"/> </map> <map type="salgrade" prefix="f"> <source file="/db/Wiki/empdept/salgrade.xml" path="//SalGrade"/> <col name="HiSal" type="xs:integer"/> <col name="LoSal" type="xs:integer"/> <col name="Grade" pk="true" uribase="ft:grade" type="xs:integer"/> <col name="Grade" prefix="rdfs" tag="label"/> </map> </XML-to-RDF>

XML to RDF

571

Data base conversion functions


One function row-to-rdf generates the RDF for a row of a table, another function map-to-schema generates RDFS descriptions of the predicates used in a table.
module namespace fr= "http://www.cems.uwe.ac.uk/wiki/fr";

import module namespace util = "http://exist-db.org/xquery/util";

declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; declare namespace rdfs = "http://www.w3.org/2000/01/rdf-schema#";

declare function fr:declare-namespaces($config) { for $ns in $config//namespace[@declare="yes"] return util:declare-namespace($ns/@prefix,xs:anyURI($ns/@uri)) };

declare function fr:expand($qname as xs:string?, $map ) as xs:string ?{ let $namespace := $map/..//namespace return if ($qname) then if (contains($qname,":")) then let $qs := tokenize($qname,":") let $prefix := $qs[1] let $name := $qs[2] let $uri := $namespace[@prefix=$prefix]/@uri return concat($uri,$name) else if ($namespace[@prefix = $qname]) then $namespace[@prefix = $qname]/@uri else $qname else () };

declare function fr:row-to-rdf($row as element() , $map as element() ) as element(rdf:Description) * { let $pk := $map/col[@pk="true"] let $pkv := string($row/*[name()=$pk/@name])

let $pkuri := fr:expand($pk/@uribase, $map) return <rdf:Description> {attribute rdf:about {concat($pkuri,"/",$pkv)}} { if ($map/@type) then let $typeuri := fr:expand(concat($map/@prefix,":",$map/@type),$map) return <rdf:type rdf:resource="{$typeuri}"/>

XML to RDF
else () } {for $col in $map/col

572

let $name := $col/@name let $data := string($row/*[name(.)=$name]) return if ($data !="") then element { concat(($col/@prefix,$map/@prefix)[1], ":", ($col/@tag,$name)[1])} { if ($col/@type) then (attribute rdf:datatype { fr:expand($col/@type,$map)} ,

$data) else if ( $col/@uribase ) then attribute rdf:resource

{concat(fr:expand($col/@uribase,$map), "/",replace($data," ","_"))} else } else () } </rdf:Description> }; $data

declare function fr:map-to-schema ($map as element()) as element(rdf:Description) * { let $typeuri := fr:expand(concat($map/@prefix,":",$map/@type),$map) for $col in $map/col[@type] let $prop := concat( fr:expand(($col/@prefix,$map/@prefix)[1],$map ), ($col/@tag,$col/@name)[1]) let $rangeuri := ( fr:expand($col/@type,$map), fr:expand($col/@uribase,$map),"http://www.w3.org/2000/01/rdf-schema#literal")[1] return <rdf:Description rdf:about="{$prop}"> <rdf:type rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>

<rdfs:domain rdf:resource="{$typeuri}"/> <rdfs:range rdf:resource="{$rangeuri}"/> <rdf:label>{string($col/@name)}</rdf:label> </rdf:Description> };

XML to RDF

573

Full database conversion


The script to generate the RDF for the full database: import module namespace fr="http://www.cems.uwe.ac.uk/wiki/fr" at "fr.xqm"; declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; declare namespace rdfs = "http://www.w3.org/2000/01/rdf-schema#"; declare variable $config := doc(request:get-parameter("config",())); declare variable $x := fr:declare-namespaces($config); <rdf:RDF> { for $map in $config//map let $xml := doc($map/source/@file) let $source := util:eval(concat("$xml",$map/source/@path)) return (for $row in $source return fr:row-to-rdf($row,$map), fr:map-to-schema($map) ) } </rdf:RDF>

Links
Get RDF [2] Cached RDF [3] Validate RDF [4]

Resource RDF
In addition each resource is retrieved as RDF. In this simple example, the request for a resource URI like: http:/ / www. cems. uwe. ac. uk/ empdept/ emp/ 7839 is re-written by Apache to http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdeptrdf. xq?emp=7839 and the script retrieves the RDF:Description of the selected resource from the RDF file directly. This mechanism does not conform to the recommended practice of distinguishing between information resources (such as the information about employee 7839) and the real world entity being represented. At present, the resource URI de-references directly to the RDF, rather than to indirect using the 303 mechanism recommended. declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; declare variable $rdf := doc("/db/Wiki/RDF/empdept.rdf"); declare option exist:serialize "media-type=application/rdf+xml"; (: better to just parse the uri itself :) let $param := request:get-parameter-names()

XML to RDF let $type := $param[1] return if ($type="all") then $rdf else let $key := request:get-parameter($type,()) let $resourceuri := concat("http://www.cems.uwe.ac.uk/empdept/",$type,"/",$key) return <rdf:RDF> {$rdf//rdf:Description[@rdf:about=$resourceuri]} </rdf:RDF>

574

To Do
compound primary keys conversion functions, for example to convert the case of strings, reformat dates added resources and relationships - here the a company entity and links from departments to company

References
[1] [2] [3] [4] http:/ / www4. wiwiss. fu-berlin. de/ bizer/ pub/ LinkedDataTutorial/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ xml2rdf. xq?config=/ db/ Wiki/ RDF/ empdeptconfig. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdept. rdf http:/ / www. w3. org/ RDF/ Validator/ ARPServlet?URI=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdept. rdf& PARSE& TRIPLES_AND_GRAPH=PRINT_BOTH

XML to SQL

575

XML to SQL
Tabular XML can be exported to SQL by generating the create statement: declare function generic:element-to-SQL-create($element) { ("create table ", name($element), $generic:nl , string-join( for $node in $element/*[1]/* return concat (" ",name($node) , " varchar(20)" ), concat(',',$generic:nl) ), ";",$generic:nl ) }; and the insert statements: declare function generic:element-to-SQL-insert ($element) { for $row in $element/* return concat ( " insert into table ", name($element), " values (", string-join( for $node in $element/*[1]/* return concat('"',data($row/*[name(.)=name($node)]),'"'), "," ), ");",$generic:nl ) }; and using these two functions in a script:
declare option exist:serialize "method=text media-type=text/text";

import module namespace generic = "http://www.cems.uwe.ac.uk/generic" at "../lib/generic.xqm"; let $x := response:set-header('Content-Disposition','inline;filename=emp.sql') return (generic:element-to-SQL-create(/EmpTable), generic:element-to-SQL-insert(/EmpTable) )

Generate SQL [1] (not yet tested ) This SQL is very general, with all fields defined as varchar because of the lack of a schema. With a Schema, appropriate datatypes could be defined in SQL.

XML to SQL

576

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ emp2SQL. xq

XPath examples
Motivation
You would like to select specific structures within an XML document. You would like to use a language that is consistent across all W3X XML Standards. This language is XPath.

Sample Input File


Put this file in /db/apps/training/data/books.xml
<books> <description>A list of books useful for people first learning how to build web XML web applications.</description> <book> <title>XQuery</title> <author>Priscilla Walmsley</author> <description>This book is a highly detailed, through and complete tour of the W3C Query language. It covers all the key aspects of the language as well as</description> <format>Trade press</format> <license>Commercial</license> <list-price>49.95</list-price> </book> <book> <title>XQuery Examples</title> <author>Chris Wallace</author> <author>Dan McCreary</author> <description>This book provides a variety of XQuery example programs and is designed to work with the eXist open-source native XML application server.</description> <format>Wiki-books</format> <license>Creative Commons Sharealike 3.0 Attribution-Non-commercial</license> <list-price>29.95</list-price> </book> <book> <title>XForms Tutorial and Cookbook</title> <author>Dan McCreary</author> <description>This book is an excellent guide for anyone that is just beginning to learn the XForms standard. The book is focused on providing the reader with simple, but complete examples of how to create XForms web applications.</description> <format>Wikibook</format> <license>Creative Commons Sharealike 3.0 Attribution-Non-commercial</license>

XPath examples
<list-price>29.95</list-price> </book> <book> <title>XRX: XForms, Rest and XQuery</title> <author>Dan McCreary</author> <description>This book is an overview of the key architectural and design patters.</description> <format>Wikibook</format> <license>Creative Commons Sharealike 3.0 Attribution-Non-commercial</license> <list-price>29.95</list-price> </book> </books>

577

XPath provides a number of functions and axes to move around an XML structure.

Sample Test XQuery Script


Here is a sample XQuery "driver" for these tests. To use it just replace the function after the '''return''' on the last line. xquery version "1.0"; let $books := doc('/db/apps/training/data/books.xml') return count($books//book)

Screen Image of XQuery Results in oXygen


There are several ways to test your XPath expressions. On of the most useful is to put you XML test data in document within eXist and then use a tool such as oXygen to execute the test on the server but then display the results in the oXygen results window. This can be done within oXygen by setting up eXist (not the default internal Saxon) as your "transformation scenario". The following is a screen image of how these results look when viewed with the oXygen IDE:

XPath examples

578

Sample XPath Expressions


Return the entire data file $books Get just books and all the data for each book. $books//book Get all the book titles $books//title Get the collection description $books/books/description/text() Get the descriptions for all the books $books//book/description/text()

Counting and Math


Count the number of books using a absolute path count($books/books/book) (: should return 4 :)

Count the number of books using //. With eXist executes much faster on larger collections. count($books//book) (: should return 4 :) Get a sequence of all the titles in the book collection. $books//title/text()

XPath examples Calculate the total and average price of all the books in the collection. sum($books//list-price/text()) avg($books//list-price/text()) min($books//list-price/text()) max($books//list-price/text()) (: (: (: (: Should Should Should Should return return return return a a a a number number number number such such such such as as as as 139.84 :) 34.96 :) 29.95 :) 44.99 :)

579

The following scripts show some of these functions and axes in use.

Adding Predicates
A predicate is a qualifier that is added to the end of an XPath expression. It is usually used to filter out nodes from result set. Predicates are similar to WHERE constructs in SQL. $books//book[format='Wikibook'] Get just the titles of the wikibooks $books//book[format='Wikibook']/title/text() Get all the books that contain the word: 'XQuery" somewhere in the title $books//book[contains(title, 'XQuery')]/title/text()

Complex Predicates
1. node() 2. text() 3. * 4. string(..) 5. data(..) 6. child:: 7. parent:: 8. following-sibling:: 9. preceding-sibling:: 10. descendant:: 11. descendant-or-self Navigating around a tree with distinct tags [1] Navigating around a tree with a single tag [2]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xpathExamples1. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xpathExamples2. xq

XQuery and Python

580

XQuery and Python


Over at [1] Cameron Laird has an example of Python code to extract and list the a tags on an XHTML page: import elementtree.ElementTree def detail_anchor(element): if element.tag == "a": attributes = element.attrib if "href" in attributes: print "'%s' is at URL '%s'." % (element.text, attributes['href']) if "name" in attributes: print "'%s' anchors '%s'." % (element.text, attributes['name']) for element in elementtree.ElementTree.parse("draft2.xml").findall("//a"): detail_anchor(element) The equivalent to this Python code in XQuery is: for $a in doc("http://en.wikipedia.org/wiki/XQuery")//*:a return if ($a/@href) then concat("'", $a,"' is at URL '",$a/@href,"'&#10;") else if ($a/@name) then concat("'", $a,"' anchors '",$a/@name,"'&#10;") else () tags in XQuery on Wikipedia [2] (view source) Here the namespace prefix is a wild-card since we don't know what the html namespace might be. More succinctly but less readably (and with quotes omitted from the output for clarity), this could be expressed as: string-join( doc("http://en.wikipedia.org/wiki/XQuery")//*:a/ (if (@href) then concat(.," is at URL ",@href) else if (@name) then concat(.," anchors ", @name) else () ) ,'&#10;' ) tags in XQuery on Wikipedia [3] (view source) More usefully, we might supply the url of any XHTML page as a parameter and generate an HTML page of external links:
declare option exist:serialize "method=xhtml media-type=text/html";

XQuery and Python


let $url :=request:get-parameter("url",()) return <html> <h1>External links in {$url}</h1> { for $a in doc($url)//*:a[text()][starts-with(@href,'http://')] return <div><b>{string($a)}</b> is at } </html> <a href="{$a/@href}"><i>{string($a/@href)}</i> </a></div>

581

XQuery on Wikipedia [4]

References
[1] [2] [3] [4] http:/ / www. ibm. com/ developerworks/ xml/ library/ x-simplifyxmlreads. html?S_TACT=105AGX06& S_CMP=EDU#listing2 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ atags. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ atags3. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ links. xq?url=http:/ / en. wikipedia. org/ wiki/ XQuery

XQuery and XML Schema


Motivation
In learning any modelling language, it is helpful to see sample instances of any formal model. This is equally true when developing an XML Schema. XML Development tools like Oxygen and XML-Spy have tools to generate a random instance of a supplied XML schema, but it is useful to have a web service to do this. The service can also be used for test data generation. The following is a test XML Schema file:

Sample XML Schema


<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="Root"> <xs:annotation> <xs:documentation>Any</xs:documentation> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element name="Any" type="xs:string"/> <xs:element name="Any"> <xs:complexType> <xs:all> <xs:element name="AnyA"/> <xs:element name="AnyB"/> <xs:element name="AnyC"/>

XQuery and XML Schema


</xs:all> </xs:complexType> </xs:element> <xs:element name="Multiple" maxOccurs="5"> <xs:complexType> <xs:sequence> <xs:element name="Choice"> <xs:complexType> <xs:choice> <xs:element name="First"> <xs:complexType> <xs:attribute name="AttributeInt" type="xs:integer"/> <xs:attribute name="AttributeString" use="required" type="xs:string"/> <xs:attribute name="AttributeBoolean" use="required" type="xs:boolean"/> </xs:complexType> </xs:element> <xs:element name="Second"> <xs:complexType> <xs:attribute name="AttributeString" type="xs:string"/> <xs:attribute name="AttributeDate" use="required" type="xs:date"/>

582

</xs:complexType> </xs:element> <xs:element name="Third"> <xs:complexType> <xs:attribute name="AttributeDecimal" use="required" type="xs:decimal"/> <xs:attribute name="AttributeTime" use="required" type="xs:time"/> <xs:attribute name="AttributeDateTime" use="required" type="xs:dateTime"/> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> </xs:element> <xs:element name="Unbounded" maxOccurs="unbounded"/> <xs:element ref="Ref"/> <xs:element name="WithRefType" type="RefType"/> <xs:element name="Optional" minOccurs="0"/> </xs:sequence> </xs:complexType>

XQuery and XML Schema


</xs:element> <xs:element name="End" type="xs:integer"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Ref"> <xs:complexType> <xs:sequence> <xs:element name="Ref1" <xs:element name="Ref2" </xs:sequence> </xs:complexType> </xs:element> maxOccurs="3" type="xs:integer"/> type="xs:date"/>

583

<xs:complexType name="RefType"> <xs:sequence> <xs:element name="RefTypeA"/> <xs:element name="RefTypeB"/> </xs:sequence> </xs:complexType>

</xs:schema>

XQuery to convert Simple XML Schema into an Instance Document


We can use a recursive function to convert this XML Schema into an instance document. Here is an implementation of this function which covers some of the key XML Schema constructs:
declare namespace xs = "http://www.w3.org/2001/XMLSchema"; declare function local:process-element($element as node()) typeswitch ($element) case element(xs:complexType) for $sub-element in return return $element/* {

local:process-element($sub-element)

case element(xs:annotation) return () case element(xs:element) let $max := if ($element/@maxOccurs = "unbounded") then 10 else ($element/@maxOccurs,1)[1] let $count := $min + $min)) return if ($count then >0 ) round(math:random() * ($max return let $min := ($element/@minOccurs,1)[1]

XQuery and XML Schema


for $i in 1 to $count return if ($element/@ref) then local:process-element(root($element)/xs:schema/xs:element[@name=$element/@ref]) else if (root($element)/xs:schema/xs:complexType[@name=$element/@type]) then element {$element/@name} {

584

local:process-element(root($element)/xs:schema/xs:complexType[@name=$element/@type]) } else element {$element/@name} { if ($element/*) return local:process-element($sub-element) else local:process-type(($element/@type,"xs:string")[1]) } else () case element(xs:attribute) then attribute {$element/@name} { local:process-type($element/@type) } else () case element(xs:sequence) return return or math:random() > 0.5) if ($element/@use="required"

then for $sub-element in $element/*

for $sub-element in $element/* return local:process-element($sub-element) case element(xs:all) for $sub-element return in $element/*

order by math:random() return local:process-element($sub-element) case element(xs:choice) return

let $count := count($element/*) let $i := ceiling(math:random()*$count)

XQuery and XML Schema


let $sub-element := $element/*[ $i] return local:process-element($sub-element) default return () };

585

This function uses the element construct to create a new XML instance tree based on the information in the XML Schema file and the typeswitch construct to select the appropriate processing of a given element type.

Type Instance Generation


Some helper functions are required to generate random instances of (some of ) the standard types. declare function local:pad($string as xs:string, $char as xs:string, $max as xs:integer) as xs:string { concat(string-pad($char,$max - string-length($string)), $string) }; declare function local:pad2($string) { local:pad($string,"0",2) }; declare function local:process-type($name ) { (: just simple types - xs namespace :) if ($name = ("xs:string","xs:NMTOKEN","xs:NCName")) then string-join( for $c in 1 to math:random() * 12 return ("A","B","C","D")[ceiling(math:random() * 4)] , "") else if ($name="xs:integer") then round(math:random()*100) else if ($name="xs:decimal") then math:random() * 100 - 50 else if ($name="xs:boolean") then ("true","false")[ ceiling( math:random() *2)] else if ($name="xs:date") then string-join((1990 + round(math:random() * 20), local:pad2(ceiling(math:random()*12)) , local:pad2(ceiling(math:random()*30))),"-") else if ($name="xs:time") then string-join((local:pad2(floor(math:random() * 24)), local:pad2(floor(math:random()*60)) , local:pad2(floor(math:random()*60))),":") else if ($name="xs:dateTime") then concat ( string-join((1990 + round(math:random() * 20), local:pad2(ceiling(math:random()*12)) , local:pad2(ceiling(math:random()*30))),"-"), "T",

XQuery and XML Schema string-join((local:pad2(floor(math:random() * 24)), local:pad2(floor(math:random()*60)) , local:pad2(floor(math:random()*60))),":") ) else () };

586

Script
Here is a basic script which applies this function to the example schema. By default, the root of the generated instance is the first element in the schema or a named element if a root parameter is supplied. let $file := request:get-parameter("file",()) let $root := request:get-parameter("root",()) let $schema := doc($file) return if ($root) then local:process-element($schema/xs:schema/xs:element[@name=$root]) else local:process-element($schema/xs:schema/xs:element[1])

Execute [1]

Sample Output
<Root> <Start>C</Start> <Any> <AnyA>BADADAD</AnyA> <AnyC>BA</AnyC> <AnyB>D</AnyB> </Any> <Multiple> <Choice> <Third AttributeDecimal="8.937041402178778" AttributeTime="20:38:04" AttributeDateTime="1995-08-06T21:08:43"/> </Choice> <Unbounded>BDAACAAAA</Unbounded> <Ref>

<Ref1>52</Ref1> <Ref2>1992-09-02</Ref2> </Ref> <WithRefType> <RefTypeA>BBBDD</RefTypeA> <RefTypeB>DAADCABDC</RefTypeB>

XQuery and XML Schema


</WithRefType>

587

</Multiple> <Multiple> <Choice> <Second AttributeString="ADAB" AttributeDate="2005-12-16"/> </Choice> <Unbounded/> <Unbounded>CABBB</Unbounded> <Unbounded/>

<Unbounded>B</Unbounded> <Unbounded>AACBABAABCB</Unbounded> <Ref> <Ref1>56</Ref1> <Ref2>1992-05-22</Ref2> </Ref> <WithRefType>

<RefTypeA>CDAA</RefTypeA> <RefTypeB>CBDBDBDB</RefTypeB> </WithRefType> <Optional>CC</Optional> </Multiple> <Multiple> <Choice>

<First AttributeInt="21" AttributeString="AABC" AttributeBoolean="true"/> </Choice> <Unbounded>ADADDDDDB</Unbounded> <Unbounded>DCDCBDDD</Unbounded> <Unbounded>CBAD</Unbounded> <Unbounded>DAAAD</Unbounded> <Ref>

<Ref1>13</Ref1> <Ref2>1997-04-30</Ref2> </Ref> <WithRefType> <RefTypeA>CB</RefTypeA> <RefTypeB/> </WithRefType>

</Multiple> <Multiple> <Choice> <Second AttributeString="DBDA" AttributeDate="1999-05-16"/>

XQuery and XML Schema


</Choice> <Unbounded>CCDDDDCD</Unbounded> <Unbounded>DDABACC</Unbounded> <Ref>

588

<Ref1>44</Ref1> <Ref1>30</Ref1> <Ref1>57</Ref1> <Ref2>1997-01-26</Ref2> </Ref> <WithRefType> <RefTypeA>CBC</RefTypeA>

<RefTypeB>CA</RefTypeB> </WithRefType> </Multiple> <End>95</End> </Root>

To do
full set of xml types mixed restriction (only ennumeration so far) Group AttributeGroup problem with missing attributes in a complexType hinting for distributions namespaces randomisation configuration

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ XMLSchema/ schema2instance. xq?file=/ db/ Wiki/ XMLSchema/ eg3. xsd

XQuery and XSLT

589

XQuery and XSLT


Motivation
You want to create a RESTful web service that executes an XSLT transform on an XML document.

Method
XQuery is superior to XSLT in many ways. XQuery is designed to be brief and concise programming language that interleaves XML and functional language statements. Therefore XQuery programs are usually much smaller than XSLT. XQuery processors are also designed to use indexes so that XQueries over large data sets can run quickly. But unfortunately there are still some times when you must use XSLT. One example of this is in-browser transforms. The eXist database comes with an XQuery function that allows you to transform an XML file using XSLT.

Creating an XSLT service


eXist includes a function to call an XSLT transform is the following:
transform:transform($input as node()?, $stylesheet as item(), $params as node()?) node()?

where:
$input is the node tree to be transformed $stylesheet is either a URI or a node to be transformed. If it is an URI, it can either point to an external

location or to an XSL stored in the db by using the 'xmldb:' scheme. $params are the optional XSLT name/value parameters with the following structure: <parameters> <param name="param-name1" value="param-value1"/> </parameters>

The result is zero or one nodes. The namespace of the transform module is http://exist-db.org/xquery/transform'''. The transform:transform() function can be used to provide a service which accepts the url of an XML file, the url of an XSLT script and any other parameters which are passed to the stylesheet. Currently output is text/html. declare option exist:serialize "method=html media-type=text/html"; (: look for URL parameters for the XML file and the transform :) let $xslt:= request:get-parameter("xslt",()) let $xml := request:get-parameter("xml",()) (: now get a list of all the URL parameters that are not either xml= or xslt= :) let $params := <parameters> {for $p in request:parameter-names() let $val := request:get-parameter($p,()) where not($p = ("xml","xslt")) return <param name="{$p}" value="{$val}"/>

XQuery and XSLT } </parameters> return (: now run the transform :) transform:transform(doc($xml), doc($xslt), $params)

590

Checking XSLT Version


The following XSLT is useful for checking what version of XSLT you are running. <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <body> <p>Version: <xsl:value-of select="system-property('xsl:version')" /> <br /> Vendor: <xsl:value-of select="system-property('xsl:vendor')" /> <br /> Vendor URL: <xsl:value-of select="system-property('xsl:vendor-url')" /> </p> </body> </html> </xsl:template> </xsl:stylesheet>

Form-based search
In this example, an XML file on one host is transformed by a XSLT script on another. The XSLT script defines a form to allow the use to select a subset of the entries in the XML file, followed by the search results, if any. 1. Stylesheet [1] 2. Data [2] 3. Search Whisky data [3] A sequence diagram describes the interaction involved: Sequence Diagram [4]

XQuery and XSLT

591

Page Scraping Example


see ../Page scraping and Yahoo Weather/

Using XSLT Imports


XSLT allows you to call a stylesheet that imports a common library of other XSLT templates. But not all of the XSLT import path statements will work within eXist. In the following example we will use a XSLT style sheet that imports another style sheet. The following assumes you have a collection called /db/test/xslt and all the files are placed in that collection. XQuery XSLT Test Program xquery version "1.0"; declare namespace transform="http://exist-db.org/xquery/transform"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; let $input := <data> <element>element 1</element> <element>element 2</element> <element>element 3</element> </data> return <html> <head> <title>Demonstration of running XSLT within an XQuery</title> </head> <body> <h1>Demonstration of running XSLT within an XQuery</h1> { transform:transform($input, doc("/db/test/xslt/style.xsl"), ()) } </body> </html> Top-Level Style.xsl <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:import href="common.xsl"/> <xsl:template match="/"> <ol> <xsl:apply-templates/> </ol> </xsl:template> </xsl:stylesheet> In the second line, the following imports do work as expected:
<xsl:import href="common.xsl"/> <xsl:import href="common.xsl" xml:base="http://localhost/exist/rest/db/test/xslt"/>

XQuery and XSLT


<xsl:import href="common.xsl" xml:base="/exist/rest/db/test/xslt"/>

592

But you will note that using the following does not work: <xsl:import href="/exist/rest/db/test/xslt/common.xsl"/> Imported common.xsl <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="element"> <li> <xsl:value-of select="."/> </li> </xsl:template> </xsl:stylesheet>

XForms Example
You can also create a simple XForms example that serves as a front end to this script. See the XRX wikibook for an example of this XForms front end. xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/xml indent=no process-xsl-pi=yes"; (: transform:transform($node-tree as node()?, $stylesheet as item(), $parameters as node()?, as xs:string) node()? :) let $transform := 'http://localhost:8080/exist/rest/db/xforms/xsltforms/xsltforms.xsl' let $form := <html xmlns="http://www.w3.org/1999/xhtml" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:xf="http://www.w3.org/2002/xforms"> <head> <title>XForms Template</title> <xf:model> <xf:instance xmlns="" id="save-data"> <data> <name>John Smith</name> </data> </xf:instance> </xf:model> </head> <body>

XQuery and XSLT <h1>XForms Test Program</h1> <xf:input ref="name"> <xf:label>Name: </xf:label> </xf:input> </body> </html> let $serialization-options := 'method=xml media-type=text/xml omit-xml-declaration=yes indent=no' let $params := <parameters> <param name="output.omit-xml-declaration" value="yes"/> <param name="output.indent" value="no"/> <param name="output.media-type" value="text/html"/> <param name="output.method" value="xhtml"/> </parameters> return transform:transform($form, $transform, $params, $serialization-options)

593

Caching Management
By default, once a document has been transformed it resides in the cache. This is very good for performance reasons if a file needs to be retransformed but sometimes if the source file changes the transform needs to be rerun. You can disable caching by changing the configuration file. In the file conf.xml change the @caching value from yes to no.:
<transformer class="org.apache.xalan.processor.TransformerFactoryImpl" caching="no"/>

http://demo.exist-db.org/exist/xquery.xml#N10375

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ whisky/ t4g. xsl [2] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ pipes/ whisky. xml [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ xslt2html. xq?xml=http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ pipes/ whisky. xml& xslt=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ whisky/ t4g. xsl [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ showDiagram. xq?id=whiskyxslt

XQuery from SQL

594

XQuery from SQL


The Scott/Tiger example
A classic database widely used in teaching Relational databases concerns Employees, Departments and SalaryGrades. In Oracle training material it is known by the demo username and password, Scott/Tiger. These three tables converted to XML (via the XML add-in to Excel XQuery/Excel and XML ) are: Employees XML [2] Table [1] MySQL [2] Departments XML [3] Table [4] MySQL [5] Salary Grades XML [6] Table [7] MySQL [8] A port of the Oracle SQL file to MySQL can be found here [9].

Execution environments
The eXist demo server is used for the XQuery examples. These are returned either as plain XML or converted to table format. The equivalent SQL queries are executed on an w:MySQL server,also based at the University of the West of England in Bristol

Basic Queries
Counting Records
Task: How many Employees? SQL: select count(*) from emp MySQL [10] XQuery: count(//Emp) XML [11] Task: How many Departments? SQL: select count(*) from dept MySQL [12] XQuery: count(//Dept) XML [13]

Selecting records
Task: Show all Employees with a salary greater than 1000 SQL: select * from emp where sal > 1000; MySQL [14] XQuery: //Emp[Sal>1000] XML [15] Table [16] Task: Show all Employees with a salary greater than 1000 and less than 2000 SQL: select * from emp where sal between 1000 and 2000; MySQL [17] XQuery: //Emp[Sal>1000][Sal<2000] XML [18] Table [19] Here, successive filter conditions replace the anded conditions implied by 'between'. Although there is no 'between' function in XQuery, it is a simple matter to write one:
declare function local:between($value as xs:decimal, $min as xs:decimal, $max as xs:decimal) as xs:boolean { $value >= $min and $value <= $max };

XQuery from SQL which simplifies the query to //Emp[local:between(Sal,1000,2000)] XML [20] Table [21] and has the advantage that the conversion of Sal to a number is now implicit in the function signature. Task: Show all employees with no Commission SQL: select * from emp where comm is null; MySQL [22] XQuery: //Emp[empty(Comm/text())] XML [23] Table [24] Note that empty(Comm) is not enough, since this is true only if the element itself is absent, which in this sample XML it is not. XQuery: //Emp[empty(Comm)] XML [25] Task: Select the first 5 employees SQL: select * from emp limit 5; MySQL [26] XQuery: //Emp[position() <=5] XML [27] Table [28]

595

Selecting Columns
List Employee names and salaries SQL: Select ename,sal from emp MySQL [29] Surprisingly, selecting only a subset of children in a node (pruning) is not supported in XPath. //Emp/(Ename,Sal) XML [30] retrieves the required elements, but the parent Emp nodes have been lost. //Emp/(Ename|Sal) XML [31] is better since it keeps the elements in sequence, but it does not return Emp nodes with only the Ename and Sal children as required. //Emp/*[name(.) = ("Ename","Sal")] XML [32] uses reflection on the element names. XQuery: for $emp in //Emp return <Emp> {$emp/(Ename|Sal)} </Emp> XML [33] Table [34] Here an XQuery FLWOR expression is used to create a new EMP element from the original elements.

Computing values
Computing the Annual Salary Task: Compute the Annual Salaries of all employees. The Annual Salary is computed from 12 times the Monthly salary plus Commission. Since commission may be null, it must be replaced by a suitable numeric value: SQL: select 12 * sal + ifnull(comm,0) from emp; MySQL [35] XQuery: //Emp/(12*number(Sal)+(if(exists(Comm/text())) then number(Comm) else 0)) XML [36] The SQL function COALESCE is the same as IFNULL but will accept multiple arguments: SQL: select 12 * sal + coalesce(comm,0) from emp; MySQL [37] XQuery: //Emp/(12*number(Sal)+ number((Comm/text(),0)[1])) XML [38]

XQuery from SQL The lack of a schema in this simple example to carry information on the type of the items, leads to the need for explicit conversion of strings to numbers. Note the XQuery idiom: (Comm/text(),0)[1] computes the first non-null item in the sequence, the counter-part of COALESCE. Selecting and Creating Columns Task: List the employee names with their Annual Salary. SQL: select ename, 12 * sal + ifnull(comm,0) as "Annual Salary" from emp; MySQL [39] XQuery: for $emp in //Emp return <Emp> {$emp/Ename} <AnnualSalary> {12*number($emp/Sal)+ (if (exists($emp/Comm/text())) then number($emp/Comm) else 0) } </AnnualSalary> </Emp> XML [40] Table [41] Again we have the problem of tree-pruning, but now with added grafting, which again requires the explicit construction of an XML node.

596

SQL Operators
IN
Task: Show all employees whose Job is either ANALYST or MANAGER SQL: select * from emp where job in ("ANALYST","MANAGER") MySQL [42] XQuery: //Emp[Job = ("ANALYST","MANAGER")] XML [43] Table [44]

NOT IN
Task :Select all employees whose Job is not 'ANALYST' or 'MANAGER' SQL: select * from emp where job not in ("ANALYST","MANAGER") MySQL [45] This doesn't work: XQuery: //Emp[Job !=("ANALYST","MANAGER")] XML [46] Table [47] The generalised equals here is always true since everyone is either not an ANALYST or not a MANAGER. This works: XQuery: //Emp[not(Job =("ANALYST","MANAGER"))] XML [48] Table [49]

XQuery from SQL

597

Distinct values
Task: Show the different Jobs which Employees have MySQL: select distinct job from emp; MySQL [50] XQuery: distinct-values(//Emp/Job) XML [51]

Pattern Matching
Task: List all Employees with names starting with "S" MySQL: select * from emp where ename like "S%"; MySQL [52] XQuery: //Emp[starts-with(Ename,"S")] XML [53] Table [54] See starts-with() [55] Task: List all Employees whose name contains "AR" MySQL: select * from emp where ename like "%AR%"; MySQL [56] XQuery: //Emp[contains(Ename,"AR")] XML [57] Table [58] See contains() [59] Task: List all Employees whose name contains "ar" ignoring the case MySQL: select * from emp where ename like "%ar%"; MySQL [60] LIKE in SQL is case insensitive, but fn:contains() is not, so the case needs to be converted: XQuery: //Emp[contains(upper-case(Ename),upper-case("ar"))] XML [61] Table [62] See upper-case() [63] More complex patterns need regular expressions. MySQL: select * from emp where ename regexp "M.*R"; MySQL [64] XQuery: //Emp[matches(Ename,"M.*R")] XML [65] Table [66] See matches() [67] Similarly, SQL's REGEXP is case-insensitive, whereas additional flags control matching in the XQuery matches() MySQL: select * from emp where ename regexp "m.*r"; MySQL [68] XQuery: //Emp[matches(Ename,"m.*r",'i')] XML [69] Table [70] ('i' makes the regex match case insensitive.)

Table Joins
Simple Inner joins Task: Find the name of the department that employee 'SMITH' works in: SQL : select dept.dname from emp, dept where dept.deptno = emp.deptno and ename='SMITH'; MySQL [71] XPath : //Dept[DeptNo = //Emp[Ename='SMITH']/DeptNo]/Dname XML [72]

XQuery from SQL Perhaps a FLWOR expression in XQuery would be more readable: let $dept := //Emp[Ename='SMITH']/DeptNo return //Dept[DeptNo = $dept ]/Dname XML [73] Task: To find the names of all employees in Accounting SQL: select emp.ename from emp,dept where dept.deptno = emp.deptno and dname='Accounting'; MySQL [74] XPath: //Emp[DeptNo = //Dept[Dname='Accounting']/DeptNo]/Ename XML [75] XQuery: let $dept := //Dept[Dname='Accounting']/DeptNo return //Emp[DeptNo = $dept]/Ename XML [76] Note that in this release of eXist, the order of the operands in the equality is significant - to be fixed in a later release. XQuery: //Emp[//Dept[Dname='Accounting']/DeptNo = //Emp/DeptNo]/Ename XML [77] More complex Inner Join Task: List the name of each Employee, together with the name and location of their department. SQL: select ename, dname,location from emp, dept where emp.deptno = dept.deptno; MySQL [78] Where elements must be selected from several nodes, XPath is insufficient and XQuery is needed: XQuery: This join could be written as: for $emp in //Emp for $dept in //Dept where $dept/DeptNo= $emp/DeptNo return <Emp> {$emp/Ename} {$dept/(Dname|Location)} </Emp> XML [79] Table [80]

598

XQuery from SQL But it would be more commonly written in the form of a sub-selection: for $emp in //Emp let $dept := //Dept[DeptNo=$emp/DeptNo] return <Emp> {$emp/Ename} {$dept/(Dname|Location)} </Emp> XML [81] Table [82] Inner Join with Selection Task: List the names and department of all Analysts SQL: select from where and MySQL [83] XQuery: for $emp in //Emp[Job='ANALYST'] let $dept := //Dept[DeptNo= $emp/DeptNo] return <Emp> {$emp/Ename} {$dept/Dname} </Emp> XML [84] Table [85] 1 to Many query Task: List the departments and the number of employees in each department SQL: select dname, (select count(*) from emp where deptno = dept.deptno ) as headcount from dept; MySQL [86] XQuery: for $dept in //Dept let $headCount := count(//Emp[DeptNo=$dept/DeptNo]) return ename, dname emp, dept emp.deptno = dept.deptno job="ANALYST";

599

XQuery from SQL <Dept> {$dept/Dname} <HeadCount>{$headCount}</HeadCount> </Dept> </pre> XML [87] Table [88] Theta (Inequality) Join Task: List the names and salary grade of staff in ascending grade order Grades are defined by a minimum and maximum salary. SQL: select ename, grade from emp, salgrade where emp.sal between salgrade.losal and salgrade.hisal; MySQL [89] XQuery:
for $emp in //Emp let $grade :=

600

//SalGrade[number($emp/Sal) > number(LoSal)][number($emp/Sal) < number(HiSal)]/Grade

order by $grade return <Emp> {$emp/Ename} {$grade} </Emp>

XML [90] Table [91] Recursive Relations The relationship between an employee and their manager is a recursive relationship. Task: List the name of each employee together with the name of their manager. SQL: select e.ename, m.ename from emp e join emp m on e.mgr = m.empno MySQL [92] XQuery: for $emp in //Emp let $manager := //Emp[EmpNo = $emp/MgrNo] return <Emp> {$emp/Ename} <Manager>{string($manager/Ename)}</Manager> </Emp>

XQuery from SQL XML [93] Table [94] The XQuery result is not quite the same as the SQL result. King, who has no manager, is missing from the SQL inner join. To produce the same result in XQuery, we would filter for employees with Managers: for $emp in //Emp[MgrNo] let $manager := //Emp[EmpNo = $emp/MgrNo] where $emp/MgrNo/text() return <Emp> {$emp/Ename} <Manager>{string($manager/Ename)}</Manager> </Emp> XML [95] Table [96] Alternatively, an outer join returns all employees, including King: SQL: select e.ename, m.ename from emp e left join emp m on e.mgr = m.empno MySQL [97]

601

Conversion to an organisational tree


The manager relationship defines a tree structure, with King at the root, her direct reports as her children and so on. A recursive function in XQuery solves this task. XQuery: declare function local:hierarchy($emp) { <Emp name='{$emp/Ename}'> <Reports> {for $e in //Emp[MgrNo = $emp/EmpNo] return local:hierarchy($e) } </Reports> </Emp> }; local:hierarchy(//Emp[empty(MgrNo/text())]) XML [98]

XQuery from SQL

602

Conversion to a Department/Employee Hierarchy


For export, a single XML file could be created with Employees nested within Departments. This is possible without introducing redundancy or loss of data because the Dept/Emp relationship is exactly one to many. XQuery: <Company> {for $dept in //Dept return <Department> {$dept/*} {for $emp in //Emp[DeptNo = $dept/DeptNo] return $emp } </Department> } </Company> XML [99] With this simple approach, the foreign key DeptNo in Emp has been included but it is now redundant. The except operator is useful here: <Company> {for $dept in //Dept return <Department> {$dept/*} {for $emp in //Emp[DeptNo = $dept/DeptNo] return <Employee> {$emp/* except $emp/DeptNo} </Employee> } </Department> } </Company> XML [100] Note that this assumes there are no attributes to be copied. If there are, these would be copied with $emp/@*

XQuery from SQL

603

Working with the hierarchical data


This hierarchical data can be queried directly in XQuery.

Path to Employee
Almost all the queries remain the same (except for the change of element name to Employee). This is because the path used to select Emps in the Emp.xml document is //Emp and is now //Employee in the merged document. If a full path had been used (/EmpList/Emp), this would need to be replaced by /Company/Department/Employee

Simple Navigation
Task: To find the department name of employee 'Smith' XQuery: //Employee[Ename='SMITH']/../Dname XML [101] Task: To find the names of employees in the Accounting department XQuery: //Department[Dname='Accounting']/Employee/Ename XML [102]

Department /Employee Join


The main changes are in queries which require a join between Employee and Departments because they are already nested and thus become navigation up (from Employee to Department ) or down (from Department to Employee) the tree. many - one The query to list the Employees and the location of their Department with separate documents was: for $emp in //Emp for $dept in //Dept where $dept/DeptNo=$emp/DeptNo return <Emp> {$emp/Ename} {$dept/(Dname|Location)} </Emp> XML [79] Table [80] With one nested document, this becomes: for $emp in //Employee return <Employee> {$emp/Ename} {$emp/../Location} </Employee> XML [103] Table [104] using the parent access to move up the tree.

XQuery from SQL 1 - many To list departments and the number of employees in the separate tables is : for $dept in //Dept let $headCount := count(//Emp[DeptNo=$dept/DeptNo]) return <Dept> {$dept/Dname} <HeadCount>{$headCount}</HeadCount> </Dept> XML [87] Table [88] which becomes: for $dept in //Department let $headCount := count($dept/Employee) return <Department> {$dept/Dname} <HeadCount>{$headCount}</HeadCount> </Department> XML [105] Table [106]

604

Summarising and Grouping


Summary data
Task: Show the number, average (rounded), min and max salaries for Managers. SQL: SELECT count(*), round(avg(sal)), min(sal), max(sal) FROM emp WHERE job='MANAGER'; MySQL [107] XQuery:
(count(//Emp[Job='MANAGER']),round(avg(//Emp[Job='MANAGER']/Sal)),min(//Emp[Job='MANAGER']/Sal),max( //Emp[Job='MANAGER']/Sal))

XML [108] Better to factor out the XPath expression for the subset of employess:
let $managers := //Emp[Job='MANAGER'] return (count($managers),round(avg($managers/Sal)),min($managers/Sal),max($managers/Sal))

XML [109] It would be better to tag the individual values computed: let $managers := //Emp[Job='MANAGER'] return <Statistics> <Count>{count($managers)}</Count> <Average>{round(avg($managers/Sal))}</Average> <Min>{min($managers/Sal)}</Min> <Max>{max($managers/Sal)}</Max> </Statistics>

XQuery from SQL XML [110]

605

Grouping
Task: Show the number, average (rounded), min and max salaries for each Job. SQL: SELECT job, count(*), round(avg(sal)), min(sal), max(sal) FROM emp GROUP BY job; MySQL [111] In XQuery, grouping must be done by iterating over the groups. Each Group is identified by the Job and we can get the set (sequence) of all Jobs using the distinct-values function: for $job in distinct-values(//Emp/Job) let $employees := //Emp[Job=$job] return <Statistics> <Job>{$job}</Job> <Count>{count($employees )}</Count> <Average>{round(avg($employees/Sal))}</Average> <Min>{min($employees/Sal)}</Min> <Max>{max($employees/Sal)}</Max> </Statistics> XML [112] Table [113]

Hierarchical report
Task: List the departments , their employee names and salaries and the total salary in each department This must generate a nested table. SQL: ? XQuery: <Report> { for $dept in //Dept let $subtotal := sum(//Emp[DeptNo = $dept/DeptNo]/Sal) return <Department> {$dept/Dname} {for $emp in //Emp[DeptNo = $dept/DeptNo] return <Emp> {$emp/Ename} {$emp/Sal} </Emp> } <SubTotal>{$subtotal}</SubTotal> </Department> } <Total>{sum(//Emp/Sal)}</Total> </Report> XML [114]

XQuery from SQL Note that the functional nature of the XQuery language means that each total must be calculated explicitly, not rolled up incrementally as might be done in an imperative language. This has the advantage that the formulae are explicit and independent and can thus be placed anywhere in the report, such as at the beginning instead of at the end: <Report> <Total>{sum(//Emp/Sal)}</Total> { for $dept in //Dept let $subtotal := sum(//Emp[DeptNo = $dept/DeptNo]/Sal) return <Department> <SubTotal>{$subtotal}</SubTotal> {$dept/Dname} {for $emp in //Emp[DeptNo = $dept/DeptNo] return <Emp> {$emp/Ename} {$emp/Sal} </Emp> } </Department> } </Report> XML [115]

606

Restricted Groups
Task: Show the number, average (rounded), min and max salaries for each Job where there are at least 2 employees in the group. SQL: SELECT FROM GROUP HAVING job, count(*), round(avg(sal)), min(sal), max(sal) emp BY job count(*) > 1;

MySQL [116] XQuery: for $job in distinct-values(//Emp/Job) let $employees := //Emp[Job=$job] where count($employees) > 1 return <Statistics> <Job>{$job}</Job> <Count>{count($employees )}</Count> <Average>{round(avg($employees /Sal))}</Average> <Min>{min($employees /Sal)}</Min> <Max>{max($employees /Sal)}</Max>

XQuery from SQL </Statistics> XML [117] Table [118]

607

Date Handling
Selecting by Date
Task: list all employees hired in the current millenium SQL: SELECT * from job where hiredate >= '2000-01-01' MySQL [119] XQuery: //Emp[HireDate >= '2000-01-01'] Actually this comparison is a string comparison because of the lack of a schema to define HireDate as an xs:date. XML [120] Table [121]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=1 [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=1 [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=2 [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=2 [5] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=2 [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=3 [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=3 [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=3 [9] http:/ / 2sun. org/ scott-tiger-port-mysql [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=4 [11] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=4 [12] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=5 [13] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=5 [14] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=7 [15] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=7 [16] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=7 [17] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=8 [18] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=8 [19] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=8 [20] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=9 [21] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=9 [22] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=10 [23] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=10 [24] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=10 [25] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=10a [26] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=42 [27] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=42 [28] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=42 [29] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=11 [30] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=11 [31] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=12 [32] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=12a [33] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=13 [34] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=13 [35] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=14 [36] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=14 [37] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=14a [38] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=14a [39] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=16 [40] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=16

XQuery from SQL


[41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=16 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=40 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=40 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=40 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=43 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=44 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=44 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=43 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=43 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=41 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=41 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=45 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=45 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=45 http:/ / www. xqueryfunctions. com/ xq/ fn_starts-with. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=46 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=46 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=46 http:/ / www. xqueryfunctions. com/ xq/ fn_contains. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=47 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=47 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=47 http:/ / www. xqueryfunctions. com/ xq/ fn_upper-case. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=48 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=48 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=48 http:/ / www. xqueryfunctions. com/ xq/ fn_matches. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=49 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=49 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=49 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=56 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=56 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=59 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=57 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=57 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=60 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=58 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=18 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=18 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=18 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=19 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=19 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=20 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=20 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=20 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=52 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=52 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=52 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=21 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=21 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=21 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=30 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=30 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=30 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runQuery. xq?id=31 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runQueryTable. xq?id=31

608

[97] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=32 [98] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runQuery. xq?id=33 [99] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runQuery. xq?id=50

XQuery from SQL


[100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runQuery. xq?id=51 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=153 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=154 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=118 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=118 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=152 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=152 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=22 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runQuery. xq?id=21a http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runQuery. xq?id=22 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runQuery. xq?id=23 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=24 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=24 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=24 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=54 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=55 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=25 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=25 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=25 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=26 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=26 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runQueryTable. xq?id=26

609

XQuery IDE
In progress The eXist Database stores binary files as well as XML files. Binary files include the XQuery scripts themselves. This allows XQuery scripts to manipulate the XQuery scripts themselves - viewing, searching, analyzing, modifying and creating scripts, all the operations required of a development environment.

Viewing the XQuery script


The eXist util module includes functions to read binary documents and convert to text. Here they are used to view a script. In this public interface, a configuration file is used to control access to those scripts. <config> <base>/db/Wiki/IDE</base> <publicScripts> <name>listQueryText.xq</name> </publicScripts> </config> The script checks that a requested script name is present in the list of public scripts before retrieving the script. View the script reader itself [1]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ IDE/ listXQueryText. xq?name=listXQueryText. xq

XSL-FO Images

610

XSL-FO Images
Motivation
You want to enrich your documents with print quality images and charts etc.

Method
We will use the fo:external-graphic primitive. For example, to add an external image add a block to the XSL-FO:
<fo:block> <fo:external-graphic src="http://www.uwe.ac.uk/includes/branding/better/engine/images/logo.gif"/> </fo:block>

execute [1]

Vector Images
SVG is a standard way to describe graphical artwork as vectors. Recent eXist installations (>1.4) with the Apache FOP processor enabled can embed SVG data in the resulting PDF as vector art: just reference them via http redirection as they are not in the file system. See ../Generating PDF from XSL-FO files/ on how to activate the XSLFO feature. <fo:block> <fo:external-graphic src="http://localhost:8080/exist/rest/db/logo.svg"/> </fo:block>

PDF image extension


For the daring: there is an extension to apache fop, that provides for pdf-images: a method of placing pages of PDF files in the FOP output. Its the work of Jeremias Mrki and available at his webspace.
$ wget http:/ / www. jeremias-maerki. ch/ download/ fop/ pdf-images/ fop-pdf-images-2. 0. 0. SNAPSHOT-bin. tar. gz $ tar xfz fop-pdf-images-2.0.0.SNAPSHOT-bin.tar.gz $ cp fop-pdf-images-2.0.0.SNAPSHOT/*jar EXIST_HOME/lib/user

I had to restart exist to activate pdf-images support in fop. The fo syntax is the same as with SVG, a page-number can be specified after a hash sign in the URL.

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xsl-fo/ helloworld-image. xq

XSL-FO SVG

611

XSL-FO SVG
Motivation
You want to convert XML into PDF and embed SVG graphics in the output.

Method
eXist of current trunk supports placing svg in fop rendered pdf out of the box, the only step being to activate the XSLFO module before building the source: echo "include.module.xslfo = true" > $EXIST_HOME/extensions/local.build.properties Beware: On headless *nix systems, make sure that there is no DISPLAY environment variable set, when eXist-db is started, otherwise the apache batik svg renderer may throw an exception.

Sample XQuery
Highlights of the xquery: The FO is created from an xsl transform: I know, this could be done with xquery, but I can reuse the xsl in other places. The document is found through a search. The xsl stylesheet resides in "views", as the pdf.xq does. I create lots of variables, but its better that way. Filename deduced from request. There may be errors, as I stripped down working code! It may be overly complicated due to my ignorance of xquery features. I think you will find your way. Application directory laid out as advised in the XRX wiki. let let let let let let let $host := "http://localhost:8080/exist/rest/" $home := "/db/apps/myApp" $match := collection(concat($home, "/data"))//produkt[@uuid=$uuid] $coll := util:collection-name($match) $file := util:document-name($match) $xsls := concat($home, "/views/foPDF.xsl") $rand := concat("?", util:random())

let $params := <parameters> <param name="svgfile" value="{$host}{$home}{$file}.svg"/> <param name="rand" value="{$rand}"/> </parameters> let $tmp := transform:transform(doc(concat($coll, "/", $file)), doc($xsls), $params) let $pdf := xslfo:render($tmp, "application/pdf", (), ()) (: substring-afer-last is in functx :) let $fname := substring-after(request:get-path-info(), "/") return response:stream-binary($pdf, "application/pdf", $fname)

XSL-FO SVG

612

Highlights of the XSL:


<!-- passed in parameter, a fully qualified URI to the SVG file, eg: "http://localhost:8080/exist/rest//db/apps/myApp/data/Some.svg" --> <xsl:param name="svgfile" select="''"/> <!-- passed in parameter, value eg: "?123", changes with every call used to force fop to reread the svg file each time starts with ? to be discarded in the end... can be left blank to have fop cache the svgfile --> <xsl:param name="rand" select="''"/>

<!-- place SVG in PDF output --> <fo:block-container> <fo:block> <fo:external-graphic src="{$svgfile}{$rand}"/> </fo:block> </fo:block-container>

XSL-FO Tables
Motivation
You want to be able to create high-quality tabular outputs suitable for book-publishing.

Method
To accomplish this we will convert our XML into XSL-FO tables. Unlike HTML, XML-FO allows you to create flows of text and you can set up rules on how objects span page boundaries.

Sample Input
Here is a sample XML file that contains a table with two columns. <table heading="Department Phone Extensions"> <Person> <Name>John Doe</Name> <Extension>1234</Extension> </Person> <Person> <Name>Sue Smith</Name> <Extension>5678</Extension> </Person> </table>

XSL-FO Tables We would like this XML file to be rendered with two columns, the first containing the person's name and the second their phone extension. It should look like the following.

613

Department Phone Extensions


Name John Doe Extension 1234

Sue Smith 5678

Example FO File
The following is the core of the XML-FO layout that you will need to create the table (without control on the column widths).
<fo:block xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:block font-size="14pt" padding="10px" font-family="Verdana">Department Phone Extensions</fo:block> <fo:block font-size="10pt"> <fo:table border="solid" border-collapse="collapse"> <fo:table-header> <fo:table-row> <fo:table-cell> <fo:block font-weight="bold">Name</fo:block> </fo:table-cell> <fo:table-cell> <fo:block font-weight="bold">Extension</fo:block> </fo:table-cell> </fo:table-row> </fo:table-header> <fo:table-body> <fo:table-row> <fo:table-cell> <fo:block>John Doe</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>1234</fo:block> </fo:table-cell> </fo:table-row> <fo:table-row> <fo:table-cell> <fo:block>Sue Smith</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>5678</fo:block> </fo:table-cell> </fo:table-row> </fo:table-body>

</fo:table>

XSL-FO Tables
</fo:block> </fo:block>

614

Transform with XSLT


NOTE: This example should be moved to a book on XSLT. XQuery typeswitch transforms should be used to do this. We can transform the XML structure to XSL-FO using an XSLT script. This generic script only requires that the root of the XML table is called table with a heading attribute. <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fo="http://www.w3.org/1999/XSL/Format" exclude-result-prefixes="xs" version="2.0"> <xsl:template match="/table"> <fo:block> <fo:block font-size="14pt" padding="10px" font-family="Verdana"> <xsl:value-of select="@heading"/> </fo:block> <fo:block font-size="10pt"> <fo:table border="solid" border-collapse="collapse" > <fo:table-header> <fo:table-row> <xsl:for-each select="*[1]/*"> <fo:table-cell> <fo:block font-weight="bold"> <xsl:value-of select="name(.)"/> </fo:block> </fo:table-cell> </xsl:for-each> </fo:table-row> </fo:table-header> <fo:table-body> <xsl:apply-templates select="*"/> </fo:table-body> </fo:table> </fo:block> </fo:block> </xsl:template> <xsl:template match="*"> <fo:table-row> <xsl:for-each select="*"> <fo:table-cell> <fo:block> <xsl:value-of select="."/> </fo:block> </fo:table-cell> </xsl:for-each> </fo:table-row>

XSL-FO Tables </xsl:template> </xsl:stylesheet>

615

XQuery integration
Finally we can generate the full XSL-FO document and render as PDF with an XQuery script. We use the XSLT to transform the table, and then embed that XSL-FO fragment in the XSL-FO master before rendering as PDF and streaming the binary document. There are of course other ways to assemble the full XSLT-FO document.
xquery version "1.0"; import module namespace xslfo="http://exist-db.org/xquery/xslfo"; import module namespace transform="http://exist-db.org/xquery/transform"; declare namespace fo="http://www.w3.org/1999/XSL/Format"; let $table := <table heading="Department Phone Extensions"> <Person> <Name>John Doe</Name> <Extension>1234</Extension> </Person> <Person> <Name>Sue Smith</Name> <Extension>5678</Extension> </Person> </table> let $table-fo := transform:transform($table,doc("/db/Wiki/eXist/xsl-fo/table2fo.xsl"),()) let $fo := <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="my-page"> <fo:region-body margin="1in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="my-page"> <fo:flow flow-name="xsl-region-body"> <fo:block> {$table-fo} </fo:block> </fo:flow> </fo:page-sequence> </fo:root> let $pdf := xslfo:render($fo, "application/pdf", ()) return response:stream-binary($pdf, "application/pdf", "output.pdf")

Execute [1]

XSL-FO Tables

616

Database data
As a further example, the following XQuery selects all employees and renders them in a PDF table:
xquery version "1.0"; import module namespace xslfo="http://exist-db.org/xquery/xslfo"; import module namespace transform="http://exist-db.org/xquery/transform"; declare namespace fo="http://www.w3.org/1999/XSL/Format"; let $table := <table heading="Employees"> {//Emp} </table> let $table-fo := transform:transform($table,doc("/db/Wiki/eXist/xsl-fo/table2fo.xsl"),()) let $fo := <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="my-page"> <fo:region-body margin="1in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="my-page"> <fo:flow flow-name="xsl-region-body"> <fo:block> {$table-fo} </fo:block> </fo:flow> </fo:page-sequence> </fo:root> let $pdf := xslfo:render($fo, "application/pdf", ()) return response:stream-binary($pdf, "application/pdf", "output.pdf")

Execute [2]

References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xsl-fo/ table2fo. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xsl-fo/ emp2fo. xq

Article Sources and Contributors

617

Article Sources and Contributors


XQuery Source: http://en.wikibooks.org/w/index.php?oldid=2244340 Contributors: Adrignola, ChrisWallace, Dallas1278, Dmccreary, Dominique Rabeuf, Elharo, Fraserhore, Gregburd, Jens stergaard Petersen, Joewiz, Jomegat, Kurt.cagle, Lavinio, Masini, Matt-turner, Mike.lifeguard, Stuartyeates, Tom Morris, Vtoman, Zazpot, 52 anonymous edits Advanced Search Source: http://en.wikibooks.org/w/index.php?oldid=2208641 Contributors: Adrignola, Avicennasis, Dmccreary All Leaf Paths Source: http://en.wikibooks.org/w/index.php?oldid=1942930 Contributors: Adrignola, Dmccreary, 2 anonymous edits All Paths Source: http://en.wikibooks.org/w/index.php?oldid=2175050 Contributors: Adrignola, Dmccreary Alphabet Poster Source: http://en.wikibooks.org/w/index.php?oldid=1582905 Contributors: Adrignola, ChrisWallace Auto-generation of Index Config Files Source: http://en.wikibooks.org/w/index.php?oldid=2091796 Contributors: Dmccreary, J36miles Background Source: http://en.wikibooks.org/w/index.php?oldid=2157431 Contributors: Adrignola, Dmccreary, Jens stergaard Petersen, Qwertyus, Zig-e, 10 anonymous edits Basic Authentication Source: http://en.wikibooks.org/w/index.php?oldid=2005496 Contributors: Adrignola, ChrisWallace, Dmccreary Basic Feedback Form Source: http://en.wikibooks.org/w/index.php?oldid=1755651 Contributors: ChrisWallace, Thenub314 Basic Search Source: http://en.wikibooks.org/w/index.php?oldid=1985424 Contributors: Adrignola, Dmccreary, Fl.schmitt, Jens stergaard Petersen, 2 anonymous edits Basic Session Management Source: http://en.wikibooks.org/w/index.php?oldid=1582906 Contributors: Adrignola, Dmccreary BBC Weather Forecast Source: http://en.wikibooks.org/w/index.php?oldid=1694962 Contributors: Adrignola, ChrisWallace Benefits Source: http://en.wikibooks.org/w/index.php?oldid=2067294 Contributors: Adrignola, Citizencontact, Dmccreary, Jens stergaard Petersen, Joewiz, Nillem, Suchenwi, 3 anonymous edits Caching and indexes Source: http://en.wikibooks.org/w/index.php?oldid=1582909 Contributors: Adrignola, ChrisWallace, Dmccreary Chaining Web Forms Source: http://en.wikibooks.org/w/index.php?oldid=2006595 Contributors: Adrignola, Dmccreary Changing Permissions on Collections and Resources Source: http://en.wikibooks.org/w/index.php?oldid=2268776 Contributors: Dmccreary, J36miles Compare two XML files Source: http://en.wikibooks.org/w/index.php?oldid=2173814 Contributors: Adrignola, Dmccreary, J36miles, 1 anonymous edits Compare with XQuery Source: http://en.wikibooks.org/w/index.php?oldid=1582911 Contributors: Adrignola, ChrisWallace, Dmccreary Creating a Timeline Source: http://en.wikibooks.org/w/index.php?oldid=1942289 Contributors: Adrignola, Dmccreary Creating XQuery Functions Source: http://en.wikibooks.org/w/index.php?oldid=2176667 Contributors: Adamretter, Adrignola, ChrisWallace, Dmccreary, Jens stergaard Petersen, Joewiz, 5 anonymous edits Dates and Time Source: http://en.wikibooks.org/w/index.php?oldid=1868013 Contributors: Dmccreary, J36miles DBpedia with SPARQL - Stadium locations Source: http://en.wikibooks.org/w/index.php?oldid=1582915 Contributors: Adrignola, ChrisWallace Delivery Status Report Source: http://en.wikibooks.org/w/index.php?oldid=2169274 Contributors: ChrisWallace, QuiteUnusual Digest Authentication Source: http://en.wikibooks.org/w/index.php?oldid=1795008 Contributors: Adrignola, ChrisWallace, Dmccreary, 5 anonymous edits Digital Signatures Source: http://en.wikibooks.org/w/index.php?oldid=2017706 Contributors: Dmccreary, J36miles, 5 anonymous edits DocBook to HTML Source: http://en.wikibooks.org/w/index.php?oldid=2202863 Contributors: Adrignola, Dmccreary DOJO data Source: http://en.wikibooks.org/w/index.php?oldid=2238171 Contributors: Adrignola, Dmccreary, 4 anonymous edits Dynamic Module Loading Source: http://en.wikibooks.org/w/index.php?oldid=1582916 Contributors: Adrignola, ChrisWallace, Dmccreary Examples Wanted Source: http://en.wikibooks.org/w/index.php?oldid=1981547 Contributors: Adrignola, ChrisWallace, 1 anonymous edits eXist demo server Source: http://en.wikibooks.org/w/index.php?oldid=1810303 Contributors: Adrignola, ChrisWallace Extracting data from XHTML files Source: http://en.wikibooks.org/w/index.php?oldid=1582917 Contributors: Adrignola, Dmccreary Filling Portlets Source: http://en.wikibooks.org/w/index.php?oldid=1582920 Contributors: Adrignola, Dmccreary Flickr GoogleEarth Source: http://en.wikibooks.org/w/index.php?oldid=1596604 Contributors: Adrignola, ChrisWallace FLWOR Expression Source: http://en.wikibooks.org/w/index.php?oldid=2072182 Contributors: Adrignola, Dmccreary, 1 anonymous edits Formatting Numbers Source: http://en.wikibooks.org/w/index.php?oldid=1977763 Contributors: Adrignola, ChrisWallace, Dmccreary, Elharo, Joewiz Generating PDF from XSL-FO files Source: http://en.wikibooks.org/w/index.php?oldid=2249489 Contributors: Adrignola, ChrisWallace, Dmccreary, Joewiz, 9 anonymous edits Generating Skeleton Typeswitch Transformation Modules Source: http://en.wikibooks.org/w/index.php?oldid=2006070 Contributors: Adrignola, ChrisWallace, Joewiz Generating xqDoc-based XQuery Documentation Source: http://en.wikibooks.org/w/index.php?oldid=1746164 Contributors: Adrignola, ChrisWallace, Dmccreary Get zipped XML file Source: http://en.wikibooks.org/w/index.php?oldid=2007085 Contributors: Adrignola, ChrisWallace, Dmccreary, 1 anonymous edits Google Chart Bullet Bar Source: http://en.wikibooks.org/w/index.php?oldid=1993857 Contributors: Adrignola, Dmccreary, Joewiz Google Chart Sparkline Source: http://en.wikibooks.org/w/index.php?oldid=1995286 Contributors: ChrisWallace, Dmccreary, QuiteUnusual Google Charts Source: http://en.wikibooks.org/w/index.php?oldid=1996353 Contributors: Adrignola, Dmccreary, Fraserhore, Joewiz Graphing Triples Source: http://en.wikibooks.org/w/index.php?oldid=1582925 Contributors: Adrignola, ChrisWallace Grouping Items Source: http://en.wikibooks.org/w/index.php?oldid=1683494 Contributors: Adrignola, Dmccreary Guest Registry Source: http://en.wikibooks.org/w/index.php?oldid=2238159 Contributors: Adrignola, Cutlass2009, Dmccreary, Dominique Rabeuf, Esbanarango, Joewiz, 2 anonymous edits Higher Order Functions Source: http://en.wikibooks.org/w/index.php?oldid=1582928 Contributors: Adrignola, ChrisWallace, Dmccreary, 3 anonymous edits Histogram of File Sizes Source: http://en.wikibooks.org/w/index.php?oldid=1582929 Contributors: Adrignola, Dmccreary, Joewiz

Article Sources and Contributors


Image Library Source: http://en.wikibooks.org/w/index.php?oldid=1582930 Contributors: Adrignola, ChrisWallace, Dmccreary, 1 anonymous edits Incremental Searching Source: http://en.wikibooks.org/w/index.php?oldid=1910067 Contributors: Adrignola, ChrisWallace, Dmccreary, 2 anonymous edits Index of Application Areas Source: http://en.wikibooks.org/w/index.php?oldid=1582933 Contributors: Adrignola, ChrisWallace Index of eXist modules and features Source: http://en.wikibooks.org/w/index.php?oldid=1582936 Contributors: Adrignola, ChrisWallace Index of XQuery features Source: http://en.wikibooks.org/w/index.php?oldid=1582934 Contributors: Adrignola, ChrisWallace Inserting and Updating Attributes Source: http://en.wikibooks.org/w/index.php?oldid=1941883 Contributors: Adrignola, Dmccreary, Joewiz, 1 anonymous edits Installing and Testing Source: http://en.wikibooks.org/w/index.php?oldid=1582937 Contributors: Adrignola, Dmccreary, SunKing2 Installing the XSL-FO module Source: http://en.wikibooks.org/w/index.php?oldid=2249609 Contributors: ChrisWallace, Dmccreary, Jens stergaard Petersen, Thenub314, 9 anonymous edits Introduction to XML Search Source: http://en.wikibooks.org/w/index.php?oldid=2076756 Contributors: Adrignola, Dmccreary, Jens stergaard Petersen Keyword Search Source: http://en.wikibooks.org/w/index.php?oldid=1901571 Contributors: Adrignola, Dmccreary, Jens stergaard Petersen, Joewiz, 3 anonymous edits Latent Semantic Indexing Source: http://en.wikibooks.org/w/index.php?oldid=1582940 Contributors: Adrignola, Dmccreary, 1 anonymous edits Limiting Child Trees Source: http://en.wikibooks.org/w/index.php?oldid=1993396 Contributors: Adrignola, Dmccreary Link gathering Source: http://en.wikibooks.org/w/index.php?oldid=1582942 Contributors: Adrignola, ChrisWallace, Dmccreary List OWL Classes Source: http://en.wikibooks.org/w/index.php?oldid=1898692 Contributors: Adrignola, Dmccreary, 1 anonymous edits Login and Logout Source: http://en.wikibooks.org/w/index.php?oldid=2023529 Contributors: Dmccreary, J36miles Lorum Ipsum text Source: http://en.wikibooks.org/w/index.php?oldid=1667375 Contributors: Adrignola, ChrisWallace Lucene Search Source: http://en.wikibooks.org/w/index.php?oldid=2238480 Contributors: Adrignola, Dmccreary, Jens stergaard Petersen, Joewiz, Ljo, QuiteUnusual, Ron.vandenbranden, 26 anonymous edits Multiple page scraping and Voting behaviour Source: http://en.wikibooks.org/w/index.php?oldid=1582946 Contributors: Adrignola, ChrisWallace, 2 anonymous edits MusicXML to Arduino Source: http://en.wikibooks.org/w/index.php?oldid=1986517 Contributors: Adrignola, ChrisWallace Naming Conventions Source: http://en.wikibooks.org/w/index.php?oldid=2238174 Contributors: Adrignola, ChrisWallace, Dmccreary, 1 anonymous edits Navigating Collections Source: http://en.wikibooks.org/w/index.php?oldid=1694963 Contributors: Adrignola, ChrisWallace, Dmccreary OAuth Source: http://en.wikibooks.org/w/index.php?oldid=1940113 Contributors: Adrignola, Dmccreary Open Search Source: http://en.wikibooks.org/w/index.php?oldid=1991230 Contributors: Adrignola, Dmccreary Overview of eXist search functions and operators Source: http://en.wikibooks.org/w/index.php?oldid=1965963 Contributors: Adrignola, Ron.vandenbranden, 2 anonymous edits Overview of Page Scraping Techniques Source: http://en.wikibooks.org/w/index.php?oldid=1667176 Contributors: Adrignola, Dmccreary, Joewiz Pachube feed Source: http://en.wikibooks.org/w/index.php?oldid=1696871 Contributors: Adrignola, ChrisWallace, Dmccreary Publishing Overview Source: http://en.wikibooks.org/w/index.php?oldid=1722604 Contributors: Adrignola, Dmccreary Publishing to Subversion Source: http://en.wikibooks.org/w/index.php?oldid=1795020 Contributors: Adrignola, Dmccreary Quantified Expressions Source: http://en.wikibooks.org/w/index.php?oldid=2202178 Contributors: Adrignola, Dmccreary, Jens stergaard Petersen, 2 anonymous edits Registered Functions Source: http://en.wikibooks.org/w/index.php?oldid=1737526 Contributors: Adrignola, ChrisWallace, Dmccreary Registered Modules Source: http://en.wikibooks.org/w/index.php?oldid=1737524 Contributors: Adrignola, ChrisWallace, Dmccreary Regular Expressions Source: http://en.wikibooks.org/w/index.php?oldid=1665868 Contributors: Adrignola, ChrisWallace, Dmccreary, Jens stergaard Petersen, 3 anonymous edits REST interface definition Source: http://en.wikibooks.org/w/index.php?oldid=1582950 Contributors: Adrignola, ChrisWallace, 1 anonymous edits Returning the Longest String Source: http://en.wikibooks.org/w/index.php?oldid=1694964 Contributors: Adrignola, ChrisWallace, Dmccreary, Jens stergaard Petersen, 3 anonymous edits Saving and Updating Data Source: http://en.wikibooks.org/w/index.php?oldid=1942001 Contributors: ChrisWallace, Dmccreary, Jens stergaard Petersen, Mike.lifeguard, QuiteUnusual, 4 anonymous edits Searching multiple collections Source: http://en.wikibooks.org/w/index.php?oldid=1651866 Contributors: Adrignola, Dmccreary, Jens stergaard Petersen Sending E-mail Source: http://en.wikibooks.org/w/index.php?oldid=1744479 Contributors: Adrignola, Dmccreary, Joewiz, 1 anonymous edits Sequences Source: http://en.wikibooks.org/w/index.php?oldid=2258015 Contributors: Adrignola, Billymac00, ChrisWallace, Dmccreary, Jens stergaard Petersen, Krbeesley, Nsincaglia, Rumplestiltzkin, Schmorgluck, 15 anonymous edits Sequences Module Source: http://en.wikibooks.org/w/index.php?oldid=2082520 Contributors: Dmccreary, Thenub314 Setting HTTP Headers Source: http://en.wikibooks.org/w/index.php?oldid=2095486 Contributors: Adrignola, 1 anonymous edits Simile Exhibit Source: http://en.wikibooks.org/w/index.php?oldid=1582953 Contributors: Adrignola, ChrisWallace, Dmccreary Sitemap for Content Management System Source: http://en.wikibooks.org/w/index.php?oldid=2022030 Contributors: Adrignola, Avicennasis, Dmccreary, Elharo, Hullo Slideshow Source: http://en.wikibooks.org/w/index.php?oldid=1582956 Contributors: Adrignola, ChrisWallace, 1 anonymous edits SMS tracker Source: http://en.wikibooks.org/w/index.php?oldid=1227485 Contributors: ChrisWallace, Ramac Southampton Pubs Source: http://en.wikibooks.org/w/index.php?oldid=2174873 Contributors: Adrignola, ChrisWallace SPARQLing Country Calling Codes Source: http://en.wikibooks.org/w/index.php?oldid=1695745 Contributors: Adrignola, ChrisWallace Special Characters Source: http://en.wikibooks.org/w/index.php?oldid=2012166 Contributors: Adrignola, Dmccreary Splitting Files Source: http://en.wikibooks.org/w/index.php?oldid=2130333 Contributors: Adrignola, Billymac00, Dmccreary, Tomato86, 12 anonymous edits Subversion Source: http://en.wikibooks.org/w/index.php?oldid=2264267 Contributors: Dmccreary, Joewiz, QuiteUnusual, 10 anonymous edits

618

Article Sources and Contributors


Sudoku Source: http://en.wikibooks.org/w/index.php?oldid=2173562 Contributors: ChrisWallace, David1981, Frigotoni, Frozen Wind, PotHead, 4 anonymous edits Synchronizing Remote Collections Source: http://en.wikibooks.org/w/index.php?oldid=2011771 Contributors: Adrignola, Dmccreary, Westbaystars TEI Concordance Source: http://en.wikibooks.org/w/index.php?oldid=1983972 Contributors: Adrignola, Stuartyeates TEI Document Timeline Source: http://en.wikibooks.org/w/index.php?oldid=2088507 Contributors: Adrignola, ChrisWallace, Dmccreary, Stuartyeates, 2 anonymous edits The Emp-Dept case study Source: http://en.wikibooks.org/w/index.php?oldid=1650020 Contributors: Adrignola, ChrisWallace Time Based Queries Source: http://en.wikibooks.org/w/index.php?oldid=2174872 Contributors: Adrignola, Dmccreary, 2 anonymous edits Time Comparison with XQuery Source: http://en.wikibooks.org/w/index.php?oldid=1582961 Contributors: Adrignola, ChrisWallace, Dmccreary Timelines of Resource Source: http://en.wikibooks.org/w/index.php?oldid=1582962 Contributors: Adrignola, Dmccreary Timing Fibonacci algorithms Source: http://en.wikibooks.org/w/index.php?oldid=2234426 Contributors: Adrignola, ChrisWallace, Dmccreary, Joewiz Transformation idioms Source: http://en.wikibooks.org/w/index.php?oldid=2006169 Contributors: Adrignola, ChrisWallace Typeswitch Transformations Source: http://en.wikibooks.org/w/index.php?oldid=2006340 Contributors: Adrignola, ChrisWallace, Dmccreary, Joewiz, 6 anonymous edits UK shipping forecast Source: http://en.wikibooks.org/w/index.php?oldid=1694203 Contributors: Adrignola, ChrisWallace, QuiteUnusual Unzipping an Office Open XML docx file Source: http://en.wikibooks.org/w/index.php?oldid=2006822 Contributors: Adrignola, Dmccreary Updates and Namespaces Source: http://en.wikibooks.org/w/index.php?oldid=2095651 Contributors: Adrignola, Dmccreary, Jens stergaard Petersen, 1 anonymous edits Uploading Files Source: http://en.wikibooks.org/w/index.php?oldid=1966608 Contributors: Adrignola, Dmccreary Uptime monitor Source: http://en.wikibooks.org/w/index.php?oldid=1716961 Contributors: Adrignola, ChrisWallace, Dmccreary, Jomegat, Mazonakis, Mike.lifeguard, Str4nd, 5 anonymous edits URL Driven Authorization Source: http://en.wikibooks.org/w/index.php?oldid=2022335 Contributors: Adrignola, Dmccreary URL Rewriting Basics Source: http://en.wikibooks.org/w/index.php?oldid=2203578 Contributors: Adrignola, Dmccreary, Elharo, Jens stergaard Petersen, Joewiz, 11 anonymous edits Using Intermediate Documents Source: http://en.wikibooks.org/w/index.php?oldid=1582969 Contributors: Adrignola, ChrisWallace, Dmccreary, 1 anonymous edits Using Triggers to assign identifiers Source: http://en.wikibooks.org/w/index.php?oldid=1938047 Contributors: Dmccreary, 17 anonymous edits Using Triggers to Log Events Source: http://en.wikibooks.org/w/index.php?oldid=1933649 Contributors: Adrignola, Dmccreary, 2 anonymous edits Using XQuery Functions Source: http://en.wikibooks.org/w/index.php?oldid=1665386 Contributors: Adrignola, Dmccreary, Jens stergaard Petersen UWE StudentsOnline Source: http://en.wikibooks.org/w/index.php?oldid=1582966 Contributors: Adrignola, ChrisWallace Validating a document Source: http://en.wikibooks.org/w/index.php?oldid=1718611 Contributors: Adrignola, Dmccreary, Elharo, 1 anonymous edits Validation using a Catalog Source: http://en.wikibooks.org/w/index.php?oldid=1582663 Contributors: Adrignola, Dmccreary Web XML Viewer Source: http://en.wikibooks.org/w/index.php?oldid=2174378 Contributors: Dmccreary, J36miles Wikibook list of code links Source: http://en.wikibooks.org/w/index.php?oldid=1947388 Contributors: Adrignola, ChrisWallace Wikipedia Events RSS Source: http://en.wikibooks.org/w/index.php?oldid=1582973 Contributors: Adrignola, ChrisWallace Wikipedia Page scraping Source: http://en.wikibooks.org/w/index.php?oldid=1582974 Contributors: Adrignola, ChrisWallace, Dmccreary World Temperature records Source: http://en.wikibooks.org/w/index.php?oldid=1694965 Contributors: Adrignola, ChrisWallace XHTML + Voice Source: http://en.wikibooks.org/w/index.php?oldid=1582975 Contributors: Adrignola, ChrisWallace, Dmccreary XML Differences Source: http://en.wikibooks.org/w/index.php?oldid=2238148 Contributors: Adrignola, Avicennasis, Dmccreary, 1 anonymous edits XML Schema to Instance Source: http://en.wikibooks.org/w/index.php?oldid=2000921 Contributors: Adrignola, ChrisWallace, Dmccreary, Fraserhore XML Schema to SVG Source: http://en.wikibooks.org/w/index.php?oldid=2158419 Contributors: Adrignola, Dmccreary, Jens stergaard Petersen XML Schema to XForms Source: http://en.wikibooks.org/w/index.php?oldid=2004984 Contributors: Dmccreary, J36miles XMP data Source: http://en.wikibooks.org/w/index.php?oldid=1755649 Contributors: ChrisWallace, Thenub314 XQuery SQL Module Source: http://en.wikibooks.org/w/index.php?oldid=2082521 Contributors: Dmccreary, Thenub314 Adder Source: http://en.wikibooks.org/w/index.php?oldid=2007036 Contributors: ChrisWallace, Dmccreary, Herbythyme, Jomegat, Jorunn, QuiteUnusual, Ramac, Wutsje, 16 anonymous edits Ah-has Source: http://en.wikibooks.org/w/index.php?oldid=2160954 Contributors: ChrisWallace, Tom Morris, 4 anonymous edits Checking for Required Parameters Source: http://en.wikibooks.org/w/index.php?oldid=2248444 Contributors: ChrisWallace, Dmccreary, Jens stergaard Petersen, 3 anonymous edits Dataflow diagrams Source: http://en.wikibooks.org/w/index.php?oldid=1701967 Contributors: ChrisWallace DBpedia with SPARQL - Football teams Source: http://en.wikibooks.org/w/index.php?oldid=2020662 Contributors: ChrisWallace, 1 anonymous edits DBpedia with SPARQL and Simile Timeline - Album Chronology Source: http://en.wikibooks.org/w/index.php?oldid=1280416 Contributors: ChrisWallace Displaying data in HTML Tables Source: http://en.wikibooks.org/w/index.php?oldid=2225328 Contributors: ChrisWallace, Dmccreary, Jens stergaard Petersen, 7 anonymous edits Displaying Lists Source: http://en.wikibooks.org/w/index.php?oldid=1979890 Contributors: ChrisWallace, Dmccreary, Mike.lifeguard, Trustinme2, 4 anonymous edits Employee Search Source: http://en.wikibooks.org/w/index.php?oldid=1746984 Contributors: ChrisWallace, Jens stergaard Petersen, 3 anonymous edits Example Sequencer Source: http://en.wikibooks.org/w/index.php?oldid=1070555 Contributors: ChrisWallace Excel and XML Source: http://en.wikibooks.org/w/index.php?oldid=1213104 Contributors: ChrisWallace, Kunamvenu eXist Crib sheet Source: http://en.wikibooks.org/w/index.php?oldid=1808485 Contributors: ChrisWallace, Dmccreary, Struts, 4 anonymous edits Filtering Nodes Source: http://en.wikibooks.org/w/index.php?oldid=2259813 Contributors: ChrisWallace, Dmccreary, Jens stergaard Petersen, Joewiz, 8 anonymous edits Filtering Words Source: http://en.wikibooks.org/w/index.php?oldid=1661952 Contributors: ChrisWallace, Dmccreary, Jens stergaard Petersen, Webwurst, 1 anonymous edits

619

Article Sources and Contributors


Fizzbuzz Source: http://en.wikibooks.org/w/index.php?oldid=1070556 Contributors: ChrisWallace Getting POST Data Source: http://en.wikibooks.org/w/index.php?oldid=2208350 Contributors: ChrisWallace, Dmccreary, 3 anonymous edits Getting URL Parameters Source: http://en.wikibooks.org/w/index.php?oldid=1661886 Contributors: ChrisWallace, Dmccreary, Jens stergaard Petersen Google Geocoding Source: http://en.wikibooks.org/w/index.php?oldid=1894091 Contributors: ChrisWallace, Dmccreary, J.delanoy, Jomegat, Nikai, Wutsje, 8 anonymous edits Gotchas Source: http://en.wikibooks.org/w/index.php?oldid=1635400 Contributors: ChrisWallace, Dmccreary, 4 anonymous edits Graph Visualization Source: http://en.wikibooks.org/w/index.php?oldid=2153271 Contributors: ChrisWallace, Dmccreary, Olivier speciel HelloWorld Source: http://en.wikibooks.org/w/index.php?oldid=2238156 Contributors: ChrisWallace, Dmccreary, Elharo, Xania, 15 anonymous edits HTML Table View Source: http://en.wikibooks.org/w/index.php?oldid=1635283 Contributors: ChrisWallace, Dmccreary Incremental Search of the Chemical Elements Source: http://en.wikibooks.org/w/index.php?oldid=1665886 Contributors: ChrisWallace, Dmccreary Limiting Result Sets Source: http://en.wikibooks.org/w/index.php?oldid=2213440 Contributors: ChrisWallace, Dmccreary, Fluff, Jens stergaard Petersen, Jomegat, Mike.lifeguard, Theop, YMS, 6 anonymous edits Manipulating URIs Source: http://en.wikibooks.org/w/index.php?oldid=1288320 Contributors: ChrisWallace, Dmccreary Nationalgrid and Google Maps Source: http://en.wikibooks.org/w/index.php?oldid=1471825 Contributors: ChrisWallace Net Working Days Source: http://en.wikibooks.org/w/index.php?oldid=1428654 Contributors: ChrisWallace, Dmccreary Page scraping and Yahoo Weather Source: http://en.wikibooks.org/w/index.php?oldid=1223382 Contributors: ChrisWallace, 1 anonymous edits Parsing Query Strings Source: http://en.wikibooks.org/w/index.php?oldid=1070527 Contributors: ChrisWallace, Kurt.cagle Project Euler Source: http://en.wikibooks.org/w/index.php?oldid=1222274 Contributors: ChrisWallace, Kunamvenu Searching,Paging and Sorting Source: http://en.wikibooks.org/w/index.php?oldid=2174993 Contributors: ChrisWallace, Jens stergaard Petersen, 4 anonymous edits Sequence Diagrams Source: http://en.wikibooks.org/w/index.php?oldid=1747002 Contributors: ChrisWallace Simple RSS reader Source: http://en.wikibooks.org/w/index.php?oldid=1329061 Contributors: ChrisWallace, 1 anonymous edits Simple XForms Examples Source: http://en.wikibooks.org/w/index.php?oldid=2221184 Contributors: ChrisWallace, Dmccreary, QuiteUnusual, 5 anonymous edits SPARQL interface Source: http://en.wikibooks.org/w/index.php?oldid=1130706 Contributors: ChrisWallace SPARQL Tutorial Source: http://en.wikibooks.org/w/index.php?oldid=1790381 Contributors: ChrisWallace, Jens stergaard Petersen, 7 anonymous edits String Analysis Source: http://en.wikibooks.org/w/index.php?oldid=1125880 Contributors: ChrisWallace Tag Cloud Source: http://en.wikibooks.org/w/index.php?oldid=2052467 Contributors: ChrisWallace, Dmccreary, Joewiz Topological Sort Source: http://en.wikibooks.org/w/index.php?oldid=1379026 Contributors: ChrisWallace, Dmccreary, SQL Tree View Source: http://en.wikibooks.org/w/index.php?oldid=1597161 Contributors: ChrisWallace, Dmccreary Validating a hierarchy Source: http://en.wikibooks.org/w/index.php?oldid=1070551 Contributors: ChrisWallace Wiki weapons page Source: http://en.wikibooks.org/w/index.php?oldid=1070539 Contributors: ChrisWallace Wikibook index page Source: http://en.wikibooks.org/w/index.php?oldid=1326029 Contributors: ChrisWallace, Dmccreary Wikipedia Lookup Source: http://en.wikibooks.org/w/index.php?oldid=1070537 Contributors: ChrisWallace XML to RDF Source: http://en.wikibooks.org/w/index.php?oldid=1747924 Contributors: ChrisWallace, Jens stergaard Petersen XML to SQL Source: http://en.wikibooks.org/w/index.php?oldid=1070533 Contributors: ChrisWallace XPath examples Source: http://en.wikibooks.org/w/index.php?oldid=2249339 Contributors: ChrisWallace, Dmccreary, 1 anonymous edits XQuery and Python Source: http://en.wikibooks.org/w/index.php?oldid=1705922 Contributors: ChrisWallace, 3 anonymous edits XQuery and XML Schema Source: http://en.wikibooks.org/w/index.php?oldid=2237018 Contributors: Chgans, ChrisWallace, Dmccreary, SQL, 3 anonymous edits XQuery and XSLT Source: http://en.wikibooks.org/w/index.php?oldid=2150029 Contributors: ChrisWallace, Dmccreary, Hullo, Jens stergaard Petersen, Joewiz, 6 anonymous edits XQuery from SQL Source: http://en.wikibooks.org/w/index.php?oldid=1931935 Contributors: ChrisWallace, Jens stergaard Petersen, 12 anonymous edits XQuery IDE Source: http://en.wikibooks.org/w/index.php?oldid=1070550 Contributors: ChrisWallace XSL-FO Images Source: http://en.wikibooks.org/w/index.php?oldid=2059397 Contributors: ChrisWallace, J36miles, 5 anonymous edits XSL-FO SVG Source: http://en.wikibooks.org/w/index.php?oldid=2142582 Contributors: Adrignola, Dmccreary, 7 anonymous edits XSL-FO Tables Source: http://en.wikibooks.org/w/index.php?oldid=2209582 Contributors: Adrignola, ChrisWallace, Dmccreary

620

Image Sources, Licenses and Contributors

621

Image Sources, Licenses and Contributors


Image:XForms-best-practice.jpg Source: http://en.wikibooks.org/w/index.php?title=File:XForms-best-practice.jpg License: unknown Contributors: Dmccreary image:advanced-search-form-screen-image.png Source: http://en.wikibooks.org/w/index.php?title=File:Advanced-search-form-screen-image.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: Dmccreary Image:XQuery-basic-search-result.png Source: http://en.wikibooks.org/w/index.php?title=File:XQuery-basic-search-result.png License: GNU Free Documentation License Contributors: Dmccreary Image:XQuery-basic-search-form.png Source: http://en.wikibooks.org/w/index.php?title=File:XQuery-basic-search-form.png License: GNU Free Documentation License Contributors: Dmccreary Image:html-diff-report-using-xquery.jpg Source: http://en.wikibooks.org/w/index.php?title=File:Html-diff-report-using-xquery.jpg License: Creative Commons Attribution-Sharealike 3.0 Contributors: Dmccreary Image:timeline-example.jpg Source: http://en.wikibooks.org/w/index.php?title=File:Timeline-example.jpg License: unknown Contributors: Original uploader was Dmccreary at en.wikibooks Image:Bullet-bar-screen-image.png Source: http://en.wikibooks.org/w/index.php?title=File:Bullet-bar-screen-image.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: Dmccreary Image:Bullet-bar-terms-key.png Source: http://en.wikibooks.org/w/index.php?title=File:Bullet-bar-terms-key.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: Dmccreary File:Search-results.png Source: http://en.wikibooks.org/w/index.php?title=File:Search-results.png License: Creative Commons Attribution-Sharealike 3.0,2.5,2.0,1.0 Contributors: Joewiz Image:XQuery-publishing-dmz.png Source: http://en.wikibooks.org/w/index.php?title=File:XQuery-publishing-dmz.png License: Creative Commons Attribution 3.0 Contributors: Dmccreary Image:XQuery-sitemap-screen-image.jpg Source: http://en.wikibooks.org/w/index.php?title=File:XQuery-sitemap-screen-image.jpg License: Creative Commons Attribution-Sharealike 2.0 Contributors: Dmccreary Image:XQuery-newer-items-report.jpg Source: http://en.wikibooks.org/w/index.php?title=File:XQuery-newer-items-report.jpg License: Creative Commons Attribution-Sharealike 3.0 Contributors: Dmccreary Image:Upload-image.png Source: http://en.wikibooks.org/w/index.php?title=File:Upload-image.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: Dmccreary Image:XQueryMonitor3.jpg Source: http://en.wikibooks.org/w/index.php?title=File:XQueryMonitor3.jpg License: Creative Commons Attribution-Sharealike 3.0,2.5,2.0,1.0 Contributors: ChrisWallace File:XML-to-HTML-Encoded.png Source: http://en.wikibooks.org/w/index.php?title=File:XML-to-HTML-Encoded.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: Dmccreary Image:XForms-HTML-Table-screen-image.jpg Source: http://en.wikibooks.org/w/index.php?title=File:XForms-HTML-Table-screen-image.jpg License: GNU Free Documentation License Contributors: Dmccreary Image:XQuery-test-for-string-in-list.jpg Source: http://en.wikibooks.org/w/index.php?title=File:XQuery-test-for-string-in-list.jpg License: Creative Commons Attribution-Sharealike 2.5 Contributors: Dmccreary File:Oxygen-xquery-screen-image.png Source: http://en.wikibooks.org/w/index.php?title=File:Oxygen-xquery-screen-image.png License: Creative Commons Attribution-Sharealike 3.0 Contributors: Dmccreary

License

622

License
Creative Commons Attribution-Share Alike 3.0 Unported //creativecommons.org/licenses/by-sa/3.0/

Вам также может понравиться