Академический Документы
Профессиональный Документы
Культура Документы
Articles
XQuery Advanced Search All Leaf Paths All Paths Alphabet Poster Auto-generation of Index Config Files Background Basic Authentication Basic Feedback Form Basic Search Basic Session Management BBC Weather Forecast Benefits Caching and indexes Chaining Web Forms Changing Permissions on Collections and Resources Compare two XML files Compare with XQuery Creating a Timeline Creating XQuery Functions Dates and Time DBpedia with SPARQL - Stadium locations Delivery Status Report Digest Authentication Digital Signatures DocBook to HTML DOJO data Dynamic Module Loading Examples Wanted eXist demo server Extracting data from XHTML files Filling Portlets Flickr GoogleEarth FLWOR Expression 1 14 19 21 23 27 28 30 31 33 35 37 39 41 44 48 50 51 57 60 63 64 67 69 72 74 74 76 79 79 80 81 82 83
Formatting Numbers Generating PDF from XSL-FO files Generating Skeleton Typeswitch Transformation Modules Generating xqDoc-based XQuery Documentation Get zipped XML file Google Chart Bullet Bar Google Chart Sparkline Google Charts Graphing Triples Grouping Items Guest Registry Higher Order Functions Histogram of File Sizes Image Library Incremental Searching Index of Application Areas Index of eXist modules and features Index of XQuery features Inserting and Updating Attributes Installing and Testing Installing the XSL-FO module Introduction to XML Search Keyword Search Latent Semantic Indexing Limiting Child Trees Link gathering List OWL Classes Login and Logout Lorum Ipsum text Lucene Search Multiple page scraping and Voting behaviour MusicXML to Arduino Naming Conventions Navigating Collections OAuth Open Search Overview of eXist search functions and operators Overview of Page Scraping Techniques
84 86 92 100 103 110 112 114 117 119 122 123 125 128 129 132 135 135 136 137 138 141 142 154 157 158 165 167 168 171 178 180 183 184 187 189 190 191
Pachube feed Publishing Overview Publishing to Subversion Quantified Expressions Registered Functions Registered Modules Regular Expressions REST interface definition Returning the Longest String Saving and Updating Data Searching multiple collections Sending E-mail Sequences Sequences Module Setting HTTP Headers Simile Exhibit Sitemap for Content Management System Slideshow SMS tracker Southampton Pubs SPARQLing Country Calling Codes Special Characters Splitting Files Subversion Sudoku Synchronizing Remote Collections TEI Concordance TEI Document Timeline The Emp-Dept case study Time Based Queries Time Comparison with XQuery Timelines of Resource Timing Fibonacci algorithms Transformation idioms Typeswitch Transformations UK shipping forecast Unzipping an Office Open XML docx file Updates and Namespaces
192 204 207 208 209 211 212 215 226 227 229 232 234 245 246 250 252 257 262 267 268 272 273 276 282 287 290 295 302 303 305 309 311 317 327 331 347 349
Uploading Files Uptime monitor URL Driven Authorization URL Rewriting Basics Using Intermediate Documents Using Triggers to assign identifiers Using Triggers to Log Events Using XQuery Functions UWE StudentsOnline Validating a document Validation using a Catalog Web XML Viewer Wikibook list of code links Wikipedia Events RSS Wikipedia Page scraping World Temperature records XHTML + Voice XML Differences XML Schema to Instance XML Schema to SVG XML Schema to XForms XMP data XQuery SQL Module Adder Ah-has Checking for Required Parameters Dataflow diagrams DBpedia with SPARQL - Football teams DBpedia with SPARQL and Simile Timeline - Album Chronology Displaying data in HTML Tables Displaying Lists Employee Search Example Sequencer Excel and XML eXist Crib sheet Filtering Nodes Filtering Words Fizzbuzz
351 352 365 366 370 375 377 379 381 381 383 384 388 389 389 391 406 409 411 420 421 421 423 424 428 429 432 434 443 448 451 452 454 455 457 460 467 470
Getting POST Data Getting URL Parameters Google Geocoding Gotchas Graph Visualization HelloWorld HTML Table View Incremental Search of the Chemical Elements Limiting Result Sets Manipulating URIs Nationalgrid and Google Maps Net Working Days Page scraping and Yahoo Weather Parsing Query Strings Project Euler Searching,Paging and Sorting Sequence Diagrams Simple RSS reader Simple XForms Examples SPARQL interface SPARQL Tutorial String Analysis Tag Cloud Topological Sort Tree View Validating a hierarchy Wiki weapons page Wikibook index page Wikipedia Lookup XML to RDF XML to SQL XPath examples XQuery and Python XQuery and XML Schema XQuery and XSLT XQuery from SQL XQuery IDE XSL-FO Images
471 474 476 480 483 485 487 489 493 496 498 502 505 509 512 515 520 527 529 539 542 552 558 561 563 565 566 567 568 569 575 576 580 581 589 594 609 610
611 612
References
Article Sources and Contributors Image Sources, Licenses and Contributors 617 621
Article Licenses
License 622
XQuery
XQuery
XQuery Examples Collection
Welcome to the XQuery Examples Collection Wikibook! XQuery is a World Wide Web Consortium recommendation for selecting data from documents and databases.
Current Status
A new release of eXist (1.4) is currently installed and under test. Please note any problems with these examples in the discussion. Recent Changes
Introduction
1. 2. 3. 4. Background - A brief history and motivation for the XQuery standard. Benefits - Why use XQuery? Installing and Testing - How to install an XQuery server on your . Naming Conventions - Naming standards used throughout this book.
Example Scripts
Beginning Examples
Examples that do not assume knowledge of functions and modules. 1. 2. 3. 4. 5. 6. 7. 8. HelloWorld - A simple test to see if XQuery is installed correctly. FLWOR Expression - A basic example of how XQuery FLWOR statements work. Sequences - Working with sequences is central to XQuery. XPath examples - Sample XPath samples for people new to XML and XPath Regular Expressions - Regular expressions make it easy to parse text. Searching multiple collections - How to search multiple collections in a database. Getting URL Parameters - How to get parameters from the URL. Getting POST Data - How to get XML data posted to an XQuery.
9. Checking for Required Parameters - How to check for a required parameter using if/then/else. 10. Displaying Lists - How to take a list of values in an XML structure and return a comma separated list. 11. Extracting data from XHTML files - How to use the doc() function to get data from XHTML pages.
XQuery 12. Displaying data in HTML Tables - How to display XML data in an HTML table. 13. Limiting Result Sets - How to limit the number of records returned in an XQuery. 14. Filtering Words - How to test to see if a word is on a list. 15. Saving and Updating Data - How to have a single XQuery that saves new records or updates existing records. 16. Quantified Expressions - Testing all the items in a sequence. 17. Dates and Time - Sample expressions that work with date and time values 18. Chaining Web Forms - Passing data from one web page to another using URL parameters, sessions or cookies {{stage short|25%|Dec 17th, 2010}
Intermediate Examples
Assumes knowledge of functions and modules. 1. 2. 3. 4. 5. 6. Using XQuery Functions - How to read XQuery function documents and user XQuery functions Creating XQuery Functions - How to create your own local XQuery functions Returning the Longest String - A function to find the longest string from a list of strings Net Working Days - How to calculate the number of working days between two dates Tag Cloud - Counting and viewing the number of keywords String Analysis/ - Regular expression string analysis
7. Manipulating URIs - How to get and manage URIs 8. Parsing Query Strings - Parsing query strings using alternate delimiters. 9. Splitting Files - Splitting a large XML files into many smaller files. 10. Filling Portlets - How to fill regions of a web page with XQuery 11. Filtering Nodes - How to use the identity transform to filter out nodes 12. Limiting Child Trees - You have a tree of information and you want to "prune" only at a specific level 13. Higher Order Functions - Passing functions as arguments to functions 14. Timing Fibonacci algorithms - A couple of Fibonacci algorithms and timing display 15. Using Intermediate Documents - Analysis of a MusicXML file 16. Formatting Numbers - using picture formats to format numbers 17. Uploading Files - how to upload files using HTML forms 18. TEI Concordance - How to build a TEI-based concordance
Search
1. Introduction to XML Search - An overview of XML search terminology 2. Basic Search - A simple search page 3. Searching,Paging and Sorting - Searching and Viewing search results 4. Keyword Search - full text search with Google-style results 5. Employee Search - an Ajax example 6. Incremental Search of the Chemical Elements - with Ajax 7. Lucene Search - using eXist's Lucene-based fulltext search 8. Incremental Searching - working with a JavaScript client to perform incremental search 9. Advanced Search - creating complex searches using multiple search fields 10. Open Search - creating an OpenSearch file to describe your search page 11. Auto-generation of Index Config Files - scripts to automatically generate the index configuration file
XQuery
Interaction
1. 2. 3. 4. Adder - Creating a web service that adds two numbers. Simple XForms Examples Navigating Collections - an example of an AJAX browser Sending E-mail - How to send an e-mail message from within an XQuery
Paginated Reports
Unlike HTML pages, paginated reports use the concept of text flows between pages. These examples show you how to convert raw XML into high-quality PDF files suitable for printing. The examples use a markup standard called XSL-FO for "Formatted Objects" 1. Installing_the_XSL-FO_module - update your 1.4 configuration to get the current software from the Apache web site 2. Generating PDF from XSL-FO files - Converting XML-FO to PDF files 3. XSL-FO Tables - Generating XSL-FO tables from XML files 4. XSL-FO Images - Embedding images in generated (PDF) files
XQuery
Content Publishing
1. Publishing Overview - How to transfer a document from an internal intranet server to a public web site 2. Publishing to Subversion - How to transfer a document from an internal intranet to a public SVN server using SSL and digest authentication
DocBook Documents
1. DocBook to HTML 2. DocBook to PDF 3. DocBook to ePub
OpenOffice
1. OpenOffice to HTML
XQuery
XML Schemas
1. XML Schema to Instance 2. XML Schema to XForms 3. XML Schema to SVG
Language Comparisons
Python
1. XQuery and Python
SQL
1. XQuery SQL Module - Calling SQL from within your XQuery 2. XQuery from SQL - Using XQuery to access a classic Relational database - Employee/Department/Salary
RDF/OWL
1. List OWL Classes - A simple XQuery script that will display all the OWL classes in an OWL file
Language combination
Excel
1. Excel and XML
JavaScript
1. 2. 3. 4. Navigating Collections - basic AJAX Employee Search - basic AJAX Incremental Search of the Chemical Elements - AJAX DOJO data - basic JSON
SQL
1. XML to SQL
XQuery
XHTML + Voice
1. Simple RSS reader 2. XHTML + Voice Twitter Radio for Opera
XSLT
1. XQuery and XSLT Executing an XSLT transform from within XQuery
Data Mashups
Authentication
1. Basic Authentication - Logging in to a remote web server using HTTP Basic Authentication 2. Digest Authentication - Logging in to a remote web server using HTTP Digest Authentication 3. OAuth - A standard for protecting a set of user-owed data within a web service
Wikipedia interaction
1. 2. 3. 4. Wikipedia Page scraping Wikipedia Lookup Wikipedia Events RSS Wiki weapons page
Wikibook applications
1. Wikibook index page 2. Wikibook list of code links
Visualization
1. 2. 3. 4. Graph Visualization Dataflow diagrams Sequence Diagrams Example Sequencer - Step-by-step tutorial
Google Charts
Although the Google Charts functions only work when you are connected to the Internet, these examples show that XQuery is an ideal tool for converting XML data into charts. 1. 2. 3. 4. Google Charts Using XML and XQuery to generate Google Charts using REST Google Chart Sparkline - A demonstration of how to create a chart using the Google Charts API Google Chart Bullet Bar - A demonstration of how to a dashboard bullet bar using the Google Charts API Histogram of File Sizes - An XQuery report that generates a histogram of file sizes
There are also sample XForms that can be used to create front-ends in the XForms Tutorial and Cookbook [2]
XQuery
Digital Dashboards
Digital dashboards are single screens that compress a great deal of information into a single web page. This section will leverage many of the Google Charts examples from the prior section. 1. Dashboard Architecture - How to design dashboards that have fast response times
Page Scraping
Page scraping is the process of extracting well-formed XML data from any HTML web page. When creating mashup applications this is also known as the harvesting process. 1. 2. 3. 4. 5. 6. 7. 8. Overview of Page Scraping Techniques Page scraping and Yahoo Weather UK shipping forecast BBC Weather Forecast Page scraping and Mashup Simple RSS reader Multiple page scraping and Voting behaviour Link gathering
Mapping
1. 2. 3. 4. 5. Google Geocoding String Analysis#Location_Mapping Mapping Car Registrations Flickr GoogleEarth Nationalgrid and Google Maps SMS tracker
Timelines
1. Creating a Timeline - Creating a simple timeline view of events 2. Timelines of Resource - Using creation and modification dates to create timelines 3. TEI Document Timeline - Creating a timeline of all dates within a single TEI document
XQuery 9. Simile Exhibit Browser visualizations using the Simile JavaScript libraries 10. Latent Semantic Indexing Finding the semantic distance between documents
Development Tools
1. Sitemap for Content Management System XQuery functions can easily perform many common web site content management functions 2. Uptime monitor/ use XQuery to monitor a remote web service 3. XQuery IDE - XQuery Integrated development environment 4. Image Library - using an XQuery to preview your images 5. XML Schema to Instance - XQuery function to generate a sample XML instance from an XML Schema file (.xsd) 6. Lorum Ipsum text - generating sample text for inserting into test page layouts 7. XQuery and XML Schema - Generating an XML instance document 8. Generating XQDocs - Automating the generation of XQuery documentation for Modules and Functions 9. XqUSEme [3] - Firefox extension to allow XQueries including against the loaded document (even against originally non-XML (poorly formed) HTML).
Validation
1. Validating a document - Validate a document with an XML Schema 2. Validation using a Catalog - Using a Catalog file to validate documents 3. Validating a hierarchy -
Path Analysis
1. All Paths - A report of all paths in a document or collection 2. All Leaf Paths - A report of all leaf paths in a document or collection
Security
1. 2. 3. 4. Login and Logout - How to log users in and log them out URL Driven Authorization How to use URL rewriting to check for valid users Digital Signatures - How to use a custom module to use the XML Digital Signature standards Changing Permissions on Collections and Resources - how to change permissions on collections and resources
Case Studies
1. 2. 3. 4. 5. 6. 7. 8. Fizzbuzz Project Euler Topological Sort Slideshow Sudoku Pachube feed World Temperature records - conversion of text data formats to XML, indexing and data presentation UWE StudentsOnline
XQuery
Modules
compression
Function Reference [4] 1. Get zipped XML file 2. Unzipping an Office Open XML docx file - Uncompressing and storing a docx file
ftp client
This module allows you to interact with a remote FTP server on a remote system. It includes functions for listing, getting and putting files. 1. FTP Client
httpclient
Function Reference [5] 1. Digest Authentication 2. UK shipping forecast
lucene
Function Reference [6] Help [7] 1. Lucene_Search
mail
Function Reference [8] 1. Sending E-mail 2. Basic Feedback Form
XQuery
10
math
1. Using the Math Module
request
Function Reference [9] Function examples [10] 1. 2. 3. 4. 5. 6. Getting URL Parameters/ Getting POST Data/ Checking for Required Parameters Manipulating URIs Parsing Query Strings Adder simple client-server interaction
scheduler
Function Reference [11] Help [12] 1. XQuery Batch Jobs
sequences
Function Reference [13] 1. Sequences Module - three additional functions (map, fold and filter)
session
Function Reference [14] 1. Basic Session Management - the basics of session management including getting a setting session variables
subversion
Function Reference [15] 1. Subversion - how to update a subversion repository from within XQuery using the subversion client
transform
Function Reference [16] 1. String_Analysis
util
Function Reference [17] 1. 2. 3. 4. 5. 6. 7. Registered Modules : util:registered-modules() Registered Functions : util:registered-functions() Dynamic Module Loading : util:import-module(), util:eval() Higher Order Functions : util:function(), util:call() Timing Fibonacci algorithms : util:function(), util:call(), util:system-time() XMP data : util:binary-doc(), util:binary-to-string(), util:parse() Basic Authentication : util:string-to-binary(), httpclient:get()
XQuery
11
validation
Function Reference [18] Help [19] 1. Validating a document
xmldb
Function Reference [20] 1. Saving_and_Updating_Data 2. Splitting_Files
xqdoc
Function Reference [21] 1. Generating xqDoc-based XQuery Documentation
xslfo
XSL-FO (Formatted Objects) is a way of converting XML into PDF. Function Reference [22] 1. 2. 3. 4. 5. Installing the XSL-FO module - setting up your XSL-Module within eXist Generating PDF from XSL-FO files - generating PDF from a FO file XSL-FO Tables - adding tables to your PDF XSL-FO Images - adding images to your PDF XSL-FO SVG - adding SVG images to your PDF
Triggers
1. Using Triggers to Log Events - how to set up a trigger to log store, update and remove events on a collection 2. Using Triggers to assign identifiers - how to use triggers to assign identifiers to new documents or new nodes 3. Sending E-mail Email is one way to notify when a trigger has fired
XQuery Updates
1. Inserting and Updating Attributes 2. Updates and Namespaces - How updates can change serialization
XQuery
12
URL Rewriting
1. URL Rewriting Basics How to make your URLs look nice
General guidance
eXist Crib sheet
Appendixes
Systems that Support XQuery
Using native and hybrid XML databases that implement XQuery 1. BaseX - Native open source XML Database with visual frontend 2. DataDirect XQuery - Java XQuery engine supporting relational, EDI, flat files and XML input/output 3. eXist - Open source native XML database 4. DB2 pureXML - DB2 9.1 includes the pureXML feature 5. MarkLogic Server - MarkLogic Server commercial XML Content Server 6. Microsoft SQL Server 2005 7. NetKernel - NetKernel 8. Oracle Berkeley DB XML - Open source embedded storage management 9. Oracle XML DB - Oracle Server 11g includes the XML DB (XDB) feature 10. Sedna - Open source native XML Database 11. Stylus Studio - XQuery mapping/editing/debugging, ships with Saxon (and SA) and DataDirect XQuery 12. EMC xDB - EMC Documentum xDB commercial native XML database 13. XQilla - Open source XQuery library and command line utility 14. Zorba - Open source XQuery engine C++ implementation with C, Java, Php, Python, Ruby library bindings and command line utility 15. Qizx - Open source and pro XQuery engine Java implementation
XQuery
13
Debugging XQuery
1. Gotchas - some pitfalls 2. Ah-has/ - some ah-ha moments
Other sources
Function Libraries
1. FunctX XQuery Function Library [23] by Priscilla Walmsley
Discussion Groups
1. XQuery General [24]
Indexes
Page index [25] - generated Index of Application Areas - edited Key to symbols: indicates an XQuery/Best practice
References
[1] http:/ / creativecommons. org/ licenses/ by-sa/ 2. 5/ [2] http:/ / en. wikibooks. org/ wiki/ XForms/ Google_Charts [3] https:/ / addons. mozilla. org/ en-US/ firefox/ addon/ 5515 [4] http:/ / demo. exist-db. org/ exist/ functions/ compression [5] http:/ / demo. exist-db. org/ exist/ functions/ httpclient [6] http:/ / demo. exist-db. org/ exist/ functions/ lucene [7] http:/ / exist-db. org/ lucene. html [8] http:/ / demo. exist-db. org/ exist/ functions/ mail [9] http:/ / demo. exist-db. org/ exist/ functions/ request [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ request/ requestProperties. xq?a=6& b=7#xxxxx [11] http:/ / demo. exist-db. org/ exist/ functions/ scheduler [12] http:/ / www. exist-db. org/ scheduler. html [13] http:/ / demo. exist-db. org/ exist/ functions/ sequences [14] http:/ / demo. exist-db. org/ exist/ functions/ session [15] http:/ / demo. exist-db. org/ exist/ functions/ svn [16] http:/ / demo. exist-db. org/ exist/ functions/ transform [17] http:/ / demo. exist-db. org/ exist/ functions/ util [18] http:/ / demo. exist-db. org/ exist/ functions/ validation [19] http:/ / www. exist-db. org/ validation. html [20] http:/ / demo. exist-db. org/ exist/ functions/ xmldb [21] http:/ / demo. exist-db. org/ exist/ functions/ xqdoc [22] http:/ / demo. exist-db. org/ exist/ functions/ xslfo [23] http:/ / www. xqueryfunctions. com/ xq/ [24] http:/ / news. gmane. org/ gmane. text. xml. xquery. general [25] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ wikiindex. xq?book=XQuery
Advanced Search
14
Advanced Search
Motivation
You have multiple fields that you would like to search on. You want to allow users to optionally search on specific fields and perform a boolean "AND" when multiple fields are used. For example you may have a database of people. Each person has a first name, last name, e-mail and phone. You want to allow users to search on any single field or multiple fields together. If two fields are entered only records that match both fields will be returned.
Method
We will use a standard HTML form with multiple input and selection fields. We will check each incoming search request for each parameter and if the parameter is not null we will concatenate a single query with many predicates and then evaluate it using the util:eval() function.
Background on Predicates
If you have a single "where clause" (called a predicate) you can always this predicate to the end of an XPath expression. For example the following FLWOR expression will return all person records in the system: for $person in collection('/db/apps/directory')//person return $person You can now restrict this to only include faculty by adding a predicate:
Advanced Search for $person in collection('/db/apps/directory')//person[type='faculty'] return $person You can now search for all faculty with a first name of "mark" buy just adding an additional predicate:
for $person in collection('/db/apps/directory')//person[type='faculty'][firstName='mark'] return $person
15
Advanced Search <option value="student">Students</option> </select> <br/> <input type="submit" name="Submit"/> </form> When the user adds a name of "John" to the first name field and selects a type of "staff" and then the submit query button is pressed the following is an example of the URL created by this form:
advanced-search.xq?firstname=John&lastname=&email=&phone=&type=staff&Submit=Submit+Query
16
Note that most of the fields are null. Only firstname and type have a value to the right of the equal sign.
Note that each of the incoming parameters is first converted to lowercase before any comparisons are done.
For the firstname, lastname, and email we are comparing the incoming parameter with the lowercase string in the XML file. With the phone number we are using the contains() function to return all records that have a string somewhere in the phone number. The type is using an exact match since both the case of the data and keyword are
Advanced Search known precisely. The most challenging aspect of this program is learning how to get the order of the quotes correct. In general I use single quotes for enclosing static strings unless that string itself must contain a single quote. Then we use double quotes. The most difficult part is to assemble a string such as [type/text() = 'staff'] and to remember to put the single quotes around the word staff. If you can figure this out the rest will be easy. If you are having trouble you can also break the concat into multiple lines: concat( '[type/text()', " = '", $type, "']" ) Where each line clearly must start and end with the same type of quote.
17
The query with just a lastname and type would then look like this: collection('/db/apps/directory/data')//person[lower-case(firstname/text()) = 'John'][lower-case(type/text()) = 'faculty'] Note that some advanced system will modify the order of the predicates based on the most likely to narrow the search. Since there are fewer records with the first name John than there are faculty it is always more efficient to put the first name before the type. This means that fewer nodes need to be moved from hard disk into RAM and the query will execute much faster.
Advanced Search for $person in $persons let $id := $person/id/text() let $lastname := $person/lastname/text() let $firstname := $person/firstname/text() order by $lastname, $firstname return <div class="hit"> <a href="../views/view-item.xq?id={$id}"> {$lastname}, {$person//firstname/text()} {' '} {$person/type/text()} </a> </div>
18
NGram Searching
in your conf.xml module make sure the following line is uncommented:
<module uri="http://exist-db.org/xquery/ngram" class="org.exist.xquery.modules.ngram.NGramModule" />
Here is the page on NGram elements in your collection.xconf file: NGram Configuration File [1] After you edit then reindex. You can now use any of the following functions: NGram Functions [2]
Acknowledgments
This example has been provided by Eric Palmer and his staff at the University of Richmond, USA.
References
[1] http:/ / exist-db. org/ ngram. html [2] http:/ / demo. exist-db. org/ exist/ functions/ ngram
19
Method
We will use the functx leaf-elements() function functx:leaf-elements($nodes*) xs:string* This function takes as input, one or more nodes and returns an array of strings.
Example Output
For the demo play Hamlet that is included in the eXist demo set the file /db/shakespeare/plays/hamlet.xml will generate the following output: PLAY TITLE FM P PERSONAE PERSONA PGROUP GRPDESCR SCNDESCR PLAYSUBT ACT SCENE STAGEDIR SPEECH SPEAKER LINE
All Leaf Paths This query uses the descendant-or-self::* function with the predicate [not(*)] to qualify only elements that do not have child nodes.
20
Example XQuery
xquery version "1.0"; declare namespace functx = "http://www.functx.com"; declare function functx:distinct-element-names($nodes as node()*) as xs:string* { distinct-values($nodes/descendant-or-self::*/local-name(.)) }; let $doc := doc('/db/shakespeare/plays/hamlet.xml') let $distinct-element-names := functx:distinct-element-names($doc) let $distinct-element-names-count := count($distinct-element-names) return <ol>{ for $distinct-element-name in $distinct-element-names order by $distinct-element-name return <li>{$distinct-element-name}</li> }</ol>
Adding Attributes
You can also run a query that will get all the distinct attributes. Attributes are all considered leaf data types since they can never have child elements. declare function functx:distinct-attribute-names($nodes as node()*) xs:string* { distinct-values($nodes//@*/name(.)) }; as
This query says in effect to "get all the all the distinct attribute names in the input nodes". For the MODS demo file: doc('/db/mods/01c73f2b05650de2e6124d9d113f40be.xml') You will get the following attributes: 1. type 2. encoding 3. authority </source>
21
References
Documentation [1] on xqueryfunctions.com web site.
References
[1] http:/ / www. xqueryfunctions. com/ xq/ functx_leaf-elements. html
All Paths
Motivation
You want to generate a list of all unique path expressions to a document. This process is very useful to quickly get familiar with a new data set. It is also important to make sure that your document-style transforms are accessing all the elements. This process can also be used as a basis for generating index files for a new data set.
Example Output
Paths the list of unique paths for a sample file from the Shakespeare Demos on the eXist demo system at /db/shakespeare/plays/hamlet.xml would generate the following results. PLAY PLAY/TITLE PLAY/FM PLAY/FM/P PLAY/PERSONAE PLAY/PERSONAE/TITLE PLAY/PERSONAE/PERSONA PLAY/PERSONAE/PGROUP PLAY/PERSONAE/PGROUP/PERSONA PLAY/PERSONAE/PGROUP/GRPDESCR PLAY/SCNDESCR PLAY/PLAYSUBT PLAY/ACT PLAY/ACT/TITLE PLAY/ACT/SCENE PLAY/ACT/SCENE/TITLE PLAY/ACT/SCENE/STAGEDIR PLAY/ACT/SCENE/SPEECH PLAY/ACT/SCENE/SPEECH/SPEAKER PLAY/ACT/SCENE/SPEECH/LINE PLAY/ACT/SCENE/SPEECH/STAGEDIR PLAY/ACT/SCENE/SPEECH/LINE/STAGEDIR
Note that these path expressions are sorted in document order, that is the order that the path first appeared in a document. So you can see that the cast list in the PERSONAE appear before the ACT/SCENE elements. The output can also be sorted in alphabetical order.
All Paths
22
Method
We will use the functx libraries. In particular the function: functx:distinct-element-paths($nodes) takes as its input a node and returns a sequence of strings of the path expressions. See Documentation on xqueryfunctions.com [1]
distinct-element-paths function
xquery version "1.0"; declare namespace functx = "http://www.functx.com"; declare function functx:path-to-node($nodes as node()*) as xs:string* { $nodes/string-join(ancestor-or-self::*/name(.), '/') }; declare function functx:distinct-element-paths($nodes as node()*) as xs:string* { distinct-values(functx:path-to-node($nodes/descendant-or-self::*)) }; declare function functx:sort($seq as item()*) as item()* { for $item in $seq order by $item return $item }; let $in-xml := collection("NAMEOFCOLLECTION") return functx:sort(functx:distinct-element-paths($in-xml))
The heart of this query is the single expression: ancestor-or-self::*/name(.) Which says in effect "get me the element names of all the nodes in the document". The next step is to turn this list into a list of distinct element paths. This is done by the function functx:distinct-element-paths()
All Paths
23
Acknowledgments
David Elwell posted this suggestion on the open-exist list on July 22 of 2010
References
[1] http:/ / www. xqueryfunctions. com/ xq/ functx_distinct-element-paths. html
Alphabet Poster
This toy programme creates alphabet posters using images from Wikipedia, located via dbpedia. It is described in a blog entry [1]
Script
(: This script creates a picture alphabet based on a list of words.
@parameter @parameter
title - The title string for the poster alphabet - list of comma-separated word , unordered
@parameter cols - the number of columns in the table layout @parameter action : poster - generate the poster, editor generate
the editor for the data @author Chris Wallace @date 2008-10-22
declare variable $alphabet := request:get-parameter("alphabet","Ant,Bat"); declare variable $words := tokenize(normalize-space($alphabet)," *, *"); declare variable $title := request:get-parameter("title","Charlie's Animal Alphabet");
Alphabet Poster
24
declare variable
$query := "
PREFIX : <http://dbpedia.org/resource/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT * WHERE { :Hedgehog } "; foaf:depiction ?img.
as xs:string {
declare function local:cell ($animal as xs:string , $picture as xs:string) as element(td) { let $letter := substring($animal,1,1) return <td class="cell" valign="top"> <span class="letter">{$letter} </span> is for <div> <a href="http://en.wikipedia.org/wiki/{$animal}"> <img src="{$picture}" alt="{$animal}" title="{$animal}" border="0" /> </a> </div> <span class="word"> {$animal} </span> </td>
};
Alphabet Poster
declare function local:poster() as element(div) { <div> <h1>{$title} </h1>
25
{let $letters := for $animal in $words let $picture := local:picture($animal) order by $animal return local:cell($animal,$picture) let $nrows := xs:integer(ceiling(count($letters) div $cols)) return <table> {for $row in (1 to $nrows) return <tr> {for $col in (1 to $cols) let $letter := $letters[position()= ($row - 1 ) * $cols + $col] return if ($letter) then $letter else <td> </td> } </tr> } </table> } </div> };
declare function local:editor() as element(form) { <form action="alphabet.xq" method="get"> <input type="hidden" name="action" value="poster"/> <div> <label for="title">Title of Alphabet</label><input type="text" name="title" value="{$title}"size="50"/></div> <div> <label for="cols">Number of Columns</label><input type="text" name="cols" value="{$cols}"size="2"/></div> <div> <label for="alphabet">Alphabet words, unordered, separated by <br/> <textarea name="alphabet" cols="80" rows="5"> {$alphabet} </textarea> </div> <input type="submit" value="Create Alphabet Poster"/> </form> }; , </label>
Alphabet Poster
<html> <head> <title>Alphabet Poster - {$action}</title> <style> <![CDATA[ body {font-family:Comic Sans MS;} div.cell {margin: 0 5px 10px 0; } span.letter {font-size:200%;}
26
span.word {display:none;} ]]> </style> <style media="print"> <![CDATA[ .nav {display:none} span.word {display:block; font-size:120%;font-family:Comic Sans MS; } ]]> </style> </head> <body> { if ($action = "poster") then (<span class="nav"><a href="alphabet.xq?alphabet={string-join($words,", ")}&title={$title}&cols={$cols}&action=edit"> [edit]</a></span> , local:poster() ) else if ($action="edit")
References
[1] http:/ / thewallaceline. blogspot. com/ 2008/ 10/ grandson-charlie-age-nearly-6-rang. html
27
Index Types
There are several types of indexes you may want to create. Range indexes are very useful when you have identifiers or you want to sort results based on element content. Fulltext indexes are most frequently used of language text that contains full sentences with punctuation.
FullText Indexes
The following is some example code on how one might do this. Lucene fulltext indexes are most useful when they index fulltext sentences. One approach is to scan an instance document for full sentences looking for longer strings with punctuation. Although a full implementation would involve the inclusion of a "Natural Language Processor" library such as Apache UIMA, we can begin with some very simple rules. Here are some sample steps in the process for non-mixed-text content. Mixed text can also be done but the steps are more complex: 1. 2. 3. 4. get a list of all elements in a sample index file classify the elements according to if they have simple or complex content if they have simple content, look for sentences (spaces and puncuation) for each element that has fulltext create a lucene index
Auto-generation of Index Config Files for $ns in $defaultNamespaces return concat(' xmlns:ns',index-of($defaultNamespaces,$ns),$eq,$qt,$ns,$qt) let $index3 := "><fulltext default='none' attribute='no'/><lucene><analyzer class='org.apache.lucene.analysis.standard.StandardAnalyzer'/><analyzer id='ws' class='org.apache.lucene.analysis.WhitespaceAnalyzer'/><text qname='foo'/>" let $index4 := for $ns in $defaultNamespaces let $prefix := concat('ns',index-of($defaultNamespaces,$ns)) return concat('<text qname=',$qt,$prefix,':foo',$qt,'/>') let $index5 := "</lucene></index></collection>" let $index := util:parse(string-join(($index1,$index2,$index3,$index4,$index5),"")) let $status := xmldb:store($indexLocation,"collection.xconf",$index) let $result :=xmldb:reindex($dataLocation)
28
Background
XQuery and Functional Programming
XQuery is an example of a functional programming language. Like other functional languages, XQuery variables are immutable, meaning that you can set them once but never change them after that. XQuery functions do not have "side effects" meaning that they do not change data that is not specifically passed to them. Functional programming has recently gained popularity with the rise of the MapReduce algorithms recently popularized by Google. Google's ability to leverage tens of thousands of CPUs in its data center has shown that functional languages are in many ways superior to procedural languages. But many of the benefits of functional programming go back to mathematical formalisms of the 1930s, including the lambda calculus and the -recursive functions. Although the XQuery 1.0 W3C specification does not allow a function to be passed as an argument to a function, most implementations such as eXist support this, so technically the eXist implementation of XQuery is a true functional language but the W3C standard is not. However, XQuery 1.1 is to allow function items as data[1] while A history of functional programming is available at Functional Programming [2]. This article has an excellent historical background on functional programming and why functional programs are ideal for a server environment where reliability is critical.
Background
29
References
W3C Papers from 1998 on XML Query Languages [4]
[1] [2] [3] [4] http:/ / www. w3. org/ TR/ xquery-11/ #id-inline-func http:/ / en. wikibooks. org/ wiki/ Computer_programming/ Functional_programming http:/ / www. w3. org/ TandS/ QL/ QL98/ Overview. html http:/ / www. w3. org/ TandS/ QL/ QL98/ pp. html
Basic Authentication
30
Basic Authentication
Motivation
You want to use a very basic login process over a secure network such as a secure Intranet or over an SSL connection.
Method
We will used the base64 encoding and decoding tools to generate the right strings. xquery version "1.0"; let $user := 'Aladdin' let $password := 'open sesame' let $credentials := concat($user, ':', $password) let $encode := util:string-to-binary($credentials) return <results> <user>{$user}</user> <password>{$password}</password> <encode>{$encode}</encode> </results> Returns the following: <results> <user>Aladdin</user> <password>open sesame</password> <encode>QWxhZGRpbjpvcGVuIHNlc2FtZQ==</encode> </results>
Basic Authentication let $user := 'my-login' let $password := 'my-password' return local:basic-get-http($uri,$username,$password)
31
References
http://en.wikipedia.org/wiki/Basic_access_authentication Wikipedia Entry on Basic Authentication
Implementation
A simple HTML form gathers suggested improvements and an email address. The suggestion is emailed to one of the authors and an acknowledgment sent to the submitter. Here the default send-mail client on the eXist implementation at UWE, Bristol is used.
XQuery script
xquery version "1.0";
let $comment:= normalize-space(request:get-parameter("comment","")) let $email := normalize-space(request:get-parameter("email","")) return <html> <head> <title>Feedback on the XQuery Wikibook</title> </head> <body> <h1>Feedback on the XQuery Wikibook</h1> <form method="post"> Please let us know how this Wikibook could be improved.<br/> <textarea name="comment" rows="5" cols="80"/><br/> Your email address <input type="text" name="email" size="60"/> <input type="submit" value="Send"/> </form> {if ($email ne "" and $comment ne "") then let $commentMessage := <mail>
32
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ mail/ feedback. xq
Basic Search
33
Basic Search
Motivation
You want to create a basic HTML search page and search service.
Method
We will create two files. One is an HTML form and the other is a RESTful search service that takes a single parameter from the URL which is the search query. The search service will search a collection of XML files. Here is the base path to our test search collection: /db/test/search The data to be searched will be in the following collection: /db/test/search/data In "Browse Collections" in the Admin interface, create the collection "test"; create the collection "search" under it; lastly, create the collection "data" under "search". Upload the two XML documents listed under "Sample Data" to "data"; upload "search-form.xq" and "search.xq" to "search" (instead of uploading, you can Save to URL, using oXygen, or use the Webstart client).
Search Form
/db/test/search/search-form.xq
We will create a basic HTML form that has just one input field for the query. declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; let $title := 'Basic Search Form' return <html> <head> <title>{$title}</title> </head> <body> <h1>{$title}</h1> <form method="GET" action="search.xq"> <p> <strong>Keyword Search:</strong> <input name="q" type="text"/> </p> <p> <input type="submit" value="Search"/> </p> </form> </body>
Basic Search </html> Note that the action will pass the value from the form to a RESTful service. The only parameter will be "q", the query string.
34
Search Service
The following file should be placed in /db/test/search/search.xq
/db/test/search/search.xq
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; let $title := 'Simple Search RESTful Service' let $data-collection := '/db/test/search/data' (: get the search query string from the URL parameter :) let $q := request:get-parameter('q', '') return <html> <head> <title>{$title}</title> </head> <body> <h1>Search Results</h1> <p><b>Searching for: </b>{$q} in collection: {$data-collection}</p> <ol>{ for $fruit in collection($data-collection)/item[fruit/text() = $q] return <li>{data($fruit)}</li> }</ol> </body> </html>
To drive this service from a form, click the following link or copy it into your browser navigation toolbar:
Basic Search http:/ / localhost:8080/ exist/ rest/ db/ test/ search/ search-form. xq
35
/db/test/search/data/2.xml
<item> <fruit>banana</fruit> </item>
Method
There are several functions provided by eXist and other web servers to manage information associated with a login session. xquery version "1.0"; let $session-attributes := session:get-attribute-names() return <results> {for $session-attribute in $session-attributes return <session-attribute>{$session-attribute}</session-attribute> } </results> Before you add any session attributes this might return only a single variable such as:
Basic Session Management <results> <session-attribute>_eXist_xmldb_user</session-attribute> </results> xquery version "1.0"; (: set the group and role :) let $set-dba-group := session:set-attribute('group', 'dba') let $set-role-editor := session:set-attribute('role', 'editor') let $session-attributes := session:get-attribute-names() return <results> {for $session-attribute in $session-attributes return <session-attribute>{$session-attribute}</session-attribute> } </results> This will return the following attributes: <results> <session-attribute>group</session-attribute> <session-attribute>role</session-attribute> <session-attribute>_eXist_xmldb_user</session-attribute> </results> These attributes will remain associated with the user until the user logs out or their session times out, typically after 15 minutes of inactivity. One sample use of session attributes is to keep track of user interface preferences. For example if a user wants to have their data sorted by a person's zip code you can add that to their session variable. let $set-sort := session:set-attribute('sort', 'zip-code')
36
37
24-hour forecast
This script uses the eXist module httpclient to get the HTML, parses the HTML and generates an XML file. This XML could then be transformed via XSLT to a viewable page.
Interface
This script has two parameter: region - required - a numeric code unique to the BBC (? code list) area - optional - a sub region , typically the beginning of the postcode
declare namespace h ="http://www.w3.org/1999/xhtml";
declare function local:day-of-week($date) { ('Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat') [ xs:integer(($date - xs:date('1901-01-06')) div xs:dayTimeDuration('P1D')) mod 7 +1] };
let $area := request:get-parameter("area",()) let $region := request:get-parameter("region","2") let $url := concat ("http://news.bbc.co.uk/weather/forecast/",$region, "?state=fo:B", if (exists($area)) then concat("&area=",$area) let $doc := httpclient:get(xs:anyURI($url),false(),()) let $currentDate := current-date() let $currentTime := current-time() let $dow := local:day-of-week($currentDate) return element forecasts { element region {$region}, if (exists($area)) then element area {$area} else () , element source {"BBC"}, else ())
38
let $raw-time :=normalize-space($row/h:td[1]) let $time := if (contains($raw-time," ")) then substring-before($raw-time," ") else $raw-time let $time := xs:time(concat($time,":00")) let $pdow := if (contains($raw-time,"(")) then substring-before(substring-after($raw-time,"("),")") else $dow let $date := if ($pdow ne $dow) return element forecast { element date {$date}, element time {$time}, element dow {$pdow}, element summary {string($row/h:td[2]//h:p[@class="sum"])}, element imageurl {string($row/h:td[2]//h:div[@class="summary"]//h:img/@src)}, element maxTemp{ attribute units {"degc"} , $row/h:td[3]//h:span[@class="cent"]/text()}, element maxTemp {attribute units {"degf"} , $row/h:td[3]//h:span[contains(@class,"fahr")]/text()}, element windDirection {string($row/h:td[4]//h:span[contains(@class,"wind")]/@title)}, element windSpeed {attribute units {"mph"} , substring-before($row/h:td[4]//h:span[contains(@class,"mph")], "mph")}, element windSpeed {attribute units {"kph"} ,substring-before($row/h:td[4]//h:span[contains(@class,"kph")], "km/h")}, element humidity {attribute units {"%"}, normalize-space(substring-before($row/h:td[5]//h:span[contains(@class,"hum")], "%"))}, element pressure { attribute units {"mb"} , normalize-space(substring-before($row/h:td[5]//h:span[@class="pres"], "mB"))}, then $currentDate + xs:dayTimeDuration("P1D") else $currentDate
References
[1] [2] [3] [4] http:/ / newsrss. bbc. co. uk/ weather/ forecast/ 3/ ObservationsRSS. xml http:/ / newsrss. bbc. co. uk/ weather/ forecast/ 3/ Next3DaysRSS. xml http:/ / pipes. yahoo. com/ pipes/ pipe. edit?_id=1HlcTL8F3hGF7NSlPxJ3AQ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ weather/ bbc24hforecast. xq?region=3
Benefits
39
Benefits
Benefits of XQuery
The principal benefits of XQuery are: Expressiveness - XQuery can query many different data structures and its recursive nature makes it ideal for querying tree and graph structures Brevity - XQuery statements are shorter than similar SQL or XSLT programs Flexibility - XQuery can query both hierarchical and tabular data Consistency - XQuery has a consistent syntax and can be used with other XML standards such as XML Schema datatypes XQuery is frequently compared with two other languages, SQL and XSLT, but has a number of advantages over these.
Benefits and empowering them to fuse both the requirements and documentation of a transformation routing into a single, modular program. On the other hand, learning XSLT requires overcoming a very substantial learning curve. XSLT's difficulty is due, in part, to one of the key design decisions by its architects: to express the transformation rules using XML itself, rather than creating a brand new syntax and grammar for storing the transformation rules. XSLT's unique approach to transformation rules also contributes to the steepness of the learning curve. The learning curve can be overcome, but it is fair to say that this learning curve has created a opening for an alternative approach. XQuery has filled this demand for an alternative among a growing community of users: they find XQuery has a lower learning curve, it meets their needs for transforming XML, and, together with XQuery's other advantages, it has become a compelling "all-in-one" language. Like XSLT, XQuery was created by the W3C to handle XML. But instead of expressing the language in XML syntax, the architects of XQuery chose a new syntax that would be more familiar to users of server-side scripting languages such as PHP, Perl, or Python. XQuery was designed to be similar to users of relational database query languages such as SQL, while still remaining true to functional programming practices. Despite its relative youth (XQuery 1.0 was only released in 2007 when XSLT had already reached its version 2.0), XQuery was born remarkably mature. XML servers like eXist-db and MarkLogic were already using XQuery as their language for querying XML and performing web server operations (obviating the need for learning PHP, Perl, or Python). So, in the face of the XSLT community's contention that "XSLT is best for transforming documents and XQuery is best for querying databases", this community of users was surprised to find that XQuery has entirely replaced their need for XSLT. They have come to argue unabashedly that they prefer XQuery for this purpose. How does XQuery accomplish the task of transforming XML? The primary technique in XQuery for transforming XML is a little-known expression added by the authors of XQuery, called "typeswitch." Although it is quite simple, typeswitch enables XQuery to perform nearly the full set of transformations that XSLT does. A typeswitch expression quickly looks at a node's type, and depending on the node's type, performs the operation you specify for that type of node. What this means is that each distinct element of a document can have its own rule, and these rules can be stored in modular XQuery functions. This humble addition to the XQuery language allows developers to transform documents with complex content and unpredictable order - something commonly believed to be best reserved for the domain of XSLT. Despite the differences in syntax and approach to transformation, a growing community has actually come to see the XQuery typeswitch expression as a valid, even superior, way to store their document transformation logic. By structuring a set of XQuery functions around the typeswitch expression, you can achieve the same result as XSLT-style transforms while retaining the benefits of XQuery: ease of learning and integration with native XML databases. Even more important for those users of native XML databases, the availability of typeswitch means that they only need to learn a single language for their database queries, web server operations, and document transformations. These XQuery typeswitch routines have proved easy to build, test, and maintain - some believe easier than XSLT. XQuery typeswitch has given these users a high degree of agility, allowing them to master XQuery fully rather than splitting their time and attention between XQuery and XSLT. That said, there is still a large body of legacy XSLT transforms that work well, and there are XSLT developers who see little benefit from transitioning to a typeswitch-style XQuery. Both are valid approaches to document transformation. A natural tension has arisen between the proponents of XQuery typeswitch and XSLT, each promoting what they are most comfortable with and believe to be superior. In practice you might be best served by trying both techniques and determining what style is right for you and your organization. Without presuming a background or interest in XSLT, this article and its companion article help you to understand the key patterns for using XQuery typeswitch for your XML transformation needs.
40
41
Non-caching approach
The following script generates an index page with links to the HTML view and the timeline views of a artist album.
declare option exist:serialize "method=xhtml media-type=text/html"; declare variable $query := " PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?group } "; skos:subject <http://dbpedia.org/resource/Category:Rock_and_Roll_Hall_of_Fame_inductees>.
declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"\(.*\)","") let $text := replace($text,"_"," ") return $text };
let $category := request:get-parameter("category","") let $categoryx := replace($category,"_"," ") let $queryx := replace($query,"Rock_and_Roll_Hall_of_Fame_inductees",$category) let $sparql := concat("http://dbpedia.org/sparql?default-graph-uri=",escape-uri("http://dbpedia.org",true()), "&query=",escape-uri($queryx,true()) ) let $result return <html> <body> <h1>{$categoryx}</h1> <table border="1"> { for $row in $result/table//tr[position()>1] let $resource := substring-after($row/td[1],"resource/") let $name := local:clean($resource) order by $name return <tr> <td> := doc($sparql)
42
Index examples
Rock and Roll Groups [1]
Caching Approach
Two scripts are needed - one to generate the data to cache, the other to generate the index page. The approach is illustrated with an index to Rock and Roll groups based on the Wikipedia category Rock and Roll Hall of Fame inductees.
declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"\(.*\)","") let $text := replace($text,"_"," ") return $text };
declare function local:table-to-seq($table ) { let $head := $table/tr[1] for $row in $table/tr[position()>1] return <tuple>
43
let $category := request:get-parameter("category","Rock_and_Roll_Hall_of_Fame_inductees") let $queryx := replace($query,"Rock_and_Roll_Hall_of_Fame_inductees",$category) let $sparql := concat("http://dbpedia.org/sparql?default-graph-uri=",escape-uri("http://dbpedia.org",true()), "&query=",escape-uri($query,true()) ) let $result := doc($sparql)/table
let $groups := local:table-to-seq($result) return <ResourceList category="{$category}"> {for $group in $groups let $resource := substring-after($group/group,"resource/") let $name := local:clean($resource) order by $name return <resource id="{$resource}" name="{$name}"/> } </ResourceList>
Note: I guess a better approach would be to use triples here, saved to a local triple store.
Caching and indexes </td> <td> <a href="groupTimeline.xq?group={$resource/@id}">Timeline</a> </td> </tr> } </table> </body> </html>
44
Execute
Roll and Roll groups [2]
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ groupIndex. xq?category=Rock_and_Roll_Hall_of_Fame_inductees [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ groupList. xq
Methods
We will use three methods to demonstrate this: on the client using URL parameters and hidden form fields on the client in cookies on the server using sessions
Chaining Web Forms <form action="02-web-form.xq"> <span class="label">Please enter your first name:</span> <input type="text" name="name"/><br/> <input type="submit" value="Next Question"/> </form> </body> </html> The URL is passed to the second form and we will use the request:get-parameter() function to get the value from the URL. Here is the XQuery function for question 2. question-2.xq
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=yes indent=yes"; let $name := request:get-parameter('name', '') let $title := 'Question 2: Enter Your Favorite Color' return <html> <head> <title>{$title}</title> </head> <body> <h1>{$title}</h1> <form action="03-result.xq"> <span class="label">Hello {$name}. Please enter your favorite color:</span> <input type="hidden" name="name" value="{$name}"/> <input type="text" name="color"/><br/> <input type="submit" value="Results"/> </form> </body> </html>
45
Note that we are storing the incoming name in a hidden input field in the form. The value of the hidden field must take the value of the incoming {$name} parameter. The last page just gets the two input parameters and displays them in an HTML page. If you look at the URL it will be of the format: result.xq?name=dan&color=blue result.xq xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=yes indent=yes"; let $name := request:get-parameter('name', '') let $color := request:get-parameter('color', '')
Chaining Web Forms let $title := 'Result' return <html> <head> <title>{$title}</title> </head> <body> <h1>{$title}</h1> <p>Hello {$name}. Your favorite color is {$color}</p> </body> </html> Discussion This method is the preferred method since it does not require the client browser to support cookies. It also does not require the users to have a login a manage sessions. Sessions have the disadvantage that if the user gets interrupted half way through the process their session information will be lost and all the data they entered will need to be re-entered. Note that although the first "name" parameter is not visible in the second form, the value is visible in the URL. So the term "hidden" does not apply to the URL, only the form.
46
Using Cookies
In this example we will use the following functions for setting and getting cookies: response:set-cookie($name as xs:string, $value as xs:string) empty() request:get-cookie-value($cookie-name as xs:string) xs:string? The first form is identical to the example above. But the
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=yes indent=yes"; (: get the input and set the name cookie :) let $name := request:get-parameter('name', '') let $set-cookie := response:set-cookie('name', $name) let $title := 'Question 2: Enter Your Favorite Color' return <html> <head> <title>{$title}</title> </head> <body> <h1>{$title}</h1> <form action="03-result.xq"> <span class="label">Hello {$name}. Please enter your favorite color:</span>
47
Our first form will set the first cookie value and our second form will read the name cookie's value. xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=yes indent=yes"; let $name := request:get-cookie-value('name') let $color := request:get-parameter('color', '') let $title := 'Result From Cookies' return <html> <head> <title>{$title}</title> </head> <body> <h1>{$title}</h1> <p>Hello {$name}. Your favorite color is {$color}</p> </body> </html> Discussion Using cookies can be complex and you must be very careful that your cookies are not changed by another application from the same domain. Your design must also consider the fact that browsers and users disable cookies.
Using Sessions
The last method is to use server session values to store the key-value data. This will be very similar to the last example but we will use the eXist Session [1] module functions to set and get the values. Here are the two calls we will need: session:set-attribute($name as xs:string, $value as item()*) empty() session:get-attribute($name as xs:string) xs:string* You only need to make a change to a single line of the 2nd form. Just change the lines to the following: (: get the name and set the session :) let $name := request:get-parameter('name', ) let $set-session := session:set-attribute('name', $name) and in the final result script just get the data from the session: let $name := session:get-attribute('name')
Chaining Web Forms Discussion Using sessions can also be complex if you are new to session management. There are many rules that govern session timeouts and both the web server and database server may need to be configured to take your users needs into account. Session management may also not appropriate for public web sites that have policies against collection information on the web server.
48
References
[1] http:/ / demo. exist-db. org/ functions/ session
Method
There are two functions we will use: For collections: xmldb:chmod-collection($collection, $perm) and for resources: xmldb:chmod-resource($collection, $resource, $perm) The $perm is a decimal number. As of 1.5 you can use the function xmldb:string-to-permissions("rwurwu---") to get this decimal number.
49
Warning, this breaks several features. You must run many functions as non-guest.
50
Method
We will use the xdiff:compare() function that comes built in to eXist. To use this you pass two nodes to the compare function: xdiff:compare($node1 as node(), $node2 as node())
at
51
let $list2 := <list1> <item>a</item> <item>c</item> <item>e</item> </list1> return <missing>{ for $item1 in $list1/item let $item-text := $item1/text() return <test item="{$item-text}"> {if ($list2/item/text()=$item-text) then ($item1) else <missing>{$item-text}</missing> } </test> }</missing> Note that the conditional expression:
Compare with XQuery if ($list2/item/text() = $item-text) Tests to see if the $item-text is anywhere in list2. If it occurs anywhere this expression will return true().
52
Sample Results
<missing> <test item="a"> <item>a</item> </test> <test item="b"> <missing>b</missing> </test> <test item="c"> <item>c</item> </test> <test item="d"> <missing>d</missing> </test> </missing> Note that this will not report any items on the second list that are missing from the first list.
53
We can rewrite the output function to use this function: <results> <missing-from-2>{local:missing($list1, $list2)}</missing-from-2> <missing-from-1>{local:missing($list2, $list1)}</missing-from-1> </results> Note that the order of the lists has been reversed in the second call to the missing() function. The second pass looks for items on list2 that are not on list1. Running this query generates the following output: <results> <missing-from-2> <item>b</item> <item>d</item> </missing-from-2> <missing-from-1> <item>e</item> </missing-from-1> </results>
54
Screen Image
Sample Data
This example uses full words of items to show text highlighting: let $list1 := <list> <item>apples</item> <item>bananas</item> <item>carrots</item> <item>kiwi</item> </list>
let $list2 := <list> <item>apples</item> <item>carrots</item> <item>grapes</item> </list> The following function uses HTML div and span elements and adds class="missing" to each div that is missing. The CSS file will highlight this background. declare function local:missing($list1 as node()*, $list2 as node()*) as node()* { for $item1 in $list1/item return if (some $item2 in $list2/item satisfies $item2/text() = $item1/text()) then <div>{$item1/text()}</div> else <div> {attribute {'class'} {'missing'}} {$item1/text()}
Compare with XQuery </div> }; We then use the following CSS file to highlight the differences. Each missing element must have class="missing" attribute for the missing element to be highlighted in this report. body {font-family: Ariel,Helvetica,sans-serif; font-size: large;} h2 {padding: 3px; margin: 0px; text-align: center; font-size: large; background-color: silver;} .left, .right {border: solid black 1px; padding: 5px;} .missing {background-color: pink;} .left {float: left; width: 190px} .right {margin-left: 210px; width: 190px} <body> <h1>Missing Items Report</h1> <div class="left"> <h2>List 1</h2> {for $item in $list1/item return <div>{$item/text()}</div>} </div> <div class="right"> <h2>List 2</h2> {for $item in $list2/item return <div>{$item/text()}</div>} </div> <br/> <div class="left"> <h2>List 1 Missing from 2</h2> {local:missing($list1, $list2)} </div> <div class="right"> <h2>List 2 Missing from 1</h2> {local:missing($list2, $list1)} </div> </body>
55
Collation
If the lists are in sorted order, or can be sorted into order, an alternative approach is to recursively collate the two lists. The core algorithm looks like: declare function local:merge($a, $b as item()* ) as item()* { if (empty($a) and empty($b)) then () else if (empty ($b) or $a[1] lt $b[1]) then ($a[1], local:merge(subsequence($a, 2), $b)) else if (empty($a) or $a[1] gt $b[1]) then ($b[1],local:merge($a, subsequence($b,2))) else (: a and b matched :) ($a[1], $b[1], local:merge(subsequence($a,2), subsequence($b,2)))
56
let $list2 := <list> <item>apples</item> <item>carrots</item> <item>grapes</item> </list> return <result> {local:merge($list1/item,$list2/item) } </result> Execute [1] The actions on merge will depend on the application and the algorithm can be modified to output only mismatched items on one or other list, and handle matching items appropriately. For example, to display the merged list as HTML, we might modify the algorithm to:
declare function local:merge($a, $b if (empty($a) and empty ($b)) then () else if (empty ($b) or $a[1] lt $b[1]) then (<div class="left">{$a[1]/text()}</div>, local:merge(subsequence($a, 2), $b)) else if (empty ($a) or $a[1] gt $b[1]) then else }; (<div class="right">{$b[1]/text()}</div>,local:merge($a, subsequence($b,2))) (<div class="match">{$a[1]/text()}</div>, local:merge(subsequence($a,2), as item()* ) as item()* {
subsequence($b,2)))
Execute [2]
57
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ collate1. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ collate3. xq
Creating a Timeline
Motivation
You want to create a timeline of event data. Timelines show events in a horizontal scrolling view.
Method
We will use the JavaScript client Timeline widgets provided by the Simile-Widgets project will be using the timeline 2.2.0 API calls.
[1]
. In this example we
To do this we need to transform a list of event dates into the proper formats and then create an HTML page that includes calls to the Simile JavaScript libraries. Steps 1. View sample Event XML File format 2. View HTML template that loads XML file 3. Create XQuery Function that generates the HTML template and loads the appropriate XML data file Our first example will use a list of non-Duration Events (Instant Events). We will explore duration events and other events in a future chapter. We will then create a simple XQuery module with a single function that loads a simple timeline.
<![CDATA[
Creating a Timeline
var tl;
58
function onLoad() { var eventSource = new Timeline.DefaultEventSource(); var bandInfos = [ Timeline.createBandInfo({ eventSource: date: width: intervalUnit: eventSource, "Jan 01 2009 00:00:00 GMT", "70%", Timeline.DateTime.MONTH,
intervalPixels: 100 }), Timeline.createBandInfo({ eventSource: date: width: intervalUnit: eventSource, "Jan 01 2009 00:00:00 GMT", "30%", Timeline.DateTime.YEAR,
var resizeTimerID = null; function onResize() { if (resizeTimerID == null) { resizeTimerID = window.setTimeout(function() { resizeTimerID = null; tl.layout(); }, 500); } } ]]> </script> </head> <body onload="onLoad();" onresize="onResize();"> <h1>Timeline Template</h1> <div id="my-timeline" style="height: 150px; border: 2px solid blue">
</div>
Creating a Timeline
<noscript> This page uses Javascript to show you a Timeline. Please enable Javascript in your browser to see the full page. Thank you. </noscript> </body> </html>
59
Sample Image
This will produce the following example:
References
[1] http:/ / code. google. com/ p/ simile-widgets/ wiki/ Timeline
60
Method
Use XQuery functions to encapsulate any chunk of XQuery code with a function wrapper. Any time you see a grouping of XQuery or XML code in your XQuery program that you would like to standardize, it is good design to start creating your own XQuery functions.
Static Content
Static content is content that is fixed and is not changed by the use of parameters. XQuery functions are ideal for storage of static content libraries. For example, if all your HTML pages have the same block of code that has your logo and header text, you can create a simple XQuery function that encodes this functionality. Here is the HTML code you want to standardize on: <div class="web-page-header"> <img src="images/mylogo.jpg" alt="Our Logo"/> <h1>Acme Widgets Inc.</h1> </div> declare function local:header() as node() { <div class="web-page-header"> <img src="images/mylogo.jpg" alt="Our Logo"/> <h1>Acme Widgets Inc.</h1> </div> }; When you want to reference this you just call the function by placing it in your HTML page and enclosing it in curly braces: <html> <head> <title>Sample Web Page</title> </head> <body> {local:header()} </body> </html> Note that these functions names are preceded by "local:". This is the default namespace of a function invoked only in the same XQuery main module. If you want to store your functions in a separate file, you can do so. Such a file is called a "library module". To make use of the functions in this module, you need to "import" the module in the prolog of your query. The benefit of storing your code in functions and modules is that if you ever need to make a change to a function, you only have to make the change in one location, rather than in the many locations where you've copied and pasted the same code.
Creating XQuery Functions The following file, which we will save as webpage.xqm, is an example of this (note also the addition of a footer function): module namespace webpage='http://www.example.com/webpage'; declare function webpage:header() as node() { <div class="web-page-header"> <img src="images/mylogo.jpg" alt="Our Logo"/> <h1>Acme Widgets Inc.</h1> </div> }; declare function webpage:footer() as node() { <div class="web-page-footer"> <img src="images/mylogo.jpg" alt="Our Logo"/> <p>Acme Widgets Inc.</p> </div> }; The module begins with a declaration of the module's namespace. Here we use an arbitrary namespace, "webpage".
61
62
Dynamic Content
Unlike static content, dynamic content can be modified by including parameters into the function. One very common approach is to use a "page-assembler" function that includes parameters such as the document title and content. Here is an example of this function.
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ functions/ staticpage. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ functions/ dynamicpage. xq
63
Method
We will provide a sample list of XQuery expressions and their results.
Current Date
This function returns the current date on the system that is executing the XQuery in W3C XML Schema date format: current-date() Result: 2010-05-28-05:00 Note that the "-05:00" is the offset from GMT of the server.
Current Time
current-time() Result: 07:02:11.616-05:00
64
Examples
Football Venues in England
kml [1] GoogleMap [2]
Script
(: This accepts a all stadiums :) category of stadiums and generates a kml map of
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE {?ground skos:subject <http://dbpedia.org/resource/Category:Football_venues_in_England>. ?ground geo:long ?ground geo:lat ?long. ?lat.
?ground rdfs:label ?groundname. OPTIONAL {?ground foaf:depiction ?image .}. OPTIONAL {?club p:ground ?ground. FILTER (lang(?clubname) = 'en')}. OPTIONAL {?ground foaf:page ?wiki.}. FILTER (lang(?groundname) ='en'). } "; ?club rdfs:label ?clubname .
65
declare function local:sparql-to-tuples($rdfxml) { for $result in $rdfxml//r:result return <tuple> { for $binding return if ($binding/r:uri) then element {$binding/@name} attribute type { in $result/r:binding
{"uri"} ,
"method=xhtml
media-type=application/vnd.google-earth.kml+xml highlight-matches=none";
let $category := request:get-parameter("category","Football_venues_in_England") let $queryx := replace($query,"Football_venues_in_England",$category) let $result:= local:execute-sparql($queryx) let $tuples := local:sparql-to-tuples($result)
return
66
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ stadium2kml. xq?category=Football_venues_in_England [2] http:/ / maps. google. co. uk/ maps?q=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ stadium2kml. xq?category=Football_venues_in_England [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ stadium2kml. xq?category=Football_venues_in_Scotland [4] http:/ / maps. google. co. uk/ maps?q=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ stadium2kml. xq?category=Football_venues_in_Scotland
67
Integrated Report
The following script shows how the local delivery data can be combined with the data for this delivery obtained from the delivery company. In this case, the delivery company City-Line provides a page for each consignment reporting its status. The script loops over the relevant deliveries and constructs the appropriate URL to read the page for each delivery. The page is input to an HTML-to-XML conversion (used in the Yahoo Weather feed), and then specific elements are retrieved from the HTML to build an extract in XML of the page. This XML data is then combined with the local data to create a combined report.
import module namespace fwiki = "http://www.cems.uwe.ac.uk/xmlwiki" at "../reports/util.xqm";
declare function
local:get-consignment($consNo) {
let $citylinkURL := replace($citylinkURL,"ZZZZ",$consNo) let $page := fwiki:html-to-xml($citylinkURL) return <Consignment> <CustomerReference> {string($page//table[@id="this_table_holds_the_summary_info"]/tr[1]/td[2])} </CustomerReference> <ScheduledDeliveryDate> {string($page//table[@id="this_table_holds_the_summary_info"]/tr[1]/td[4])} </ScheduledDeliveryDate> <DeliveryStatus> {string($page//table[@id="this_table_holds_the_detailed_status_desc"]/tr[1]/td[2])} </DeliveryStatus> </Consignment> };
68
let $report := <Report> {for $delivery in //Delivery[Service="CityLink"] let $citylink := local:get-consignment($delivery/ConsignmentNo) return <Delivery> {$delivery/*} {$citylink/*} </Delivery> } </Report> return fwiki:element-seq-to-table($report)
Notes
1. In production, a simple script to extract and store the delivery data in the database could be scheduled to run every hour to reduce the demands on the sites used in this application. 2. The script uses a generic function to convert any simple tabular XML to an HTML table. 3. The mapping between HTML elements and XML depends on the stability of this page. The paths are simplified by the presence of ids for the relevant tables. 4. A production system must be able to detect HTTP errors and act accordingly. This would require more control over the HTTP requests and responses. This facility is provided by the HTTP module in later releases of eXist. The simplistic approach taken here to obtain the XML would need to be replaced.
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Scrape/ deliveryReport. xq
Digest Authentication
69
Digest Authentication
Motivation
The API you are using uses digest authentication, for example the Talis platform this in the eXist httpclient module, but one can be written in XQuery.
[1]
XQuery Module
module namespace http ="http://www.cems.uwe.ac.uk/xmlwiki/http"; declare namespace httpclient= "http://exist-db.org/xquery/httpclient";
Two functions transform between a comma-delimited list of name="value" pairs and an XML representation: The first function takes strings in the following format: string="value",string1="value2",string3="value3" Note that the replace function removes all double quotes from the right side of each expression.
Supporting Functions
The following two functions convert key-value encoded strings of this form: key1="value1",key2="value2",key3="value3" into XML structures of the form: <field name="key1" value="value1"/> <field name="key2" value="value2"/> <field name="key3" value="value3"/> Here are the supporting functions: declare function http:string-to-nvs($string) { let $nameValues := tokenize($string,", ") return for $f in $nameValues let $nv := tokenize($f,"=") return <field name = "{$nv[1]}" value="{replace($nv[2],'"','')}"/> }; declare function http:nvs-to-string($nvs) { string-join( for $field in $nvs return concat ($field/@name, '="',$field/@value,'" ') , ", ")
Digest Authentication };
70
(: send an HTTP Request to the server - called the challenge :) let $request := ) (: The server responds with the 401 response code. ressponse the server provide the authentication realm and a randomly-generated, single-use value called a nonce. We will get the realm and the nouce by finding the WWW-Authenticate value out of the response :) let $first-response := substring-after($request//httpclient:header[@name="WWW-Authenticate"]/@value,"Digest ") (: now we get the nounce, realm and the optional quality of protection out of the first response :) let $fields := http:string-to-nvs($first-response) let $nounce := $fields[@name="nonce"]/@value let $realm := $fields[@name="realm"]/@value let $qop := $fields[@name="qop"]/@value (: Create a client nounce using a Universally Unique Identifier :) let $cnonce := util:uuid() (: this is the nounce count :) let $nc := "00000001" let $HA1:= util:md5(concat($username,":",$realm,":",$password)) (: TODO if the quality of protection (qos) is "auth-int" , then HA2 is MD5(method : digestURU : MD5(entityBody)) But if qos "auth" or "auth-int" then it is the following :) let $HA2 := util:md5(concat("POST:",$path)) util:md5(concat($HA1, ":", $nounce,":",$nc,":", In this httpclient:post( $uri, "dummy", false(), $header
let $response :=
$cnonce, ":", $qop, ":",$HA2)) (: note that if qop directive is unspecified, then the response should be md5(HA!:nounce:HA2) :)
Digest Authentication
71
(: here are the new headers :) let $newfields := ( <field name="username" value="{$username}"/>, <field name="uri" value="{$path}"/>, <field name="cnonce" value="{$cnonce}"/>, <field name="nc" value="{$nc}"/>, <field name="response" value="{$response}"/> ) let $authorization := concat("Digest ", http:nvs-to-string(($field,$newfields))) let $header2 := <headers> {$header/header} <header name="Authorization" value='{$authorization}'/> </headers> return httpclient:post( $uri, $doc, false(), $header2 ) };
Note that on under eXist 1.4 the util:md5($string) function has been deprecated. You should now use util:hash($string, 'md5) function with the second parameter now the type of hash.
Example
In this example, an RDF file is POSTed to the Talis server.
declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; import module namespace http = "http://www.cems.uwe.ac.uk/xmlwiki/http" at "http.xqm"; let $rdf := let $path := doc("/db/RDF/dataset.rdf")/rdf:RDF "/store/mystore/meta"
let $username := "myusername" let $password := "mypassword" let $host := "http://api.talis.com" let $header := <headers> <header name="Content-Type" value="application/rdf+xml"/> </headers> return http:put-with-digest($host, $path, $username, $password, $rdf , $header)
Digest Authentication
72
References
http://en.wikipedia.org/wiki/Digest_access_authentication Wikipedia Page on Digest Authentication http://technet.microsoft.com/en-us/library/cc780170%28WS.10%29.aspx Microsoft Technet Article
References
[1] http:/ / www. talis. com/ platform/
Digital Signatures
Motivation
You want to verify that a document sent to you has not been modified.
Method
We will use the W3C Digital Signature standard. We will use the standard Java function to sign and verify the signature of a document. Warning: This program is not working yet
After you run this file put the /tmp/keystore.pem file into your file system /db/test/dig-sig/keystore.pem
Digital Signatures
return if ( not(util:binary-doc-available($keystore-file-path)) ) then <error><message>Keystore File {$keystore-file-path} Not Available</message></error> else let $doc := <data><a>1</a><b>7</b><c/><c/></data> let $certificate-details := <digital-certificate> <keystore-type>JKS</keystore-type> <keystore-name>{$keystore-file-path}</keystore-name> <keystore-password>ab987c</keystore-password> <key-alias>eXist</key-alias> <private-key-password>kpi135</private-key-password> </digital-certificate> let $signed-doc := x-crypt:generate-signature($doc, "inclusive", "", "DSA_SHA1", "ds", "enveloped", $certificate-details ) return <results> <doc>{$doc}</doc> <keystore-file-path>{$keystore-file-path}</keystore-file-path> </results>
73
DocBook to HTML
74
DocBook to HTML
Motivation
You would like to convert DocBook documents to HTML format.
Method
We will use an XQuery transform that converts sample instance documents into an XQuery typeswitch module. To begin this process you can use any tool that generates an instance document from the XML Schema. You can then edit this document to include only the elements that you want to transform. You can then run this file through the tool to generate the typeswich XQuery module.
References
Chis Wallace has provided a tool that converts the XML Docbook into a typeswitch here: ../Generating Skeleton Typeswitch Transformation Modules/ DocBook to HTML Typeswitch Transform [1]
References
[1] http:/ / exist. svn. sourceforge. net/ svnroot/ exist/ branches/ dmccreary/ docs/ webapp/ docs/ docbook5/ docbook2xhtml-v2. xqm
DOJO data
Motivation
You want to use XQuery with your DOJO JavaScript library which uses a variation of JSON syntax.
Method
DOJO is a framework for developing rich client side applets in javascript: from the nice to have to the core webapp. Some day you may want to deliver your data in a way, that you or other people can easily use from DOJO. DOJO specifies its own idiosyncratic way of wrapping data in JSON formatted objects, so it can be consumed by lots of its widgets: trees, grids, comboboxes, input fields etc. Below example (note the use of single quotes, which makes this invalid JSON) is taken from its web supplied documentation: { identifier: 'abbr', label: 'name', items: [ { abbr:'ec', name:'Ecuador', { abbr:'eg', name:'Egypt', ]}
capital:'Quito' }, capital:'Cairo' }
Now, if eg. you want to feed an incremental user input widget from a server side search, xquery (in eXist at least) makes this a piece of cake. Please read below script as an introduction to the concept, very likely it can be optimized. The search itself uses a lucene fulltext index, which returns very quickly.
DOJO data xquery version "1.0"; import module namespace json="http://www.json.org"; declare namespace request="http://exist-db.org/xquery/request"; declare option exist:serialize "method=html media-type=text/javascript"; (: where the data lives:) let $coll := "/db/apps/myapp/data" (: what we are looking for, sanitize remote input :) let $tmp := xs:string(request:get-parameter("q", "")) let $querystring := replace($tmp, "[^0-9a-zA-Z\-,. ]", "") let $query := <query> <near slop="10" ordered="no">{$querystring}</near> </query> return (: fetch results, dont forget to create an index in collection.xconf :) let $hits := collection($coll)//article[ft:query(., $query)] let $count := count($hits) let $result := <result> <identifier>id</identifier> <label>title</label> <count>{$count}</count> { for $item in $hits return <items> <id>{string($item/@id)}</id> {$item/title} </items> } </result> return json:xml-to-json($result) The xquery extension json:xml-to-json($node as node()) does all the magic. In the result variable the data structure is created in the way DOJO wants it (per default), as shown above. Another thing to note: DOJO expects the identifier to be unique. It is up to you to design your data to satisfy this. Another note: as of today (eXist trunk of early september 2010) numbers in the output are quoted, it is up to you to convert them on the client for optimal processing.
75
76
Method
Module import
We will use the XQuery function util:import-module(). This function has three arguments: $namespace: The full URI of the module that you are loading such as http://example.com/my-module $prefix: the prefix you want to use to reference each function in the module such as style $location: the database path that you will be loading the module from such as an absolute path/db/modules/my-module.xqm or a relative path my-module.xqm For example the following will import a module called my-module from the /db/modules collection.:
util:import-module(xs:anyURI('http:/ / example. com/ my-module'), 'style', xs:anyURI('/db/modules/my-module.xqm'))
The function xs:anyURI is used to cast each string into the URL type.
Function invocation
Because the namespace is declare dynamically, the imported functions have to be invoked using util:eval. The input to this function is a string containing an XQuery expression. e.g. util:eval('style:header()')
Example
The following will randomly load one of two style modules. xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=yes indent=yes"; let $module := if (math:random() < 0.5) then util:import-module( xs:anyURI('http://example.com/style-a'), 'style', xs:anyURI('style-a.xqm') ) else util:import-module( xs:anyURI('http://example.com/style-b'), 'style', xs:anyURI('style-b.xqm') )
Dynamic Module Loading return <html> <head> <title>Test of Dynamic Module Import</title> {util:eval('style:import-css()')} </head> <body> {util:eval('style:header()')} {util:eval('style:breadcrumb()')} <h1>Test of Dynamic Module Import</h1> {util:eval('style:footer()')} </body> </html> Run [1]
77
Style A Module
Here is an example of a style module. It has four functions. One to import the CSS files, one for the header, one for the navigation breadcrumb and one for the footer. xquery version "1.0"; module namespace style='http://example.com/style-a'; declare function style:import-css() { <link type="text/css" rel="stylesheet" href="style-a.css"/> }; declare function style:header() { <div class="header"> <h1>Header for Style A</h1> </div> }; declare function style:breadcrumb() { <div class="breadcrumb"> <h1>Breadcrumb for Style A</h1> </div> }; declare function style:footer() { <div class="footer"> <h1>Footer for Style A</h1> </div> };
78
Style A CSS
body { color: blue; }
Style B Module
xquery version "1.0"; module namespace style='http://example.com/style-b'; declare function style:import-css() { <link type="text/css" rel="stylesheet" href="style-b.css"/> }; declare function style:header() { <div class="header"> <h1>Header for Style B</h1> </div> }; declare function style:breadcrumb() { <div class="breadcrumb"> <h1>Breadcrumb for Style B</h1> </div> }; declare function style:footer() { <div class="footer"> <h1>Footer for Style B</h1> </div> };
Style B CSS
body { color: red; }
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ dynamicModule. xq
Examples Wanted
79
Examples Wanted
Examples wanted
If you would like examples of XQuery code to be added to the Wikibook, please list your suggestions here.
Suggestions
XUpdate xquery examples for xml data mining
References
[1] http:/ / www. uwe. ac. uk/ cems [2] http:/ / www. uwe. ac. uk
80
Method
We will start our XQuery by adding the default namespace for XHTML. declare default element namespace "http://www.w3.org/1999/xhtml";
Filling Portlets
81
Filling Portlets
Motivation
You want to be able to create reports that work with industry standard portals. These systems have standard div tags with class attributes that are standardized. For example the searchbox for a page will have the following XHTML: <div class="portal-searchbox"> </div>
Method
We will create a report that is structured as a set of divs with the appropriate class tags. We can then take the URL for this report and add it to the portal management system. Our report will automatically be styled according to the central portal style sheet. Portal software allows These divs need to be filled by XQueries: portal-wrapper portal-top portal-header portal-breadcrumbs portal-searchbox portal-advanced-search portal-footer portal-colophon portal-personaltools
Flickr GoogleEarth
82
Flickr GoogleEarth
Flickr photos which are geo-coded can be used to generate a GoogleEarth overlay. [** API not functional on this server yet **] Select Photos [1] The code for the Flickr Api to kml transformation. $flickrKey is my Flickr API key (not shown).
declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml";
let $user := string(local:callFlickr("flickr.people.findByUsername",concat("username=",$username))//user/@id) return <Folder> <name>Places for {$username} tagged {$tags}</name> { for $photo in local:callFlickr("flickr.photos.search",(concat("user_id=",$user),concat("tags=",$tags)))//photo let $photo_id := string($photo/@id) let $details := local:callFlickr("flickr.photos.getInfo",concat("photo_id=",$photo_id))//photo
where exists($details/location) return <Placemark> <name>{string($details/title)}</name> <description> {let $url := string(local:callFlickr("flickr.photos.getSizes",concat("photo_id=",$photo_id))//size[@label="Small"]/@source) return util:serialize(<div> <a href="http://www.flickr.com/photos/{string($details/owner/@nsid)}/{$photo_id}"><img src="{$url}"/></a> </div>,()) } <div>{string($details/description)}</div> </description> <Point> <coordinates>{string($details/location/@longitude)},{string($details/location/@latitude)},0</coordinates> </Point> </Placemark> } </Folder>
Flickr GoogleEarth
83
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ selectFlickr. xq
FLWOR Expression
Motivation
You have a sequence of items and you want to create a report that contains these items.
Method
We will use a basic XQuery FLWOR expression to iterate through each of the items in a sequence. The five parts of a FLWOR expression are: for - specifies what items in the sequence you want to select (optional) let - used to create temporary names used in the return (optional) where - limit items returned (optional) order - change the order of the results (optional)
return - specify the structure of the data returned (required) Here is a simple example of a FLWOR expression: for $book in doc("catalog.xml")/books/book let $title := $book/title/text() let $price := $book/price/text() where xs:decimal($price) gt 50.00 order by $title return <book> <title>{$title}</title> <price>{$price}</price> </book> This XQuery FLWOR expression will return all books that have a price over $50.00. Note that we have not just one but two let statements after the for loop. We also add a where clauses to restrict the results to books over $50.00. The results are sorted by the title and the result is a new sequence of book items with both the price and the title in them.
84
Formatting Numbers
Motivation
You want an easy way to format numbers by specifying the picture format of the number. So for example if you want to format numbers with a leading dollar sign, commas and two decimal places you would use the following "picture format": format-number($my-decimal, "$,000.00") If the input number was 1234 the output would be $1,234.00
Source Code
We will create an XQuery function that takes two arguments. One decimal number and the second a string that specifies the picture format. We will pass these both to a small XSLT stylesheet.
(: the numeric picture format function from XPath 2.0. eXist we must enable Saxon as the default XSLT engine. conf.xml file in the eXist folder for details. :) declare function local:format-number($n as xs:decimal ,$s as xs:string) as xs:string { string(transform:transform( <any/>, <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match='/'> <xsl:value-of select="format-number({$n},'{$s}')"/> </xsl:template> </xsl:stylesheet>, () To work with See the
Formatting Numbers
)) };
85
Usage
The XSLT 1.0 format-number() [2] function takes two arguments. The first is a decimal number and the second is a string that represents a picture of the output you desire. The format string is defined in the Java class DecimalFormat
[3]
If you want comma-separated values: local:format-number($my-decimal, ',000') If you want leading dollar signs: local:format-number($my-decimal, '$,000') The format of negative numbers is specified in a second picture format followed by a comma. If you want negative numbers to have a minus sign: local:format-number($my-decimal, '0,000.00;-0,000.00')
Run tests
Run [4]
Test
Run tests [5]
Discussion
It is our sincere hope that a future version of XQuery includes the functions to allow the developer to easily format both numeric and date formats.
Reference
Blog posting on XML Connections blog on format-number() written in XQuery [6]
References
[1] [2] [3] [4] [5] [6] http:/ / www. w3. org/ TR/ xquery-11-requirements/ #numeric-formatting http:/ / www. w3. org/ TR/ xslt#format-number http:/ / java. sun. com/ j2se/ 1. 4. 2/ docs/ api/ java/ text/ DecimalFormat. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Test/ formatnumber-xslt. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Test/ formatnumber-xquery. xq http:/ / www. xml-connection. com/ 2007/ 08/ formatting-numbers-in-xquery-10. html
86
Approach
Typically, the steps required to generate a PDF document are: retrieve or compute the base XML document transform to XSL-FO, perhaps using XSL transform the XSL-FO to PDF using Apache FOP
Method
We will use a built-in function to convert XSL-FO into PDF. (See ../Installing the XSL-FO module/ if this module is not installed and configured.)
This file can be saved directly to the XML file system. It will be stored as a non-searchable binary. You can then view this directly by providing a link to the file or you can send it directly to the browser by using the response:stream-binary() function as follows:
return response:stream-binary($pdf-binary, 'application/pdf', 'myGeneratedPDF.pdf')
Generating PDF from XSL-FO files </fo:root> let $pdf := xslfo:render($fo, "application/pdf", ()) return response:stream-binary($pdf, "application/pdf", "output.pdf") Execute [1]
87
Where the two possible values for the processorAdapter parameter are:
org.exist.xquery.modules.xslfo.ApacheFopProcessorAdapter for Apache's FOP org.exist.xquery.modules.xslfo.RenderHouseXepProcessorAdapter for RenderHouse's XEP
If the module is correctly loaded then you should see it in the function documentation. Make sure that you have correctly edited the $EXIST_HOME/extensions/build.properties to set XSLFO to to be true: Change: # XSL FO transformations (Uses Apache FOP) include.module.xslfo = false To be: include.module.xslfo = true Make sure that the build file can get access to the correct fop.jar file from the Apache web site.
88
Note that fop 1.0 is now available so you can change this task to be the following:
<target name="prepare-libs-xslfo" unless="libs.available.xslfo" if="include.module.xslfo.config"> <echo message="Load: ${include.module.xslfo}"/> <echo message="------------------------------------------------------"/> <echo message="Downloading libraries required by the xsl-fo module"/> <echo message="------------------------------------------------------"/>
<!-- Download the Apache FOP Processor from the Apache Web Site--> <get src="${include.module.xslfo.url}" dest="fop-1.0-bin.zip" verbose="true" usetimestamp="true" /> <unzip src="fop-1.0-bin.zip" dest="${top.dir}/${lib.user}"> <patternset> <include name="fop-1.0/build/fop.jar"/> <include name="fop-1.0/lib/batik-all-1.7.jar"/> <include name="fop-1.0/lib/xmlgraphics-commons-1.3.1.jar"/> </patternset> <mapper type="flatten"/> </unzip> <delete file="fop-1.0-bin.zip"/> </target>
Sample Transcript
The following is a sample transcript: prepare-xslfo:
[echo] Load: true [echo] -----------------------------------------------------[echo] Downloading libraries required by the xsl-fo module [echo] -----------------------------------------------------[fetch] Getting: http:/ / apache. cs. uu. nl/ dist/ xmlgraphics/ fop/ binaries/ fop-1. 0-bin. zip [fetch] To: C:\DOCUME~1\DANMCC~1\LOCALS~1\Temp\FetchTask8407348433221748527tmp [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] .................................................... [fetch] ....................................................
89
At the end of this process you should see the following three jar files in your $EXIST_HOME/lib/extensions folder:
cd $EXIST_HOME/lib/extensions $ ls -l -rwxrwxrwx+ 1 Dan McCreary None 3318083 2010-12-10 09:23 batik-all-1.7.jar -rwxrwxrwx+ 1 Dan McCreary None 3079811 2010-12-10 09:23 fop.jar -rwxrwxrwx+ 1 Dan McCreary None 569113 2010-12-10 09:23 xmlgraphics-commons-1.4.jar
If you do not see these files you can manually copy them from the a download of the XSL-FO binaries. Now go to the $EXIST_HOME directory and type "build". You should not see any error messages. If you do got to the build file and fix or remove the errors. After you reboot you should be able to see the XSL-FO convert the file into a PDF file.
90
If you are using the "wrapper" tool to start your sever you will need to add the following lines to the $EXIST_HOME/tools/wrapper/conf/wrapper.conf # make AWT load the fonts for SVG rendering inside of XSLFO wrapper.java.additional.6=-Djava.awt.headless=true
91
Notes
See ../XSL-FO Tables/ and ../XSL-FO Images/ on how to add print quality tables and charts to your document. When you follow trunk, sometimes conf.xml gets reset to the defaults, and you have to reenable xslfo processing in conf.xml. The error printed if you miss this reads like that: "cannot compile xquery: err:xpst0017 call to undeclared function: xslfo:render".
Acknowledgments
The user Dmitriy has been helpful in the creation of the procedure for installation on systems that do not have source code.
Discussion
The steps to enable the FOP module should be listed somewhere in the eXist administrative site and removed from this Wikibook.
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xsl-fo/ helloworld. xq
92
Example
Starting with a simple list of the tags in this document, we can generate a module which performs an identity transform on a document containing these tags. import module namespace gen = at "gen.xqm"; "http://www.cems.uwe.ac.uk/xmlwiki/gen"
let $tags := ("websites","sites","site","uri","name","description") let $config := <config> <modulename>coupland</modulename> <namespace>http://www.cems.uwe.ac.uk/xmlwiki/coupland</namespace> </config> return gen:create-module($tags, $config) Here is the XML output [1] and the text XQuery file [2] created by adding the line declare option exist:serialize "method=text media-type=text/text";
to the script. If we save this script as, say coupid.xqm, we can use it to generate the transformed document: import module namespace coupland = "http://www.cems.uwe.ac.uk/xmlwiki/coupland" at "coupid.xqm"; let $doc := doc("/db/Wiki/eXist/transformation/Coupland1.xml")/* return coupland:convert($doc)
Generate [3] We can also check if the identity transformation has retained the full structure of the document: import module namespace coupland = "http://www.cems.uwe.ac.uk/xmlwiki/coupland" at "coupid.xqm"; let $doc := doc("/db/Wiki/eXist/transformation/Coupland1.xml")/*
93
Module design
The generated module looks like this: module namespace coupland = "http://www.cems.uwe.ac.uk/xmlwiki/coupland"; (: conversion module generated from a set of tags :) declare function coupland:convert($nodes as node()*) as item()* { for $node in $nodes return typeswitch ($node) case element(websites) return coupland:websites($node) case element(sites) return coupland:sites($node) case element(site) return coupland:site($node) case element(uri) return coupland:uri($node) case element(name) return coupland:name($node) case element(description) return coupland:description($node) default return coupland:convert-default($node) }; declare function coupland:convert-default($node as node()) as item()* { $node }; declare function coupland:websites($node as element(websites)) as item()* { element websites{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:sites($node as element(sites)) as item()* { element sites{ $node/@*, coupland:convert($node/node()) } };
Generating Skeleton Typeswitch Transformation Modules declare function coupland:site($node as element(site)) as item()* { element site{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:uri($node as element(uri)) as item()* { element uri{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:name($node as element(name)) as item()* { element name{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:description($node as element(description)) as item()* { element description{ $node/@*, coupland:convert($node/node()) } };
94
The function convert($nodes) contains the typeswitch statement to dispatch the node to one of the tag functions. Each tag function creates an element of that name, copies the attributes and then recursively calls the convert function passing the child nodes. The default action defined in the function convert-default merely copies the node.
Generation Function
This function generates the code for an XQuery module which performs an identity transformation. There are two parameters tags - a sequence of tags config - an XML node containing definitions of the module name, module prefix and module namespace. declare variable $gen:cr := " "; declare function gen:create-module($tags as xs:string*, $config as element(config) ) as element(module) { let $modulename := $config/modulename/text() let $prefix := $config/prefix/text() let $pre:= concat($modulename,":",$prefix)
Generating Skeleton Typeswitch Transformation Modules let $namespace := ($config/namespace,"http://mysite/module")[1]/text() return <module> module namespace {$modulename} = "{$namespace}"; (: conversion module generated from a set of tags :) <function> declare function {$pre}convert($nodes as node()*) as item()* {{ {$gen:cr} for $node in $nodes return typeswitch ($node) {for $tag in $tags return <s>case element({$tag}) return {$pre}{replace($tag,":","-")}($node) </s> } default return {$pre}convert-default($node) }}; </function> <function> declare function {$pre}convert-default($node as node()) as item()* {{ {$gen:cr} $node }}; </function> {for $tag in $tags return <function> declare function {$pre}{replace($tag,":","-")}($node as element({$tag})) as item()* {{ {$gen:cr} element {$tag} {{ $node/@*, {$pre}convert($node/node()) }}{$gen:cr} }}; </function> } </module> };
95
96
and we can modify the calling script: let $doc := doc("/db/Wiki/eXist/transformation/Coupland1.xml") let $tags := gen:tags($doc) let $config := <config> <modulename>coupland</modulename> <namespace>http://www.cems.uwe.ac.uk/xmlwiki/coupland</namespace> </config> return gen:create-module($tags, $config) Generate [5]
generating each tag function: <function> declare function {$pre}{replace($tag,":","-")}($node as element({$tag})) as item()* {{ {$gen:cr} {util:call($callback,$tag,$pre)}{$gen:cr} }}; </function> To generate a basic transformation to HTML, with HTML elements being copied while non-HTML elements are converted to div elements with an additional class attribute, we define the function to create the code body, create the function reference and call the convert function:
Generating Skeleton Typeswitch Transformation Modules import module namespace gen = "http://www.cems.uwe.ac.uk/xmlwiki/gen" at "gen.xqm"; declare namespace fx = "http://www.cems.uwe.ac.uk/xmlwiki/fx"; declare variable $fx:html-tags := ("p","a","em","q"); declare function fx:tag-code ($tag as xs:string, $pre as xs:string) { if ($tag = $x:html-tags) then <code> element {$tag} {{ $node/@*, {$pre}convert($node/node()) }} </code> else <code> element div {{ attribute class {{"{$tag}" }}, $node/(@* except class), {$pre}convert($node/node()) }} </code> }; declare option exist:serialize "method=text media-type=text/text"; let $doc := doc("/db/Wiki/eXist/transformation/Coupland1.xml") let $tags := gen:tags($doc) let $callback := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/x","fx:tag-code"),2) let $config := <config> <modulename>coupland</modulename> <namespace>http://www.cems.uwe.ac.uk/xmlwiki/coupland</namespace> </config> return gen:create-module($tags, $callback, $config) Generate [6]
97
98
<xsl:template match="websites/category"> <div> <div class="span-10"> <h3> <xsl:value-of select="name"/> </h3> <h4> <xsl:value-of select="subtitle"/> </h4> <xsl:copy-of select="description/node()"/> </div> <div class="span-14 last"> <xsl:apply-templates select="../sites/site"> <xsl:sort select="(sortkey,name)[1]" order="ascending"/>
99
</xsl:template>
</xsl:stylesheet>
and XQuery to apply this server-side: declare option exist:serialize "method=xhtml media-type=text/html"; let $doc := doc("/db/Wiki/eXist/transformation/Coupland1.xml") let $ss := doc("/db/Wiki/eXist/transformation/tohtml.xsl") return transform:transform($doc, $ss,())
100
References
[1] [2] [3] [4] [5] [6] [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidxml. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtext. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtrans. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidcompare. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtext2. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtext3. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtransxsl. xq
Method
xqDoc [1] is a standard for formatting comments in XQuery modules. The eXist system comes with a XQuery module which parses XQuery modules containing comments in this format and generates XML in the xqDoc XML format [2]. This XML can then be transformed into other formats such as HTML, PDF, DocBook or ePub.
Sample Output
Sample XQuery script
xquery version "1.0"; (:~ : This is a simple module which contains a single function : @author Dan McCreary : @version 1.0 : @see http://xqdoc.org : :) module namespace simple = "http://simple.example.com"; (:~ : this function accepts two integers and returns the sum : : @param $first - the first number : @param $second - the second number : @return the sum of $first and $second
Generating xqDoc-based XQuery Documentation : @author Dan McCreary : @since 1.1 : :) declare function simple:add($first as xs:integer, $second as xs:integer) as xs:integer { $first + $second };
101
<xqdoc:comment> <xqdoc:description> This is a simple module which contains a single function</xqdoc:description> <xqdoc:author> Dan McCreary</xqdoc:author> <xqdoc:version> 1.0</xqdoc:version> <xqdoc:see> http://xqdoc.org</xqdoc:see>
(:~ : This is a simple module which contains a single function : @author Dan McCreary : @version 1.0 : @see http://xqdoc.org : :) module namespace simple = "http://simple.example.com";
(:~ : this function accepts : : @param $first - the first number : @param $second - the second number : @return the sum of $first and $second : @author Dan McCreary : @since 1.1 two integers and returns the sum
102
<xqdoc:author> Dan McCreary</xqdoc:author> <xqdoc:param> $first - the first number </xqdoc:param> <xqdoc:param> $second - the second number</xqdoc:param> <xqdoc:return> the sum of $first and $second</xqdoc:return> <xqdoc:since> 1.1 </xqdoc:since>
</xqdoc:comment> <xqdoc:name>add</xqdoc:name> <xqdoc:signature>add($first as xs:integer, $second as xs:integer) as xs:integer</xqdoc:signature> <xqdoc:body xml:space="preserve">declare function simple:add($first as xs:integer, $second as xs:integer) as xs:integer{ $first + $second };</xqdoc:body> </xqdoc:function> </xqdoc:functions> </xqdoc:xqdoc>
Known Problems
The parser for the XQuery doc is slightly different than the standard XQuery parser for eXist. In some cases an XQuery that works with eXist will fail under the XQDocs parser. old-style variable declarations are still supported in eXist but not by the xqDoc parser For example the following variable declaration: declare variable $foo:bar { 'Hello World' }; is valid in eXist XQuery but this syntax is not valid in xqDoc which only supports the XQuery standard declaration e.g. declare variable $foo:bar := 'Hello World'; comments must be valid XML text. This is more restrictive than in XQuery. For example < and & must be expressed as < and &
103
References
[1] [2] [3] [4] http:/ / xqdoc. org/ http:/ / xqdoc. org/ xqdoc-1. 0. xsd http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xqdoc/ test. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xqdoc/ geodoc. xq
Implementation
This script uses the the unzip function in the eXist compression module. This function uses higher order functions to filter the required components of the zipped file and to process each component.
104
declare function fw:process($path as xs:string,$type as xs:string, $data as item()? , $param as item()*) { (: return the XML :) $data };
let $uri := request:get-parameter("uri","http://www.iso.org/iso/iso_3166-1_list_en.zip") let $zip := httpclient:get(xs:anyURI($uri), true(), ())/httpclient:body/text() let $filter := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:filter"),3) let $process := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:process"),4) let $xml := compression:unzip($zip,$filter,(),$process,()) return $xml
Execute [1]
Get zipped XML file }; Running this on a Office Open XML file returns the following: <item <item <item <item <item <item <item <item <item <item <item path="[Content_Types].xml" type="resource">Types</item> path="_rels/.rels" type="resource">Relationships</item> path="word/_rels/document.xml.rels" type="resource">Relationships</item> path="word/document.xml" type="resource">w:document</item> path="word/theme/theme1.xml" type="resource">a:theme</item> path="word/settings.xml" type="resource">w:settings</item> path="word/fontTable.xml" type="resource">w:fonts</item> path="word/webSettings.xml" type="resource">w:webSettings</item> path="docProps/app.xml" type="resource">Properties</item> path="docProps/core.xml" type="resource">cp:coreProperties</item> path="word/styles.xml" type="resource">w:styles</item>
105
declare function fw:filter($path as xs:string, $type as xs:string, $param as item()*) as xs:boolean { (: pass all :) true() };
declare function fw:process($path as xs:string,$type as xs:string, $data as item()? , $param as item()*) { (: store the XML in the nominated directory :)
let $baseCollection := "/db/apps/zip/data/" let $uri := request:get-parameter("uri","http://www.iso.org/iso/iso_3166-1_list_en.zip") let $unzipCollection := request:get-parameter("dir","temp") let $zip := httpclient:get(xs:anyURI($uri), true(), ())/httpclient:body/text() let $filter := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:filter"),3) let $process := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:process"),4)
let $login :=
xmldb:login("/db","admin","password")
106
declare function fw:filter($path as xs:string, $type as xs:string, $param as item()*) as xs:boolean { (: pass all :) true() };
declare function fw:process($path as xs:string,$type as xs:string, $data as item()? , $param as item()*) { (: store the XML in the nominated directory :)
(: we need to encode the filename to account for filenames with illegal characters like [Content_Types].xml :) let $path := xmldb:encode($path) (: ensure mime type is set properly for .rels files which are xml alternatively you could add this mime type to the mime-types.xml configuration file :) return if (ends-with($path, '.rels')) then xmldb:store($param/@directory, $path, $data, 'application/xml') else xmldb:store($param/@directory, $path, $data) };
let $baseCollection := "/db/apps/zip/data/" let $uri := request:get-parameter("uri","http://www.iso.org/iso/iso_3166-1_list_en.zip") let $unzipCollection := request:get-parameter("dir","temp") let $zip := httpclient:get(xs:anyURI($uri), true(),
107
let $login :=
xmldb:login("/db","admin","password")
let $fullPath := concat($baseCollection, $unzipCollection) let $mkdir := if (xmldb:collection-available($fullPath)) then () else xmldb:create-collection($baseCollection, $unzipCollection)
let $store := compression:unzip($zip,$filter,(),$process,<param directory="{$fullPath}"/>) return <result> {for $file in $store return <file>{$file}</file> } </result>
declare function fw:filter($path as xs:string, $type as xs:string, $param as item()*) as xs:boolean { (: filter any files which are not required :) if (ends-with($path,".bin")) then false() else true() };
declare function fw:process($path as xs:string,$type as xs:string, $data as item()? , $param as item()*) { (: parse the path and create a collection if necessary :) let $steps := tokenize($path,"/") let $nsteps := count($steps) let $filename := $steps[$nsteps] let $collection := string-join(subsequence($steps,1,$nsteps - 1 ),"/") let $baseCollection := string($param/@collection) let $fullCollection := concat($baseCollection,"/",$collection) let $mkdir :=
108
let $zip :=
httpclient:get(xs:anyURI($path), true(),
())/httpclient:body/text()
let $login :=
xmldb:login("/db","admin","password")
let $collection := concat($baseCollection, $unzipCollection) let $mkdir := if (xmldb:collection-available($collection)) then () else xmldb:create-collection($baseCollection, $unzipCollection)
let $store := compression:unzip($zip,$filter,(),$process,<param collection="{$collection}"/>) return <result> {for $file in $store return <file>{$file}</file> } </result>
109
<mime-type name="application/zip" type="binary"> <description>ZIP archive and Office Open XML</description> <extensions>.zip,.docx,.xlsx,.pptx</extensions> </mime-type> You will need to reboot the server for this change to take effect. The basic script remains the same with minor modifications let $path := request:get-parameter("path","http://www.iso.org/iso/iso_3166-1_list_en.zip") let $unzipCollection := request:get-parameter("dir","temp") let $zip := if (starts-with($path,"http")) then httpclient:get(xs:anyURI($path), true(), ())/httpclient:body/text() else util:binary-doc($path)
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Codes/ getCountries. xq
110
Bullet Bars
From this article Bullet Bars with Google Charts Charts API.
[1]
we can see that it is easy to create bullet bars using the Google
Sample URL
http:/ / chart. apis. google. com/ chart?cht=bhs& chs=150x30& chd=t:70& chm=r,ff0000,0,0. 0,0. 5|r,ffff00,0,0. 5,0. 75|r,00A000,0,0. 75,1. 0|r,000000,0,0. 8,0. 81& chco=000000& chbh=10
Google Chart Bullet Bar actual-value value of the central black line
111
References
[1] http:/ / broadcast. oreilly. com/ 2008/ 11/ creating-bullet-bars-with-goog. html
112
Method
The GoogleChart API [1] creates PNG-format charts from data passed in the URL line. One use of the service would be to generate a Tufte sparkline [2]. This script uses random data to generate a small sparkline-like graphic. With a bit more work, the additional features such as minimum, maximum and normal bands should be able to be added. A line chart (cht=lc) includes axes but these can be removed by using an undocumented feature in which the chart type is specified as lfi [3] The script uses function overloading in XQuery which allows two functions to have the same name but different numbers of parameters. The more general function has parameters for the sequence of values and the min and max to be used in scaling the values. The second function (with the same name) accepts only the values and calculates the min and max from the data before calling the more general function to complete the task.
(: This script illustrates the use of the GoogleChart API to generate
a sparkline-like graphic
declare function local:simple-encode( $vals as xs:decimal* , $min as xs:decimal, $max as xs:decimal) as xs:string { (: encode the sequence of numbers as a string according to the simple encoding scheme. the data values are encoded in the characters A-Z,a-z,0-9 giving a range from 0 to 61 :) let $scale := 62.0 div ($max - $min) let $simpleEncode := "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" return string-join( for $x in $vals let $n := floor( ($x - $min) * $scale) return substring($simpleEncode,$n+1,1) ,"") };
declare function local:simple-encode($vals as xs:decimal*) as xs:string { (: compute the minimulm and maximum values, then call the more general
113
declare function local:sparkline( $data as xs:decimal* , $fontHeight as xs:integer, $pointSize as xs:integer, $label as xs:string ) as element(span) { (: create a span element containing the line chart of the data, the name of the data set and the last data value fontHeight and pointSize are :) let $codeString := local:simple-encode($data) let $width := count($data) * $pointSize let $last := $data[last()] let $title :=concat( "Graph of ",$label, " data: ",count($data)," defined in pixels
values, min ",min($data), " max ", max($data)) return <span> <img src="http://chart.apis.google.com/chart?chs={$width}x{$fontHeight}&chd=s:{$codeString}&cht=lfi" alt="{$title}" title="{$title}" /> <font style="font-size:{$fontHeight}px"> {$label} {$last}</font> </span>
};
(: generate some random data :) let $data := for $i in (1 to 100) return floor(math:random() * 10) return local:sparkline($data,15,1,"Random")
References
[1] [2] [3] [4] http:/ / code. google. com/ apis/ chart/ http:/ / www. edwardtufte. com/ bboard/ q-and-a-fetch-msg?msg_id=0001OR http:/ / 24ways. org/ 2007/ tracking-christmas-cheer-with-google-charts http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Graph/ randomSparkline. xq
Google Charts
114
Google Charts
Motivation
You want a XQuery function to create charts using the Google Chart API service [1].
Method
We will create a simple XQuery function that takes the required parameters of a Google Chart (i.e. chart data, size, colors, and labels). It will then construct a URL with the correct values. You can then embed this URL in your XQuery to display the chart.
Source Code
Here is an example function: declare function utility:graph($type, $colors, $size, $markers, $data, $alt, $title, $barwidthandspacing, $linestyles) { let $parameters := <Parameters> <Parameter label="chco" value="{$colors}"/> <Parameter label="chl" value="{$markers}"/> <Parameter label="chtt" value="{$title}"/> <Parameter label="chbh" value="{$barwidthandspacing}"/> <Parameter label="chls" value="{$linestyles}"/> </Parameters> let $src := concat('http://chart.apis.google.com/chart?', 'cht=', $type, '&chs=', $size, for $parameter in $parameters//Parameter[@value ne ''] return
Google Charts concat('&', $parameter/@label,'=', $parameter/@value) '&chd=t:', $data) return <img alt="{$alt}" src="{$src}"/> };
115
<Parameter label="chxt" value="{$axes}" required="false"/> <Parameter label="chxl" value="{$axislabels}" required="false"/> <Parameter label="chxp" value="{$axislabelpositions}" required="false"/> <Parameter label="chxr" value="{$axisrange}" required="false"/> <Parameter label="chxs" value="{$axisstyles}" required="false"/> <Parameter label="chp" value="{$zeroline}" required="false"/> <Parameter label="chxtc" value="{$ticklength}" required="false"/> <Parameter label="chma" value="{$margin}" required="false"/> <Parameter label="chf" <Parameter label="chg" value="{$fill}" required="false"/> value="{$grid}" required="false"/>
<Parameter label="chdl" value="{$legend}" required="false"/> <Parameter label="chdlp" value="{$legendplacement}" required="false"/> </Parameters> let $optional-parameters := string-join( ( for $parameter in $parameters//Parameter[@required = 'false'][@value ne ''] return
Google Charts
concat('&', $parameter/@label, '=', $parameter/@value) ) ,'') let $src := concat('http://chart.apis.google.com/chart?', 'cht=', $type, '&chs=', $size, $optional-parameters, '&chd=t:', $data) return <img alt="{$alt}" src="{$src}"/> };
116
Acknowledgments
Fraser Hore and Dmitriy Shabanov posted these examples to the eXist mailing list.
Resources
Sample XML Schema for checking Google Chart Parameters [2]
References
[1] http:/ / chart. apis. google. com/ chart?chxl=0:|Jan|Feb|Mar|Apl|May|Jun|1:|10|50|100& chxr=0,-5,100& chxt=x,r& chbh=a& chs=300x150& cht=bvs& chco=0000FF& chd=t:10,20,30,40,50,60& chp=0. 05& chtt=Downloads+ Per+ Month [2] http:/ / code. google. com/ p/ xrx/ source/ browse/ trunk/ 20-google-charts/ schemas/ google-charts. xsd
Graphing Triples
117
Graphing Triples
The RDF Validation service [1] can be used to graph RDF, but since this expands prefixed names to full URIs the graphs can look rather un-readable as examples. This service is for drawing simple triple graphs where each triple is defined in a local XML format in which each triple has attributes subject, property and object. [XQuery function to convert RDF to N3 needed] Subjects and objects are drawn as nodes, triples as arcs with the property as the label. If the subject or object contains ':' or starts with 'http:/ / ', then the node is shown as an ellipse. If it starts with '_' it is a blank node and an unnamed circle is shown; otherwise the node is assumed to be a literal and is drawn in a box.
Endpoint
[2]
Parameters
url : url of triples in the xml format illustrated above dir : LR - left to right (default), TB - top to bottom [ rankdir in Graphviz] title title for graph default none
Example
From page 11 of the RDF primer [3]
<?xml version="1.0" encoding="UTF-8"?> <graph> <triple subject="exstaff:85740" property="exterms:address" object="exaddressid:87540"/> <triple subject="exaddressid:87540" property="exterms:street" object="1501 Grant Avenue"/> <triple subject="exaddressid:87540" property="exterms:city" object="Bedford"/> <triple subject="exaddressid:87540" property="exterms:state" object="Massachusetts"/> <triple subject="exaddressid:87540" property="exterms:postalcode" object="01730"/> </graph>
dot Output [4] digraph { rankdir='LR' "exstaff:85740" [label="exstaff:85740" shape=ellipse]; "exaddressid:87540" [label="exaddressid:87540" shape=ellipse]; "1501 Grant Avenue" [label="1501 Grant Avenue" shape=box]; "Bedford" [label="Bedford" shape=box]; "Massachusetts" [label="Massachusetts" shape=box]; "01730" [label="01730" shape=box]; "exstaff:85740" -> "exaddressid:87540" [label="exterms:address"]; "exaddressid:87540" -> "1501 Grant Avenue" [label="exterms:street"]; "exaddressid:87540" -> "Bedford" [label="exterms:city"]; "exaddressid:87540" -> "Massachusetts" [label="exterms:state"]; "exaddressid:87540" -> "01730" [label="exterms:postalcode"]; } GIF Image
Graphing Triples http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2image. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/RDF/triple2dot.xq?url%3Dhttp://www.cems.uwe.ac.uk/xmlwiki/RDF/egtriples4.xml
118
Usage
Either save the generated gif, or use 5clicks [5] or similar to capture the image on the screen. One way to print large GIF images is to save the image, then insert it into an Excel spreadsheet. Excel will print the image over multiple pages. Reduced in size and with page borders removed, even large graphs can be printed and then taped together.
Source
declare declare declare declare declare option exist:serialize "method=text media-type=text/text"; variable $nl := " "; variable $url := request:get-parameter("url",()); variable $dir := request:get-parameter("dir","LR"); variable $title := request:get-parameter("title","");
let $graph := doc($url) return ( "digraph ",$title, " { rankdir='" , $dir,"' ", $nl, for $node in distinct-values(($graph//triple/@subject,$graph//triple/@object)) let $nodetype := if (contains($node,":") or starts-with ($node,"http://")) then concat ('label="',$node,'" shape=ellipse') else if (starts-with($node,"_")) then 'shape=circle' else concat ('label="',$node,'" shape=box') return concat ('"',$node,'" [',$nodetype,'];',$nl) , for $triple in $graph//triple return ( concat ('"', $triple/@subject, '" -> "' , $triple/@object ,'" [label="',$triple/@property, '"];'), $nl) , "} ",$nl ) This script would be improved by the use of an intermediate XML structure and a XSLT script to convert to dot.
Graphing Triples
119
References
[1] [2] [3] [4] [5] http:/ / www. w3. org/ RDF/ Validator/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ triple2dot. xq http:/ / www. w3. org/ TR/ REC-rdf-syntax/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ triple2dot. xq?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ egtriples4. xml http:/ / www. screen-capture. net/
Grouping Items
Motivation
You have many items in a set of data that have a category associated with them. You want to create a report that sorts the items by a category.
Method
We will perform the query in three steps. 1. use a FLOWR statement create a sequence of the distinct categories using the distinct-values() function 2. for each item in the category sequence, select all items that belong to that category. This will be done by adding a predicate (where clause) to the end of our XPath selector. This takes the form of data/item[x=y] where if x=y returns true the item will be added to the sequence 3. for each result set in the FLOWR statement return the category name and then all the items in that category
Sample Data
<items> <item> <name>item #1</name> <category>red</category> </item> <item> <name>item #2</name> <category>green</category> </item> <item> <name>item #3</name> <category>red</category> </item> <item> <name>item #4</name> <category>blue</category> </item> <item> <name>item #5</name> <category>red</category> </item> <item> <name>item #6</name>
Grouping Items <category>blue</category> </item> <item> <name>item #7</name> <category>green</category> </item> <item> <name>item #8</name> <category>red</category> </item> </items>
120
Sample Query
The following XQuery will demonstrate this technique. Note that the distinct values for all the categories are stored in the $distinct-categories variable. xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html"; (: load the items :) let $data := doc('/db/mdr/apps/training/labs/04-group-by/data.xml')/items let $distinct-categories := distinct-values($data/item/category/text()) return <html> <body> <table border="1"> <thead> <tr> <th>Category</th> <th>Items</th> </tr> </thead> <tbody> { for $category in $distinct-categories return <tr> <td>{$category}</td> <td>{string-join($data/item[category=$category]/name/text(), ', ')}</td> </tr> } </tbody> </table>
Grouping Items </body> </html> In the query above the statement: $data/item[category=$category] reads as "get all the items from the data set that have the category element equal to the current category. The string-join() function just puts a comma and a space string between the items in the output stream for readability.
121
Sample Output
Category red green blue Items item #1, item #3, item #5, item #8 item #2, item #7 item #4, item #6
Discussion
Note that you are not restricted to having an item be in a single category. Adding multiple categories to an item will not require any changes to the script. You can also add new categories to this list at any time without changing the program above. As long as there are range indexes for the category element the list of all categories will be created very quickly, even for millions of records.
Guest Registry
122
Guest Registry
Guest Registry
Dan McCreary
I am using this book for teaching XQuery to my students. Most of my work is in the government and financial sectors. I am using XQuery with REST and XForms.
Jim Fuller
I am using this book for teaching some aspects of XQuery to people I mentor professionally and also plan to use for a University course in Prague this September.
Rajamani Marimuthu
I am using this book for creating samples and some real time applications for the users .. and also for teaching to my colleagues and learners.
Dominique Rabeuf
I am using this book for creating XForms/XQuery samples and applications
Chris Cargile
I am using this book for creating XQuery samples for research purposes
Joe Wicentowski
I am using this book to capture community innovations in XQuery and develop best practices. I am using XQuery with REST and XForms.
123
Method
You would like to use a single function where you pass a series of functions as parameters to that function.
Simple example
In the following example we will declare two functions. We will then process a list of words by applying these functions to each of the items in the sequence. We will do this by passing the function name as an argument to another function. NOTE: This only appears to work in eXist 1.3. eXist versions 1.2.X have the wrong data type associated with the QName() function. The eXist system needs to turn each function into a function identifier. To do this it needs to call util:function(). util:function takes two arguments, the qualified name of the function (the prefix and the function name) as well as the arity of the function. In this case the arity of a function is the number of arguments that the function takes. The data type of the first argument must be of type QName. The data type of the arity, the second parameter, is an integer. util:function($function as xs:QName, $arity as xs:integer) as function declare namespace fw = "http://www.cems.uwe.ac.uk/xmlwiki/fw"; declare function fw:apply($words as xs:string*, $my-function as function) { for $word in $words return util:call($my-function,$word)
Higher Order Functions }; declare function fw:f1($string) { string-length($string) }; declare function fw:f2($string) { substring($string,1,1) }; let $f1 := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:f1"),1) let $mywords := ("red","green","purple") return <hofs> <data>{$mywords}</data> <hof> <task>length of each string</task> <result>{fw:apply($mywords,$f1)}</result> </hof> <hof> <task>Initial letter of each string</task> <result>{ fw:apply($mywords, util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:f2"),1) ) }</result> </hof> </hofs>
124
Execute [1]
References
Jim Fuller Article on IBM DeveloperWorks [2] - this has an excellent example of how to use Higher Order Functions using Saxon.
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Test/ hof. xq [2] http:/ / www. ibm. com/ developerworks/ edu/ x-dw-x-advxquery. html
125
Method
We will use the xmldb:size() function to generate a list of all the file sizes in a given collection. We can then transform this list into a series of strings that can be passed to the Google Charts line chart function. The format of the size function is the following: xmldb:size($collection, $document-name) This function returns the number of bytes in a file. Our first step is to create a sequence of numbers that represents all of the sizes of resources in a collection: let $sizes := for $file in xmldb:get-child-resources($collection) let $size := xmldb:size($collection, $file) return $size This can also be done with a combination of the collection() function and the util:document-name() function: let $sizes := for $file in collection($collection)/* let $name := util:document-name($file) let $size := xmldb:size($collection, $name) return $size
Sample Program
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html"; (: Put the collection you want to analyze here :) let $collection := "/db/test" (: How many bytes in each section :) let $increment := 10000 (: How many divisions :) let $divisions := 20 (: Color for the lines or the bars in RRGGBB :) let $color := '0000FF'
126
(: For vertical bar chart use 'bvs', for line chart use 'lc', for spark line (no axis) use 'ls' :) let $chart-type := 'bvs' (: this is the max size of a google chart - 30K pixels. number is the width, the second is the height. :) let $chart-size := '600x500' let $uriapi := 'http://chart.apis.google.com/chart?' let $sizes := for $file in xmldb:get-child-resources($collection) let $size := xmldb:size($collection, $file) return $size (: the raw data counts for each range. The 't' is just a marker that it is true that we are in this range. :) let $raw-data := for $range in (0 to $divisions) let $min := $range * $increment let $max := ($range + 1) * $increment return count( for $number in $sizes return if ($number gt $min and $number lt $max) then ('t') else () ) let $max-value := max($raw-data) (: scale to the max height :) let $scaled-data := for $num in $raw-data return string(floor($num div ($max-value div 500))) (: join the strings with commas to get a comma separated list :) let $data-csv := string-join($scaled-data, ',') (: construct the URL :) let $chart-uri := concat($uriapi, 'cht=', $chart-type, '&chs=', $chart-size, '&chco=', $color, '&chd=t:', $data-csv) (: return the results in an HTML page :) <html> The first
Histogram of File Sizes <head><title>Google Chart Histogram View of {$collection}</title></head> <body> <h1>Google Chart Histogram View of {$collection}</h1> <p><img src="{request:encode-url(xs:anyURI($chart-uri))}"/></p> </body> </html>
127
Sample Result
http:/ / chart. apis. google. com/ chart?cht=ls& chd=t:83,6,13,37,85,414,500,87,41,31,11,16,9,12,5,7,4,4,3,1,1 chs=500x500& chco=0000FF&
Discussion
To run the query you will need to customized the name of the collection that you are analyzing. After you run the query you can check to make sure the results are what you expect and then copy the results into a browser URL. Note that if there are files over the max size indicated in the top range, an additional count of these file sizes should be added. let $top-range := $increment * ($divisions + 1) let $top-count := count( for $num in $sizes return if ($num > $overflow) then ('t') else () ) This query could also be parametrized using the get-parameter() function so that many of the parameters that are passed to the Google chart can also be set as a parameter in the XQuery of the URL.
Image Library
128
Image Library
Motivation
You want a script that will display a small thumbnail of all the images in an image collection. The images may have many file suffixes (jpg, png, gif etc).
Method
We will write an XQuery that finds all the child resources in the collection that have the correct file types.
Source Code
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html";
(: look for the collection parameter in the incoming URL. assume a default collection like /db/images. :)
If not
(: you can also change the number of images per row :) let $images-per-row := request:get-parameter('images-per-row', 10)
(: first get all the files in the collection :) let $all-files := xmldb:get-child-resources($collection)
(: now just get the files with known image file type extensions :) let $image-files := for $file in $all-files[ ends-with(.,'.png') or ends-with(.,'.jpg') or ends-with(.,'.tiff') or ends-with(.,'.gif')] return $file
let $image-count := count($image-files) let $rows-count := xs:integer(ceiling($image-count div $images-per-row)) return <html> <head> <title>Images for collection {$collection}</title> </head> <body> Images in collection: {$collection} <table>{ for $row return in (1 to $rows-count)
Image Library
<tr>{ for $col in (1 to $images-per-row) let $n := ($row - 1 ) * $images-per-row + $col
129
return if ($n <= $image-count) then let $image := $images[position = $n ] let $path := concat('/exist/rest', $collection, '/', $image) return <td> <a href="{$path}"><img src="{$path}" height="100px" width="100px"/></a> </td> else <td/> }</tr> }</table> </body> </html> (: blank cells at the end of the last row :)
Incremental Searching
Motivation
You have a large data set and you want to use JavaScript to asynchronously communicate with a server to narrow the scope of the search as a user types.
<html xmlns="http://www.w3.org/1999/xhtml" > <head> <title>ZIP Code to City and State using XmlHttpRequest</title> <script language="javascript" src="ajaxzip.js"/> </head> <body> <h1>US Zipcode decoder</h1>
Incremental Searching
<form onSubmit="getList(); return false"> <p>ZIP code: <input type="text" size="5" name="zip" id="zip" onkeyup="getList();" onfocus="getList();" /> e.g. 95472 </p> </form> <div id="list"/> </body> </html>
130
Javascript
Uses XMLHttpRequest to request the subset and innerHTML to update the page.
function updateList() { if (http.readyState == 4) { var divlist = document.getElementById('list'); divlist.innerHTML = http.responseText; isWorking = false; } }
function getList() { if (!isWorking && http) { var zipcode = document.getElementById("zip").value; http.open("GET", "getzip.xq?zipcode=" + escape(zipcode), true); http.onreadystatechange = updateList; // this sets the call-back function to be invoked when a response from the HTTP request is returned isWorking = true; http.send(null); } }
function getHTTPObject() { var xmlhttp; /*@cc_on @if (@_jscript_version >= 5) try { xmlhttp = new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { try { xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } catch (E) { xmlhttp = false; } } @else xmlhttp = false; @end @*/
Incremental Searching
if (!xmlhttp && typeof XMLHttpRequest != 'undefined') { try { xmlhttp = new XMLHttpRequest(); xmlhttp.overrideMimeType("text/xml"); } catch (e) { xmlhttp = false; } } return xmlhttp; }
131
XQuery search
The Server-side XQuery to perform the search in the XML database and generate the XHTML. This uses the eXist full text index and the exist-specific &= operator. let $zipcode := request:get-parameter("zipcode",()) return <div> {if (string-length($zipcode) > 1) (: too slow :) then let $search := concat('^',$zipcode) for $zip in //Zipcode[matches(Code,$search)] return <div>{string-join(($zip/Code,$zip/Area,$zip/State),' ')}</div> else () } </div>
Incremental Searching <State>NH</State> </Zipcode> <Zipcode> <Code>213</Code> <Area>Portsmouth</Area> <State>NH</State> </Zipcode> ... Execute [3]
132
References
[1] http:/ / acg. media. mit. edu/ people/ fry/ zipdecode/ [2] http:/ / processing. org/ [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ajax/ zipcode. xq
Diagrams
1. ../Graph Visualization 2. ../Sequence Diagrams
Geocoding
1. ../Google Geocoding 2. ../Nationalgrid and Google Maps 3. ../String Analysis
Dates
1. ../Net Working Days
Development
1. ../XQuery IDE
E-learning
1. ../Example Sequencer
133
Graphs
1. ../Graph Visualization 2. ../Topological Sort
Mathematics
1. ../Project Euler
Page Scraping
1. 2. 3. 4. 5. ../Delivery Status Report ../Page scraping and Yahoo Weather ../Wikipedia Page scraping ../Wikipedia Lookup ../Wiki weapons page
Pipelines
1. ../Page_scraping_and_Yahoo_Weather
Puzzles
1. ../Fizzbuzz
Photos
1. ../Flickr GoogleEarth 2. ../XMP data
Python
1. ../XQuery and Python
Regular Expressions
1. ../String Analysis
RSS
1. ../Simple RSS reader
Strings
1. ../String Analysis 2. ../Tag Cloud
SQL
1. ../XML to SQL 2. ../XQuery from SQL
134
Tables
1. ../Searching,Paging and Sorting 2. ../Table View
Trees
1. ../Tree View 2. ../Validating a hierarchy
URL Parameters
1. ../Getting URL Parameters 2. ../Checking for Required Parameters 3. ../Parsing Query Strings
VoiceXML
1. Simple RSS reader
Weather
1. Page scraping and Yahoo Weather
XForms
1. ../Simple XForms Examples
XSLT
1. ../XQuery and XSLT
135
Gotchas
1. ../Gotchas
Regular Expressions
1. ../String Analysis/
URL Parameters
1. ../Getting URL Parameters/ 2. ../Checking for Required Parameters/ 3. ../Parsing Query Strings/
XPath Navigation
1. ../XPath examples/
136
Result Document
<root foo="bar"> <message>Hello World</message> </root>
Result Document
<root foo="new-value"> <message>Hello World</message> </root>
137
References
[1] [2] [3] [4] http:/ / exist-db. org/ quickstart. html http:/ / localhost:8080/ exist http:/ / localhost:8080/ exist/ sandbox/ http:/ / exist-db. org/ webdav. html
138
Step 3: Update the path to the new XSL-FO zip file in the configuration file
include.module.xslfo.url = http://apache.cs.uu.nl/dist/xmlgraphics/fop/binaries/fop-0.95-bin.zip
You will find that the Apache project removed the old binaries (not always best practice). This is the new line this should be replaced with.
include.module.xslfo.url = http://apache.cs.uu.nl/dist/xmlgraphics/fop/binaries/fop-1.0-bin.zip
And additionally change references to the deprecated version in $EXIST_HOME extensions/modules/build.xml to point to the newer version. E.g:
<!-- Apache FOP --> <get src="${include.module.xslfo.url}" dest="fop-1.0-bin.zip" verbose="true" usetimestamp="true" /> <unzip src="fop-1.0-bin.zip" dest="${top.dir}/${lib.user}"> <patternset> <include name="fop-1.0/build/fop.jar"/> <include name="fop-1.0/lib/batik-all-*.jar"/> <include name="fop-1.0/lib/xmlgraphics-commons-*.jar"/> <include name="fop-1.0/lib/avalon-*.jar"/> </patternset> <mapper type="flatten"/> </unzip> <delete file="fop-1.0-bin.zip"/>
139
[get] last modified = Thu Jul 31 09:47:44 CDT 2008 [unzip] Expanding: C:\workspace\exist\extensions\modules\fop-1.0-bin.zip in to C:\workspace\exist\lib\user
This can be downloaded from the Apache XML Graphics Commons Distribution Mirror [2]
Installing the XSL-FO module total 7480 -rwxrwxrwx -rw-rw-r--rw-rw-r--rwxrwxrwx -rwxrwxrwx -rw-r--r--
140
1 1 1 1 1 1
root 56290 Nov 3 2009 ec2-user 3318083 Jul 12 2010 ec2-user 3079811 Jul 12 2010 root 434812 Nov 3 2009 root 117470 Nov 3 2009 root 569113 Nov 12 17:28
References
[1] http:/ / exist. sourceforge. net/ building. html [2] http:/ / www. apache. org/ dyn/ closer. cgi/ xmlgraphics/ commons [3] http:/ / demo. exist-db. org/ exist/ building. xml#build-system
141
Search Terms
Structured Search - retaining an using the structure of a document to aid in search result ranking Search Document - a document or document fragment that is returned as the result of a search. Note that in the search examples the word "document" may imply an entire XML document or a fragment or item in such a document. Search Query - a word or phrase to find Boolean Search - a search for an XML document that is either true or false with no ordering of search results Search Hit - a match of a query to a document or document fragment Hit Scoring - a method of assigning a weight to a search result for sorting. For example if a term occurs more frequently in a document it might receive a higher score Search Ranking - an order list of search results Global Search - searching one or more item types in a database Item Viewer - a program used to view a specific item type in a collection
References
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schtze, Introduction to Information Retrieval, Cambridge University Press. 2008: online edition [1]
References
[1] http:/ / www-csli. stanford. edu/ ~hinrich/ information-retrieval-book. html
Keyword Search
142
Keyword Search
Motivation
You want to create a Google-style keyword search interface to an XML database with relevance-ranked, full-text search of selected nodes and search results in which the keyword in context is highlighted, as shown below.
Method
Our search engine will receive keywords from a simple HTML form, assigning them to the variable $q. Then it (1) parses the keywords, (2) constructs the scope of the query, (3) executes the query, (4) scores and sorts the hits according to the score, (5) shows the linked results with a summary containing the keyword highlighted in context, and (6) paginates the results. Note: This tutorial was written against eXist 1.3, which was a development version of eXist; since then eXist 1.4 has been released, which altered several aspects of eXist slightly. This article has not yet been fully updated to account for the changes. The most notable changes are that (1) the kwic.xql file referenced here is now a built-in module and (2) the previous default fulltext search index (whose search operator is below as &=) is disabled by default in favor of the new, Lucene-based fulltext index, which speeds both search and scoring considerably. The changes required to make the code work with 1.4 will be extensive, but nonetheless the article is instructive in its current form. Lastly, this example will not run under versions prior to 1.3.
Keyword Search
143
Collection B File='/db/test/people/2.xml'
<person id="2" xmlns="http://en.wikibooks.org/wiki/XQuery/test"> <name>Joe Doe</name> <role type="author"/> <contact type="e-mail">joeschmoe@mail.net</contact> <biography>Joe Doe was born in Brooklyn, New York, and he now lives in Boston, Massachusetts.</biography> </person>
Keyword Search
144
Search Form
File='/db/test/search.xq' xquery version "1.0"; declare namespace test="http://en.wikibooks.org/wiki/XQuery/test"; declare option exist:serialize "method=xhtml media-type=text/html"; <html> <head><title>Keyword Search</title></head> <body> <h1>Keyword Search</h1> <form method="GET"> <p> <strong>Keyword Search:</strong> <input name="q" type="text"/> </p> <p> <input type="submit" value="Search"/> </p> </form> </body> </html> Note that the form element can also contain an action attribute such as action="search.xq" to specify the XQuery function to use.
Keyword Search let $q := xs:string(request:get-parameter("q", "")) let $filtered-q := replace($q, "[^0-9a-zA-Z\-,. ]", "")
145
Keyword Search
146
Keyword Search
<span class="url">{concat($base-uri, $url)}</span> </p> </div> else let $title := concat('Unknown result. Collection: ', $collection, '. Document: ', $document, '.') let $summary := kwic:summarize($hit, $config) let $url := concat($collection, '/', $document) return <div class="result"> <p> <span class="title"><a href="{$url}">{$title}</a></span><br/> {$summary/*}<br/> <span class="url">{concat($base-uri, $url)}</span> </p> </div>
147
We also need to provide links to each page of results. To do so, we will mimic Google's pagination links, which start by displaying 10 results per page, grow up to 20 results per page, and show previous and next results. Our pagination links will only show if there are more than 10 results, and will be a simple HTML list that can be styled with CSS.
let $perpage := xs:integer(request:get-parameter("perpage", "10")) let $start := xs:integer(request:get-parameter("start", "0")) let $total-result-count := count($hits) let $end := if ($total-result-count lt $perpage) then $total-result-count else $start + $perpage let $number-of-pages := xs:integer(ceiling($total-result-count div $perpage)) let $current-page := xs:integer(($start + $perpage) div $perpage) let $url-params-without-start := replace(request:get-query-string(), '&start=\d+', '') let $pagination-links :=
Keyword Search
if ($total-result-count = 0) then () else <div id="search-pagination"> <ul> { (: Show 'Previous' for all but the 1st page of results :) if ($current-page = 1) then () else <li><a href="{concat('?', $url-params-without-start, '&start=', $perpage * ($current-page - 2)) }">Previous</a></li> }
148
{ (: Show links to each page of results :) let $max-pages-to-show := 20 let $padding := xs:integer(round($max-pages-to-show div 2)) let $start-page := if ($current-page le ($padding + 1)) then 1 else $current-page - $padding let $end-page := if ($number-of-pages le ($current-page + $padding)) then $number-of-pages else $current-page + $padding - 1 for $page in ($start-page to $end-page) let $newstart := $perpage * ($page - 1) return ( if ($newstart eq $start) then (<li>{$page}</li>) else <li><a href="{concat('?', $url-params-without-start, '&start=', $newstart)}">{$page}</a></li> ) }
{ (: Shows 'Next' for all but the last page of results :) if ($start + $perpage ge $total-result-count) then () else <li><a href="{concat('?', $url-params-without-start, '&start=', $start + $perpage)}">Next</a></li> } </ul> </div>
Keyword Search We should also provide a plain English summary of the search results, in the form "Showing all 5 of 5 results", or "Showing 10 of 1200 results." let $how-many-on-this-page := (: provides textual explanation about how many results are on this page, : i.e. 'all n results', or '10 of n results' :) if ($total-result-count lt $perpage) then concat('all ', $total-result-count, ' results') else concat($start + 1, '-', $end, ' of ', $total-result-count, ' results')
149
let $q := xs:string(request:get-parameter("q", "")) let $filtered-q := replace($q, "[&"-*;-`~!@#$%^*()_+-=\[\]\{\}\|';:/.,?(:]", "") let $scope := ( collection('/db/test/articles')/test:article/test:body, collection('/db/test/people')/test:person/test:biography ) let $search-string := concat('$scope', '[. &= "', $filtered-q, '"]') let $hits := util:eval($search-string) let $sorted-hits := for $hit in $hits let $keyword-matches := text:match-count($hit) let $hit-node-length := string-length($hit) let $score := $keyword-matches div $hit-node-length order by $score descending return $hit let $perpage := xs:integer(request:get-parameter("perpage", "10")) let $start := xs:integer(request:get-parameter("start", "0"))
Keyword Search
let $total-result-count := count($hits) let $end := if ($total-result-count lt $perpage) then $total-result-count else $start + $perpage let $results := for $hit in $sorted-hits[position() = ($start + 1 to $end)] let $collection := util:collection-name($hit) let $document := util:document-name($hit) let $config := <config xmlns="" width="60"/> let $base-uri := replace(request:get-url(), 'search.xq$', '') return if ($collection = '/db/test/articles') then let $title := doc(concat($collection, '/', $document))//test:title/text() let $summary := kwic:summarize($hit, $config) let $url := concat('view-article.xq?article=', $document) return <div class="result"> <p> <span class="title"><a href="{$url}">{$title}</a></span><br/> {$summary/*}<br/> <span class="url">{concat($base-uri, $url)}</span> </p> </div> else if ($collection = '/db/test/people') then let $title := doc(concat($collection, '/', $document))//test:name/text() let $summary := kwic:summarize($hit, $config) let $url := concat('view-person.xq?person=', $document) return <div class="result"> <p> <span class="title"><a href="{$url}">{$title}</a></span><br/> {$summary/*}<br/> <span class="url">{concat($base-uri, $url)}</span> </p> </div> else let $title := concat('Unknown result. Collection: ', $collection, '. Document: ', $document, '.') let $summary := kwic:summarize($hit, $config) let $url := concat($collection, '/', $document) return <div class="result"> <p>
150
Keyword Search
<span class="title"><a href="{$url}">{$title}</a></span><br/> {$summary/*}<br/> <span class="url">{concat($base-uri, $url)}</span> </p> </div> let $number-of-pages := xs:integer(ceiling($total-result-count div $perpage)) let $current-page := xs:integer(($start + $perpage) div $perpage) let $url-params-without-start := replace(request:get-query-string(), '&start=\d+', '') let $pagination-links := if ($number-of-pages le 1) then () else <ul> { (: Show 'Previous' for all but the 1st page of results :) if ($current-page = 1) then () else <li><a href="{concat('?', $url-params-without-start, '&start=', $perpage * ($current-page - 2)) }">Previous</a></li> }
151
{ (: Show links to each page of results :) let $max-pages-to-show := 20 let $padding := xs:integer(round($max-pages-to-show div 2)) let $start-page := if ($current-page le ($padding + 1)) then 1 else $current-page - $padding let $end-page := if ($number-of-pages le ($current-page + $padding)) then $number-of-pages else $current-page + $padding - 1 for $page in ($start-page to $end-page) let $newstart := $perpage * ($page - 1) return ( if ($newstart eq $start) then (<li>{$page}</li>) else <li><a href="{concat('?', $url-params-without-start, '&start=', $newstart)}">{$page}</a></li> ) }
Keyword Search
(: Shows 'Next' for all but the last page of results :) if ($start + $perpage ge $total-result-count) then () else <li><a href="{concat('?', $url-params-without-start, '&start=', $start + $perpage)}">Next</a></li> } </ul> let $how-many-on-this-page := (: provides textual explanation about how many results are on this page, : i.e. 'all n results', or '10 of n results' :) if ($total-result-count lt $perpage) then concat('all ', $total-result-count, ' results') else concat($start + 1, '-', $end, ' of ', $total-result-count, ' results') return
152
<html> <head> <title>Keyword Search</title> <style> body {{ font-family: arial, helvetica, sans-serif; font-size: small }} div.result {{ margin-top: 1em; margin-bottom: 1em; border-top: 1px solid #dddde8; border-bottom: 1px solid #dddde8; background-color: #f6f6f8; }} #search-pagination {{ display: block; float: left; text-align: center; width: 100%; margin: 0 5px 20px 0; padding: 0; overflow: hidden; }} #search-pagination li {{ display: inline-block; float: left; list-style: none; padding: 4px; text-align: center;
Keyword Search
background-color: #f6f6fa; border: 1px solid #dddde8; color: #181a31; }} span.hi {{ font-weight: bold; }} span.title {{ font-size: medium; }} span.url {{ color: green; }} </style> </head> <body> <h1>Keyword Search</h1> <div id="searchform"> <form method="GET"> <p> <strong>Keyword Search:</strong> <input name="q" type="text" value="{$q}"/> </p> <p> <input type="submit" value="Search"/> </p> </form> </div>
153
{ if (empty($hits)) then () else ( <h2>Results for keyword search "{$q}". {$how-many-on-this-page}.</h2>, <div id="searchresults">{$results}</div>, <div id="search-pagination">{$pagination-links}</div> ) } </body> </html> Displaying
Keyword Search
154
References
[1] [2] [3] [4] [5] http:/ / demo. exist-db. org/ exist/ xquery. xml#ftidx http:/ / developer. marklogic. com/ svn/ lib-search/ trunk/ docs/ XQuery%20Injection%20Audit. txt http:/ / cwe. mitre. org/ data/ definitions/ 652. html http:/ / searchsecuritychannel. techtarget. com/ generic/ 0,295582,sid97_gci1304701,00. html http:/ / markmail. org/ message/ syghikh5yac2pajj
Method
We will use a text-mining technique called "Latent Semantic Indexing". We will first create a matrix of all concept words (terms) by all the documents. Each cell will have the frequency count of terms in each document. We then send this term-document matrix to a service that performs a standard Singular Value Decomposition or SVD. SVD is a very compute-intensive algorithm that can take many hours or days of calculation if you have a large number of words and documents. The SVD service then return a set of "Concept Vectors" that can be used to group related documents.
Sample Data
To keep the example simple, we will just use the document titles, not the full documents. Here are some document titles: XQuery Tutorial and Cookbook XForms Tutorial and Cookbook Auto-generation of XForms with XQuery Building RESTful Web Applications with XRX XRX Tutorial and Cookbook XRX Architectural Overview The Return on Investment of XRX Our first step will be to build a Word-Document Matrix. This matrix has all the words in the document in a column and one column for each document. We will do this in several steps. 1. Get all the words from all the documents an put them into a single sequence 2. Create a list of the distinct words that are not "stop words" 3. For each word: 1. For each document count the frequency that this word appears in the document
155
156
157
Sample Data
Assume we have an org chart that has the following structure: <position title="President" name="Peg Prez"> <position title="Vice President" name="Vic Vicepres"> <position title="Director" name="Dan Director"> <position title="Manager" name="Marge Manager"> <position title="Supervisor" name="Sue Supervisor"> <position title="Project Manager" name="Pete Project"/> </position> </position> </position> </position> <position title="CFO" name="Barb Beancounter"/> </position> <position title="CIO" name="Tracy Technie"/> </position> </position>
Limiting Child Trees To display an org chart you only want to display the individual and their direct reports.
158
Approach
We will use computed element and doc('/db/my-org/apps/hr/data/positions.xml')/position {for $subelement in $positions/position return element {name($subelement)} {for $attribute in $subelement/@* return attribute {name($attribute)} {$attribute} , $subelement/text()} } attribute constructors. let $positions :=
Link gathering
Motivation
You want to gather the links on a blog page.
Method
We use the doc() function to perform an HTTP GET on a remote web page. If the page is a well formed XML file you can then extract all the unorder list items by adding a ul predicate to the doc function. This script fetches the blog page and selects the urls in the link section, which reference other blog articles. Each referenced article is fetched and the urls marked as external are selected. The result is returned as XML.
declare namespace q = "http://www.w3.org/1999/xhtml";
<results> { let $nav := doc("http://www.advocatehope.org/tech-tidbits/theory-of-the-web-as-one-big-database")//q:ul[@class="portletNavigationTree navTreeLevel0"] for $href in $nav//@href let $page := data($href) let $content := doc($page)//q:div[@id="content"] for $links in $content//q:a[@title="external-link"] return <link>{ data($links/@href) }</link> } </results>
Execute [1]
Link gathering
159
Version 2
Dropping the intermediate variables allows the structure to be seen more clearly: declare namespace q = "http://www.w3.org/1999/xhtml"; let $uri := "http://www.advocatehope.org/tech-tidbits/theory-of-the-web-as-one-big-database" return <results> { for $page in doc($uri)//q:ul[@class="portletNavigationTree navTreeLevel0"]//@href for $link in doc($page)//q:div[@id="content"]//q:a[@title="external-link"]/@href return <link>{data($link)}</link> } </results> Execute [2]
Repository Schemas
Daniel is proposing a standard for supporting the extraction of data such as this from a site. Such a schema would define a view of a set of documents sufficient to allow the extraction above to be based on the schema. We can go some way towards this with a view schema represented as an ER model, with added implementation-dependent paths.
<model name="blog-links">
<type name="url" datatype="string"/> <entity name="page" > max="N" path="//q:ul[@class='portletNavigationTree navTreeLevel0']//@href" path="//q:div[@id='content']//q:a[@title='external-link']/@href" type="page"/> type="url"/>
<attribute name="inner"
This schema can then be used by a generic script link gathering script: let $start := request:get-parameter("page",()) let $view := request:get-parameter("view",()) let $schema := doc($view) let $inner := $schema//entity[@name='page']/attribute[@name='inner']/@path let $external := $schema//entity[@name='page']/attribute[@name='external']/@path return <results> { for $page in util:eval(concat('doc($start)',$inner)) for $link in util:eval(concat('doc($page)',$external))
Link gathering return <link>{string($link)}</link> } </results> This script now performs the task of link gathering on any site whose page structure can be defined in terms of the schema with appropriate paths. Execute [3]
160
Link gathering </results> So with a different schema - same model, different paths:
<model name="site-links"> <type name="url" datatype="string"/> <entity name="page" >
161
<attribute name="inner" max="N" path="//div[@class='nav']//a/@href" type="page"/> <attribute name="external" max="N" path="//div[@class='content']//a/@href" type="url"/> </entity> </model>
Virtual Paths
The navigation path is still hard-coded in the script. We would like to write path expressions where the steps are defined in the schema. This path would then be interpreted in the context of the schema.
View Schema
In this example, the test site has been expanded to include a separate index page and some additional components in the view:
<model name="site-links"> <entity name="externalPage"> <attribute name="title" path="/head/title"/> </entity> <entity name="index"> <attribute name="link" max="N" path="//div[@class='index']//a/@href" type="page"/> </entity> <entity name="page"> <attribute name="title" path="//head/title"/> <attribute name="inner" max="N" path="//div[@class='nav']//a/@href" type="page"/> <attribute name="external" max="N" path="//div[@class='content']//a/@href" type="externalPage"/> <attribute name="author" min="0" path="//div[@class='content']/span[@class='author']"/> </entity> </model>
Index [6]
Link gathering
162
Path language
This prototype uses a simple path language.The step -> dereferences a relative or absolute URL. Where a step is recognised as an attribute of the current entity, the associated path expression is used, otherwise the step is executed as XPath. The first step identifies the (entity) type of the initial document. For example: index/link/->/title List the titles of the pages in the index. import module namespace vp ="http://www.cems.uwe.ac.uk/xmlwiki/vp" at "../Gov/vp.xqm"; let $uri := "http://www.cems.uwe.ac.uk/xmlwiki/Gov/site/index.html" let $schema := "/db/Wiki/Gov/site3.xml" return <result> {vp:process-path($uri,"index/link/->/title",$schema) } </result> Run [7] index/link/->/author/string(.) List the authors of the pages referenced in the index. import module namespace vp ="http://www.cems.uwe.ac.uk/xmlwiki/vp" at "../Gov/vp.xqm"; let $uri := "http://www.cems.uwe.ac.uk/xmlwiki/Gov/site/index.html" let $schema := "/db/Wiki/Gov/site3.xml" return <result> {vp:process-path($uri,"index/link/->/author/string(.)",$schema) } </result> Run [8] page/inner/->/external List the url of all distinct external links of all pages referenced by the index page. import module namespace vp ="http://www.cems.uwe.ac.uk/xmlwiki/vp" at "../Gov/vp.xqm"; declare option exist:serialize "method=xhtml media-type=text/html"; let $uri := "http://www.cems.uwe.ac.uk/xmlwiki/Gov/site/index.html" let $schema := "/db/Wiki/Gov/site3.xml" return <ul> {for $uri in distinct-values(vp:process-path($uri,"index/link/->/external",$schema))
Link gathering order by $uri return <li> <a href="{$uri}">{string($uri)}</a> </li> } </ul> Run [9] page/inner/->/inner/->/title List the titles of pages linked to the initial page. import module namespace vp ="http://www.cems.uwe.ac.uk/xmlwiki/vp" at "../Gov/vp.xqm"; let $uri := "http://www.cems.uwe.ac.uk/xmlwiki/Gov/site/test1.html" let $schema := "/db/Wiki/Gov/site3.xml" return <result> {vp:process-path($uri,"page/inner/->/inner/->/title",$schema) } </result> Run [10]
163
Script
The core function processes a virtual path in the context of a schema. declare function vp:process-steps($nodes,$context,$steps,$base,$schema) { if (empty($steps)) then $nodes else let $step := $steps[1] let $entity := $schema//entity[@name=$context] return if ( $step = "->" ) then let $newnodes := for $node in $nodes return vp:get-doc($node,$base) return vp:process-steps($newnodes, $context, subsequence($steps,2),$base,$schema) else if ($entity/attribute[@name=$step]) then let $attribute :=$entity/attribute[@name=$step] let $next :=
Link gathering string($schema//entity[@name=$attribute/@type]/@name) let $path := string($attribute/@path) let $newnodes := for $node in $nodes let $newnode := util:eval(concat("$node",$path)) return $newnode return vp:process-steps($newnodes, $next, subsequence($steps,2),$base,$schema) else let $newnodes := for $node in $nodes let $newnode := util:eval(concat("$node/",$step)) return $newnode return vp:process-steps($newnodes, $context, subsequence($steps,2),$base,$schema) };
164
Acknowledgments
This example is based on an article by Daniel Bennett [11].
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ urlDiscover. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ urlDiscover3. xq [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ urlDiscover5. xq?page=http:/ / www. advocatehope. org/ tech-tidbits/ theory-of-the-web-as-one-big-database& view=/ db/ Wiki/ Gov/ blog. xml [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ site/ test1. html [5] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ urlDiscover6. xq?page=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ site/ test1. html& view=/ db/ Wiki/ Gov/ site. xml [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ site/ index. html [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ test1. xq [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ test2. xq [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ test3. xq? [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ test4. xq [11] http:/ / www. advocatehope. org/ tech-tidbits/ theory-of-the-web-as-one-big-database
165
Method
We will start by just selecting all the classes in the file that have an name. In this example the names are stored in the rdf:ID attribute of the class like the following: <owl:Class rdf:ID="Wine"> For our example we will use the wine ontology [1] Used in the W3C OWL Guide [2]: Our XQuery will specifically get all the RDF tags with the "owl:Class" element in the file. Here is a simple XQuery that returns all the Classes in the wine ontology. To you this script you can load it into a collection such as /db/apps/owl/views/classes.xq and the RDF data files can be loaded into /db/apps/owl/data /db/apps/owl/views/classes.xq xquery version "1.0"; declare declare declare declare namespace namespace namespace namespace xsd="http://www.w3.org/2001/XMLSchema"; rdfs="http://www.w3.org/2000/01/rdf-schema#"; rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; owl="http://www.w3.org/2002/07/owl#";
declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; let $title := 'List of OWL Classes' let $data-collection := '/db/apps/owl/data' let $file := request:get-parameter('file', 'wine.rdf') let $file-path := concat($data-collection, '/', $file) (: we only want classes that have an ID. Other classes are not named classes. :) let $classes := doc($file-path)//owl:Class[@rdf:ID] (: sort the list :) let $ordered-classes := for $class in $classes order by $class/@rdf:ID return $class return <html>
List OWL Classes <head> <title>{$title}</title> </head> <body> <file>File Path: {$file-path}</file> <p>Number of Classes = {count($classes)}</p> <ol> {for $class in $ordered-classes let $class-name := string($class/@rdf:ID) return <li>{$class-name}</li> } </ol> </body> </html>
166
Sample Results
The results will be an HTML file with a ordered list: File Path: /db/org/syntactica/apps/owl/data/wine.rdf Number of Classes = 74 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. ... AlsatianWine AmericanWine Anjou Beaujolais Bordeaux Burgundy CabernetFranc CabernetSauvignon CaliforniaWine Chardonnay
Other Tools
There are several other tools for working with OWL files that are very useful. One is to list all of the properties in an OWL file or list all the properties of a class. These reports can then be used to load the class or property into an XForms application for editing/versioning/workflow and approval.
References
[1] http:/ / www. w3. org/ TR/ owl-guide/ wine. rdf [2] http:/ / www. w3. org/ TR/ owl-guide/
167
Method
We will use the following functions to create login and logout forms: xmldb:login($collection, $user, $password, true()) session:create() session:invalidate()
Logging In
To login we need to first create a new session and then use this session to store our login information: session:create() xmldb:login($collection, $user, $password, true()) This changes the effective user executing the current query and stores that user information into the HTTP session, so subsequent queries within the same session will also execute with the same user rights. Note that you must use "true()" as the fourth argument to the login function.
Logging Out
To log a user out use: session:invalidate() as well as session:clear will remove the user binding from the session, which means that the next call to the query will run as guest. However, the currently executing query will still use the old non-guest user until it completes. (: if we are already logged in, are we logging out - i.e. set permissions back to guest :) if(request:get-parameter("logout",()))then ( let $null := xdb:login("/db", "guest", "guest") let $inval := session:invalidate() return false() ) else ( (: we are already logged in and we are not the guest user :) true() )
In this example we have both call to xdb:login() as guest and session:invalidate(). We want to do both, clear the session for future queries as well as reset the current user for the rest of the query.
168
Timeout setting
You can also change the default timeout setting by changing the Jetty configuration file here: $EXIST_HOME/tools/jetty/etc/webdefault.xml By default the configuration files sets the session timeout to 30 minutes: <session-config> <session-timeout>30</session-timeout> </session-config> Note: In the future there may be xmldb:logout function which combines both steps. Another approach could be to handle the login/ logout within a controller.xql and thus separate it from the main query.
Concepts used
XML <> string conversion : The script uses a pair of functions from the exist util module (util:serialize and util:parse) to convert back and forth between XML and a string. This allows the XML text to be operated on as a simple string before being converted back to XML recursion : interpolating the random text into the original string requires a recursive function regular expressions: reg exps are used to tokenise the lorum ipsum text and the incomplete XML file containing ellipsis
XQuery
declare function local:join-random($parts,$words) { if (count($parts) > 1) then let $randomtext :=string-join(subsequence ($words,util:random(100), util:random(100))," ") return string-join(($parts[1],$randomtext, local:join-random(subsequence($parts,2), $words)),"")
Lorum Ipsum text else $parts }; let $lorumipsum := doc("/db/Wiki/apps/lorumipsum/words.xml")/lorumipsum let $words := tokenize($lorumipsum,"\s+") let $file := request:get-parameter("file",()) let $doc := doc($file)/* let $docText := util:serialize($doc,"media-type=text/xml method=xml") let $parts := tokenize($docText, "\.\.\.") let $completedText := local:join-random($parts,$words) return util:parse($completedText)
169
Example
incomplete XML [1] XML with ellipsis replaced with ipsum lorum text [2]
Explanation
the lorum ipsum text is split into words by tokenising on whitespace the incomplete XML is fetched and the root element accessed. this element is converted to a string using the util:serialize function, then tokenized with the pattern "\.\.\.\" (not "..." since . means any single character in regular expressions) the recursive function join-random() joins the first of a sequence of strings with a random stretch of the lorum ipsum text with the remainder of the strings similarly joined the expanded text is converted back to an XML element using util:parse()
Improvements
the lorum ipsum text itself could be generated rather than stored. the script could be parameterized for the lorum impsum file, allowing different, perhaps more realistic text to be used. the lorum ipsum words are passed as a parameter to the recursive function. This could be defined in a global variable instead. It would be better to use the httpclient module to fetch the files and control the caching via headers - here the file is being cached
170
Concepts
recursion - to copy an arbitrary XML tree, replacing a given element with random text.
XQuery
declare variable $lorumipsum := doc("/db/Wiki/apps/lorumipsum/words.xml")/lorumipsum; declare variable $words := tokenize($lorumipsum,"\s+"); declare variable $marker:= "ipsum"; declare function local:copy-with-random($element as element()) as element() { element {node-name($element)} {$element/@*, for $child in $element/node() return if ($child instance of element()) then if (name($child) = $marker) then subsequence($words,util:random(100),util:random(100)) else local:copy-with-random($child) else $child } }; let $file := request:get-parameter("file",()) let $root := doc($file)/* return local:copy-with-random($root)
Explanation
the sequence of ipsum lorum words are held in a global variable to avoid passing it as a parameter to the recursive function. The copy-with-random() function recursively copies the elements and items in a tree to a new tree When the element with the name "ipsum" is encountered, a selection of ipsum lorem text is returned instead of the original element.
171
Example
incomplete XML [3] XML with ellipsis replaced with ipsum lorum text [4]
Discussion
The second approach is simpler. Performance is about the same.
Acknowledgements
the sample XML is an extract from "Search: The Graphics Web Guide", Ken Coupland,Laurence King Publishing (2002)
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Coupland/ ex-flat. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ apps/ lorumipsum/ complete. xq?file=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Coupland/ ex-flat. xml [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Coupland/ ex-flat-x. xml [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ apps/ lorumipsum/ complete2. xq?file=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Coupland/ ex-flat-x. xml
Lucene Search
Motivation
You want to perform a full text keyword search on one or more XML documents. This is done using the Lucene index extensions to eXist.
Background
The Apache Lucene full text search framework was added to eXist 1.4 as a full text index, replacing the previous native full text index. The new Lucene full text search framework is faster, more configurable, and more feature-rich than eXist's legacy full text index. It will also be the basis for an implementation of the W3C's full text extensions for XQuery. eXist associates a distinct node-id with each node in an XML document. This node-id is used as the Lucene document ID in the Lucene index files, that is, each XML node becomes a Lucene document. This means that you can customize to a very high degree the search weight of keyword matches to every node in your document. So, for example, a match of a keyword within a title can have a higher score than a match in the body of a document. This means that a search hit retrieving a document title in a large number of documents will have a higher probability of being ranked first in your search results. This means your searches will have higher Precision and Recall than search systems that do not retain document structure.
Lucene Search
172
Notes: If your test data are saved in db/test, you should save collection.xconf in db/system/config/db/test. Index configuration files are always saved in a directory structure inside system/config/db which is isomorphic to the directory structure of db. After you create or update this index configuration file, you will need to reindex the data. You can do this either by using the eXist Java-based admin client, selecting the test collection and choosing "Reindex collection", or by using the xmldb:reindex() [2] function, supplying xmldb:reindex('/db/test') in eXide or in the XQuery Sandbox. Although the legacy full text index is not needed for Lucene-based search, we have explicitly enabled it here for this example configuration in order to point out the expressive similarities between the Lucene and legacy search functions/operators (i.e. Lucene's ft:query() vs. the legacy full text index's &=, |=, near(), text:match-all(), text:match-any()).
Lucene Search
173
Indexing Strategies
You can either define a Lucene index on a single element or attribute name (qname="...") or on a node path (match="..."). If you define an index on a qname, such as <text qname="test"/>, an index is created on <test> alone. What is passed to Lucene is the string value of <test>, which includes the text of all its descendant text nodes. With such an index, one cannot search for the nodes below <test>, e.g. for <p> or <name>, since such nodes have all been collapsed. If you want to be able to query descendant nodes, you should set up additional indexes on these, such as <text qname="p"/> or <text qname="name"/>. If you define an index on a node path, as above with <text match="//test"/>, the node structure below <test> is maintained in the index and you can still query descendant nodes, such as <p> or <name>. This can be seen as a shortcut to establishing an index on all elements below <test>. Be aware that, according to the documentation, this feature is "subject to change" [3]. When deciding which approach to use, you should consider which parts of your document will be of interest as context for full text query. How narrow or broad to make it is best decided when considering concrete search scenarios.
Lucene Search
174
Indexing
Since we have indexed the <test> element as a path, the index includes descendant nodes, and queries for nested elements therefore also return hits: collection('/db/test')/test/p/name[ft:query(., 'edward')] collection('/db/test')/test/p[ft:query(name, 'edward')] If we had indexed the qname test with <text qname="test"/>, we would not be able to do so.
Stopwords
The standard Lucene analyser, activated in the above collection.conf file with <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>, applies the Lucene default list of English stop words and removes the following words from the index: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with. If you wish to make these words searchable, comment out the StandardAnalyzer, remove id="ws" from <analyzer id="ws" class="org.apache.lucene.analysis.WhitespaceAnalyzer"/> and reindex the collection. Todo: How can the list of stopwords be customised?
Ranking
Lucene assigns a relevance score or rank to each match. The more frequently a word occurs in a document, the higher the score. This score is preserved by eXist and can be accessed through the score function, which returns a decimal value. for $m in collection('/db/test')//p[ft:query(., 'tests ron')] let $score := ft:score($m) order by $score descending return <hit score="{$score}">{$m}</hit> The higher the score, the more relevant is the hit.
Boosting Values
The configuration file can be set up to apply higher search weights to specific elements within your document. So for example a match of a keyword in the title of a book will rank that search higher than matches in the body of the book.
175
matching no terms
To express the "match none" (not + |=) legacy full text query using the new Lucene query function: collection('/db/test')//p[not(. |= 'issues edward')] you would use the following: collection('/db/test')//p[not(ft:query(., <query> <bool> <term occur="should">issues</term> <term occur="should">edward</term> </bool> </query>))] Note that the last one could not be expressed as: collection('/db/test')//p[ft:query(., <query> <bool> <term occur="not">issues</term> <term occur="not">edward</term> </bool> </query>)] because Lucene's NOT operator can't be used on its own, without the presence of a 'positive' search term.
Lucene Search
176
+fillet +snake
-fillet +snake
+fillet +sn*e
phrase search
Lucene Search
177
<query> <near slop="1" ordered="no"> <term>snake</term> <term>fillet</term> </near> </query> snake~ <query> <fuzzy>snake</fuzzy> </query> <query> <fuzzy min-similarity="0.3">snake</fuzzy> </query>
snake~0.3
Mind the gaps in the table above! In standard Lucene syntax you can't express: regular expressions: this is a unique feature of eXist's XML query syntax, by means of the <regex> element ordering of proximity search terms: this is a unique feature of eXist's XML query syntax, by means of the @ordered attribute on <near> Finally, a more complex case, in which boolean operator are grouped to override default priority rules:
search type Lucene syntax XML syntax <query> <bool> <bool occur="must"> <term occur="should">fillet</term> <term occur="should">malice</term> </bool> <term occur="must">snake</term> </bool> </query>
Note how: grouping in standard Lucene syntax can be expressed with nesting in XML syntax for nested <bool> operators, the @occur attribute can be specified as well
178
References
eXist Lucene XML Syntax [6] blog posting by Ron Van den Branden
References
[1] [2] [3] [4] [5] [6] http:/ / lucene. apache. org/ java/ 2_9_1/ queryparsersyntax. html http:/ / demo. exist-db. org/ exist/ functions/ xmldb/ reindex http:/ / exist-db. org/ lucene. html#N1018D http:/ / en. wikipedia. org/ wiki/ Levenshtein_distance http:/ / exist-db. org/ lucene. html#N102D5 http:/ / rvdb. wordpress. com/ 2010/ 08/ 04/ exist-lucene-to-xml-syntax
Multiple page scraping and Voting behaviour string-length(xs:string($rollno))))),xs:string($rollno)) let $path := concat("http://clerk.house.gov/evs/",$year,"/roll",$zeropaddedrollnum,".xml") let $report := doc($path) let $bill := $report//vote-metadata let $specificvote := $report//recorded-vote[legislator/@name-id = $repid] return <result> <year>{$year}</year> {$bill/rollcall-num} {$bill/vote-question} {$bill/legis-num} {$specificvote/legislator} {$specificvote/vote} </result> }; <report> {local:voting("E000215",2007,10 to 15)} </report> Execute [4] Note. It would be preferable to use the asp endpoint since this does not involve the complication arising here from leading zeros, but that produces mal-formed XML (??)
179
References
[1] [2] [3] [4] http:/ / clerk. house. gov/ evs/ 2007/ ROLL_000. asp http:/ / clerk. house. gov/ cgi-bin/ vote. asp?year=2007& rollnumber=10 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ voting. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Gov/ voteAnalysis. xq
MusicXML to Arduino
180
MusicXML to Arduino
Motivation
You want to play music available in MusicXML format on an Arduino.
Approach
Fetch the Music XML file (either plain XML or compressed) and transform one monophic part to code to be included in an Arduino sketch.
Script
(: ~ : convert a monotonic part in a MusicXML score to an Arduino code fragment suitable to include in a sketch : :@param uri - the uri of the MusicXML file :@param part - the id of the part to be converted to midi notes :@return text containing Arduino statements to : : set the tempo, : define the array of midi notes a : define a parallel array of note durations in beats :@author Chris Wallace :) (: offsets of the letters ABCDEFG from C :) declare namespace fw = "http://www.cems.uwe.ac.uk/xmlwiki/fw"; declare variable $fw:step2offset := (9,11,0,2,4,5,7); declare function fw:filter($path as xs:string, $type as xs:string, $param as item()*) as xs:boolean { (: pass all :) true() }; declare function fw:process($path as xs:string,$type as xs:string, $data as item()? , $param as item()*) { (: return the XML :) $data }; declare function fw:unzip($uri) { let $zip := httpclient:get(xs:anyURI($uri), true(), ())/httpclient:body/text() let $filter := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:filter"),3)
MusicXML to Arduino let $process := util:function(QName("http://www.cems.uwe.ac.uk/xmlwiki/fw","fw:process"),4) let $xml := compression:unzip($zip,$filter,(),$process,()) return $xml };
181
declare function fw:MidiNote($thispitch as element() ) as xs:integer { let $step := $thispitch/step let $alter := if (empty($thispitch/alter)) then 0 else xs:integer($thispitch/alter) let $octave := xs:integer($thispitch/octave) let $pitchstep := $fw:step2offset [ string-to-codepoints($step) - 64] return 12 * ($octave + 1) + $pitchstep + $alter } ; declare function fw:mxl-to-midi ($part) { for $note in $part//note return element note { attribute midi { if ($note/rest) then 0 else fw:MidiNote($note/pitch)}, attribute duration { ($note/duration, 1) [1] } } }; declare function fw:notes-to-arduino ($notes as element(note)*) as element(code) { (: create the two int arrays for inclusion in an Arduino sketch :) <code> int note_midi[] = {{ { string-join( for $midi at $i in $notes/@midi return concat(if ($i mod 10 eq 0) then " " else (),$midi) ,", ") } }}; int note_duration[] = {{ { string-join( for $duration at $i in $notes/@duration return concat(if ($i mod 10 eq 0) then " " else (),$duration) ,", ") }
MusicXML to Arduino }}; </code> }; declare option exist:serialize "method=text media-type=text/text"; let let let let $uri := request:get-parameter("uri",()) $part := request:get-parameter("part","P1") $format := request:get-parameter("format","xml") $doc := if ($format = "xml") then doc ($uri) else if ($format = "zip") then fw:unzip($uri) else () (: get the requested part :) let $part := $doc//part[@id = $part] (: use the data in the first measure to set the temp :) let $measure := $part/measure[1] let $tempo := (xs:integer($measure/sound/@tempo), 100)[1] (: convert the notes into an internal XML format :) let $notes := fw:mxl-to-midi($part) return (: generate the sketch fragmemt:) <sketch> int tempo = {$tempo}; {fw:notes-to-arduino($notes) } </sketch>
182
Examples
1. Good King Wensceslas [1] 2. HTML Form interface [2]
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Music/ mxl2arduino. xq?uri=http:/ / www. hymnsandcarolsofchristmas. com/ Hymns_and_Carols/ XML/ Good_King_Wenceslas2. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Music/ mxl2arduinoForm. xq?uri=http:/ / www. hymnsandcarolsofchristmas. com/ Hymns_and_Carols/ XML/ Good_King_Wenceslas2. xml
Naming Conventions
183
Naming Conventions
Guidelines for wikibook authors
Our goal is to allow many people to contribute examples but allow our readers a consistent user experience. In that light we would like all of our authors to use some of the following standards. Make sure you use the source tags to surround your code. If it is XML code use the lang="xml" attribute. <source lang="xml"> ...xml code here... </source> Try to keep the examples as simple as you can to demonstrate the core concepts of your examples.
Complex XQueries should have comments using the XQuery comments xquery version "1.0"; (: This is a comment :)
Navigating Collections
184
Navigating Collections
Motivation
You want to browse collections using an HTML web page and narrow your choices as you type.
Method
We will first create a server-side script that takes a single parameter. This is the collection path that the user is entering into an input field in a web page. With each character the user types the list of possible sub-collections is narrowed. There are three parts to this script: 1) the server side XQuery script 2) the HTML form 3) the JavaScript file that implements the JavaScript with AJAX functions.
declare function local:substring-before-last-slash($arg as xs:string?) as xs:string { if (matches($arg, '/')) then replace($arg,'^(.*)/.*','$1') :) else '' }; (: by default matching is eager
(: if we don't get any value then use the root collection :) let $collection := request:get-parameter("collection", '') let $before-last-slash := local:substring-before-last-slash($collection) let $after-last-slash := substring-after($collection, concat($before-last-slash, '/'))
(: collection={$collection}<br/> before last "/"={$before-last-slash}<br/> after last "/"={$after-last-slash}<br/> :) return <div class="results">{ if (count($sub-collections) = 0) then
Navigating Collections
<h1>There are no subcollections of {$collection}</h1> else <div class="selections">{ for $child in $sub-collections let $child-path := concat($before-last-slash, '/', $child) order by $child return if (starts-with($child, $after-last-slash)) then <div class="selection"><a href="browse.xq?collection={$child-path}/">{$child}</a></div> else () }</div> }</div>
185
browse.xq
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=no indent=yes
doctype-public=-//W3C//DTD XHTML 1.0 Transitional//EN
doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";
return <html xmlns="http://www.w3.org/1999/xhtml" > <head> <title>{$title}</title> <script type="text/javascript" src="ajax-collection.js"/> <style type="text/css"> td {{background-color: #efe; font-size:14px;}} th {{background-color: #ded; text-align: right; padding:3px; font-size:12px;}} </style> </head> <body onload="getList();"> <h1>{$title}</h1> <p>{$collection}</p> <form onsubmit="getList(); return false" action="get"> <span> <label for="collection">Collection:</label> <input type="text" size="50" name="collection" id="collection" title="collection" onkeyup="getList();" onfocus="getList();" value="{$collection}"/>
Navigating Collections
</span> </form> <!-- this is where the results are placed --> <div id="results"/> </body> </html>
186
ajax-collection.js function updateList() { if (http.readyState == 4) { var divlist = document.getElementById('results'); divlist.innerHTML = http.responseText; isWorking = false; } } function getList() { if (!isWorking && http) { var collectionid = document.getElementById("collection").value; http.open("GET", "get-child-collections.xq?collection=" + collectionid); http.onreadystatechange = updateList; // this sets the call-back function to be invoked when a response from the HTTP request is returned isWorking = true; http.send(null); } } function getHTTPObject() { var xmlhttp; /*@cc_on @if (@_jscript_version >= 5) try { xmlhttp = new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { try { xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } catch (E) { xmlhttp = false; } } @else xmlhttp = false; @end @*/ if (!xmlhttp && typeof XMLHttpRequest != 'undefined') { try {
Navigating Collections xmlhttp = new XMLHttpRequest(); xmlhttp.overrideMimeType("text/xml"); } catch (e) { xmlhttp = false; } } return xmlhttp; } var http = getHTTPObject(); // var isWorking = false; create the HTTP Object
187
OAuth
Motivation
You want to login to a web service that support the OAuth protocol.
Background
OAuth is an open protocol to allow secure API authorization in a simple and standard method from desktop and web applications. Like OpenID, OAuth allows other web services to use your private data without giving out your passwords.
Terminology
Consumer Key - When you register as a developer with a OAuth service provider they will send you an API key to use with their service. This is typically about a 65 character string composed of digits and letter. Service Provider - an organization like LinkedIn, Google, or Twitter that has some of your data protected behind a web service. Token - a somewhat long string of computer-generated letters and numbers use in AOuth data exchanges. These strings hard to guess, and are paired with a secret key to protect the token from being used by unauthorized parties. OAuth defines two different types of tokens: a request token and access token.
Steps
We will perform this process in the following steps: 1. Request a Token 2. Sign 3. etc. Here is an example of the structure that contains OAuth information (from 28msec web site)
<oa:service-provider realm="example.com/oauth"> <oa:request-token> <oa:url></oa:url> <oa:http-method>GET</oa:http-method> </oa:request-token>
OAuth
<oa:user-authorization> <oa:url></oa:url> </oa:user-authorization> <oa:access-token> <oa:url></oa:url> <oa:http-method>GET</oa:http-method> </oa:access-token> <oa:supported-signature-methods> <oa:method>HMAC-SHA1</oa:method> </oa:supported-signature-methods> <oa:oauth-version>1.0</oa:oauth-version> <oa:authentication> <oa:consumer-key>your consumer key</oa:consumer-key> <oa:consumer-key-secret>your consumer secret</oa:consumer-key-secret> </oa:authentication> </oa:service-provider>
188
References
http://oauth.net/ http://hueniverse.com/2007/10/beginners-guide-to-oauth-part-ii-protocol-workflow/ http://sausalito.28msec.com/latest/index.php?id=working_with_oauth Examples of XML definitions for service Provider Structures [1] MarkLogic Facebook OAuth module [2] Norm Walsh on OAuth [3]
References
[1] http:/ / sausalito. 28msec. com/ latest/ index. php?id=service_provider_structures [2] http:/ / github. com/ marklogic/ comoms/ blob/ master/ src/ oauth. xqy [3] http:/ / norman. walsh. name/ 2010/ 09/ 25/ oauth
Open Search
189
Open Search
Motivation
You want to allow users to search your site using a tool such as the search boxes in the upper right corner of many web browsers. You want to publish your search interface using standardized documents. Note that this has not been made to work yet. It is currently in development.
This file will tell the search tool to take the search terms out of the search text field and perform an HTTP get on the local XQuery on your default eXist running on your local system. If you change your hostname, port or path you just need to update the URL in the XML configuration file.
190
191
Method
XQuery is an ideal toolkit for manipulating well-formed HTML; you need only use the doc() function, e.g. doc('http://www.example.org/index.html') or doc('/db/path/to/index.html'). But, if a webpage is not well-formed XML, you will get errors about the source not being well-formed. Luckily, there are programs that transform HTML files into well-formed XML files. eXist provides several such tools. One is the httpclient module's get function, httpclient:get(). To use this function you need to enable the httpclient module, by modifying the conf.xml file so that the module is loaded the next time you start eXist. Uncomment the following line: <module class="org.exist.xquery.modules.httpclient.HTTPClientModule" uri="http://exist-db.org/xquery/httpclient" /> For example the following example performs an HTTP GET on the list of all the feeds from the IBM web site: let $feeds-url := 'http://www.ibm.com/ibm/syndication/us/en/?cm_re=footer-_-ibmfeeds-_-top_level' let $data := httpclient:get(xs:anyURI($feeds-url), true(), <Headers/>) return $data Sometimes the HTML is so malformed that even httpclient:get() will not be able to salvage the HTML. For example, if an element has two @id elements, you will get the error, "Error XQDY0025: element has more than one attribute 'id'". In this case, you may need to download the HTML source and clean up the HTML just enough so that eXist can parse the rest. Then, store the file in your database, and use the util:parse-html() function (which passes the text through the Neko HTML parser to make it well-formed). The following XQuery will clean up HTML (saved as text file, because it is still malformed): let $html-txt := util:binary-to-string(util:binary-doc('/db/html-file-saved-as-text.txt')) let $data := util:parse-html($html-txt) return $data
192
Pachube feed
Motivation
You want to create a feed for the Pachube [1] application. A Pachube application allows you to store, share & discover realtime sensor, energy and environment data from objects, devices & buildings around the world. This provides a platform for sensor data integration. History gathered by Pachube can be presented in various formats and used by other applications to mashup feeds.
Tower Bridge
The idea of a feed of the open/closed status of Tower Bridge in London was borrowed from @ni [2]. A Twitter [3] stream provides the base data for a simple status feed. The RSS feed [4] from this stream is read by an XQuery script, the status deduced from the text and an XML file representing the current status updated. This XML file has an attached XSLT stylesheet so that when the file is pulled on schedule from the eXist database, it is first transformed on the server-side into the EEML [5] format required for Pachube feeds. As configured on the UWE server, this uses Saxon XSLT-2.
XQuery script
let $rss := httpclient:get(xs:anyURI( "http://twitter.com/statuses/user_timeline/14012942.rss" ),false(),())/httpclient:body let $lastChange:= $rss//item[1] let $bridgeStatus := return if (exists($lastChange) and exists($bridgeStatus)) then let $open := if(contains($lastChange/description,"opening")) return update replace $bridgeStatus with element status { attribute bridge {$open}, attribute lastChange {$lastChange/pubDate}, attribute lastUpdate {current-dateTime()} } else () then "1" else "0" doc("/db/Wiki/Pachube/bridge.xml")/data/status
Pachube feed 1. httpclient is used here because doc() throws an error about duplicate namespace declarations - under investigation
193
Bridge status
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="http://www.cems.uwe.ac.uk/xmlwiki/Pachube/bridge.xsl"?> <data> <status bridge="0" lastChange="Mon, 14 Dec 2009 12:09:02 +0000" lastUpdate="2009-12-14T16:57:00.679Z"/> </data>
XSLT
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="/data">
<environment updated="{current-dateTime()}">
<feed>http://www.cems.uwe.ac.uk/xmlwiki/Pachube/bridge.xml</feed>
is closed. </description>
<email>kit.wallace@gmail.com</email>
<name>Tower Bridge</name>
<lat>51.5064186</lat>
<lon>-0.074865818</lon>
</location>
<data id="0">
<tag>bridge open</tag>
<xsl:value-of select="status/@bridge"/>
</value>
</data>
</environment>
</eeml>
</xsl:template>
</xsl:stylesheet>
Job scheduling
The XQuery update script is invoked by the eXist job scheduler every 1 minute:
let $login := xmldb:login( "/db", "user", "password" ) let $del := scheduler:delete-scheduled-job("BRIDGE") let $job := scheduler:schedule-xquery-cron-job("/db/Wiki/Pachube/pollbridgerss.xq" , "0 0/1 * * * ?","BRIDGE") return $job
Pachube feed
194
Feed view
There is a public view of the Feed as processed by Pachube : http://www.pachube.com/feeds/3922
Discussion
The Pachube interface refreshes the automatic feeds every 15 minutes (for the free service). Since typical bridge lifts last 10 minutes, there is a likelihood that a lift will be missed. The alternative is to push changes to Pachube when detected.
195
Mapping File
An XML document defines the origin of the raw data, the Pachube appid and the mapping from data values (1-based) to data streams (numbered from 1 in document order). <weatherfeed xmlns = "http://www.cems.uwe.ac.uk/xmlwiki/wdl"> <data>http://www.martynhicks.co.uk/weather/clientraw.txt</data> <appid>4013</appid> <format> <field n="2" unit="kts">Average Wind Speed</field> <field n="4" unit="degrees">Wind Direction</field> <field n="5" unit="Celcius">Temperature</field> <field n="7" unit="hPa">Barometer</field> </format> </weatherfeed>
Update script
This script is scheduled to run every minute (as above). The mapping file namespace needs to be declared: declare namespace wdl = "http://www.cems.uwe.ac.uk/xmlwiki/wdl"; First a function to read the raw data file and tokenize to a sequence of values:
declare function local:client-data ($rawuri) { let $headers := element headers{ element header { attribute name {"Cache-Control"}, attribute value {"no-cache"} } } let $raw := httpclient:get(xs:anyURI($rawuri),false(),$headers )/httpclient:body return tokenize($raw,"\+") };
Then a function to transform from the sequence of values to the Pachube data chanels: declare function local:data-to-eeml ($data,$format) { for $field at $id in $format/wdl:field let $name := string($field) let $index := xs:integer($field/@n) return element data { attribute id {$id}, element tag { string($field)}, element value {$data[$index] }, element unit {string($field/@unit)} }
Pachube feed }; The main line fetches the feed definition file (here hard-coded but it could be passed in as a parameter). The data values are obtained, the EEML generated and PUT to the Pachube API.
let $feed := doc("/db/Wiki/Pachube/horfieldweather.xml")/wdl:weatherfeed let $data := local:client-data($feed/wdl:data) let $appid := $feed/wdl:appid let $APIKey := "eeda7c27ff8b7c49e8529e4eb4b3f57724c5b609db0d22904df11edd4742e92c" let $url := xs:anyURI(concat( "http://www.pachube.com/api/",$appid)) let $headers := <headers> <header name="X-PachubeApiKey" value="{$APIKey}"/> </headers> let $eeml:= <eeml xmlns="http://www.eeml.org/xsd/005"> <environment updated="{current-dateTime()}"> {local:data-to-eeml($data,$feed/wdl:format)} </environment> </eeml> return httpclient:put($url,$eeml,false(),$headers)
196
Feed view
There is a public view of the Pachube feed at http://www.pachube.com/feeds/4013
Weatherundgerground Feed
A typical XML feed for a station in weatherunderground is http:/ / api. wunderground. com/ weatherstation/ WXCurrentObXML.asp?ID=IBAYOFPL1
XSLT transform
This XML can be transformed to EEML using XSLT:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output media-type="application/xml" method="xml" indent="yes"/> <xsl:template match="/current_observation"> <eeml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.eeml.org/xsd/005" xsi:schemaLocation="http://www.eeml.org/xsd/005 http://www.eeml.org/xsd/005/005.xsd" version="5">
Pachube feed
<environment updated="{current-dateTime()}"> <title>Weather Report</title> <location exposure="outdoor" domain="physical" disposition="fixed"> <name> <xsl:value-of select="location/full"/> </name> <lat> <xsl:value-of select="location/latitude"/> </lat> <lon><xsl:value-of select="location/longitude"/> </lon> </location> <data id="1"> <tag>Average Wind speed</tag> <value> <xsl:value-of select="round-half-to-even(wind_mph * 1.15077945,1)"/> </value> <unit>kts</unit> </data> <data id="2"> <tag>Wind Direction</tag> <value> <xsl:value-of select="wind_degrees"/> </value> <unit>degrees</unit> </data> <data id="3"> <tag>Temperature</tag> <value> <xsl:value-of select="temp_c"/> </value> <unit>Celcius</unit> </data> <data id="4"> <tag>Barometric Pressure</tag> <value> <xsl:value-of select="pressure_mb"/> </value> <unit>hPA</unit> </data> </environment> </eeml> </xsl:template> </xsl:stylesheet>
197
Pachube feed
198
The script can be invoked: http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Pachube/ weatherunderground. xq?id=IBAYOFPL1. Since this script is parameterised, it could be used with any weatherUnderground station.
Pachube Feed
An automatic feed can be created - http://www.pachube.com/feeds/4037 which uses this feed.
NOAA Feed
We can adopt a similar approach with the feeds for US ICAO stations [8]. NOAA provide XML feeds such as http:/ / www. weather. gov/ xml/ current_obs/ KEWR. xml . The format is nearly the same as the weatherunderground feed and is documented: http:/ / www. weather. gov/ view/ current_observation. xsd. Update rate is hourly but there is no way currently to configure Pachube to update at that frequency.
XSLT
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output media-type="application/xml" method="xml" indent="yes"/> <xsl:template match="/current_observation"> <eeml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.eeml.org/xsd/005" xsi:schemaLocation="http://www.eeml.org/xsd/005 http://www.eeml.org/xsd/005/005.xsd" version="5"> <environment updated="{current-dateTime()}"> <title>NOAA Weather Report</title> <location exposure="outdoor" domain="physical" disposition="fixed"> <name> <xsl:value-of select="location"/> </name> <lat> <xsl:value-of select="latitude"/> </lat> <lon><xsl:value-of select="longitude"/> </lon> </location> <data id="1"> <tag>Average Wind speed</tag>
Pachube feed
<value> <xsl:value-of select="wind_kt"/> </value> <unit>kts</unit> </data> <data id="2"> <tag>Wind Direction</tag> <value> <xsl:value-of select="wind_degrees"/> </value> <unit>degrees</unit> </data> <data id="3"> <tag>Temperature</tag> <value> <xsl:value-of select="temp_c"/> </value> <unit>Celcius</unit> </data> <data id="4"> <tag>Barometric Pressure</tag> <value> <xsl:value-of select="pressure_mb"/> </value> <unit>hPA</unit> </data> </environment> </eeml> </xsl:template> </xsl:stylesheet>
199
XQuery Script
let $id := request:get-parameter("id",()) let $ss := doc("/db/Wiki/Pachube/NOAA.xsl") let $data := doc(concat("http://www.weather.gov/xml/current_obs/",$id,".xml")) return transform:transform($data,$ss,())
Pachube feed
200
Feed
The transformed XML http://www.cems.uwe.ac.uk/xmlwiki/Pachube/NOAA.xq?id=KEWR is the basis for the manual feed http://www.pachube.com/feeds/4047
XSLT only
If Pachube supported XSLT on the server side, the whole task could be handled by a single XSLT script. For the sake of generalisation, its helpful to provide an interface which allows parameters to be passed to the script but it is not necessary:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output media-type="application/xml" method="xml" indent="yes"/> <xsl:param name="station" select="'KEWR'"/> <xsl:template match="/"> <xsl:variable name="url" select='concat("http://www.weather.gov/xml/current_obs/",$station,".xml")'></xsl:variable> <xsl:apply-templates select="doc($url)/current_observation"/> </xsl:template> <xsl:template match="current_observation"> <eeml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.eeml.org/xsd/005" xsi:schemaLocation="http://www.eeml.org/xsd/005 http://www.eeml.org/xsd/005/005.xsd" version="5"> <environment updated="{current-dateTime()}"> <title>NOAA Current weather for {station_id}</title> <location exposure="outdoor" domain="physical" disposition="fixed"> <name> <xsl:value-of select="location"/> </name> <lat> <xsl:value-of select="latitude"/> </lat> <lon> <xsl:value-of select="longitude"/> </lon> </location> <data id="1"> <tag>Average Wind speed</tag> <value> <xsl:value-of select="wind_kt"/> </value> <unit>kts</unit> </data> <data id="2"> <tag>Wind Direction</tag> <value> <xsl:value-of select="wind_degrees"/>
Pachube feed
</value> <unit>degrees</unit> </data> <data id="3"> <tag>Temperature</tag> <value> <xsl:value-of select="temp_c"/> </value> <unit>Celcius</unit> </data> <data id="4"> <tag>Barometric Pressure</tag> <value> <xsl:value-of select="pressure_mb"/> </value> <unit>hPA</unit> </data> </environment> </eeml> </xsl:template> </xsl:stylesheet>
201
The server can just run this standalone to generate the EEML feed. This small XQuery script uses the SAXON processor on the eXist platform: transform:transform((),doc("/db/Wiki/Pachube/NOAA3.xsl"),()) XSLT [9] Execute [10] (currently fails - under investigation)
Pachube feed let $id := request:get-parameter("id",()) let $feed := doc("/db/Wiki/Pachube/feeds.xml")//PachubeFeed[@id=$id] return transform:transform( doc($feed/data), doc($feed/xslt), $feed/params ) Pachube automatic feeds can now be created with a URL like http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Pachube/ getFeed. xq?id=2001 e.g. http://www.pachube.com/feeds/4661 User interface and database to allow users to register, create and edit feeds There are issues here with loading and with unsafe code in the stored XSLT.
202
Output
Similarly output processing of either the current EEML or a specific datastream's csv history could be provided with a bit of code and XSLT. Since this may require authentication, API keys would have to be stored on this database too. Jobs could be generated and scheduled to implement triggers but this will need a timed pull of the required data. Code is needed to convert the history feeds provided by Pachube to XML since these are only available in CSV. Once in XML, XSLT can transform to the format required. Of course it would be preferable if Pachube provided XML feeds in addition to the CSV feeds. Archive The full archive is provided as a csv file. We can convert that to XML with the following script:
import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../lib/csv.xqm";
let $feed := request:get-parameter("feed","") let $stream := request:get-parameter("stream","") let $archiveurl := concat("http://www.pachube.com/feeds/",$feed,"/datastreams/",$stream,"/archive.csv") let $data:= csv:get-data($archiveurl) let $rows := tokenize($data,$csv:newline)
let $now := current-dateTime() return <history feed="{$feed}" stream="{$stream}" dateTime="{$now}" {for $row in $rows let $point := tokenize($row,",") return <value dateTime="{$point[1]}">{$point[2]}</value> } </history> count="{count($rows)}">
http://www.cems.uwe.ac.uk/xmlwiki/Pachube/getArchive.xq?feed=4037&stream=2
Pachube feed 24 Hour History In the csv stream, these are untimed. The time has to be estimated and calculated using xs:dateTimeDuration:
import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../lib/csv.xqm";
203
let $feed := request:get-parameter("feed","") let $stream := request:get-parameter("stream","") let $historyurl := concat("http://www.pachube.com/feeds/",$feed,"/datastreams/",$stream,"/history.csv") let $data:= csv:get-data($historyurl) let $values :=tokenize($data,",") let $now := current-dateTime() let $then := $now - xs:dayTimeDuration("P1D") return <history feed="{$feed}" stream="{$stream}" dateTime="{$now}" {for $value at $i in $values let $dt := $then + xs:dayTimeDuration(concat("PT",15*$i,"M")) return <value dateTime="{$dt}">{$value}</value> } </history> count="{count($values)}">
http://www.cems.uwe.ac.uk/xmlwiki/Pachube/getHistory.xq?feed=4037&stream=2 </pre>
References
[1] http:/ / www. pachube. com/ [2] http:/ / twitter. com/ ni [3] http:/ / twitter. com/ towerbridge [4] http:/ / twitter. com/ statuses/ user_timeline/ 14012942. rss [5] http:/ / www. eeml. org/ [6] http:/ / www. weather-display. com/ index. php [7] http:/ / www. wunderground. com/ [8] http:/ / www. weather. gov/ xml/ current_obs/ [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Pachube/ NOAA3. xsl [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Pachube/ NOAA3. xq
Publishing Overview
204
Publishing Overview
Motivation
You have a workflow process that allows an internal team to review web content before it is transferred to a public web site. When the documents have been marked "approved for publication" they must be transferred to a public web server in a controlled way.
Methods
There are many ways to transfer XML documents from one server to another. This document will describe an set of basic methods that may vary based on your local configuration. In this document the following figure will be used.
Publishing Overview
205
The public web server then calls a function to load that resource from the content management system inside the system. This can be done with standard URL parameters. Note that in this case the passwords will be in the web log files. Example of getting URL parameters with: publish-with-callback.xq
Publishing Overview (: The user that will execute the login :) let $user := request:get-parameter('user', '') (: The pass that will execute the login :) let $pass := request:get-parameter('pass', '') (: The full URL of the document we are going to bring over :) let $url := request:get-parameter('url', '') (: the /db location we are going to put the new document into :) let $db-loc := request:get-parameter('db-loc', '') (: This is the document fetch from the internal CMS server :) let $get-doc := doc($url)
206
Note that this style is more secure since only documents that exist on the internal content management system are candidates for publishing.
Publishing Overview
207
Using Certificates
It is sometimes not possible to create a secure connection between an internal CMS systems and the publishing we site. An alternative method is to provide certificates to each system that is authorized to publish documents to the publishing server.
Publishing to Subversion
Motivation
You want to have a single button on a content management system that will copy a file to a remote subversion repository.
Method
We will configure our subversion repository on a standard Apache server that is configured with an SSL certificate. This will encrypt all communication between the intranet system and the remote subversion server. We will also set the authentication to be Basic Authentication.
Publishing to Subversion <header name="Authorization" value="{$value}"/> </headers> let $response := httpclient:put($url, $content, false(), $new-headers) return $response }; To put the file you just need to put the URL to the correct content area, and the content to be inserted, the user name and password and
208
References
Wikipedia entry on Basic access authentication
Quantified Expressions
Motivation
You have a list of items in a sequence and you want to test to see if any or all of the items match a condition. The result of the test on the sequence will be either true or false.
Method
Quantified expressions have a format very similar to a FLWOR expression with the the following two changes 1. instead of the word for you will use either the words some or every 2. instead of the where/order/return you will use the word satisfies The quantified expression always takes a sequence as its input and returns a boolean true/false as its output. Here is an example of a quantified expression which checks to see if there are any books that contain the word "cat". Assume you have a collection of books, each book has a single XML file with a title attribute such as this: <book> ... <title>The Cat With Nine Lives</title> ... </book> some $book in collection($collection)/book satisfies (contains(lower-case($book/title/text()), 'cat'))
Quantified Expressions This expression will return true as long as one book contains the word "cat" in the title. Note that the quantified expression can not be used to indicate which book title contains the word "cat", only that the word "cat" occurs in at least one title in your collection. Quantified expressions can often be rewritten by a single XPath expression with a predicate. In the above case the expression would be: let $has-a-cat-book := exists(collection($collection)/book/title[contains(lower-case(./text(), 'cat')])
209
The variable $has-a-cat-book will be set to true() if any book contains the word "cat". Some XQuery parsers can optimize quantified expressions better and some people feel that quantified expressions are more readable then a single XPath expression.
Registered Functions
Motivation
You want a list of all functions or all modules and their functions.
Method
There are two functions that we can use to get a list of functions in the current run-time system: util:registered-functions() util:registered-functions($module) The first function returns all registered functions, the second returns all registered functions for a given module.
Registered Functions <function>compression:tar</function> <function>compression:zip</function> <function>datetime:count-day-in-month</function> <function>datetime:date-for</function> ... Note that if there is no namespace prefix, the function is an XPath library function. (or the math module which also appears without a prefix ??)
210
Registered Functions
211
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ util/ registered-functions. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ util/ registered-module-functions. xq
Registered Modules
Motivation
You want to check to see if an module is loaded in your runtime systems.
Method
Some modules that you may need are not loaded into the runtime engine when the server starts. If this is the case you may have to dynamically load a module.
Sample Results
<results> <module>http://exist-db.org/xquery/compression</module> <module>http://exist-db.org/xquery/datetime</module> <module>http://exist-db.org/xquery/examples</module> <module>http://exist-db.org/xquery/file</module> <module>http://exist-db.org/xquery/httpclient</module> <module>http://exist-db.org/xquery/image</module> <module>http://exist-db.org/xquery/mail</module> <module>http://exist-db.org/xquery/math</module> <module>http://exist-db.org/xquery/ngram</module> <module>http://exist-db.org/xquery/request</module> <module>http://exist-db.org/xquery/response</module> <module>http://exist-db.org/xquery/scheduler</module> <module>http://exist-db.org/xquery/session</module> <module>http://exist-db.org/xquery/sql</module>
Registered Modules <module>http://exist-db.org/xquery/system</module> <module>http://exist-db.org/xquery/text</module> <module>http://exist-db.org/xquery/transform</module> <module>http://exist-db.org/xquery/util</module> <module>http://exist-db.org/xquery/validation</module> <module>http://exist-db.org/xquery/xmldb</module> <module>http://exist-db.org/xquery/xmldiff</module> <module>http://www.w3.org/2005/xpath-functions</module> </results>
212
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ util/ registered-modules. xq
Regular Expressions
Motivation
You want to test to see if a text matches a specific pattern of characters You want to replace patterns of text with other patterns. You have text with repeating patterns and you would like to break the text up into discrete items.
Method
To deal with the above three problems, XQuery has the following functions: matches($input, $regex) - returns a true if the input contains a regular expression replace($input, $regex, $string) - replaces an input string that matches a regular expression with a new string tokenize($input, $regex) - returns a sequence of items matching a regular expression Through these functions we have access to the powerful syntax of regular expressions.
Regular Expressions times the character should be repeated: "*" for "0, 1 or many times" "?" for "0 or 1 times," and "+" for "1 or many times." The combination "*?" replaces the shortest substring that matches the pattern. NB: this only scratches the surface of the subject of regular expressions! The three functions all accept optional flag parameters to set matching modes. The following four flags are available: i makes the regex match case insensitive. s enables "single-line mode" or "dot-all" mode. In this mode, the dot matches every character, including newlines, so the string is treated as a single line. m enables "multi-line mode". In this mode, the anchors "^" and "$" match before and after newlines in the string as well in addition to applying to the string as a whole. x enables "free-spacing mode". In this mode, whitespace in regex pattern is ignored. This is mainly used when one has divided a complicated regex over several lines, but do not intend the newlines to be matched. If one do not use a flag, one can just leave the slot empty or write "".
213
Examples of matches()
let $input := 'Hello World' return (matches($input, 'Hello') = true(), matches($input, 'Hi') = false(), matches($input, 'H.*') = true(), matches($input, 'H.*o W.*d') = true(), matches($input, 'Hel+o? W.+d') = true(), matches($input, 'Hel?o+') = false(), matches($input, 'hello', "i") = true(), matches($input, 'he l lo', "ix") = true() , matches($input, '^Hello$') = false(), matches($input, '^Hello') = true() )
Execute [1]
Examples of tokenize()
(let $input := 'red,orange,yellow,green,blue' return deep-equal( tokenize($input, ',') , ('red','orange','yellow','green','blue')) , let $input := 'red, orange, yellow, green,blue' return deep-equal(tokenize($input, ',\s*') , ('red','orange','yellow','green','blue')) , let $input := 'red , orange , yellow , green , blue' return not(deep-equal(tokenize($input, ',\s*') , ('red','orange','yellow','green','blue'))) ,
Regular Expressions let $input := 'red , orange , yellow , green , blue' return deep-equal(tokenize($input, '\s*,\s*') , ('red','orange','yellow','green','blue')) )
214
In the second example, "\s" represents one whitespace character and thus matches the newline before "orange" and the tab character before "yellow". It is quantified with "*" so the pattern removes whitespace after the comma, but not before it. To remove all whitespace, use the pattern '\s*,\s*'. Execute [2]
Examples of replace()
( let $input := 'red,orange,yellow,green,blue' return ( replace($input, ',', '-') = 'red-orange-yellow-green-blue' ) , let $input := 'Hello World' return ( replace($input, 'o', 'O') = "HellO WOrld" , replace($input, '.', 'X') = "XXXXXXXXXXX" , replace($input, 'H.*?o', 'Bye') = "Bye World" ) , let $input := 'HellO WOrld' return ( replace($input, 'o', 'O', "i") = "HellO WOrld" ) , let $input := 'Chapter 1 Chapter 2 ' return ( replace($input, "Chapter (\d)", "Section $1.0") = "Section 1.0 Section 2.0 ") ) In the last example, "\d" represents any digit; the parenthesis around "\d" binds the variable "$1" to whatever digit it matches; in the replacement string, this variable is replaced by the matched digit. Execute [3]
Regular Expressions
215
Larger examples
XQuery/Incremental Search of the Chemical Elements Uses Ajax and a regular expression to search for a chemical element
References
The Regular Expression Library has more than 2,600 sample regular expressions: Regular Expression Library [4] This page has a very useful summary of the regular expression patterns: Regular Expression Cheat Sheet [5] This page describes how to use Regular Expressions within XQuery and XPath: XQuery and XPath Regular Expressions [6]
References
[1] [2] [3] [4] [5] [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ matches1. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ tokenize. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ replace. xq http:/ / regexlib. com/ http:/ / regexlib. com/ CheatSheet. aspx http:/ / www. regular-expressions. info/ xpath. html
<interface>
<name>del.icio.us</name>
del.icio.us social
<endpoint>http://del.icio.us/</endpoint>
<parameters>
<parameter>
<name>user-id</name>
<purpose>User Identifier</purpose>
<default>morelysq</default>
<tag>model</tag>
</parameter>
<parameter>
<name>tag</name>
216
<default>xml</default>
<tag>model</tag>
</parameter>
<parameter>
<name>url</name>
<purpose>bookmark</purpose>
<default>http://xml.com/</default>
<tag>model</tag>
</parameter>
<parameter>
<name>tagview</name>
<options>
<option>list</option>
<option>cloud</option>
</options>
<default>list</default>
<tag>ui</tag>
</parameter>
<parameter>
<name>tagsort</name>
<options>
<option>alpha</option>
<option>freq</option>
</options>
<default>list</default>
<tag>ui</tag>
</parameter>
<parameter>
<name>minfreq</name>
<options>
<option>1</option>
<option>2</option>
<option>5</option>
</options>
<default>1</default>
<tag>ui</tag>
</parameter>
<parameter>
<name>bundleview</name>
<options>
<option>show</option>
<option>hide</option>
217
<default>show</default>
<tag>ui</tag>
</parameter>
<parameter>
<name>pageno</name>
<format>[0-9]+</format>
<default>1</default>
<tag>ui</tag>
</parameter>
<parameter>
<name>count</name>
<options>
<option>10</option>
<option>25</option>
<option>50</option>
<option>100</option>
</options>
<default>10</default>
<tag>ui</tag>
</parameter>
<parameter>
<name>search</name>
<purpose>search string</purpose>
<tag>ui</tag>
</parameter>
<parameter>
<name>scope</name>
<purpose>search scope</purpose>
<options>
<option>user</option>
<option>all</option>
<option>web</option>
</options>
<default>all</default>
<tag>ui</tag>
</parameter>
<parameter>
<name>helptopic</name>
<default>urlhistory</default>
<tag>help</tag>
</parameter>
</parameters>
<services>
218
<template/>
<purpose>Home Page</purpose>
<tag>home</tag>
</service>
<service>
<template>{user-id}</template>
<tag>user</tag>
</service>
<service>
<template>{user-id}?settagview={tagview}&settagsort={tagsort}&setminfreq={minfreq}&setbundleview={bundleview}&page={pageno}&setcount={count}</template>
<tag>user</tag>
</service>
<service>
<template>rss/{user-id}</template>
latest 20 items</purpose>
<tag>user</tag>
<tag>RSS</tag>
</service>
<service>
<template>{user-id}/{tag}</template>
<tag>user</tag>
<tag>tag</tag>
</service>
<service>
<template>tag/{tag}</template>
<tag>tag</tag>
</service>
<service>
<template>network/{user-id}</template>
<tag>user</tag>
<tag>network</tag>
</service>
<service>
<template>subscriptions/{user-id}</template>
<tag>user</tag>
<tag>subscriptions</tag>
</service>
<service>
219
<tag>user</tag>
<tag>links</tag>
</service>
<service>
<template>rss/tag/{tag}</template>
<tag>tag</tag>
<tag>RSS</tag>
</service>
<service>
<template>popular/{tag}</template>
<tag>tag</tag>
</service>
<service>
<template>popular/</template>
<tag>current</tag>
</service>
<service>
<template>popular/?new</template>
<tag>current</tag>
</service>
<service>
<template>url?url={url}</template>
<tag>url</tag>
</service>
<service>
<template>help/</template>
<purpose>Help index</purpose>
<tag>help</tag>
</service>
<service>
<template>help/{helptopic}</template>
<tag>help</tag>
</service>
<service>
<template>search/?fr=del_icio_us&p={search}&searchtype={scope}</template>
<tag>search</tag>
</service>
220
<template>rss/</template>
<tag>current</tag>
<tag>RSS</tag>
</service>
<service>
<template>rss/tag/{tag}</template>
<tag>tag</tag>
<tag>RSS</tag>
</service>
<service>
<template>html/{user-id}/</template>
<tag>user</tag>
<tag>html</tag>
</service>
</services>
</interface>
The Script
declare namespace rest = "http://ww.cems.uwe.ac.uk/xmlwiki/rest";
declare variable $uri := request:get-parameter("_uri",()); declare variable $index := request:get-parameter("_index","tag"); declare variable $interface := doc(concat($uri,"?r=",math:random()))/interface;
declare function rest:template-parameters($template as xs:string) as xs:string* { (: parse the template to get the parameters :) distinct-values( for $p in subsequence(tokenize($template,"\{"),2)
return substring-before($p,"}") ) };
221
as
declare function rest:replace-template-parameters($template as xs:string, $names as xs:string* ) as xs:string { (: recursively replace the tempate paramters by their current values :) if (empty($names)) then $template else let $name := $names[1] let $value := rest:parameter-value($name) let $templatex := if (exists($value)) then replace($template, concat("\{",$name,"\}"),$value) else $template return rest:replace-template-parameters(
$templatex,subsequence($names,2)) };
(:
interface generation
:)
declare function rest:parameter-input-field( $parameter as element(parameter) ) as element(span)? { (: create a parameter field in the parameter input form :)
let $name := $parameter/name let $value := rest:parameter-value($name) return <span class="input"> <label for="{$name}"> {if ($index = "parameter") (: if it the index
is by parameter, generate a link to that part of the index :) then <a href="#{$name}"> else } </label> {if ($parameter/options) then <select name="{$name}" title="{$parameter/purpose}" > {for $option in $parameter/options/option return <option value="{$option}" { if ($option= $value) title="{$option/@label}"> $name {string($name)}</a>
222
<div class="subhead"> interface <div class="group"> <label for="_uri" > uri </label> <input type="text" name="_uri" value="{$uri}" size="80"/> </div> </div> {for $tag in distinct-values($interface/parameters/parameter/tag) return <div> <div class="subhead">{$tag} </div> <div class="group"> { for $parameter in $interface/parameters/parameter[tag=$tag] return rest:parameter-input-field($parameter) } </div> </div> } <hr/> Index services by <select name="_index">
{for $index in ("parameter","tag") return if ($index = request:get-parameter("index","tag")) then <option value="{$index}" selected="true"> {$index} </option> else <option value="{$index}" > {$index} } </select> <hr/><input type="submit" value="refresh"/> </form> </option>
223
declare function rest:service-link ($service as element(service) )as element(tr) { <div> <div class="label">{string($service/purpose)}</div> { let $names := rest:template-parameters($service/template) let $filledTemplate := rest:replace-template-parameters($service/template,$names) let $uri := if (starts-with($service/template,"http://")) then $filledTemplate
declare
function rest:parameter-index() {
<div id="index"> <h2>Parameter index </h2> {for $parameter in $interface/parameters/parameter let $name := $parameter/name let $match := concat("{",$name,"}") order by lower-case($name) return <div> <div class="subhead"><a name="{$name}">{string($name)} </a> </div> <div class="group"> {for $service in $interface//service[contains(template,$match)] return } </div> </div> } </div> }; rest:service-link($service)
224
doctype-public=-//W3C//DTD XHTML 1.0 Transitional//EN
doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";
<html> <head> <title>Infterface for {string($interface/name)}</title> <link rel="stylesheet" type="text/css" href="screen.css" /> </head> <body> <h1>{string($interface/name)} interface</h1> {$interface/description/(text(),*)} <div id="parameters"> {rest:parameter-form()} </div> {if (exists($interface)) then <div id="services"> <h2> Interface properties </h2> <div class="group"> <div class="label">Interface definition <div class="link"><a href="{$uri}">{$uri} </a> </div> </div> <div class="label">Service endpoint <div class="link"><a href="{$interface/endpoint}"> {string($interface/endpoint)} </a> </div> </div> </div> {if ($index = "parameter")
225
else if ($index= "tag") then rest:tag-index() else () } </div> else () } </body> </html>
Discussion
Architecture
The script uses a common layered architecture in which low level functions operate on the base data model, and these functions are in turn used by functions which generate the user interface. Finally class and id hooks in the generated XHTML link with CSS to style the page. Determining how many layers to use and how the layers should interface is a central design decision in XQuery application, as it is in other technologies. Several alternatives are worth considering: the script generates an intermediate XML structure which is transformed server- or client-side with XSLT; the script generates an XForm in place of the HTML form; the whole task is handled client-side with JavaScript; client-side AJAX interfaces with a base XQuery script. Handling this design space is one of the challenges of web development.
Cache busting
For scripts running inside a proxy server,as these scripts are on the UWE server, repeated access to the same url in the doc() function will return the cached file. To break the cache, a random number is added to the URL.
Global Variables
The script uses variable declarations to define some global variables used in the script functions. Global variables feel like a reversion to Fortran COMMON and similar horrors, except that these are all constant once defined. Nonetheless, the dependence on these variables is not explicit. An alternative would be to explicitly pass this data down through the functions. An alternative script using this style, passing a single node which composed the data into a single 'object', executes several times slower, is more verbose and arguably no more understandable.
226
Recursion
Replacement of the multiple parameters in a template is a recursive function, successively replacing each parameter throughout the template in turn.
Other interface
Flickr [2]
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ URLTemplates/ interface. xq?_uri=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ URLTemplates/ delicious. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ URLTemplates/ interface. xq?_uri=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ URLTemplates/ flickr. xml
Sample Program
xquery version "1.0"; declare function local:max-length($string-seq as xs:string*) as xs:string+ { let $max := max (for $s in $string-seq return string-length($s)) return $string-seq[string-length(.) = $max] }; let $tags := <tags> <tag>Z</tag> <tag>Ze</tag> <tag>Zee</tag> <tag>Zen</tag> <tag>Zenith</tag> <tag>nith</tag> <tag>ith</tag> <tag>Zenth</tag> </tags> return <results> <max-string>{local:max-length(($tags/tag))}</max-string> </results>
227
Results
<results> <max-string>Zenith</max-string> </results> Execute [1]
Discussion
This XQuery creates a local function that takes zero or more strings: $string-seq as xs:string* and returns one or more strings: as xs:string+ It uses the max() XPath function that looks at a sequence of values and returns the highest. Note that if there are several strings in the input set that each have the same max length, it will return all strings of max length. If you only want the first returned, add "[1]" to the return expression: return $string-seq[string-length(.) = $max][1]
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ longestString. xq
Method
We use HTTP POST data and scan for a specific element like id. If the record does not have an id element, we know that we must create a new record. Note that there is no sequence number generated in this example yet. If there is an id parameter, we will delete the old file and save the new data into the same file. Note that there is no backup or archive.
228
http://localhost:8080/exist/rest/db/xquery-examples/save-test/new-update-save.xq?new=true For updates where each record has an id: http://localhost:8080/exist/rest/db/xquery-examples/save-test/new-update-save.xq?id=123 :) (: replace this with your document, for example use request:get-data() :) let $my-doc := <data> <id>123</id> <message>Hello World</message> </data>
let $id := $my-doc/id let $collection := 'xmldb:exist:///db/xquery-examples/save-test' (: this logs you in; you can also get these variables from your session variables :) let $login := xmldb:login($collection, 'mylogin', 'my-password')
(: replace this with a unique file name with a sequence number :) let $file-name := 'test-save.xml' return if (not($id)) then ( let $store-return-status := xmldb:store($collection, $file-name, $my-doc) return <message>New Document Created {$store-return-status} at {$collection}/{$file-name}</message> ) else ( let $remove-return-status := xmldb:remove($collection, $file-name) let $store-return-status := xmldb:store($collection, $file-name, $my-doc) return <message>Document {$id} has been successfully updated</message>)
229
References
[1] http:/ / www. w3. org/ TR/ xquery-update-10 [2] http:/ / www. exist-db. org/ update_ext. html
Method
There are several ways to do this. The simplest way is to put both collections in a parent collection and start your search at the parent. Lets assume you have three collections: /db/test /db/test/a /db/test/b To get all the books in both collection a and b just specify the parent collection which is /db/test for $book in collection('/db/test')//book Note that the double forward slash // will find the books anywhere in the base collection or any child collections. If you have two collections that are at different locations in a file system you can simply specify each collection and join them together using the sequence join operation. This is the default operation of enclosing two sequences in parenthesis. For example if you have two sequences, a and b, the concatenation of the two sequences is just (a,b). Assume you have two collections that have books in the following collections:
230
Collection A
File='/db/test/a/books.xml' <books> <book id="47"> <title>Moby Dick</title> <author>Herman Melville</author> <published-date>1851-01-01</published-date> <price>$19.95</price> <review>The adventures of the wandering sailor in pursuit of a ferocious wale.</review> </book> <book id="48"> <title>The Great Gatsby</title> <author>F. Scott Fitzgerald</author> <published-date>1925-05-10</published-date> <price>$29.95</price> <review>Chronicles of an era during the roaring 1920s when the US economy soared.</review> </book> </books>
Collection B
File='/db/test/b/books.xml' <books> <book id="49"> <title>Catch-22</title> <author>Joseph Heller</author> <published-date>1961-01-01</published-date> <price>$19.95</price> <review>A satirical, historical novel set during the later stages of World War II from 1943 onwards.</review> </book> <book id="48"> <title>Lolita</title> <author>Vladimir Nabokov</author> <published-date>1955-01-01</published-date> <price>$19.95</price> <review>A man becomes obsessed with a 12-year-old girl.</review> </book> </books> The following query would operate on both collections. xquery version "1.0"; let $col-a := '/db/test/a' let $col-b := '/db/test/b'
Searching multiple collections return <books>{ for $book in (collection($col-a)//book, collection($col-b)//book) return $book }</books> If you wanted to only return the titles you could use the following: xquery version "1.0"; let $col-a := '/db/test/a' let $col-b := '/db/test/b' return <books>{ for $book in (collection($col-a)//book, collection($col-b)//book) return $book/title }</books> This would return the following results: <books> <title>Moby Dick</title> <title>The Great Gatsby</title> <title>Catch-22</title> <title>Lolita</title> </books>
231
Sending E-mail
232
Sending E-mail
Motivation
You want to send an e-mail message from within an XQuery. This frequently done when a report has finished running or when a key event such as a task update has been done.
Method
eXist provides a simple interface to e-mail.
where $email The email message in the following format: <mail> <from/> <reply-to/> <to/> <cc/> <bcc/> <subject/> <message> <text/> <xhtml/> </message> <attachment filename="" mimetype="">xs:base64Binary</attachment> </mail>
$server $charset The SMTP server. If empty, then it tries to use the local sendmail program. The charset value used in the "Content-Type" message header (Defaults to UTF-8)
Sample Code
xquery version "1.0"; (: Demonstrates sending an email through Sendmail from eXist :) declare namespace mail="http://exist-db.org/xquery/mail"; declare variable $message { <mail> <from>John Doe <sender@domain.com></from> <to>recipient@otherdomain.com</to> <cc>cc@otherdomain.com</cc> <bcc>bcc@otherdomain.com</bcc> <subject>A new task is waiting your approval</subject> <message>
Sending E-mail <text>A plain ASCII text message can be placed inside the text elements.</text> <xhtml> <html> <head> <title>HTML in an e-mail in the body of the document.</title> </head> <body> <h1>Testing</h1> <p>Test Message 1, 2, 3</p> </body> </html> </xhtml> </message> </mail> }; if ( mail:send-email($message, 'mail server', ()) ) then <h1>Sent Message OK :-)</h1> else <h1>Could not Send Message :-(</h1>
233
References
eXist mail module [8] eXist send-mail function [1]
References
[1] http:/ / demo. exist-db. org/ exist/ functions/ mail/ send-email
Sequences
234
Sequences
Motivation
You want to manipulate a sequence of items. These items may be very similar to each other or they may be of very different types.
Method
We begin with some simple examples of sequences. We then look at the most common sequence operators. XQuery uses the word sequence as a generic name for an ordered container of items. Understanding how sequences work in XQuery is central to understanding how the language works. The use of generic sequences of items is central to functional programming and stands in sharp contrast to other programming languages such as Java or JavaScript that provide multiple methods and functions to handle key-value pairs, dictionaries, arrays and XML data. The wonderful thing about XQuery is that you only need to learn one set of concepts and a very small list of functions to learn how to quickly manipulate data.
Examples
Creating sequences of characters and strings
You use the parenthesis to contain a sequence, commas to delimit items and quotes to contain string values: let $sequence := ('a', 'b', 'c', 'd', 'e', 'f')
Note that you can use single or double quotes, but for most character strings a single quote is used. let $sequence := ("apple", 'banana', "carrot", 'dog', "egg", 'fig')
You can also intermix data types. For example the following sequence has three strings and three integers in the same sequence. let $sequence := ('a', 'b', 'c', 1, 2, 3)
You can then pass the sequence to any XQuery function that works with sequences of items. For example the count() function takes a sequence as an input and returns the number of items in the sequence. let $count := count($sequence)
To see the results of these items you can create a simple XQuery that displays the items using a FLOWR statement.
Sequences
235
return <results> <count>{$count}</count> <items> {for $item in $sequence return <item>{$item}</item> } </items> </results> Execute [1] <results> <count>6</count> <items> <item>a</item> <item>b</item> <item>c</item> <item>d</item> <item>e</item> <item>f</item> </items> </results>
Sequences <results> <count>a c d</count> <items> <item>a</item> <item>b</item> <item>c</item> <item>d</item> <item>e</item> <item>f</item> </items> </results>
236
Although you can use parenthesis to create sequence of XML items, a best practice (?when) is to use XML tags to begin and end a sequence and to store all items as XML elements. One suggestion is to use items as the element name to hold generic sequences of items. Here is an example of this: let $items := <items> <banana/> <fruit type="carrot"/> <animal type='dog'/> <vehicle>car</vehicle> </items> The other convention is to put all individual items in their own item element tags and to place each item on a separate line if the list of items gets long. let $items := <items> <item>banana</item> <item> <fruit type="carrot"/> </item> <item> <animal type='dog'/> </item> <item> <vehicle>car</vehicle> </item> </items> The following FLOWR expression can then be used to display each of these items: xquery version "1.0"; let $sequence := <items>
Sequences <item>banana</item> <item> <fruit type="carrot"/> </item> <item> <animal type='dog'/> </item> <item> <vehicle>car</vehicle> </item> </items>
237
return <results> {for $item in $sequence/item return <item>{$item}</item> } </results> This will return the following XML <results> <item> <item>banana</item> </item> <item> <item> <fruit type="carrot"/> </item> </item> <item> <item> <animal type="dog"/> </item> </item> <item> <item> <vehicle>car</vehicle> </item> </item> </results> Note that when the resulting XML is returned, only double quotes are present in the output.
Sequences
238
Counting Items
You can count the number of items in a sequence by using the count function and adding /* to the end of the sequence path.
All of these functions have a datatype of item()* which is read zero or more items. Note that both the distinct-values() function and the subsequence() function both take in a sequence and return a sequence. This comes in very handy when you are creating recursive functions. Along with count() are also a few sequence operators that calculate sums and average, min and max: sum($seq as item()*) - used to sum the values of numbers in a sequence
avg($seq as item()*) - used to calculate the average (arithmetic mean) of numbers in a sequence
min($seq as item()*) - used to find the minimum value of a sequence of numbers max($seq as item()*) - used to find the maximum value of a sequence of numbers These functions are designed to work on numeric values of items and all return numeric values. You many want to use the number() function when working with strings of items. You may find that you can perform many tasks just by learning these few XQuery functions. You can also create most other sequence operators from these functions.
remove($seq as item()*, $position as int) - removes an item from a sequence reverse($seq as item()*) - reverses the order of items in a sequence
index-of($seq as anyAtomicType()*, $target as anyAtomicType()) - returns a sequence of integers that indicate where an item is within a sequence (index counting starts at 1)
Sequences These last two functions can be used in conjunction with the bracketed predicate expressions '[]' which operates on an item's position information within a sequence.
239
last() - when used in a predicate returns the last item in a sequence so (1,2,3)[last()] returns 3
Sequences
240
Tests on Sequences
You can also test to see if a sequence contains one or all of the items in another set. There are several methods to do this.
Sequences
241
Sorting Sequence
There is no "sort" function in XQuery. To sort your sequence you just create a new sequence that contains a FLOWR loop of your items with the order statement in it. For example if you have a list of items with titles as one of the elements you can use the following to sort the items by title: let $sorted-items := for $item in $items order by $item/title/text() return $item You can return the items sorted by their element name : let $sorted-items := for $item in $items order by name($item) return $item You can also use descending with order by to reverse the order : for $item in $items order by name($item) descending return $item If you want to sort with your own order by creating a seperate sequence and using the index-of function to find where this item is in the sequence : for $i in /root/* let $order := ("b", "a", "c") let $name := name($i) order by index-of($order, $i) return $i
Sequences Union You can also create a "union" set that removes duplicates for all items that are in both sets by using the distinct-values() function: distinct-values(($sequence-1, $sequence-2)) This will return the following: a b c d e f Note that the "c d" pair is not repeated. Intersection You can now use a variation of this to find the intersection of all items in sequence-1 that are not in sequence-2: distinct-values($sequence-1[.=$sequence-2]) This will return only items that are in BOTH sequence-1 AND sequence-2: c d The way you read this is "for each item in sequence-1, if this item (.) is also in sequence-2 then return it." Exclusion The last set operation you might want to do is the "exclusion" function, where we find all items in the first sequence that are NOT in the second sequence. distinct-values($sequence-1[not(.=$sequence-2)]) This will return a b Returning Duplicates The following example returns a list of all items that occur more than once in a sequences. This process is known as "duplicated detection" xquery version "1.0"; let $seq := ('a', 'b', 'c', 'd', 'e', 'f', 'b', 'c') let $distinct-value := distinct-values($seq) (: for each distinct item if the count is greater than 1 then return it :) let $duplicates := for $item in $distinct-value return if (count($seq[.=$item]) > 1) then $item else () return <results>
242
Sequences <sequence>{string-join($seq, ', ')}</sequence> <distinct-values>{$distinct-value}</distinct-values> <duplicates>{$duplicates}</duplicates> </results> This returns: <results> <sequence>a, b, c, d, e, f, b, c</sequence> <distinct-values>a b c d e f</distinct-values> <duplicates >b c</duplicates > </results> You can also remove all duplicates just by moving the $item to the else() portion of the if statement and putting () in the then() portion of the else statement: if (count($seq[.=$item]) > 1) then () else $item
243
Execute [3]
This process is very common way to store related files files in subcollections.
Sequences
244
Counting Items
It is very common to need to count your items as you go through them. You can do this by adding the "at $count" to your FLWOR loop: for $item at $count in $sequence return <item> <count>{$count}</count> {if ($count mod 2) then <odd/> else <even/>} </item> Note that the modulo (divide by) function: ($count mod 2) returns 1 for odd numbers, which gets converted to true(), and zero for even numbers, which gets converted to false. You can use this technique to make alternating rows of tables different colors.
Sequences
245
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ fn/ sequence1. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ fn/ sequence2. xq [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ fn/ codepoints1. xq
Sequences Module
Motivation
You want to perform a function on a sequence. You can one of the following functions: map, fold or filter.
Method
Here is the structure of these three functions.
sequences:map($func as function, $seqA as item()*, $seqB as item()*) as item()*
sequences:fold($func as function, $seq as item()*, $start as item()) sequences:filter($func as function, $seq as item()*) as item()* Each of them takes an XQuery function as the first argument.
Map
The map function applies the function item $f to every item from the sequence $seq in turn, returning the concatenation of the resulting sequences in order. W3C Page on Map [1]
References
[1] http:/ / www. w3. org/ TR/ xpath-functions-30/ #func-map
246
Sample Module
The following module was provided by Thomas White. File tw_stream-binary-cached.xql xquery version "1.0" encoding "UTF-8"; module namespace cached-binary = "http://www.thomas-white.net/xqm/stream-binary-cached.1.0" ; declare default function namespace "http://www.w3.org/2005/xpath-functions"; import module namespace xdb="http://exist-db.org/xquery/xmldb"; import module namespace cache = "http://exist-db.org/xquery/cache"; import module namespace datetime = "http://exist-db.org/xquery/datetime"; import module namespace util = "http://exist-db.org/xquery/util"; declare option exist:serialize "method=xml media-type=text/xml"; declare function cached-binary:eTag( $pathToBinaryResource as xs:string, $last-modified as xs:dateTime, $domain-tag as xs:string ) as xs:string{ concat( $domain-tag, '-', util:document-id( $pathToBinaryResource ) ,'-', fn:translate( fn:substring($last-modified,1,19),':-T' , '') ) }; declare function cached-binary:eTag-from-uri( $pathToBinaryResource as xs:string, $domain-tag as xs:string ) as xs:string{ cached-binary:eTag( $pathToBinaryResource, xdb:last-modified( util:collection-name( $pathToBinaryResource ), util:document-name( $pathToBinaryResource )), $domain-tag) }; declare function cached-binary:stream-binary-with-cache-headers( $original-path as xs:string?, $pathToBinaryResource as xs:string, $expiresAfter as xs:dayTimeDuration?,
Setting HTTP Headers $must-revalidate as xs:boolean, $doNotCache as xs:string, $domain as xs:string? ) { if( fn:string-length($pathToBinaryResource) = 0 or not( util:binary-doc-available( $pathToBinaryResource )) ) then ( response:set-status-code( 404 ), concat( $original-path, ' ( ', $pathToBinaryResource, ' ) not found!') (: ($original-path, $pathToBinaryResource)[1] :) ) else ( let $coll := util:collection-name( $pathToBinaryResource ) let $file := util:document-name( $pathToBinaryResource ) let $last-modified := xdb:last-modified( $coll, $file) let $ETag := cached-binary:eTag( $pathToBinaryResource, $last-modified, $domain ) let $if-modified-since := request:get-header('If-Modified-Since') let $expire-after := if( empty($expiresAfter) ) then xs:dayTimeDuration( "P365D" ) else $expiresAfter (: 365 Day expiry period :) let $content-type:= ( util:declare-option('exist:serialize', concat("media-type=", xdb:get-mime-type( xs:anyURI( $pathToBinaryResource) )) ), response:set-header( "Pragma", 'o' ) ) return if( not($doNotCache = 'true') and ( ( request:get-header('If-None-Match') = $ETag ) or (: ETag :) (fn:string-length($if-modified-since) > 0 and datetime:parse-dateTime( $if-modified-since, 'EEE, d MMM yyyy HH:mm:ss Z' ) <= $last-modified ) )) then ( response:set-status-code( 304 ), response:set-header( "Cache-Control", concat('public, max-age=', $expire-after div xs:dayTimeDuration('PT1S') )) (: 24h=86,400 , must-revalidate :) ) else ( let $maxAge := $expire-after div xs:dayTimeDuration('PT1S') let $headers := ( response:set-header( "ETag", $ETag ), response:set-header( "Last-Modified",
247
Setting HTTP Headers datetime:format-dateTime( $last-modified, 'EEE, d MMM yyyy HH:mm:ss Z' )), response:set-header( "Expires", datetime:format-dateTime( dateTime(current-date(), util:system-time()) + $expire-after, 'EEE, d MMM yyyy HH:mm:ss Z' )), if( $doNotCache = 'true' ) then ( response:set-header( "Cache-Control", 'no-cache, no-store, max-age=0, must-revalidate' ), response:set-header( "X-Content-Type-Options", 'nosniff' ) )else response:set-header( "Cache-Control", concat( 'public, max-age=', $maxAge, if( $must-revalidate ) then ', must-revalidate' else '' )) ) return response:stream-binary( util:binary-doc( xs:anyURI($pathToBinaryResource ) ), xdb:get-mime-type( xs:anyURI( $pathToBinaryResource) ), xs:anyURI( ($original-path, $pathToBinaryResource)[1] ) ) ) ) }; (: HTTP/1.1 200 OK Date: Fri, 30 Oct 1998 13:19:41 GMT Server: Apache/1.3.3 (Unix) Cache-Control: max-age=3600, must-revalidate Expires: Fri, 30 Oct 1998 14:19:41 GMT Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT ETag: "3e86-410-3596fbbc" Content-Length: 1040 Content-Type: text/html Cache-Control: max-age=3600, must-revalidate Expires: Fri, 30 Oct 1998 14:19:41 GMT Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT ETag: "3e86-410-3596fbbc" Cache-Control: public, max-age=1728000
248
Setting HTTP Headers Expires: Thu, 06 Aug 2009 10:04:13 GMT Date: Fri, 17 Jul 2009 10:04:13 GMT Content-Type: text/javascript; charset=UTF-8 ETag: "ih2h6n8r44hc" Last-Modified: Fri, 05 Sep 2003 02:11:15 GMT X-Content-Type-Options: nosniff :) xquery version "1.0" encoding "UTF-8";
249
declare default function namespace "http://www.w3.org/2005/xpath-functions"; import module namespace request = "http://exist-db.org/xquery/request"; import module namespace cached-binary = "http://www.thomas-white.net/xqm/stream-binary-cached.1.0" "tw_stream-binary-cached.xql";
at
cached-binary:stream-binary-with-cache-headers( request:get-parameter("url", ()), request:get-parameter("uri", 'no-uri'), xs:dayTimeDuration(request:get-parameter("expire", 'P30D')), xs:boolean(request:get-parameter("must-revalidate", 'false') = 'true'), request:get-parameter("doNotCache", ''), request:get-parameter("domain", '') )
Simile Exhibit
250
Simile Exhibit
Motivation
You want to create a Simile Exhibit output of an XML file. To do this we will need to convert XML to JSON file format.
Method
You have a file of contributors to a book and you would like to create a map of their locations. <contributors> <contributor> <author-name>John Doe</author-name> <bio>John is a software developer interested in the semantic web.</bio> <location>New York, NY</location> <image-url>http://www.example.com/images/john-doe.jpg</image-url> </contributor> <contributor> <author-name>Sue Anderson</author-name> <bio>Sue is an XML consultant and is interested in XQuery.</bio> <location>San Francisco, CA</location> <image-url>http://www.example.com/images/sue-anderson.jpg</image-url> </contributor> </contributors>
:)
Simile Exhibit let $item-header := concat($nl, ' ', $lcb, ' ') let $item-footer := concat(' ', $rcb) return <results>{$json-header} { string-join( for $contributor in doc($document)/contributors/contributor return <item>{$item-header}label: "{$contributor/author-name/text()}", location: "{$contributor/location/text()}", "image-url": "{$contributor/image-url/text()}" {$item-footer}</item> , ', ') }{$json-footer}</results>
251
Alternative Approach
An alternative is to use the fact that curly-braces can be escaped in XQuery by doubling. Since the output is being serialized as text, all elements will be serialised, so there is no need to serialise items separately. xquery version "1.0"; declare option exist:serialize "method=text media-type=text/plain"; let $document := '/db/Wiki/JSON/contributors.xml' return <result> {{ "items" : [ { string-join( for $contributor in doc($document)/contributors/contributor return <item> {{
Simile Exhibit label: "{$contributor/author-name}", location: "{$contributor/location}", "image-url": "{$contributor/image-url}" }} </item> , ', ' ) } ] }} </result> Execute [1]
252
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ JSON/ asJSON4. xq
Method
We will use the eXist get-child-collections() function to get all of the child collections for a root collection. We create a recursive function that traverses the collection tree. From the eXist function library, here is the description of get-child-collections function.
xmldb:get-child-collections($a as xs:string) xs:string*
Returns a sequence of strings containing all the child collections of the collection specified in $a. The collection parameter can either be a simple collection path or an XMLDB URI.
If we have a collection called /db/webroot we could pass this string as a parameter to this function and all the child collections would be returned as sequence of strings. We can then create a recursive function that works on each of these child collections.
253
This recursive function takes a single input argument of a string and returns a complex node. The result is an HTML ordered list structure. It first does a test to see if there are any children elements in the collection. If there are not any, it just returns. If there are new children elements, then it creates a new ordered list and iterates through all the child elements in that collection creating a new list item for each child and then calling itself. Note that this could have been written so that the conditional operator only calls itself if there are child elements in a collection. This is an example of tail recursion. This pattern occurs frequently in XQuery functions.
Source Code
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes";
254
<html> <head> <title>Sitemap</title> </head> <body> <h1>Sitemap for collection /db/webroot</h1> {local:sitemap('/db/webroot')} </body> </html>
Adding Titles
Sometimes the title for the navigation bar will be different from the name of the collection. By convention collection names usually are just short lowercase letters without spaces or uppercase letters. Navigation bars typically have labels that contain spaces and uppercase letters. Here is an example that uses a lookup table to look up the title from a an XML file.
xquery version "1.0"; declare function local:sitemap($collection as xs:string) as node()* { if (empty(xmldb:get-child-collections($collection))) then () else <ol>{ for $child in xmldb:get-child-collections($collection) let $db-path := concat($collection, '/', $child) let $path := concat('/exist/rest', $collection, '/', $child) let $lookup := doc('/db/apps/sitemap/06-collection-titles.xml')/code-table/item[$db-path=path]/title/text() order by $child return <li> <a href="{if (empty($lookup)) then ($path) else (concat($path, "/index.xhtml"))}"> {if (empty($lookup)) then ($child) else ($lookup)} </a>
255
Screen Image
Note that the child collections are all sorted alphabetically. In some cases this may not be the order you would like to display your site navigation menus. You can add an element a sort-order parameter to the XML file that displays the titles and use that field to sort the child collections.
Sitemap for Content Management System <path>/db/webroot/training/xquery</path> <title>XQuery</title> </item> <item> <path>/db/webroot/training/tei</path> <title>Text Encoding Initiative</title> </item> <item> <path>/db/webroot/training/exist</path> <title>eXist</title> </item> <item> <path>/db/webroot/products</path> <title>Products</title> </item> <item> <path>/db/webroot/support</path> <title>Support</title> </item> </code-table>
256
Slideshow
257
Slideshow
Motivation
Despite being a 20-year old program, albeit enhanced over the years, Microsoft PowerPoint is the ubiquitous presentation software. It provides a wide range of functionality, but most of us use it for simple text slides, perhaps with a bit of animation. However Powerpoint does not cleanly separate the content of the presentation from its presentation (as slides, in printed form, as a index) and appearance (styles, colors), is an expensive proprietary product and over-weight for many tasks. Thus there is value in using simple XML tools to provide similar functionality.
Prior art
There are a number of approaches to using XML technologies to provide light-weight, non-proprietorial presentation software. These typically rely on a web browser as the rendering engine (a design choice not open to Richard Gaskins [1] in 1984). Core problems are the task of dividing the presentation into separate slides and supporting navigation in the slide sequence. Slidy [2] by Dave Raggett - SlidyXML, XSLT, Javascript S5 [3] by Eric Meyer DocBook [4] by Norman Walsh - DocBook, XSLT [5] DocBook, CSS, Opera presentation mode
Presentation format
Other approaches use a defined vocabulary but the choice here is to use XHTML with a little additional markup to define slide boundaries and slideshow properties. This provides a wide range of functionality needed such as formating, linking, images, embedded video.
<ss:slideshow xmlns="http://www.w3.org/1999/xhtml" <ss:css> <ss:slide>slide.css</ss:slide> <ss:print/> </ss:css> <ss:header>DSA 2008 - Lecture 1 - Chris Wallace</ss:header> <ss:footer/> <ss:slide> <h1>Teaching approach</h1> <ul> <li>1 lecture a week <ul> <li>Interaction using SMS whiteboard and Multi-choice cube</li> </ul> </li> <li>1 2-hour workshop every 2 weeks - write down the weeks you have been allocated</li> <li>2 hour Research time every 2 weeks (alternating with the xmlns:ss="http://www.cems.uwe.ac.uk/xmlwiki/slideshow">
Slideshow
workshops) - independent study with tutor support </li> <li>Teaching resources in UWEOnline and in the <a href="https://www.cems.uwe.ac.uk/studentwiki/index.php/UFIEKG-20-2/2008">studentWiki</a> </li> </ul> </ss:slide> ...
258
Here I have used two namespaces: the default is the XHTML used in the slide body, the ss namespace is used for the slideshow elements which define slideshow properties, master slide, and the slide boundaries. This is a very minimal format which would be expanded in future.
The Script
The XML document defining the slideshow content needs to be transformed into slides for projection and into a print format. In this implementation, both versions are generated from the same script.
Namespaces
The two namespaces must be declared - an arbitrary prefix used for the default XHTML namespace. declare namespace ss= "http://www.cems.uwe.ac.uk/xmlwiki/slideshow"; declare namespace h = "http://www.w3.org/1999/xhtml" ;
Parameters
The slide parameters are the uri of the slideshow document (whether a database document or an external document), the slide number and the mode - slide or print. The parameters are passed in a semicolon-delimited query string rather than in the more usual key=value form because I was unable to get the & separator to work in Javascript(??) declare declare declare declare variable variable variable variable $params := tokenize(request:get-query-string(),";"); $uri := $params[1]; $n := xs:integer(($params[2],1)[1]); $mode :=($params[3],"slide")[1];
Slideshow
259
Contents Slide The <h1> element in each slide is used to generate a contents slide, numbered 0.
declare function local:show-contents() as element(div) { <div class="contents"> <span class="header">{$slideshow/ss:header/node()}</span> <h1>Contents</h1> <ul> {for $slide at $i in $slides return <li>{$i}  <a href="slide.xql?{$uri};{$i}">{string($slide/h:h1)}</a> </li> } </ul> <span class="footer">0/{$count}   {$slideshow/ss:footer/node()} </div> }; </span>
Navigation Navigation is handled by a JavaScript function which handles keypress events and is attached to the page body. This code is different for each slide so is generated for each slide. The keypress mapping is based partly on the codes generated by a common wireless presenter, the Labtec Notebook presenter [6] which is designed for use with PowerPoint. Documentation on the device was hard to find, so the key mapping was analysed by capturing the keypresses observed by a simple javascript. left and right buttons: PageUp and PageDown to step forwards and backwards bottom key : 'b' to blank the screen top button : toggle between F5 to fullscreen, Esc to edit mode Other key mappings were added to allow the cursor keys to be used and to go to print mode. Note that in generating this Javascript code, { } brackets need to be doubled. declare function local:keypress-script() as element(script) { let $prev := if ($n > 0) then $n - 1 else 1 let $next := if ($n < $count) then $n + 1 else $count return <script type="text/javascript"> function keypress(e) {{
Slideshow var code=e.keyCode if (code==34 || code== 39) document.location = "slide.xql?{$uri};{$next}" //Page UP or forward : next if (code==33 || code== 37) document.location = "slide.xql?{$uri};{$prev}" //Page Down or back : previous if (code==66 || code==38 ) document.location = "slide.xql?{$uri};0" //b or up : index if (code==36) document.location ="slide.xql?{$uri};1" //Home : first if (code==35) document.location ="slide.xql?{$uri};{$count}" //End : last if (code==80 || code==40 ) document.location ="slide.xql?{$uri};0;print" //p : print }} </script> }; Generate declare option exist:serialize "method=xhtml media-type=text/html"; if ($mode="slide") then <html> <head> <title>{string($slideshow/ss:title)} - Slides</title> <link rel="stylesheet" type="text/css" href="{$slideshow/ss:css/ss:slide}"/> {local:keypress-script()} </head> <body onkeydown="keypress(event)"> { if ($n=0) then local:show-contents() else local:show-slide($slides[$n]) } </body> </html> else ...
260
Print format
Other functions generate a printable version of the slide show. This comprises : Contents Page
declare function local:print-contents() as element(div) { <div class="contents"> <h2>Contents</h2> <ul> {for $slide at $i in $slides return
Slideshow
<li>{$i} . <a href="slide.xql?{$uri};{$i}">{string($slide/h:h1)}</a> </li> } </ul> </div> };
261
Slides declare function local:print-slides() as element(div)* { for $slide at $i in $slides return $slide };
Links The URIs for links is not visible in the printed slides, so it is useful to add a final page with all the links which appear in the slides listed together. declare function local:print-links() as element(div) { <div class="links"> <h1>Links</h1> <ul> {for $slide at $i in $slides for $link in $slide//h:a order by upper-case($link) return <li>{string($link)} :<em>{string($link/@href)}</em> </li> } </ul> </div> }; Generate Print View If the mode is "print" then generate the print format:
.. else <html> <head> <title>{string($slideshow/ss:title)} - Print</title> <link rel="stylesheet" type="text/css" </head> <body> {local:print-contents()} {local:print-slides()} {local:print-links()} </body> href="{$slideshow/ss:css/ss:print}"/>
Slideshow
</html>
262
Execute
An introductory lecture [7] - incomplete CW 18/09/08
References
[1] [2] [3] [4] [5] [6] [7] http:/ / www. robertgaskins. com/ http:/ / www. w3. org/ Talks/ Tools/ Slidy/ #(1) http:/ / meyerweb. com/ eric/ tools/ s5/ http:/ / docbook. sourceforge. net/ http:/ / www. thingbag. net/ docbook/ sig032503/ enus/ misc/ operashow. html http:/ / www. labtec. com/ index. cfm/ gear/ details/ EUR/ EN,crid=29,contentid=730 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SlideShow/ slide. xql?/ db/ Wiki/ SlideShow/ DSA1a. xml
SMS tracker
Motivation
BrightKite [1] provides a service to micro-blog your location and message to a service to geocode the address, map it, find other tweeters nearby and forward to other micro-blogs. However for UK users the service lacks the availability of an SMS service. The following scripts provide a basic SMS tracker service, allowing a user to text an address and a message to an SMS service and see that location on a generated map. This simple application does not provide the social aspects of BrightKite, being confined to creating a simple track.
Implementation
Dependencies
eXist-db Modules xmldb - to update the track datetime - for date formatting util - serialize to convert XML to CDATA Other an SMS two-way service GoogleGeocoding service kml-based mapping such as GoogleMap or GoogleEarth
SMS tracker
263
In-bound messages
Inbound messages have the structure: geo {address} ! {message} SMS messages are sent to the UWE SMS two-way service described in here [2]. The router uses the first word to route the message to the associate service, in this case track2sms.xq. This service is invoked viat HTTP, passing the prefix (prefix), the originating mobile number (from) and the text of the message (text') following the prefix. The script uses the originating mobile number to find the associated track. If there is one, the message is parsed into the address and message text. The address is passed to the Google geocoding service. If the address is recognised, a new event is created and appended to the rest of the events in the track and a confirmation returned to the originator (via the SMS two-way service).
declare namespace declare namespace geo = "http://www.cems.uwe.ac.uk/exist/geo";
kml = "http://earth.google.com/kml/2.0";
SMS tracker
264
declare function geo:geocode($address as xs:string) element(geo:location)* { let $address := normalize-space($address) let $address := encode-for-uri($address) let $url :=
as
concat($geo:googleUrl,$address,"&output=xml&key=",$geo:googleKey) let $response := doc($url) for $placemark in $response//kml:Placemark let $point := $placemark/kml:Point/kml:coordinates let $latlong := tokenize($point,",") return <geo:location latitude="{$latlong[2]}" }; declare variable $sep declare variable $from declare variable $text declare variable $track declare variable $now := := := := := "!"; request:get-parameter("from",()); request:get-parameter("text",()); //geo:track[geo:mobile = $from]; longitude="{$latlong[1]}"/>
string(adjust-dateTime-to-timezone(current-dateTime())); declare option indent=yes"; if (exists($track)) let $address := exist:serialize "method=text media-type=text/text
then if (contains($text,$sep)) then normalize-space(substring-before($text,$sep)) else normalize-space($text) let $message := substring-after($text,$sep) let $location := geo:geocode($address) return if (exists($location) and count($location)=1) then let $update := update insert <entry xmlns="http://www.cems.uwe.ac.uk/exist/geo" date='{$now}' > <address>{$address}</address> {$location} <message> {$message} </message> </entry> into $track/geo:entries
SMS tracker
return concat("Reply: else concat("Reply: else () ",$track/name," address :", $address, "not geocoded or ambiguous", $text,":",$message) ",$address, " is at lat: ", $location/@latitude, " long:.", $location/@longitude)
265
declare namespace kml = "http://earth.google.com/kml/2.1" ; declare function geo:entry-to-kml($entry element(Placemark) { let $location := $entry/geo:location let $latlong := concat($location/@latitude," ",$location/@longitude) let $dt := datetime:format-dateTime($entry/@date,"yy/MM/dd HH:mm") let $popup := <div xmlns="http://www.w3.org/1999/xhtml"> <h3>{string($entry/geo:address)}</h3> <p> {string($entry/geo:message)} </p> </div> return <Placemark> <name>{$dt}  {string($entry/geo:title)}</name> <description> {util:serialize($popup,"method=xhtml")} </description> <Point> <coordinates> {string-join(($location/@longitude,$location/@latitude),",")} </coordinates> </Point> </Placemark> }; as element(geo:entry)) as
"method=xml indent=yes
SMS tracker
let $dummy := response:set-header('Content-Disposition',concat('inline;filename=',$name,'.kml;')) return <kml xmlns="http://earth.google.com/kml/2.1" <Folder> <name>{$name}</name> <title>{$track/geo:title}</title> { for $entry in $track//geo:entry return } </Folder> </kml> geo:entry-to-kml($entry) >
266
Example Map
Google Map [3] Note that one address has been miscoded but the feedback allowed the address to be changed and resent.
To do
1. edit track to remove or correct bad geo-coding 2. add events from a browser
References
[1] http:/ / brightkite. com/ [2] http:/ / en. wikibooks. org/ wiki/ XQuery/ String_Analysis#SMS_service [3] http:/ / maps. google. co. uk/ maps?q=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Tracker/ map. xq?name=wiki
Southampton Pubs
267
Southampton Pubs
Pubs of Southampton
Data on Pubs in Southampton [1] has collected by a couple of enthusiasts. John Goodwin created an RDF representation of this data and an interface [3] to the data.
[2]
Conversion to KML
The RDF is straightforward to convert to KML. The RDF uses a number of namespaces, not all of which are used in this extract. March 2009: This script was discovered to be broken. The base RDF file has been changed to add a new namespace for addresses [4] in place of the local pub namespace. Since there is of course no notification of such changes, the user of published RDF data sets is not in a much better position than the web scraper, unless the application is written to first check that the vocabulary assumed by the application is still used. However there is no mechanism for expressing the mixture of bits of vocabs used in a RDF dataset. If there were, at least the interfaces could be checked by having a similar definition of the parts actually used in this application to compare. 5 March 2008 Sadly John has been forced to take down this data set due to adverse reaction to one of the pub reviews.
declare namespace rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; declare namespace rdfs= "http://www.w3.org/2000/01/rdf-schema#"; declare namespace pub= "http://www.johngoodwin.me.uk/pubs/"; declare namespace geo ="http://www.w3.org/2003/01/geo/wgs84_pos#"; declare namespace con ="http://www.w3.org/2000/10/swap/pim/contact#"; declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none"; let $x := response:set-header('Content-disposition','Content-disposition: inline;filename=sotonpubs.kml;') let $pubs := doc("http://www.johngoodwin.me.uk/pubs/models/pubs.rdf")/rdf:RDF return <Folder> {for $pub in $pubs/rdf:Description let $description := <div> <div style="color:gray">{concat($pub/con:address//con:street," ", $pub/con:address//con:postalCode)}</div> <div style="color:blue">{string($pub/pub:description)}</div> <hr/> <div style="font-size:10pt">{$pub/pub:dateSurveyed}</div> </div> return
Southampton Pubs
<Placemark> <name>{string($pub/rdfs:label) } </name> <description>{ <Point> <coordinates>{concat($pub/geo:long,",",$pub/geo:lat,",0")}</coordinates> </Point> </Placemark> } </Folder> util:serialize($description,"method=xhtml")}</description>
268
On GoogleMap [5]
References
[1] [2] [3] [4] http:/ / www. pubsinsouthampton. co. uk/ http:/ / www. johngoodwin. me. uk/ pubs/ models/ pubs. rdf http:/ / www. johngoodwin. me. uk/ pubs/ pubindex. html http:/ / www. w3. org/ 2000/ 10/ swap/ pim/ contact#
[5] http:/ / maps. google. co. uk/ maps?q=http:%2F%2Fwww. cems. uwe. ac. uk%2Fxmlwiki%2FRDF%2Fmappubs. xql
First attempt
import module namespace fr="http://www.cems.uwe.ac.uk/wiki/fr" "fr.xqm"; declare variable $query := " PREFIX : <http://dbpedia.org/resource/> PREFIX p: <http://dbpedia.org/property/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE { ?resource p:callingCode ?callingCode. } "; declare option exist:serialize "method=xhtml media-type=text/html"; <html> <head> <title>Country Calling codes</title> </head> at
SPARQLing Country Calling Codes <body> <h1>Country Calling codes</h1> <table border="1"> { for $country in fr:sparql-to-tuples(fr:execute-sparql($query)) let $name := fr:clean($country/resource) order by $name return <tr> <td><a href="{$country/resource}">{$name}</a></td> <td>{$country/callingCode}</td> </tr> } </table> </body> </html> Run [2] In this script the resource uri is parsed to get the local name part of the resource URI in the fr:clean() function. The more sound alternative is to filter the multilingual rdfs:label property: SELECT * WHERE { ?resource p:callingCode ?callingCode. ?resource rdfs:label ?name. FILTER (lang(?name) = 'en') } Run [3] but this query is naturally much slower.
269
Discussion
This query returns a set of dbpedia resources which have a callingCode property. However, it includes resources which are not countries and it proves quite difficult to identify which resources are countries. It might be expected that either the skos:subject or rdfs:type predicates would identify countries, but this is not the case. Of course, what entities are classified as countries is a debatable issue, as is currently illustrated by Kosova and by the documentation on ISO 3166. Perhaps countries are better identified by properties. There is a property countryCode which looks promising: The SPARQL query becomes: PREFIX : <http://dbpedia.org/resource/> PREFIX p: <http://dbpedia.org/property/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE { ?resource p:callingCode ?callingCode. ?resource p:countryCode ?countryCode. } Run [4]
SPARQLing Country Calling Codes However this shows that many countries have incomplete data in dbpedia, or that the coding of this property is inconsistent. This is not surprising because there are a number of types of country codes, which result in different definitions of country: ISO 3166-1 alpha-3 [5] ISO 3166-1 alpha-2 [6] ISO 3166-1 numeric [7] IOC country codes [8] License plate numbers [9] Top-level domain codes [10]
270
Wikipedia scraping
In fact, International Calling codes are listed in a wikipedia entry [11] Thus a more direct approach would be to generate the table by scraping wikipedia directly. However, now we err in the opposite direction, in that there are calling codes for telecom services as well as countries, and the format of numbers and names is inconsistent - some multiple numbers, some numbers with leading + , some countries with appended synonyms etc. In this script, the path expression finds the anchor "Alphabetical_Listing" and then finds the following table. declare namespace h= "http://www.w3.org/1999/xhtml" ; let $url := "http://en.wikipedia.org/wiki/International_calling_codes" let $wikipage := doc($url) let $section := $wikipage//h:table[@class="wikitable sortable"][2] return $section
Jan 2010 - the page layout had changed so that the previous path to this table :
let $section := $wikipage//h:a[@name="Alphabetical_Listing"]/../following-sibling::h:table[1]
Export as RDF
An alternative is to export this table as RDF. Here the resource is the dbpedia resource and the property is defined in the dbpedia property namespace. declare namespace h= "http://www.w3.org/1999/xhtml" ; declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; declare namespace p = "http://dbpedia.org/property/"; let $url := "http://en.wikipedia.org/wiki/International_calling_codes" let $wikipage := doc($url) let $section := $wikipage//h:table[@class="wikitable sortable"][2] return <rdf:RDF xmlns:p = "http://dbpedia.org/property/">
SPARQLing Country Calling Codes {for $row in $section/h:tr[h:td] let $country := string($row/h:td[1]) let $code := string($row/h:td[2]/h:a[1]) let $code := replace($code,"\*","") let $resource := concat("http://dbpedia.org/resource/", replace($country," ","_")) return <rdf:Description rdf:about="{$resource}"> <p:internationalcallingCode>{$code}</p:internationalcallingCode> </rdf:Description> } </rdf:RDF> Similarly the structure of this table changed so this code needed to be updated. RDF [13]
271
References
[1] http:/ / blogs. sun. com/ bblfish/ entry/ sparqling_calling_codes [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ countryCodes. xq [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ countryCodes2. xq [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ countryCodes1. xq [5] http:/ / en. wikipedia. org/ wiki/ ISO_3166-1_alpha-3 [6] http:/ / en. wikipedia. org/ wiki/ ISO_3166-1_alpha-2 [7] http:/ / en. wikipedia. org/ wiki/ ISO_3166-1_numeric [8] http:/ / en. wikipedia. org/ wiki/ List_of_IOC_country_codes [9] http:/ / en. wikipedia. org/ wiki/ List_of_international_license_plate_codes [10] http:/ / en. wikipedia. org/ wiki/ Country_code_top-level_domain [11] http:/ / en. wikipedia. org/ wiki/ International_calling_code [12] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Scrape/ wikicallingcodes. xq [13] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ wikicallingcodesrdf. xq
Special Characters
272
Special Characters
Motivation
You want to control where you put newlines and quote characters in your output.
Method
We will create XQuery variables (referents) to the decimal encoded character values using the &#NN' notation where NN is the decimal number for this character in the character set. We can then add these variables anywhere in the output stream.
Example Program
In this example we will create a variable $nl had have it refer to the newline character. We will then put this in the middle of a string. xquery version "1.0"; let $nl := " " let $quote := """ let $string := concat("Hello", $nl, "World") return $string Returns: Hello World The following shows how both quote and newline special characters can be created. let $nl := " " let $quote := """ let $string := concat($quote, "Hello", $nl, "World", $quote) return $string Returns: "Hello World" Note that the string length of these variables string-length($nl) and string-length($quote) is only one character.
273
Splitting Files
Motivation
You have a single large XML document with many consistent records in it. You want to split it into many smaller documents so that each can be edited by a separate user. There are many good reasons to split large files up. Some have to do with how much data you want to load into an editor at a time or how you want to publish individual files to a remote site. eXist and many other systems do versioning and keep date/time stamps for each file. Using smaller files these functions may be easier to do.
Method
We will create an XQuery that will iterate through all the records in the document. For each record we will use the XQuery function to store a document in a collection. The format of this function is: xmldb:store($collection, $filename, $data) Where: $collection is a string that holds the path to the collection we will be storing the data for each record. For example '/db/test/data' $filename is the name of the file. The name can either be derived from the data or it can be generated by a sequence counter in the split query. For example 'Hello.xml" or "1.xml". $data is the data we will be storing into the file
Sample XQuery
xquery version "1.0"; let $input-document := '/db/test/input.xml' let $collection := '/db/test/terms' (: the login used must have write access to the collection :) let $output-collection := xmldb:login($collection, 'my-login',
Splitting Files 'my-password') return <SplitResults>{ for $term-data in doc($input-document)/root/row (: For brevity we will create a file name with the term name. Change this to be an ID function if you want :) let $term-name := $term-data/Term/text() let $documentname := concat($term-name, '.xml') let $store-return := store($collection, $documentname, $term-data) return <store-result> <store>{$term-name}</store> <documentname>{$documentname}</documentname> </store-result> }</SplitResults>
274
Splitting Files <person-name>John Doe</person-name> ... </item> It is a best practice to make sure that items do not already have an ID element. for $item at $count in $items[not(id)] update insert <id>{$count}</id> preceding $item/person-name This prevents duplicate ids from being added if the script gets run twice. You can also modify this to start the count one higher then the largest id in a collection. (: get the largest ID in the collection :) let $largest-id := max( collection($my-collection)/*/id/text() ) let $offset := $largest-id + 1 for $item at $count in $items[not(id)] update insert <id>{$count + $offset}</id> preceding $item/person-name
275
References
The split pattern is documented in the Enterprise Pattern Integration [1] Web site. Note that pattern is called "Splitter" dispite the fact that the name in the URL is "Sequencer". Also note that the size of the file you select to load into the client has a large impact on the way that concurrent edits are performed. This has a large impact on what data needs to be locked for editing. See XRX Locking Grain Design [2] for more information.
References
[1] http:/ / www. eaipatterns. com/ Sequencer. html [2] http:/ / www. oreillynet. com/ xml/ blog/ 2008/ 05/ xrx_locking_grain_design. html
Subversion
276
Subversion
Motivation
You want to be able to access a Subversion (SVN) repository, including checking out the repository's files directly into the eXist database and committing changed files back to the repository, using XQuery.
Method
A subversion XQuery module has been added to the bleeding edge development version of eXist 1.5. You can use it to query remote subversion servers, and even to check out a remote repository to store the repository's contents in the database. (If you do check out a repository, note that the subversion repository's files, including its many ".svn" files, will be stored directly in your database.) As of May 2011, the subversion module can perform most, but not all, common subversion functions.
Installation Steps
Building the subversion extension
As with all eXist extensions that are not enabled by default, you need to instruct eXist's build process to include the extension. You should first copy the file $EXIST_HOME/extensions/build.properties to a new file, called $EXIST_HOME/extensions/local.build.properties. This local file will be used by the build process, but it will be ignored by your subversion client so that you don't accidentally commit it to the eXist repository. You should now locate the following line: #SVN extension include.feature.svn = false Change false to true: include.feature.svn = true Save the local.build.properties file. With these changes you must now rebuild (i.e. recompile) eXist so that the subversion extension is included in eXist's jar files.
Save conf.xml. Now you can start eXist, and the subversion module will now be ready for you to use. You can build the subversion function documentation at http:/ / localhost:8080/ exist/ admin/ admin. xql?panel=fundocs and then accessing http://localhost:8080/exist/functions/subversion. You should now be able to test the subversion XQuery functions. This should look very similar to the function listings on the eXist demo site here: http://demo.exist-db.org/exist/xquery/functions.xql [1]
Subversion
277
Current Status
Subversion repositories can be accessed over HTTP and HTTPS, both anonymously and with username/password authentication. The following functions have been tested to work: subversion:checkout($repository-uri as xs:string, $database-path as xs:string) xs:long subversion:checkout($repository-uri as xs:string, $database-path as xs:string, $login as xs:string, $password as xs:string) xs:long subversion:get-latest-revision-number($repository-uri as xs:string, $login as xs:string, $password as xs:string) xs:long subversion:info($database-path as xs:string) element() subversion:list($repository-uri as xs:string) element() subversion:log($repository-uri as xs:string, $login as xs:string, $password as xs:string, $start-revision as xs:integer?, $end-revision as xs:integer?) element() subversion:status($database-path as xs:string) element() subversion:update($database-path as xs:string) xs:long subversion:update($database-path as xs:string, $login as xs:string, $passwrod as xs:string) xs:long subversion:add($database-path as xs:string) empty() The following works under some cases but has buffer errors for some sizes of commits: subversion:commit($database-path as xs:string, $message as xs:string?, $login as xs:string, $password as xs:string) xs:long The following functions are not yet confirmed to work and are still being tested: subversion:clean-up($database-path as xs:string) empty() subversion:lock($database-path as xs:string, $message as xs:string?) empty() subversion:revert($database-path as xs:string) empty() subversion:unlock($database-path as xs:string) empty()
Examples
Querying a Remote Repository
subversion:get-latest-revision-number() The subversion:get-latest-revision-number() function queries the remote SVN repository, returning the latest revision number. For example:
xquery version "1.0"; import module namespace subversion = "http://exist-db.org/xquery/versioning/svn"; let $repository-uri := xs:anyURI('https://exist.svn.sourceforge.net/svnroot/exist/trunk/eXist/webapp/eXide') let $username := '' let $password := '' return subversion:get-latest-revision-number($repository-uri, $username, $password)
Subversion
278
This query returns the following result: 14458 subversion:info() Once you have done a checkout of a resource from subversion you can query that resource locally and find out more about it. subversion:info('/db/apps/faqs/data') This will return: <info uri="/db/cms/apps/faqs/data"> <info local-path="/db/apps/faqs/data" URL="https://www.example.com/repo/trunk/db/apps/faq/data" Repository-UUID="db6794ef-7b42-44a9-8912-f63d0efeae0f" Revision="10" Node-Kind="dir" Schedule="normal" Last-Changed-Author="dmccreary" Last-Changed-Revision="8" Last-Changed-Date="Thu Sep 01 15:03:04 CDT 2011"/> subversion:list() The subversion:list() function lists the contents of a remote repository, returning the results as an XML node:
xquery version "1.0"; let $repository-uri := xs:anyURI('https://exist.svn.sourceforge.net/svnroot/exist/trunk/eXist/webapp/scripts/') return subversion:list($repository-uri)
This script will return the following result: <entries> <entry type="directory">edit_area</entry> <entry type="directory">jquery</entry> <entry type="directory">openid-selector</entry> <entry type="directory">syntax</entry> <entry type="directory">yui</entry> <entry type="file">fundocs.js</entry> <entry type="file">main.js</entry> <entry type="file">prototype.js</entry> </entries>
Subversion subversion:log() The subversion:log() function queries the remote SVN repository, returning the log of changes as an XML node. For example, this query will return show the log of changes between two arbitrary revision numbers (note that substituting empty nodes () for $start-revision and/or $end-revision will return a more open-ended log of revisions):
xquery version "1.0"; let $repository-uri := xs:anyURI('https://exist.svn.sourceforge.net/svnroot/exist/trunk/eXist/webapp/eXide') let $username := '' let $password := '' let $start-revision := 14300 let $end-revision := 14350 return subversion:log($repository-uri, $username, $password, $start-revision, $end-revision)
279
The results of this query are as follows (note that the @revtype values are 'A' for item added, 'D' for item deleted, 'M' for item modified, and 'R' for item replaced):
<log uri="https://exist.svn.sourceforge.net/svnroot/exist/trunk/eXist/webapp/eXide" start="14300"> <entry rev="14331" author="wolfgang_m" date="2011-04-29T07:00:54.297-04:00"> <message>[feature] eXide - a web-based XQuery IDE for eXist. Features: fast syntax highlighting, ability to edit huge XQuery files, code completion for functions and variables, code templates, powerful navigation, on-the-fly compilation, generation of app skeletons, integration with app repository... This is the initial checkin of eXide.</message> <paths> <path revtype="A">/trunk/eXist/webapp/eXide/templates</path> <path revtype="A">/trunk/eXist/webapp/eXide/collections.xql</path> <path revtype="A">/trunk/eXist/webapp/eXide/session.xql</path> ....etc.... <path revtype="A">/trunk/eXist/webapp/eXide/scripts/ace/cockpit.js</path> <path revtype="A">/trunk/eXist/webapp/eXide/index.html</path> </paths> </entry> <entry rev="14346" author="wolfgang_m" date="2011-04-30T08:35:23.395-04:00"> <message>[website] eXide: fixed completion popup window (support mouse, extra "close" link if popup looses focus); improved auto-indent in editor after { and (.</message> <paths> <path revtype="M">/trunk/eXist/webapp/eXide/src/mode-xquery.js</path> <path revtype="M">/trunk/eXist/webapp/eXide/src/util.js</path> <path revtype="M">/trunk/eXist/webapp/eXide/eXide.css</path>
Subversion
</paths> </entry> </log>
280
Getting the Last 10 Commit Messages The log function can be combined with the get-latest-revision-number function to get the last 10 commit messages in the system. let $latest-version := subversion:get-latest-revision-number($repo-url, $svn-account, $svn-password) (: if we have more than 10 revisions then get them all, else start with one :) let $start := if ($latest-version gt 10) then $latest-version - 10 else 1 return <last-10-commit-messages> {subversion:log($repo-url, $svn-account, $svn-password, $start , $latest-version)//*:message} </last-10-commit-messages>
let $repository-uri := xs:anyURI('https://exist.svn.sourceforge.net/svnroot/exist/trunk/eXist/webapp/functions/') let $destination-path := '/db/svn' let $version := subversion:checkout($repository-uri, $destination-path)
return concat('Revision ', $version, ' successfully checked out to collection ', $destination-path)
This returns: Revision 14457 successfully checked out to collection /db/svn The /db/svn collection will now contain the following files: .svn (collection) controller.xql filter.xql functions.xql
Subversion subversion:add() After you have run a checkout you are now ready to do a subversion:commit() or an subversion:add(). The both of these functions take a single argument which is the database collection path you want to send to your subversion server. subversion:update() Assuming we have already checked out a repository to /db/svn, we can update the working copy to the latest revision using the subversion:update() function: xquery version "1.0"; let $working-copy := '/db/svn' let $update := subversion:update($working-copy) return concat('Successfully updated to revision ', $update)
281
This script will return the following result: Successfully updated to revision 14457 You can also get updates from a secure site by using subversion:update($working-copy, $user, $password) subversion:status() The subversion:status() function returns the status of files in the local working copy. For example, assuming you have checked out the repository https:/ / exist. svn. sourceforge. net/ svnroot/ exist/ trunk/ eXist/ webapp/ functions/ to the /db/svn collection, you can get the status of its files with the following query: xquery version "1.0"; let $destination-path := '/db/svn' return subversion:status($destination-path)
References
[1] http:/ / demo. exist-db. org/ exist/ xquery/ functions. xql
Sudoku
282
Sudoku
Sudoku solver in XQuery
A Puzzle
A sudoku puzzle can be expressed in matrix form. Here is part of one from a Times book of sudokus. <?xml version="1.0" encoding="UTF-8"?> <sudoku name="Times 1 p1"> <matrix> <row> <col/> <col>6</col> <col>1</col> <col/> <col>3</col> <col/> <col/> <col>2</col> <col/> </row> <row> <col/> <col>5</col> <col/> <col/> <col/> <col>8</col> <col>1</col> <col/> <col>7</col> </row> <row> <col/>
Sudoku seconds-from-duration($t)) * 1000 ) }; let let let let let let let let let $url := request:get-parameter('url',()) $sudoku :=doc($url)/sudoku $p := $sudoku/matrix $pc := su:matrix-to-cells($p) $start := util:system-time() $ps := su:solve($pc) $finish := util:system-time() $elapsedms := local:duration-as-ms($finish - $start) $s := su:cells-to-matrix($ps)
283
return <div> <h1>Solving Sudoku problem {string($sudoku/@name)}</h1> <table border = '1'> <tr> <td>{su:matrix-to-table($p)}</td> <td>{su:matrix-to-table($s)}</td> </tr> </table> <p>Elapsed time in milliseconds : {$elapsedms}</p> </div>
Functions
This module defines the necessary functions to support a brute force, depth-first search of the solution tree. Two representations of a sudoku puzzle are used here: nested columns within rows - element(matrix) - the input format list of cells with explicit row and column numbers - element(cells) The algorithm starts with the cell list representation. The number of possible solutions to every empty square is calculated. If there there is a cell with only one value, that cell is added to the list of cells and the algorithm continues. If there is more than one possible value for a cell, the algorithm iterates over the possible values, positing that each in turn is the correct value. If there is no possible value, that partial solution is infeasible and that solution path is abandoned, returning null and the next possible cell value will be tried. declare function su:matrix-to-table($s as element(matrix)) as element(table) { <table class="sudoku"> { for $r in $s/row return <tr> { for $c in $r/col return <td>{string($c)}</td> } </tr>
Sudoku } </table> }; declare function su:matrix-to-cells($s as element(matrix)) as element(cell)* { for $i in (1 to 9) for $j in (1 to 9) let $c := $s/row[$i]/col[$j] return if ($c/text()) then <cell row='{$i}' col='{$j}'>{string($c)}</cell> else () }; declare function su:cells-to-matrix($s as element(cell)*) as element(matrix) { <matrix> { for $i in (1 to 9) return <row> { for $j in (1 to 9) let $c := $s[@row = $i][@col = $j] return <col>{string($c)}</col> } </row> } </matrix> }; declare function su:block($s as element(cell)*, $i as xs:integer, $j as xs:integer ) as element(cell)+ { (: return the block of 9 cells containing $i, $j :) let $tci := (($i - 1) idiv 3 * 3 ) + 1 let $tcj := (($j - 1) idiv 3 * 3 ) + 1 return $s[@row = ($tci to $tci + 2)][@col = ($tcj to $tcj + 2)] }; declare function su:row($s as element(cell)*,$i as xs:integer) as element(cell)+ { (: return the cells in row $i :) $s[@row = $i] }; declare function su:col($s as element(cell)* ,$j as xs:integer) as element(cell)+{
284
Sudoku (: return the cells in column $j :) $s[@col = $j] }; declare function su:values($s as element(cell)*, $i as xs:integer, $j as xs:integer) as xs:integer* { (: return the set (sequence) of values in a cell's row, column and block :) distinct-values( (su:row($s,$i) ,su:col($s,$j) , su:block($s,$i,$j) )) }; declare function su:missing-values($s as element(cell)*,$i as xs:integer,$j as xs:integer) as xs:integer* { (: return the numbers missing from 1 to 9 i.e. the possible values for cell $i , $j :) let $vals := su:values($s,$i,$j) return (1 to 9) [not(. = $vals)] }; declare function su:missing-cells($s as element(cell)*) as element(cells)* { for $i in (1 to 9) for $j in (1 to 9) where empty($s[@row = $i][@col = $j]) return let $m := su:missing-values($s,$i,$j) return <cell row='{$i}' col='{$j}' n='{count($m)}'>{$m}</cell> }; declare function su:best-cell($s as element(cell)*) as element(cell)* { (: return (one of ) the cells with the minimum number of possible values :) let $empty := su:missing-cells($s) let $min := min( $empty/@n) return ($empty[@n = $min])[1] }; declare function su:search-for-solution($s as element(cell)*, $cell as element(cell), $posvalues as xs:string*) { (: recursive search of a set of possible values for a cell :) if (empty($posvalues)) then () else let $pos:= $posvalues[1] (: choose the first :)
285
Sudoku let $posit := <cell row='{$cell/@row}' col='{$cell/@col}'>{$pos}</cell> let $sol := su:solve(($s,$posit)) (: try with this posited value for the cell :) return if ($sol ) (: a solution :) then $sol else (: continue with the rest of the possible values :) su:search-for-solution($s, $cell, subsequence($posvalues,2)) }; declare function su:solve($s as element(cell)*) as element(cell)* { (: solve a sudoku problem - $s is a sequence of cells with values :) let $cell:= su:best-cell($s) return if (empty($cell) ) then $s (: solved :) else if ( $cell/@n=0) (: infeasible :) then () else if ($cell/@n = 1) (: forced move :) then su:solve(($s,$cell)) else (: multiple possible, so do depth-first search :) su:search-for-solution($s, $cell, tokenize($cell, ' ' )) };
286
Execution
With a few problems from the Times book of Sudoku problems: solve Puzzle 1 [1] solve Puzzle 2 [2] solve Puzzle 100 [3] - the last
Discussion
This code requires eXist 1.3 or above to run.
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ su6. xql?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ tp1. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ su6. xql?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ tp2. xml [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ su6. xql?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Sudoku/ tp100. xml
287
Method
Many database store creation dates and last-modified dates along with resources. These dates can be used to see if a local collection is out of sync with a remote collection. An XQuery script can be written that will only list the new files or files that are newer then the creation date on your local collection. For the eXist database, here are the two functions that are used to access the timestamps. xmldb:last-modified($collection, $resource) xmldb:created($collection, $resource) Where: $collection is the path to the collection (xs:string) $resource is the name of the resource (xs:string) For example: let $my-file-last-modified := xmldb:last-modified('/db/test', 'myfile.xml') Will return the date and time that the file myfile.xml in the collection /db/text was last modified. The format of the timestamp is the XML Schema dateTime format [1]: "2009-06-04T07:50:04.828-05:00" For example, this indicates the time is 7:50 am on June the 4th, 2009 for Central Standard Time which is 5 hours behind Coordinated Universal Time (UTC).
Synchronizing Remote Collections if (exists(xmldb:get-child-collections($collection))) then ( for $child in xmldb:get-child-collections($collection) order by $child return (: note the recursion here :) local:collection-last-modified(concat($collection, '/', $child)) ) else () } </collection> }; Note that two attributes are added to each resource. One is the resource id which must be unique in each collection an the other is the date the resource was last modified.
288
Sample Driver
You can call this function by simply passing the collection root you wish to start at. xquery version "1.0"; let $collection := '/db/test' return <last-modified-report> {local:collection-last-modified($collection)} </last-modified-report> This returns the following file:
<last-modified-report> <collection cid="/db/test"> <resource id="get-remote-collection.xq" last-modified="2009-04-29T08:16:06.104-05:00"/> <collection cid="/db/test/views"> <resource id="get-site-mod-dates.xq" last-modified="2009-04-30T09:01:58.599-05:00"/> <resource id="site-last-modified.xq" last-modified="2009-04-30T09:07:10.016-05:00"/> </collection> </collection> </last-modified-report>
289
References
[1] http:/ / www. w3. org/ TR/ 2001/ REC-xmlschema-2-20010502/ #dateTime
TEI Concordance
290
TEI Concordance
Motivation
You want to build a multi-lingual concordance from parallel texts already in the TEI format (see http:/ / www. tei-c. org/)
Architecture
There are three steps in this example: 1. preprocessing the texts to enable easier indexing, which is done in XSLT 2.0 2. querying the text to return a tei:entry (see http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DI.html ) which is done in XQuery 3. processing the tei:entry into HTML which is done in XSLT 1.0 in the browser In this particular example the languages in use are English and te reo Mori. It assumes that structural tags have 'n' attributes with urls pointing to the original source of the data.
<!-- This is a simple stylesheet that inserts word tags around words (and implicitly defines what those words are) -->
TEI Concordance
291
<xsl:element name="w" namespace="http://www.tei-c.org/ns/1.0"> <xsl:attribute name="xml:lang"><xsl:value-of select="$lang"/></xsl:attribute> <xsl:attribute name="lemma"><xsl:value-of select="$normalised"/></xsl:attribute> <xsl:value-of select="."/> </xsl:element>
<xsl:if test="string-length($string) > 0"> <xsl:if test="not(compare(substring($string,1,1),substring($string,2,1))=0)"> <xsl:value-of select="substring($string,1,1)"/> </xsl:if> <xsl:call-template name="normal"> <xsl:with-param name="string" select="substring($string,2)"/> </xsl:call-template> </xsl:if> </xsl:template>
</xsl:stylesheet>
TEI Concordance
292
let $target := 'xml-stylesheet', $content := 'href="teiresults2htmlresults.xsl" type="text/xsl" ' return <TEI> <teiHeader> <!-- substantial header information needs to go here to be well formed TEI --> </teiHeader> <text> <body> <div> { let $collection := '/db/kupu/korero', $q := request:get-parameter('kupu', 'mohio'), $lang := request:get-parameter('reo', 'mi'), $first := request:get-parameter('kotahi', 1) cast as xs:decimal, $last := 25 + $first return <entry xml:lang="{$lang}" n="{$last}"> <form> <orth>{$q}</orth> </form>{ for $word at $count in subsequence(collection($collection)//w[@lemma=$q][@xml:lang=$lang], $first, $last) let $this := $word/ancestor::*[@n][1] let $thisid := $this/@xml:id let $url := $this/@n let $lang := $word/@xml:lang let $that := if ( $this/@corresp ) then ( $this/../../*/*[concat('#',@xml:id)=$this/@corresp] ) else ( "no corresp" ) return <cit n="{$url}" corresp="#{$word/@xml:id}"> {$this} {$that} </cit> }</entry> } </div> processing-instruction {$target} {$content},
document {
TEI Concordance
</body> </text> </TEI> }
293
Transformation to HTML
The TEI is transformed into HTML in the browser following the processing instruction:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:tei="http://www.tei-c.org/ns/1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<html:head>
<html:title><xsl:value-of select="$title"/></html:title>
</html:head>
<xsl:apply-templates select="/tei:TEI/tei:text/tei:body/tei:div/tei:entry"/>
</html:body>
</html:html>
</xsl:template>
<xsl:template match="tei:entry">
<html:div>
<xsl:apply-templates select="tei:cit"/>
</html:div>
TEI Concordance
</xsl:template>
294
<html:div>
<xsl:apply-templates select="node()"/>
</html:div>
<html:hr/>
</xsl:template>
<xsl:template match="tei:p">
<html:div>
<xsl:apply-templates select="node()"/>
style="font-style: italic"></html:a>
</html:div>
</xsl:template>
<xsl:template match="tei:w">
<xsl:choose>
<xsl:when test="concat('#',@xml:id)=../../@corresp">
<xsl:apply-templates select="node()"/>
</html:a></html:span>
</xsl:when>
<xsl:otherwise>
<html:a href="{$url}">
<xsl:apply-templates select="node()"/>
</html:a>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
295
Approach
TEI [1] documents may include date elements in any of the sections of the document - in the meta-data, in the document publication details, in front and back matter as well as in the body of the text. Let's assume that we want a time line showing dates in the text body. We will use the Simile Timeline [2] Javascript API to create a browsable timeline in an HTML page.
TEI Document Timeline Note that there are two path expressions in the above query. The first expression $date/@when extracts the when attribute of the date element. The second path expression $date/text() extracts the body text of the date element, i.e. the text between the begin and end date tags: <date when="1642-08-13">13 August 1642</date>
296
Discussion
TEI dates are generally XML dates which are recognised by the Simile timeline API. However TEI supports the encoding of relative dates such as <date when="--01-01">New Years Day</date> so dates really need filtering using a suitable RegExp. One option is to check the date format with the "castable" XQuery function.
297
We need to access the mixture of elements and text nodes on either side of the target date. For example, preceding this node are a text node ("Cook left.."), a date node and another text node ("He returned .."). Following the target date is the text node ("and remained ..."). We can select these nodes using the preceding-sibling and following-sibling axes: let $nodesbefore := $date/preceding-sibling::node() let $nodesafter := $date/following-sibling::node()
A crude approach to construct a context string is to join the node strings and extract a suitable substring. The text after: let $after := string-join($nodesafter, ' ') let $afterString := substring($after,1,100)
and the text before: let $before := string-join($nodesbefore,' ') let $beforeString := substring($before,string-length($before)101,100)
We can then create an XML fragment with the target date in bold: let $context := <div> {concat('...', $beforeString,' ')} <b>{$date/text()}</b> {concat($afterString,' ...')} </div> Finally the element needs to be serialized and added to the event:
TEI Document Timeline return <event start='{$when}' title='{$when}' > {util:serialize($context,("method=xhtml","media-type=text/html"))} </event> Execute [6]
298
Improved Context
The context is extracted from the parent node without regard to word or sentence boundaries. Splitting on word boundaries would be better. let $nodesafter := $date/following-sibling::node() (: join the nodes, then split on space :) let $after := tokenize(string-join($nodesafter, ' '),' ') (: get the first $scope words :) let $afterwords := subsequence($after,1,$scope) (: join the subsequence of words, and suffix with ellipsis if the paragraph text has been truncated :) let $afterString := concat (' ',string-join($afterwords,' '),if (count($after) > $scope) then '... ' else '') Similarly, the text before the target date: let $nodesbefore := $date/preceding-sibling::node() let $before := tokenize(string-join($nodesbefore,' '),' ') let $beforewords := subsequence($before,count($before) - $scope + 1,$scope) let $beforeString := concat (if (count($before) > $scope) then '... ' else '',string-join($beforewords,' '),' ') Splitting on sentence boundaries would be even better. We can use the pattern '\. ' as the marker. This may not be entirely accurate but false positives will merely shorten the context. The ellipsis is not now needed. $scope now is the number of sentences on either side. let $nodesafter := $date/following-sibling::node() (: join the nodes, then split on the pattern fullstop space :) let $after := tokenize(string-join($nodesafter, ' '),'\. ') (: get the first $scope sentences :) let $afterSentences := subsequence($after,1,$scope) (: join the subsequence of sentences :) let $afterString := concat (' ',string-join($afterSentences,'. '))
Similarly for the beforeString. let $nodesbefore := $date/preceding-sibling::node() let $before := tokenize(string-join($nodesbefore,' '),'\. ')
TEI Document Timeline let $beforeSentences := subsequence($before,count($before) - $scope + 1,$scope) let $beforeString := concat (string-join($beforeSentences,'. '),'. ')
299
Execute [7]
Discussion
In addition, each event could link into the full text of the document. (to do)
Simile API
The definition of the timeline layout uses the SIMILE timeline Javascript API. To define the basic bands: function onLoad(file,start) { var theme = Timeline.ClassicTheme.create(); theme.event.label.width = 400; // px theme.event.bubble.width = 300; theme.event.bubble.height = 300; var eventSource = new Timeline.DefaultEventSource(); var bandInfo = [ Timeline.createBandInfo({ eventSource: eventSource, theme: theme, trackGap: 0.2, trackHeight: 1, date: start, width: "90%", intervalUnit: Timeline.DateTime.YEAR, intervalPixels: 45 }), Timeline.createBandInfo({ date: start, width: "10%", intervalUnit: Timeline.DateTime.DECADE, intervalPixels: 50 }) ]; bandInfo[1].syncWith = 0; bandInfo[1].highlight = true;
300
Timeline.create(document.getElementById("my-timeline"), bandInfo); Timeline.loadXML("dates.xq?file="+file, function(xml, url) { eventSource.loadXML(xml, url); }); } Note that the bands are set for YEAR and DECADE which are appropriate for historical texts. The function has two parameters: the source file and the start year. The events are generated by a call to the transformation script in the previous section.
Timeline.loadXML("dates.xq?file="+file, function(xml, url) { eventSource.loadXML(xml, url); });
Full script
xquery version "1.0";
let $file:= request:get-parameter('file','') let $data-collection := '/db/Wiki/TEI/docs' let $tei-document := concat($data-collection, '/', $file) let $doc := doc($tei-document) (: get the title and author from the titleStmt element :) let $header := $doc//tei:titleStmt (: there may be several titles, differentiated by the type property just take the first :) let $doc-title := string(($header/tei:title)[1])
(: get the start date :) let $orderedDates := for $date in $doc//tei:body//tei:date/@when order by $date return $date let $start := $orderedDates[1]
301
return <html> <head> <title>TimeLine: {$doc-title}</title> <script src="http://simile.mit.edu/timeline/api/timeline-api.js" type="text/javascript"></script> <script <![CDATA[ function onLoad(file,start) { var theme = Timeline.ClassicTheme.create(); theme.event.label.width = 400; // px theme.event.bubble.width = 300; theme.event.bubble.height = 300; type="text/javascript">
var bandInfo = [ Timeline.createBandInfo({ eventSource: theme: trackGap: trackHeight: date: width: intervalUnit: eventSource, theme, 0.2, 1, start, "90%", Timeline.DateTime.YEAR,
intervalPixels: 50 })
302
Examples
Beaglehole Timeline [8] Buck [9] Dates in this encoding are confined to the Bibliography and are publication rather than subject events.
Discussion
Simile Timeline has a problem displaying many events on closely related dates, so not all events may appear on the timeline.
References
[1] http:/ / www. tei-c. org/ [2] [3] [4] [5] [6] [7] [8] [9] http:/ / www. simile-widgets. org/ timeline/ http:/ / simile. mit. edu/ wiki/ How_to_Create_Event_Source_Files http:/ / www. nzetc. org/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ TEI/ dates-ex2. xq?file=BeaDisc. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ TEI/ dates. xq?file=BeaDisc. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ TEI/ dates3. xq?file=BeaDisc. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ TEI/ timeline2. xq?file=BeaDisc. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ TEI/ timeline2. xq?file=BucExpl. xml
Conversion to RDF
Taking as the starting point the XML documents defining the three tables: Emp [1] Dept [2] Salgrade [3] These documents are converted to RDF using an XQuery script guided by a mapping file. The generated RDF is cached. and accessed by an XQuery script to de-reference the resource URIs. Individual resource URIs are re-written in Apache to calls on an XQuery script which retrieves the fragment of RDF from the cached file. Thus; an employee [4] a department [5] The full RDF [6] (need to change the rewrite rule to fix this strange uri) This should be replaced by a query on the SPARQL endpoint.
303
RDF browsing
This RDF can be browsed with and RDF browser such as Disco Explorer [9] or Tabulator as an add-in to Opera [10] or Firefox [11]
[7]
[8]
OpenLink Data
References
[1] [2] [3] [4] [5] [6] [7] [8] [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ emp. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ dept. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ salgrade. xml http:/ / www. cems. uwe. ac. uk/ empdept/ emp/ 7369 http:/ / www. cems. uwe. ac. uk/ empdept/ dept/ 30 http:/ / www. cems. uwe. ac. uk/ empdept/ all/ all http:/ / www4. wiwiss. fu-berlin. de/ rdf_browser/ http:/ / demo. openlinksw. com/ DAV/ JS/ rdfbrowser/ index. html http:/ / demo. openlinksw. com/ rdfbrowser2/
[10] http:/ / widgets. opera. com/ widget/ 5053/ [11] http:/ / dig. csail. mit. edu/ 2007/ tab/
Method
By default, XML files use standard ISO dateTime structures to store temporal information. There are two main XML data types related to storing time: xs:date - for storing just the date in YYYY-MM-DD format. xs:dateTime - for storing both the date and time. There are many other structures for storing just year, month, day and time etc. but this example will only cover dates and dateTimes.
Time Based Queries Source from events-at-time.xq {: get a URL parameter to this XQuery :) let $date := xs:date(request:get-parameter('date', '')) (: create a sequence of all events :) let $events := collection('/db/apps/timelines/data')//event return {: return all events that start before the date AND end after the date :) for $event in $events[ xs:date(./start-date/text()) lt $date and xs:date(./end-date/text()) gt $date ] return $event
304
You can also set up very fast searches based on the date-time structures even for large collections of 100,000 items after you learn how to configure range indexes on XML xs:dateTime structures. See: http:/ / www. w3. org/ TR/ xmlschema-2/#dateTime how date-time structures work.
References
The following page has instructions on indexing: http://demo.exist-db.org/exist/indexing.xml And make sure to read section 2.2 on range indexes. I would also use the xs:date and xs:dateTime structures in your range indexes. Your collection configuration file (see http:/ / demo. exist-db. org/ exist/ indexing. xml#idxconf) might have the following lines if you are tracking document creation and modified dateTimes: <create qname="start-date" type="xs:date"/> <create qname="end-date" type="xs:date"/> <create qname="created-dateTime" type="xs:dateTime"/> <create qname="last-modified-dateTime" type="xs:dateTime"/> Note that all eXist collections and resource also have both these dates in their metadata. You can use the xmdb module to get these timestamps. http:/ / demo. exist-db. org/ exist/ functions/ xmldb/ created http:/ / demo. exist-db. org/ exist/ functions/ xmldb/ last-modified
305
Other Resources
Timeline mashups in this Wikibook Gantt Chart using JQuery [1] Timeline SVG reports Gantt Charts from XML data using Anychart [2]
References
[1] http:/ / plugins. jquery. com/ project/ ganttView [2] http:/ / anychart. com/ products/ anygantt/ gallery/
Method
We will write a function that compares the timestamps of the items in two lists.
306
Time Comparison with XQuery <div class="left"> <h2>List 1</h2> {for $item in $list1/item return <div>{$item/text()} dateTime={string($item/@dateTime)}</div>} </div> <div class="right"> <h2>List 2</h2> {for $item in $list2/item return <div>{$item/text()} dateTime={string($item/@dateTime)}</div>} </div> <br/> <p>The pink items are older items.</p> <div class="left"> <h2>Items on 2 Older Then 1</h2> {local:older($list1, $list2)} </div> <div class="right"> <h2>Items on 1 Older Then 2</h2> {local:older($list2, $list1)} </div> </body> </html> Execute [1]
307
Collating
Alternatively, two ordered lists can be collated to derive a set of updates. Here the items are wrapped in a div to carry the added information about the merge. Items in list1 but not list2 are flagged as new, items in list 2 but not list 1 as to be deleted and items which are newer in list 1 than list 2 as newer.
declare function local:merge($a, $b if (empty($a) and empty($b)) then () else then else then else if (empty ($b) or $a[1] lt $b[1]) (<div class="add">{$a[1]}</div>, local:merge(subsequence($a, 2), $b)) if (empty($a) or $a[1] gt $b[1]) (<div class="delete">{$b[1]}</div>,local:merge($a, subsequence($b,2))) (<div class="{ if (xs:dateTime($a[1]/@dateTime) gt xs:dateTime($b[1]/@dateTime)) then "newer" else "older"}"> {$a[1]} </div>, local:merge(subsequence($a,2), subsequence($b,2)) ) }; as node()*) as node()* {
Time Comparison with XQuery declare option exist:serialize "method=xhtml media-type=text/html"; let $list1 := <list> <item dateTime="2009-06-01T11:59:00.000-05:00">apples</item> <item dateTime="2009-02-01T11:59:00.000-05:00">bananas</item> <item dateTime="2009-02-01T11:59:00.000-05:00">carrots</item> <item dateTime="2009-02-01T11:59:00.000-05:00">cabbage</item> <item dateTime="2009-02-01T11:59:00.000-05:00">eggplant</item> <item dateTime="2009-02-01T11:59:00.000-05:00">grapes</item> </list> let $list2 := <list> <item dateTime="2009-01-01T11:59:00.000-05:00">apples</item> <item dateTime="2009-02-01T11:59:00.000-05:00">bananas</item> <item dateTime="2009-03-01T11:59:00.000-05:00">carrots</item> <item dateTime="2009-02-01T11:58:00.000-05:00">eggplant</item> <item dateTime="2009-02-01T12:00:00.000-05:00">grapes</item> <item dateTime="2009-04-01T11:59:00.000-05:00">oranges</item> </list> return <html> <head> <style language="text/css"> <![CDATA[ body {font-family: Ariel,Helvetica,sans-serif; font-size: medium;} h2 {padding: 3px; margin: 0px; text-align: center; font-size: large; background-color: silver;} .left, .right {border: solid black 1px; padding: 5px;} .newer{background-color: lightgreen;} .older{background-color: lightred;} .delete{background-color: red;} .add{background-color: green;} .left {float: left; width: 390px} .right {margin-left: 410px; width: 390px} ]]> </style> </head> <body> <h1>Update Report</h1> <div class="left"> <h2>List 1</h2>
308
Time Comparison with XQuery {for $item in $list1/item return <div>{$item/text()} dateTime={string($item/@dateTime)}</div>} </div> <div class="right"> <h2>List 2</h2> {for $item in $list2/item return <div>{$item/text()} dateTime={string($item/@dateTime)}</div>} </div> <br/> <p>Green are new, light green are newer and red to be removed</p> <div class="left"> <h2>Merged Lists</h2> {local:merge($list1/item, $list2/item)} </div> </body> </html> Execute [2]
309
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ compareTimes. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ compareTimes2. xq
Timelines of Resource
Motivation
You want to create a timeline using the creation and modification dates of a collection.
Method
Many file systems and XML databases such as the eXist database automatically keep two dates associated with each resource. One is the creation date and the other is the date the resource was last modified. These dates are required for many systems that preform incremental backups of resources. We can use these dates to automatically create a timeline report of one or more collections. These reports can server as audit trails and can help you find out who changed what and when. Here are two of the functions that we will use in these examples:
xmldb:last-modified($collection as item(), $resource as xs:string) xs:dateTime? xmldb:created($collection as xs:string, $resource as xs:string) xs:dateTime
Note that you can also use the created function with a single parameter to see when a collection was created. Our query will take a single parameter, the database path expression to the collection we wish to create a timeline for. Our timeline will then display the creation and modification dates for this collection. Here is a sample query fragment that lists all child of the resources in a collection and formats the data according to the timeline event XML structure.
Timelines of Resource
let $collection := '/db/test'
310
return <data date-time-format="iso8601">{ for $child in xmldb:get-child-resources($collection) return ( <event start="{xmldb:created($collection, $child)}" isDuration="false">{$child} created</event>, <event start="{xmldb:last-modified($collection, $child)}" isDuration="false">{$child} last-modified</event> ) }</data>
311
with a 'front-door' function to get started: declare function myfn:fib-itr($n as xs:integer) as xs:integer? { if ($n < 0) then () else if ($n = 0) then 0 else myfn:fib-itr-x($n, 0, 1) }; Iterative solutions in which variables are updated look rather messy by comparison with this tail-recursive formulation, a style essential to many algorithms in XQuery.
Timing
Just how much worse is the recursive formulation? We need to time the calls, and now we really could do with those higher order functions so we can pass either fib function to a timer function to execute. Step in eXists function modules. These raise XQuery from an XML query language to a viable web application platform. The util module provides two functions:
* util:function(qname,arity) to create a function template which can be passed to * util:call (function, params)to evaluate the function
so we can create the recursive function template with: let $call-fib-recur := util:function(QName("http:example.com/myfn","myfn:fib-recur"),1)
Timing Fibonacci algorithms The timer function takes a function, a sequence of the parameters to be passed to the function and a repetition number. The timing is based on system time and the time difference converted to seconds and then to milliseconds: declare function myfn:time-call($function as function, $params as item()* ,$reps as xs:integer ) as xs:decimal { let $start := util:system-time() let $result := for $i in 1 to $reps return util:call($function, $params) let $end := util:system-time() let $runtimems := (($end - $start) div xs:dayTimeDuration('PT1S')) * 1000 return $runtimems div $reps };
312
Results as a table
This data structure can be transformed to a table, iterating over the tuples. declare function myfn:dataset-as-table($dataset ) as element(table) { <table> <tr> {for $data in $dataset/*[1]/* return <th>{name($data)}</th> }
Timing Fibonacci algorithms </tr> {for $tuple in $dataset/* return <tr> {for $data in $tuple/* return <td>{string($data)}</td> } </tr> } </table> }; Here the XPath name() function is used to convert from the tag names to strings. This reflection allows very generic functions to be written and is a key technique for making the transition from problem-specific structures to generic functions. Note that the dataset has not been typed. This is because the function is written with minimal requirements of the structure which would require a permissive schema language to express.
313
Results as a graph
For graphing, this basic matrix could be imported directly into Excel, or, thanks to the wonderful GoogleCharts, to a simple line graph. Selected columns of the dataset are extracted and joined with commas, then all datasets joined with pipes. declare function myfn:dataset-as-chart($dataset, $vars as xs:string+) as element(img) { let $series := for $var in $vars return string-join( $dataset/*/*[name(.) = $var],",") let $points := string-join($series ,"|" ) let $chartType := "lc" let $chartSize := "300x200" let $uri := concat("http://chart.apis.google.com/chart?", "cht=",$chartType,"&chs=",$chartSize,"&chd=t:",$points) return <img src="{$uri}"/> };
314
else myfn:fib-recur($n - 1)
let $chartSize :=
let $uri := concat("http://chart.apis.google.com/chart?", "cht=",$chartType,"&chs=",$chartSize,"&chd=t:",$points) return <img src="{$uri}"/> }; declare function myfn:dataset-as-table($dataset ) as element(table) { <table> <tr> {for $data in $dataset/*[1]/* return <th>{name($data)}</th>
315
316
Execution
execute [1] (with preset limits) on the CEMS server.
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Literate/ fibgraph. xq
Transformation idioms
317
Transformation idioms
Motivation
Document transformation using the basic typeswitch statement applies the same transformation to an element independent of where it occurs in the document. The transformation also preserves document order since it processes elements in document order. In comparison with XSLT, XQuery lacks some mechanisms such as modes, priority and numbering. This article addresses some of these limitations.
Example
The example uses a custom XML schema to markup the contents of the book "Search: The Graphics Web Guide", Ken Coupland, a compendium of websites. This document is formatted with a site-specific schema. The document contains site elements which are tagged with a category, and also category elements which provide a commentary on the category. For comparison this dataset is used in a student case study which uses XSLT for transformations. [Sample file [1]]
Identity Transformation
module namespace coupland = "http://www.cems.uwe.ac.uk/xmlwiki/coupland"; (: conversion module generated from a set of tags :) declare function coupland:convert($nodes as node()*) as item()* { for $node in $nodes return typeswitch ($node) case element(websites) return coupland:websites($node) case element(sites) return coupland:sites($node) case element(site) return coupland:site($node) case element(uri) return coupland:uri($node) case element(name) return coupland:name($node) case element(description) return coupland:description($node) default return coupland:convert-default($node) }; declare function coupland:convert-default($node as node()) as item()* { $node }; declare function coupland:websites($node as element(websites)) as item()* { element websites{ $node/@*,
Transformation idioms coupland:convert($node/node()) } }; declare function coupland:sites($node as element(sites)) as item()* { element sites{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:site($node as element(site)) as item()* { element site{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:uri($node as element(uri)) as item()* { element uri{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:name($node as element(name)) as item()* { element name{ $node/@*, coupland:convert($node/node()) } }; declare function coupland:description($node as element(description)) as item()* { element description{ $node/@*, coupland:convert($node/node()) } };
318
Transformation idioms
319
Default action
Change the compare-default function to provide a different default action. For example: declare function coupland:convert-default ($node) if ($node instance of element()) then coupland:convert($node/node()) else $node }; {
would include the content of the node but remove the tag and its attributes.
Ignore element
The 'class' element is not needed: declare function coupland:class($node as element(class)) as item()* { () };
Define transformation
The image element should be transformed to an html img elment using the uri as the source: declare function coupland:image($node as element(image)) as item()* { element div { element img { attribute src { $node} } } };
Transformation idioms
320
Reordering elements
Each site is to be rendered in the order name, uri and then the rest of the sub-elements: declare function coupland:site($node as element(site)) as item()* { element div{ element div { coupland:convert($node/name), coupland:convert($node/uri) } , coupland:convert($node/(node() except (uri,name))) } };
Numbering categories
The xsl:number instruction provides a mechanism to generate hierarchical section numbers. This instruction is very powerful. In specific cases we can generate numbers using functions. For example to number the categories we can use this function to create a number for a node in a sequence of siblings. Note that the number is based on the order of nodes in the original document, not the transformed document (as does xsl:number) . declare function coupland:number($node) as xs:string { concat(count($node/preceding-sibling::node()[name(.) = name($node)]) + 1,". ") };
321
Parameterisation
The transformation can clearly be applied to different documents, but often the same transformation is to be used in different contexts. XSLT provides parameters and variables which are global to all templates. In XQuery we can either declare global variables in the module or pass one or more parameters around the functions ( module generation is helpful here). ....
Generating an index
XSLT uses the mode mechanism to allow the same template to be processed in multiple ways. A common use case is where the same transformation must generate both an index and the content. Several approaches suggest themselves. We could mimic the XSLT approach by passing an additional mode parameter in the calls and choose which transformation to apply in each function. Alternatively we append the mode to the function name. It is more difficult to use context (either global or passed) because the mode will need to be updated. The simplest approach is to use use two typeswitch transformation and combine the results at a higher level. This clearly separates the two modes of transformation. The technique of module generation is helpful here.
Complex transformation
The overall HTML document can be structured in the transformer for the root element. The page uses the blueprint stylesheets. Each category of site is rendered, with the sites which are classified in that category.
declare function coupland:websites($node as element(websites)) as item()* { (: the root element so convert to html :) <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Web Sites by Coupland</title> <link rel="stylesheet" href="../../css/blueprint/screen.css" type="text/css" media="screen, projection"/> <link rel="stylesheet" href="../../css/blueprint/print.css" type="text/css" media="print"/> <!--[if IE ]><link rel="stylesheet" href="../../css/blueprint/ie.css" type="text/css" media="screen, projection" /><![endif]--> <link rel="stylesheet" href="screen.css" type="text/css" media="screen"/> </head> <body> <div class="container"> { for $category in $node/category order by $category/class return <div>
Transformation idioms
<div class="span-10"> {coupland:convert($category)} </div> <div class="span-14 last"> {for $site in $node/sites/site[category=$category/class] order by ($site/sortkey,$site/name)[1] return coupland:convert($site) } </div> <hr /> </div> } </div> </body> </html> };
322
Completed transformation
The full XQuery module now looks like this:
module namespace coupland = "http://www.cems.uwe.ac.uk/xmlwiki/coupland"; (: conversion module generated from a set of tags
:)
declare function coupland:convert($nodes as node()* as node()?) as item()* { for $node in $nodes return typeswitch ($node) case element(category) return coupland:category($node) case element(class) return coupland:class($node) case element(description) return coupland:description($node) case element(em) return coupland:em($node) case element(hub) return coupland:hub($node) case element(image) return coupland:image($node) case element(name) return coupland:name($node) case element(p) return coupland:p($node) case element(q) return coupland:q($node) case element(site) return coupland:site($node) case element(sites) return coupland:sites($node) case element(sortkey) return coupland:sortkey($node) case element(subtitle) return coupland:subtitle($node) case element(uri) return coupland:uri($node) case element(websites) return coupland:websites($node)
Transformation idioms
323
declare function coupland:category($node as element(category) as node()?) as item()* { if ($node/parent::node() instance of element(site)) then () else element div{ $node/@*, coupland:convert($node/node()) } };
declare function coupland:description($node as element(description) as node()?) as item()* { element div{ $node/@*, coupland:convert($node/node()) } };
declare function coupland:em($node as element(em) as node()?) as item()* { element em{ $node/@*, coupland:convert($node/node()) } };
declare function coupland:hub($node as element(hub) as node()?) as item()* { element hub{ $node/@*, coupland:convert($node/node())
Transformation idioms
} };
324
declare function coupland:image($node as element(image) as node()?) as item()* { element div { element img { attribute src { $node} } } };
declare function coupland:name($node as element(name) as node()?) as item()* { if ($node/parent::node() instance of element(site)) then element span { attribute style {"font-size: 16pt"}, $node/@*, coupland:convert($node/node()) } else element h1{ $node/@*, coupland:number($node/parent::node()), coupland:convert($node/node()) } };
Transformation idioms
element div{ element div { coupland:convert($node/name), coupland:convert($node/uri) } , coupland:convert($node/(node() except (uri,name))) } };
325
declare function coupland:sites($node as element(sites) as node()?) as item()* { for $site in $node/site order by $node/sortkey return coupland:convert($node/site) };
declare function coupland:subtitle($node as element(subtitle) as node()?) as item()* { element div{ $node/@*, coupland:convert($node/node()) } };
declare function coupland:uri($node as element(uri) as node()?) as item()* { <span> {element a{ attribute href {$node }, "Link" } } </span> };
declare function coupland:websites($node as element(websites) as node()?) as item()* { (: the rot element so convert to html :) <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
Transformation idioms
<title>Web Sites by Coupland</title> <link rel="stylesheet" href="../../css/blueprint/screen.css" type="text/css" media="screen, projection"/> <link rel="stylesheet" href="../../css/blueprint/print.css" type="text/css" media="print"/> <!--[if IE ]><link rel="stylesheet" href="../../css/blueprint/ie.css" type="text/css" media="screen, projection" /><![endif]--> <link rel="stylesheet" href="screen.css" type="text/css" media="screen"/> </head> <body> <div class="container"> { for $category in $node/category order by $category/class return <div> <div class="span-10"> {coupland:convert($category)} </div> <div class="span-14 last"> {for $site in $node/sites/site[category=$category/class] order by ($site/sortkey,$site/name)[1] return coupland:convert($site) } </div> <hr /> </div> } </div> </body> </html> };
326
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ Coupland1. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ coupidtrans2. xq
Typeswitch Transformations
327
Typeswitch Transformations
Motivation
You have an XML document that you want to transform into a different format of XML. You want to control and customize the transformation process, and you want a modular way to store the transformation rules so that you or others can easily modify and maintain them.
Method
We will use XQuery's typeswitch expression to transform an XML document from one form into another. The basic approach is simple and straightforward: For each XML node in the input document, we will specify what should be created in the output document. The typeswitch expression performs this core function of identifying what happens to each node in the source document. We will write an XQuery function that takes a node, tests it using a typeswitch expression, and dispatches that node to the appropriate handler function, which transforms the node into the new format and sends any child elements back to the main function using the passthru function. This recursive routine effectively crawls through an entire node and its children, transforming them into the target format. Once the structure has been set up, the transform is easy to modify, even if there is very complex nesting of the tags within the input document. (The tail recursion technique will be familiar to discerning users of XSLT, but there is absolutely no XSLT prerequisite for this article.)
Example Data
Suppose you have a simple XML document that you would like to transform:
Typeswitch Transformations
328
Notice that the typeswitch expression tests the input node against a list of criteria: is the node a text node, a bill element, or a btitle element, or a section-id element, etc? If it's a text node (e.g. "This is the Bill title"), we simply return the text, unmodified. (Note that the text() node test comes first since text() is likely to be the single most plentiful node type in a text-rich document, and placing the most common type first improves performance.) If instead the node is a bill element, then we pass the node to the aptly-named local:bill() function for bill-specific handling. The local:bill() function (see below) turns the <bill> element into a <Bill> element. It then passes the contents of the bill element to the local:passthru() function. If our node doesn't match any of the pre-defined rules, then the typeswitch expressions resorts to the required final "default" (think: "fallback") statement; this default is used for all nodes that don't match any of the preceding tests. In our example, the default expression sends nodes without matches to the local:passthru() function. (Typeswitch isn't limited to matching text() and element() nodes; it can also match other the node types: processor-instruction() and comment(), but not typically attribute(). Attributes are conventionally dealt with inside the handler function of the attribute's parent element, rather than in the core typeswitch function.)
(Note: This is such a simple function that it may appear extraneous. Why not simply replace instances of local:passthru($node) with local:dispatch($node/node())? Its primary benefit is that it simplifies the code, relieving
Typeswitch Transformations you of the burden of typing an extra "/node()" for each recursion. A secondary benefit is that it introduces the possibility of filtering a node before it is sent to the typeswitch routine.)
329
Compact approach
While the above approach is recommended as the most modular, extensible approach, it is perfectly acceptable to express the same transformation using a more compact, self-contained function: declare function local:transform($nodes as node()*) as item()* { for $node in $nodes return typeswitch($node) case text() return $node
Typeswitch Transformations case element(bill) return element Bill {local:transform($node/node())} case element(btitle) return element BillTitle {local:transform($node/node())} case element(section-id) return element BillSectonID {local:transform($node/node())} case element(strike) return element del {local:transform($node/node())} case element(bill-text) return element BillText {local:transform($node/node())} default return local:transform($node/node()) };
330
Besides the fact that this function is entirely self-contained (beginning with a FLWOR expression and using $node/node() to recurse through child nodes), notice that the function uses computed element constructors to accomplish the transformation.
Conclusion
This is the heart of the XQuery Typeswitch approach to XML document transformation. On the basis of this simple pattern, entire libraries have been written to transform source formats like TEI, DocBook, and Office OpenXML documents into other formats like XHTML, XSL-FO, and each other. While we can create typeswitch modules by hand, building them up element by element, we can also use XQuery to generate a skeleton typeswitch module; see this article's companion article, XQuery/Generating_Skeleton_Typeswitch_Transformation_Modules. In addition to the "skeleton generator", this article also provides examples of more complex transformation patterns with XQuery typeswitch: changing an element's name, ignoring an element, transforming differently based on the context of the element, reordering elements. It also provides a detailed comparison of XQuery and XSLT's approaches to the same example transformation, so it is useful for readers coming from the world of XSLT.
References
DocBook to XHTML [2] Link to sample code that converts Docbook to XHTML in Dan McCreary's eXist Brach W3C XQuery Typeswitch definition [3] Comparison of typeswitch and XSLT apply-templates [4] i18n example by Ryan Semerau [5] typeswitch in BEA/Oracle mapper [6] Dec 2002 article by Per Bothner about using typeswitch to transform XML to HTML in xml.com [7] Transforming XML Structures With a Recursive typeswitch Expression [8] (from MarkLogic "Application Developer's Guide")
Typeswitch Transformations
331
References
[1] [2] [3] [4] [5] [6] [7] [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ transformation/ eg1. xq https:/ / exist. svn. sourceforge. net/ svnroot/ exist/ branches/ dmccreary/ docs/ webapp/ docs/ docbook5/ docbook2xhtml-v2. xqm http:/ / www. w3. org/ TR/ xquery/ #id-typeswitch http:/ / developer. marklogic. com/ blog/ tired-of-typeswitch http:/ / xquerywebappdev. wordpress. com/ 2010/ 05/ 05/ non-obtrusive-i18n http:/ / download. oracle. com/ docs/ cd/ E14981-01/ wli/ docs1031/ dtguide/ dtguideMapper. html#wp1399341 http:/ / www. xml. com/ pub/ a/ 2002/ 12/ 23/ xquery. html?page=1 http:/ / developer. marklogic. com:8040/ 4. 2doc/ docapp. xqy#display. xqy?fname=http:/ / pubs/ 4. 2doc/ xml/ dev_guide/ typeswitch. xml
UK shipping forecast
Motivation
The UK shipping forecast is prepared by the UK met office 4 times a day and published on the radio, the Met Office web site [1] and the BBC web site [2]. However it is not available in a computer readable form. Tim Duckett recently blogged about creating a Twitter stream [3]. He uses Ruby to parse the text forecast. The textual form of the forecast is included on both the Met Office and BBC sites. However as Tim points out, the format is designed for speech, compresses similar areas to reduce the time slot and is hard to parse. The approach taken here is to scrape a JavaScript file containing the raw area forecast data.
Implementation
Dependancies
eXist-db Modules The following scripts use these eXist modules: request - to get HTTP request parameters httpclient - to GET and POST scheduler - to schedule scrapping tasks dateTime - to format dateTimes util - base64 conversions xmldb - for database access
UK shipping forecast weather[28] = "Showers."; visibility[28] = "Moderate or good."; seastate[28] = "Moderate or rough."; area[28] = "Bailey"; area_presentation[28] = "Bailey"; key[28] = "Bailey"; // Faeroes ...
332
Area Forecast
JavaScript conversion
This function fetches the current JavaScript data using the eXist httpclient module, converts the base64 data to a string, picks out the required area data and parses the code to generate an XML structure using the JavaScript array names.
declare namespace httpclient = "http://exist-db.org/xquery/httpclient";
declare function met:get-forecast($area as xs:string) as element(forecast)? { let $jsuri := "http://www.metoffice.gov.uk/lib/includes/marine/gale_and_shipping_table.js" (: fetch the javascript source response :) let $base64:= httpclient:get(xs:anyURI($jsuri),true(),())/httpclient:body/text() (: this is base64 encoded , so decode it back to text :) let $js := util:binary-to-string($base64) and locate the text of the body of the
(: isolate the section for the required area, prefixed with a comment let $areajs := normalize-space(substring-before(
substring-after($js,concat("// ",$area)),"//")) return if($areajs ="") then () else (: build an XML element containing elements for each of the data items, using the array names as the element names :) (: area not found :)
<forecast> { for $d in tokenize($areajs,";")[position() < last()] (: JavaScript statements terminated by ";" - ignore the last empty :) let $ds := tokenize(normalize-space($d)," *= *") (: separate the LHS and RHS of the assignment statement :) return element {replace(substring-before($ds[1],"["),"_","")}(: element name is the array name, converted to a legal name :)
UK shipping forecast
{replace($ds[2],'"','')} quotes :) } </forecast> }; (: element text is the RHS minus
333
For example, the output for one selected area is: <forecast> <galeinforce>0</galeinforce> <gale>0</gale> <galeIssueTime/> <shipIssueTime>0505 Mon 07 Jul</shipIssueTime> <wind>Northwest backing west 5 to 7.</wind> <weather>Squally showers.</weather> <visibility>Moderate or good.</visibility> <seastate>Moderate or rough.</seastate> <area>Fastnet</area> <areapresentation>Fastnet</areapresentation> <key>Fastnet</key> </forecast>
Area Forecast
Finally these functions can be used in a script which accepts a shipping area name and returns an XML message: import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $area := request:get-parameter("area","Lundy") let $forecast := met:get-forecast($area) return <message area="{$area}" dateTime="{$forecast/shipIssueTime}"> {met:forecast-as-text($forecast)} </message> Lundy [5]
334
Message abbreviation
To create a message suitable for texting (160 characters), or tweeting (140 character limit), the message can compressed by abbreviating common words.
Abbreviation dictionary
A dictionary of words and abbreviations is created and stored locally. The dictionary has been developed using some of the abbreviations in Tim Duckett's Ruby implementation. <dictionary> <entry full="west" abbrev="W"/> <entry full="westerly" abbrev="Wly"/> .. <entry full="variable" abbrev="vbl"/> <entry full="visibility" abbrev="viz"/> <entry full="occasionally" abbrev="occ"/> <entry full="showers" abbrev="shwrs"/> </dictionary> The full dictionary [7]
Abbreviation function
The abbreviation function breaks down the text into words, replaces words with abbreviations and builds the text up again: declare function met:abbreviate($forecast as xs:string) as xs:string { string-join( (: lowercase the string, append a space (to ensure a final . is matched) and tokenise :) for $word in tokenize(concat(lower-case($forecast)," "),"\.? +") return (: if there is an entry for the word , use its abbreviation, otherwise use the unabbreviated word :) ( /dictionary/entry[@full=$word]/@abbrev,$word) [1] , " ") (: join the words back up with space separator :) };
Abbreviated Message
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $area := request:get-parameter("area","Lundy") let $forecast := met:get-forecast($area) return <message area="{$area}" dateTime="{$forecast/shipIssueTime}">
335
Execute [10]
UK shipping forecast
336
SMS service
One possible use of this data would be to provide an SMS on-request service, taking an area name and returning the abbreviated forecast. The complete set of forecasts are created, and the one for the area supplied as the message selected and returned as an abbreviated message. import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $area := lower-case(request:get-parameter("text",())) let $forecast := met:get-forecast()[lower-case(area) = $area] return if (exists($forecast)) then concat("Reply: ", met:abbreviate(met:forecast-as-text($forecast))) else concat("Reply: Area ",$area," not recognised")
The calling protocol is determined here by the SMS service installed at UWE and described here [2] Execute [11]
Caching
Fetching the JavaScript on demand is neither efficient nor acceptable net behaviour, and since the forecast times are known, it is preferable to fetch the data on a schedule, convert to the XML form and save in the eXist database and then use the cached XML for later requests.
UK shipping forecast "shippingForecast.xml", here as we only want the latest :) be stored :) <ShippingForecast at="{$forecastDateTime}" > {$forecast} </ShippingForecast> ) return <result> Shipping forecast for {string($forecastDateTime)} stored in {$store} </result> else () The timestamp used on the source data is converting to an xs:dateTime for ease of later processing.
declare function met:timestamp-to-xs-date($dt as xs:string) as xs:dateTime { (: convert timestamps in the form 0505 Tue 08 Jul to xs:dateTime :) let $year := year-from-date(current-date()) year since none provided :) let $dtp := tokenize($dt," ") let $mon := index-of(("Jan","Feb", "Mar","Apr","May", "Jun","Jul","Aug","Sep","Oct","Nov","Dec"),$dtp[4]) let $monno := if($mon < 10) then concat("0",$mon) else $mon return xs:dateTime(concat($year,"-",$monno,"-",$dtp[3],"T",substring($dtp[1],1,2),":",substring($dtp[1],3,4),":00")) }; (: assume the current
UK shipping forecast </forecast> }; There would be a case to make for using XSLT for this transformation. The caching script applies this transformation to the forecast before saving.
338
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $area := lower-case(normalise-space(request:get-parameter("text",()))) let $forecast := met:get-stored-forecast($area) return if (exists($forecast)) then concat("Reply: ", datetime:format-dateTime($forecast/../@at,"HH:mm")," ",met:abbreviate(met:forecast-as-text($forecast))) else concat("Reply: Area ",$area," not recognised")
In this script, the selected forecast for the input area extracted by the met function call is a reference to the database element, not a copy. Thus it is still possible to navigate back to the parent element containing the timestamp. The eXist datetime functions are wrappers for the Java class java.text.SimpleDateFormat [12] which defines the date formatting syntax. Lundy [13]
Job scheduling
eXist includes a scheduler module which is a wrapper for the Quartz scheduler DBA user. For example, to set a job to fetch the shipping forecast on the hour, let $login := xmldb:login( "/db", "admin", "admin password" ) let $job := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq" , "0 0 * * * ?") return $job
[14]
UK shipping forecast
339
where "0 0 * * * ?" means to run at 0 seconds, 0 minutes past every hour of every day of every month, ignoring the day of the week. To check on the set of scheduled jobs, including system schedule jobs: let $login := xmldb:login( "/db", "admin", "admin password" ) return scheduler:get-scheduled-jobs()
It would be better to schedule jobs on the basis of the update schedule for the forecast. These times are 0015, 0505, 1130 and 1725. These times cannot be fitted into a single cron pattern so multiple jobs are required. Because jobs are identified by their path, the same url cannot be used for all instances, so a dummy parameter is added. Discussion The times are one minute later than the published times. This may not be enough slack to account for discrepancies in timing on both sides. Clearly a push from the UK Met Office would be better than the pull scraping. The scheduler clock runs in local time (BST) as are the publication times. let $login := xmldb:login( "/db", "admin", "admin password" ) let $job1 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=1" "0 16 0 * * ?") let $job2 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=2" "0 6 5 * * ?") let $job3 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=3" "0 31 11 * * ?") let $job4 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=4" "0 26 17 * * ?") return ($job1, $job2, $job3, $job4)
Forecast as kml
Sea area coordinates
The UK Met Office provides a clickable map [1] of forecasts but a KML map would be nice. The coordinates [15] of the sea areas can be captured and manually converted to XML. <?xml version="1.0" encoding="UTF-8"?> <boundaries> <boundary area="viking"> <point latitude="61" longitude="0"/> <point latitude="61" longitude="4"/> <point latitude="58.5" longitude="4"/> <point latitude="58.5" longitude="0"/> </boundary> ...
UK shipping forecast The boundary for an area is accessed by two functions. In this idiom one function hides the document location and returns the root of the document. Subsequence functions use this base function to get the docuement and then apply further predicates to filter as required. declare function met:area-boundaries() as element(boundaries) { doc("/db/Wiki/Met/shippingareas.xml")/boundaries }; declare function met:area-boundary($area as xs:string) as element(boundary) { met:area-boundaries()/boundary[@area=$area] };
340
The centre of an area can be roughly computed by averaging the latitudes and longitudes: declare function met:area-centre($boundary as element(boundary)) as element(point) { <point latitude="{round(sum($boundary/point/@latitude) div count($boundary/point) * 100) div 100}" longitude="{round(sum($boundary/point/@longitude) div count($boundary/point) * 100) div 100}" /> };
kml Placemark
We can generate a kml PlaceMark from a forecast: declare function met:forecast-to-kml($forecast as element(forecast)) as element(Placemark) { let $area := $forecast/@area let $boundary := met:area-boundary($area) let $centre := met:area-centre($boundary) return <Placemark > <name>{string($forecast/areapresentation)}</name> <description> {met:forecast-as-text($forecast)} </description> <Point> <coordinates> {string-join(($centre/@longitude,$centre/@latitude),",")} </coordinates> </Point> </Placemark> };
UK shipping forecast
341
(: set the media type for a kml file :) declare option exist:serialize "method=xml indent=yes media-type=application/vnd.google-earth.kml+xml"; (: set the file name ans extension when saved to allow GoogleEarth to be invoked :) let $dummy := response:set-header('Content-Disposition','inline;filename=shipping.kml;') (: get the latest forecast :) let $shippingForecast := met:get-stored-forecast() return <kml > <Folder> <name>{datetime:format-dateTime($shippingForecast/@at,"EEEE HH:mm")}
UK shipping forecast UK Met Office Shipping forecast</name> {for $forecast in $shippingForecast/forecast return (met:forecast-to-kml($forecast), met:sea-area-to-kml($forecast/@area,false()) ) } </Folder> </kml> raw kml [16] on GoogleMap [17]
342
Push messages
An alternative use of this data is to provide a channel to push the forecasts through as soon as they are received. The channel could be a SMS alert to subscribers or a dedicated Twitter stream which users could follow.
Subscription SMS
This service should allow a user to request an alert for a specific area or areas. The application requires: a data structure to record subscribers and their areas a web service to register a user, their mobile phone number and initial area [to do] an SMS service to change the required area and turn messaging on or off a scheduled task to push the SMS messages when the new forecast has been obtained
Document Structure
<subscriptions> <subscription> <username>Fred Bloggs</username> <password>hafjahfjafa</password> <mobilenumber>447777777</mobilenumber> <area>lundy</area> <status>off</status> </subscription> ... </subscriptions> XML Schema (to be completed)
Access control
Access to this document needs to be controlled. The first level of access control is to place the file in a collection which is not accessible via the web. In the UWE server, the root (via mod-rewrite) is the collection /db/Wiki so resources in this directory and subdirectories are accessible, subject to the access settings on the file, but files in parent or sibling directories are not. So this document is stored in the directory /db/Wiki2. The URL of this file, relative to the external root is http:/ / www. cems. uwe. ac. uk/xmlwiki/../Wiki2/shippingsubscriptions.xml [18] but access fails.
UK shipping forecast The second level of control is to set the owner and permissions on the file. This is needed because a user on a client behind the firewall, using the internal server address, will gain access to this file. By default, world permissions are set to read and update. Removing this access requires the script to login to read as group or owner. Ownership and permissions can be set either via the web client or by functions in the eXist xmldb module.
343
SMS push
This function takes a subscription, formulates a text message and calls a general sms:send function to send. This interfaces with our SMS service provider. declare function met:push-sms($subscription as element(subscription)) as element(result) { let $area := $subscription/area let $forecast := met:get-stored-forecast($area) let $time := datetime:format-dateTime($forecast/../@at,"EE HH:mm") let $text := encode-for-uri(concat($area, " ",$time," ",met:abbreviate(met:forecast-as-text($forecast)))) let $number := $subscription/mobilenumber let $sent := sms:send($number,$text) return <result number="{$number}" area="{$area}" sent="{$sent}"/> };
and then to iterate through the active subscriptions and report the result: declare function met:push-subscriptions() as element(results) { <results> { let $dummy := xmldb:login("/db","webuser","password") for $subscription in met:active-subscriptions() return met:push-sms($subscription) } </results> }; This script iterates through the subscriptions currently active and calls the push-SMS function for each one.
344
This task could be scheduled to run after the caching task has run or the caching script modified to invoke the subscription task when it has completed. However eXist also supports triggers so the task could also be triggered by the database event raised when the forecast file store has been completed.
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $login:= xmldb:login("/db","user","password") let $text := normalize-space(request:get-parameter("text",())) let $number := request:get-parameter("from",()) let $subscription := met:get-subscription($number) return if (exists($subscription)) then let $update := if ( $text= "on") then update replace $subscription/status with <status>on</status> else if( $text = "off") then update replace $subscription/status with <status>off</status> else if ( lower-case($text) = met:area-names()) then ( update replace $subscription/area with <area>{$text}</area>, update replace $subscription/status with <status>on</status> ) else () return let $subscription := met:get-subscription($number)(: get the subscription post update :) return concat("Reply: forecast is ",$subscription/status," for area ",$subscription/area) else ()
UK shipping forecast
345
Twitter
Twitter [19] has a simple REST API to update the status. We can use this to tweet the forecasts to a Twitter account. Twitter uses Basic Access Authentication and a suitable XQuery function to send a message to a username/password, using the eXist httpclient module is : declare function met:send-tweet ($username as xs:string,$password as xs:string,$tweet as xs:string ) as xs:boolean { let $uri := xs:anyURI("http://twitter.com/statuses/update.xml") let $content :=concat("status=", encode-for-uri($tweet)) let $headers := <headers> <header name="Authorization" value="Basic {util:string-to-binary(concat($username,":",$password))}"/> <header name="Content-Type" value="application/x-www-form-urlencoded"/> </headers> let $response := httpclient:post( $uri, $content, false(), $headers ) return $response/@statusCode='200' }; A script is needed to access the stored forecast and tweet the forecast for an area. Different twitter accounts could be set up for each shipping area. The script will need to be scheduled to run after the the full forecast has been acquired. In this example, the forecast for given are is tweeted to a hard-coded twitterer: import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; declare variable $username := "kitwallace"; declare variable $password := "mypassword"; declare variable $area := request:get-parameter("area","lundy"); let $forecast := met:get-stored-forecast($area) let $time := datetime:format-dateTime($forecast/../@at,"HH:mm") let $message := concat($area," at ",$time,":",met:abbreviate(met:forecast-as-text($forecast))) return <result>{met:send-tweet($username,$password,$message)}</result> Chris Wallace's Twitter [20]
UK shipping forecast
346
To do
Creating and editing subscriptions
This task is ideal for XForms.
Triggers
Use a trigger to push the SMS messages when update has been done.
References
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] http:/ / www. metoffice. gov. uk/ weather/ marine/ shipping_forecast. html http:/ / www. bbc. co. uk/ weather/ coast/ shipping/ index. shtml http:/ / www. adoptioncurve. net/ archives/ 2008/ 03/ twittering-the-shipping-forecast. php http:/ / www. metoffice. gov. uk/ lib/ includes/ marine/ gale_and_shipping_table. js http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shipping. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shipping. xq?area=Fastnet http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shippingdictionary. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shippingabbrev. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shippingabbrev. xq?area=Fastnet http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ shippingfull. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ met2SMS1. xq?text=lundy http:/ / java. sun. com/ j2se/ 1. 4. 2/ docs/ api/ java/ text/ SimpleDateFormat. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ met2SMS. xq?text=Lundy http:/ / www. opensymphony. com/ quartz/ http:/ / www. users. zetnet. co. uk/ tempusfugit/ marine/ area_coord. htm http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ forecast2kml. xq http:/ / maps. google. co. uk/ maps?q=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Met/ forecast2kml. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ . . / Wiki2/ shippingsubscriptions. xml http:/ / twitter. com http:/ / twitter. com/ kitwallace
347
Method
We will use the compression:unzip() function used in the prior example and pass it a local version of the function that handles the uncompression.
File Names
Some file names in docx files such as '[Content_Types].xml' are not valid URIs. So these must be renamed to files with valid URIs. Here is a typical list of the path names in a docx file: <item <item <item <item <item <item <item <item <item <item <item path="[Content_Types].xml" type="resource">Types</item> path="_rels/.rels" type="resource">Relationships</item> path="word/_rels/document.xml.rels" type="resource">Relationships</item> path="word/document.xml" type="resource">w:document</item> path="word/theme/theme1.xml" type="resource">a:theme</item> path="word/settings.xml" type="resource">w:settings</item> path="word/fontTable.xml" type="resource">w:fonts</item> path="word/webSettings.xml" type="resource">w:webSettings</item> path="docProps/app.xml" type="resource">Properties</item> path="docProps/core.xml" type="resource">cp:coreProperties</item> path="word/styles.xml" type="resource">w:styles</item>
Note that there are three subfolders created (_rels, word and docProps). The XML files are stored in these files.
unzip-docx function
The following function is used to unzip a docx file. This function name must be passed as a parameter to the unzip function to tell it to do with each docx file. Note that you must pass in parameters to this function from the calling function. unzip-docx function:
declare function local:unzip-docx($path as xs:string, $data-type as xs:string, $data as item()?, $param as item()*) { if ($param[@name eq 'list']/@value eq 'true') then <item path="{$path}" data-type="{$data-type}"/> else let $base-collection := $param[@name="base-collection"]/@value/string() let $zip-collection := concat(
functx:substring-before-last($param[@name="zip-filename"]/@value, '.'),
348
functx:substring-after-last($param[@name="zip-filename"]/@value, '.') , '_parts/' ) let $inner-collection := functx:substring-before-last($path, '/') let $filename := if (contains($path, '/')) then functx:substring-after-last($path, '/') else $path (: we need to encode the filename to account for filenames with illegal characters like [Content_Types].xml :) let $filename := xmldb:encode($filename) let $target-collection := concat($base-collection, $zip-collection, $inner-collection) let $mkdir := if (xmldb:collection-available($target-collection)) then ()
else xmldb:create-collection($base-collection, concat($zip-collection, $inner-collection)) let $store := (: ensure mimetype is set properly for .docx rels files :) if (ends-with($filename, '.rels')) then xmldb:store($target-collection, $filename, $data, 'application/xml') else xmldb:store($target-collection, $filename, $data) return <result object="{$path}" destination="{concat($target-collection, '/', $filename)}"/> };
unzip function
declare function local:unzip($base-collection as xs:string, $zip-filename as xs:string, $action as xs:string) { if (not($action = ('list', 'unzip'))) then <error>Invalid action</error> else let $file := util:binary-doc(concat($base-collection, $zip-filename)) let $entry-filter := util:function(QName("local", "local:unzip-entry-filter"), 3) let $entry-filter-params := () let $entry-data := util:function(QName("local", "local:unzip-docx"), 4) let $entry-data-params := ( if ($action eq 'list') then <param name="list" value="true"/> else (), <param name="base-collection" value="{$base-collection}"/>,
Unzipping an Office Open XML docx file <param name="zip-filename" value="{$zip-filename}"/> ) let $login := xmldb:login('/db', 'admin', '') (: recursion :) let $unzip := compression:unzip($file, $entry-filter, $entry-filter-params, $entry-data, $entry-data-params) return <results action="{$action}">{$unzip}</results> };
349
Sample Driver
let $collection := '/db/test/' let $zip-filename := 'hello-world.docx' let $action := 'unzip' (: valid actions: 'list', 'unzip' :) return local:unzip($collection, $zip-filename, $action)
Method
We will use the XQuery update statement to change the values of XML documents and note how the default output changes with respect to how the default namespaces and namespace prefixes are rendered. XQuery updates can impact how namespaces are rendered.
Example
Suppose we have a task that includes a default namespace like the following XML document. Example XML document using a default namespace. <task xmlns="http://www.example.com/task"> <id></id> <task-name>Task Name</task-name> <task-description>Task Description</task-description> </task> In the following example we refer to this XML document as $doc. Most people that want to view XML prefer to use a default namespace and not clutter the entire document with unnecessary prefixes.
Updates and Namespaces Suppose we have just saved this file and we now want to add an ID value to the <id> element. After we update this XML file with the following update statement we note that the serialization will change. update replace $doc/task:task/task:id with <task:id>123</task:id> The next time we view this document we see the following: <task:task xmlns:task="http://www.example.com/task"> <task:id>123</task:id> <task:task-name>Task Name</task:task-name> <task:task-description>Task Description</task:task-description> </task:task> This is known as "fully qualified" document where the namespace prefix of every element is fully shown. It is technically equivalent to the prior example. This may not be what you would like.
350
Acknowledgments
Joe Wicentowski was kind enough to make these observations and provide samples.
Uploading Files
351
Uploading Files
Motivation
You want to upload files to your eXist database using simple HTML forms.
Method
We will use the HTML <input> element in the web form and the store function in an XQuery.
HTML Form
We will use a standard HTML form but we will add a enctype="multipart/form-data" attribute. <form enctype="multipart/form-data" method="post" action="upload-document.xq"> <fieldset> <legend>Upload Document:</legend> <input type="file" name="file"/> <input type="submit" value="Upload"/> </fieldset> </form>
Screen Image:
XQuery
On the server side, we will use the request:get-uploaded-file-name() to get the name of the incoming file and the request:get-uploaded-file-data() function to get the data from the file. We can then used the xmldb:store() function to save the file. File: upload-document.xq
let $collection := '/db/test/upload-test' let $filename := request:get-uploaded-file-name('file') (: make sure you use the right user permissions that has write access to this collection :) let $login := xmldb:login($collection, 'admin', 'my-admin-password') let $store := xmldb:store($collection, $filename, request:get-uploaded-file-data('file')) return <results> <message>File {$filename} has been stored at collection={$collection}.</message> </results>
Uploading Files
352
Acknowledgments
This example was posted on the eXist open mailing list by Rmi Arnaud on Nov. 05, 2010.
Uptime monitor
Motivation
You would like to monitor the service availability of several web sites or web services. You would like to do this all with XQuery and store the results in XML files. You would also like to see "dashboard" graphical displays of uptime. There are several commercial services (Pingdom web sites in terms of uptime and response time.
[1]
, Host-tracker
[2]
Although the production of a reliable service requires a network of servers, the basic functionality can be performed using XQuery in a few scripts.
Method
This approach focuses on the uptime and response time of web pages. The core approach is to use the eXist job scheduler to execute an XQuery script at regular time intervals. This script performs a HTTP GET on a URI and records the statusCode of the site in an XML data file. The operation is timed to gather response times from elapsed time (valid on a lightly used server) and the test results stored. Reports can then be run from the test results and alerts send when a site is observed to be down. Even though a prototype, the access to fine-grained data has already revealed some response time issues on one of the sites at the University. Watch list [3]
Conceptual Model
This ER model was created in QSEE, which can also generate SQL or XSD.
In this notation the bar indicates that Test is a weak entity with existence dependence on Watch.
Uptime monitor
353
Uptime monitor By Inference This schema has been generated by Trang (in Oxygen ) from an example document, created as the system runs. Compact Relax NG element Monitor { element Watch { element uri { xsd:anyURI }, element name { text }, element Log { element Test { attribute at { xsd:dateTime }, attribute responseTime { xsd:integer }, attribute statusCode { xsd:integer } }+ } }+ } XML Schema XML Schema [5] Designed Schema Editing the QSEE generated schema results in a schema which includes the restriction on statusCodes. XML Schema [6] Test Data An XQuery script transforms an XML Schema (or a subset thereof) to a random instance of a conforming document. Random Document [7] The constraint that Tests are in ascending order of the attribute at is not defined in this schema. The generator needs to be helped to generate useful test data by additional information about the length of strings and the probability distribution of enumerated values, iterations and optional elements
354
Uptime monitor ALTER TABLE Test ADD INDEX (uri), ADD CONSTRAINT fk1_Test_to_Watch FOREIGN KEY(uri) REFERENCES Watch(uri) ON DELETE RESTRICT ON UPDATE RESTRICT; In the Relational implementation the primary key uri of Watch is the foreign key of Test. There would be an advantage to adding a system-generated id to use in place of this meaningful URI, both to remove the redundancy created and to reduce the size of the foreign key. However a mechanism is then need to allocate unique ids.
355
Implementation
Dependancies
eXistdb modules xmldb for database update and login datetime for date formating util - for system-time function httpclient - for HTTP GET scheduler - to scheule the monitoring task validation - for database validation
Functions
Functions in a single XQuery module. module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor";
Database Access
Access to the Monitor database which may be a local database document, or a remote document. declare function monitor:get-watch-list($base as xs:string) as element(Watch)* { doc($base)/Monitor/Watch };
Further references to a Watch are by reference. e.g. declare function monitor:get-watch-by-uri($base as xs:string, $uri as xs:string) as element(Watch)* {
356
Executing Tests
The test does an HTTP GET on the uri. The GET is bracketed by calls to util:system-time() to compute the elapsed wall-clock time in milliseconds. The test report includes the statusCode.
declare function monitor:run-test($watch as element(Watch)) as element(Test) { let $uri := $watch/uri let $start := util:system-time() let $response := httpclient:get(xs:anyURI($uri),false(),())
let $end := util:system-time() let $runtimems := (($end - $start) div xs:dayTimeDuration('PT1S')) * 1000 let $statusCode := string($response/@statusCode) return <Test }; at="{current-dateTime()}" responseTime="{$runtimems}" statusCode="{$statusCode}"/>
The generated test is appended to the end of the log: declare function monitor:put-test($watch as element(Watch), $test as element(Test)) { update insert $test into $watch/Log };
To execute the test, a script logs in, iterates through the Watch entities and for each, executes the test and stores the result: import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm"; let $login := xmldb:login("/db/","user","password") let $base := "/db/Wiki/Monitor3/Monitor.xml" for $watch in monitor:get-watch-list($base) let $test := monitor:run-test($watch) let $update :=monitor:put-test($watch,$test) return $update
Uptime monitor
357
Job scheduling
A job is schedule to run this script every 5 minutes let $login := xmldb:login("/db","user","password") return scheduler:schedule-xquery-cron-job("/db/Wiki/Monitor/runTests.xq" , "0 0/5 * * * ?")
Index page
The index page is based on a supplied Monitor document, by default the production database.
import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm";
"method=xhtml media-type=text/html";
request:get-parameter("base","/db/Wiki/Monitor3/Monitor.xml");
<html> <head> <title>{$heading}</title> </head> <body> <h1>{$heading}</h1> <ul> {for $watch in return <li>{string($watch/name)}    monitor:get-watch-list($base)
In this implementation, the URI of the monitor document is passed to dependent scripts in the URI. An alternative would to pass this data via a session variable. View [3]
Uptime monitor
358
Reporting
Reporting draws on the log of Tests for a Watch declare function monitor:get-tests($watch as element(Watch)) as element(Test)* { $watch/Log/Test };
Overview Report
The basic report shows summary data about the watched URI and an embedded chart of response time over time. Up-time is the ratio of tests with a status code of 200 to the total number of tests.
import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm";
"method=xhtml media-type=text/html";
let $tests := monitor:get-tests($watch) let $countAll := count($tests) let $uptests := $tests[@statusCode="200"] let $last24hrs := $tests[position() >($countAll - 24 * 12)] let $heading := concat("Performance results for ", string($watch/name)) return <html> <head> <title>{$heading}</title> </head> <body> <h3> <a href="index.xq">Index</a> </h3> <h1>{$heading}</h1> <h2><a href="{$watch/uri}">{string($watch/uri)}</a></h2> {if (empty($tests)) then () else <div> <table border="1"> <tr> <th>Monitoring started</th> <td> {datetime:format-dateTime($tests[1]/@at,"EE dd/MM HH:mm")}</td> </tr>
Uptime monitor
<tr> <th>Latest test</th> <td> {datetime:format-dateTime($tests[last()]/@at,"EE dd/MM HH:mm")}</td> </tr> <tr> <th>Minimum response time </th> <td> {min($tests/@responseTime)} ms </td> </tr> <tr> <th>Average response time</th> <td> { round(sum($tests/@responseTime) div count($tests))} ms</td> </tr> <tr> <th>Maximum response time </th> <td> {max($tests/@responseTime)} ms</td> </tr> <tr> <th>Uptime</th> <td>{round(count($uptests) div count($tests) </tr> <tr> <th>Raw Data </th> <td> <a href="testData.xq?base={encode-for-uri($base)}&uri={encode-for-uri($uri)}">View</a> </td> </tr> <tr> <th>Response Distribution </th> <td> <a href="responseDistribution.xq?base={encode-for-uri($base)}&uri={encode-for-uri($uri)}">View</a> </td> </tr> </table> <h2>Last 24 hours </h2> {monitor:responseTime-chart($last24hrs)} <h2>1 hour averages </h2> {monitor:responseTime-chart(monitor:average($tests,12))} * 100) } %</td>
359
View [8]
Uptime monitor
360
This grouped distribution can then be Charted as a bar chart. Scaling is needed in this case. declare function monitor:distribution-chart($distribution as element(Distribution)) as element(img) { let $maxcount := max($distribution/Group/@count) let $scale :=100 div $maxcount let $points := string-join( $distribution/Group/xs:string($scale * @count),",") let $chartType := "bvs"
Uptime monitor let $chartSize := "300x200" let $uri := concat("http://chart.apis.google.com/chart?", "cht=",$chartType,"&chs=",$chartSize,"&chd=t:",$points) return <img src="{$uri}"/> }; Finally a Script to create a page: import module namespace monitor = "http://www.cems.uwe.ac.uk/xmlwiki/monitor" at "monitor.xqm"; declare option exist:serialize "method=xhtml media-type=text/html";
361
let $base := request:get-parameter("base",()) let $uri:= request:get-parameter("uri",()) let $watch := monitor:get-watch($base,$uri) let $tests := monitor:get-tests($watch) let $heading := concat("Distribution for ", string($watch/name)) let $distribution := monitor:response-distribution($tests) return <html> <head> <title>{$heading}</title> </head> <body> <h1>{$heading}</h1> {monitor:distribution-chart($distribution)} <br/> <table border="1"> <tr> <th>I </th> <th>Mid</th> <th>Count</th> </tr> {for $group in $distribution/Group return <tr> <td>{string($group/@i)}</td> <td>{string($group/@mid)}</td> <td>{string($group/@count)}</td> </tr> } </table> </body> </html>
Uptime monitor
362
Validation
The eXist module provides functions for validating a document against a schema. The Monitor document links to a schema: let $doc := "/db/Wiki/Monitor3/Monitor.xml" return <report> <document>{$doc}</document> {validation:validate-report(doc($doc))} </report> Execute [9] Alternatively, a document can be validated against any schema: let $schema := "http://www.cems.uwe.ac.uk/xmlwiki/Monitor3/trangmonitor.xsd" let $doc := "/db/Wiki/Monitor3/Monitor.xml" return <report> <document>{$doc}</document> <schema>{$schema}</schema> {validation:validate-report(doc($doc),xs:anyURI($schema))} </report> Execute [10] This is used to check that the randomly generated instance is valid:
let $schema := request:get-parameter("schema",()) let $file := doc(concat("http://www.cems.uwe.ac.uk/xmlwiki/XMLSchema/schema2instance.xq?file=",$schema)) return <result> <schema>{$schema}</schema> {validation:validate-report($file,xs:anyURI($schema))} {$file} </result>
Execute [11]
Uptime monitor
363
Downtime alerts
The purpose of a monitor is to alert those responsible for a site to its failure. Such an alert might be by SMS, email or some other channel. The Watch entity will need to be augmented with configuration parameters.
Check if failed
First it is necessary to calculate whether the site is down. monitor:failing () returns true() if all tests in the past $watch/fail-minutes have not returned a statusCode of 200. declare function monitor:failing($watch as element(Watch)) as xs:boolean { let $now := current-dateTime() let $lastTestTime := $now - $watch/failMinutes * xs:dayTimeDuration("PT1M") let $recentTests := $watch/Log/Test[@at > $lastTestTime] return every $t in $recentTests satisfies not($t/statusCode = "200") };
Uptime monitor
return if (monitor:failing($watch) and not(monitor:alert-sent($watch))) then let $update := update insert <Alert at="{current-dateTime()}"/> into $watch/Log let $alert := monitor:send-alert($watch,$message) return true() else false()
364
Discussion
Alert events could be added to a separate AlertLog but it is arguably easier to add a new class of Events than create a separate sequence for each. There may also be cases where the sequential relationship between Tests and Events is useful. [ Re-designed Schema]
To do
add create/edit Watch detect missing tests Support analysis for date ranges by filtering tests by date prior to analysis improve the appearance of the charts
References
[1] [2] [3] [4] [5] [6] [7] http:/ / www. pingdom. com/ http:/ / host-tracker. com/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ index. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ qseemonitor. xsd http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ trangmonitor. xsd http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ designmonitor. xsd http:/ / www. cems. uwe. ac. uk/ xmlwiki/ XMLSchema/ schema2instance. xq?file=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ designmonitor. xsd [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor/ report. xq?uri=http:/ / www. google. co. uk/ [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ validate. xq [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ validateschema. xq [11] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ validaterandom. xq?schema=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Monitor3/ monitor. xsd
365
Method
This example will use a custom controller.xql file to do this. Sample controller.xql file: (: Protected resource: user is required to log in with valid credentials. If the login fails or no credentials were provided, the request is redirected to the login.xml page. :) else if ($exist:resource eq 'protected.xml') then let $login := local:set-user() return if ($login) then <dispatch xmlns="http://exist.sourceforge.net/NS/exist"> {$login} <view> <forward url="style.xql"/> </view> </dispatch> else <dispatch xmlns="http://exist.sourceforge.net/NS/exist"> <forward url="login.xml"/> <view> <forward url="style.xql"/> </view> </dispatch> else (: everything else is passed through :) <dispatch xmlns="http://exist.sourceforge.net/NS/exist"> <cache-control cache="yes"/> </dispatch>
366
Method
A typical URL in eXist has a format similar to the following: http://www.example.com:8080/exist/rest/db/app/search.xq?q=apple You want users to access this page through a cooler [1], less platform-dependent URL such as: http://www.example.com/search?q=apple In order to go transform your URLs into the latter cool form, you need to understand the fundamentals of URLs in eXist.
Parts of a URL
Fundamentally, eXist's URLs consist of 3 parts: 1. The Hostname and Port: In the example above the hostname is www.example.com and the port is 8080 2. The Web Application Context: In the example above the context is /exist 3. The Path: In the example above the path is /rest/db/app/search.xq?q=apple Customizing an eXist URL can mean targetting 1 or more of the 3 parts.
Rewriting Primer
Some methods below make use of eXist's URL-rewriting facility, that conceptually will let your application follow a MVC (model-view-controller) design. eXist 1.5 comes preconfigured with a working setup that embodies these principles: 1. The collection that lives below /db/myapp/, which is exposed through the REST servlet via /exist/rest/db/myapp/, can at the same time be reached through URL-rewriting in the location /exist/apps/myapp/. 2. Placing a controller.xql inside of /db/myapp/ will determine how the data, a.k.a. model inside of this collection gets presented in the space created by URL-rewriting - so to say: it controls the view at the model.
Please read farther below on how to configure URL-rewriting in version 1.4.1 of eXist to get the same setup.
Customizing URLs
Changing the Port
The port for eXist's default web server (Jetty) is 8080, and it is set in $EXIST_HOME/tools/jetty/etc/jetty.xml line 51. You can modify this file, or you can set the port on startup by setting the -Djetty.port=80 flag upon startup. Note that how you change the port is different based on how you start eXist. If you start eXist from the bin/startup using a UNIX or DOS shell you must change the startup.sh or startup.bat file. If you start eXist automatically using the UNIT tools/wrapper/exist.sh tools or the Windows Services you need to change the jetty.xml file. Restart eXist. Now, with this change made, your URL will now look like: http://www.example.com/exist/rest/db/app/search.xq?q=apple
URL Rewriting Basics instead of: http://www.example.com:8080/exist/rest/db/app/search.xq?q=apple On Unix (including Mac OS X) and Linux, you will need to run eXist as root in order to bind to port 80. Otherwise the server won't start.
367
URL Rewriting Basics A basic controller.xql file that will accomplish this goal is as follows: xquery version "1.0"; (:~ Default controller XQuery. Forwards '/search' to search.xq in the same directory and passes all other requests through. :) (: Root path: forward to search.xq in the same collection (or directory) as the controller.xql :) if (starts-with($exist:path, '/search')) then let $query := request:get-parameter("q", ()) return <dispatch xmlns="http://exist.sourceforge.net/NS/exist"> <forward url="search.xq"/> <set-attribute name="q" value="{$query}"/> </dispatch> (: Let everything else pass through :) else <ignore xmlns="http://exist.sourceforge.net/NS/exist"> <cache-control cache="yes"/> </ignore> Note that the $exist:path variable is a variable that eXist makes available to controller.xql files. The value of $exist:path is always equal to the portion of the requested URL that comes after the controller's root directory. A request to '/search' will cause $exist:path to be '/search'. Save this query as controller.xql and place it in your /db/app directory. Congratulations! Our URL is now in the very cool form we had envisioned: http://www.example.com/search?q=apple instead of: http://www.example.com/search.xq?q=apple This $exist:path variable is one of 5 such variables available to controller.xql files. (See the full URL Rewriting documentation for more information on each.) These variables give you very fine control over the URLs requested as well as eXist's own internal paths to your app's resources. Since you may wish to re-route a URL request based on the URL parameters (e.g. q=apple), you may wish to retrieve the URL parameter using the request:get-parameter() function, and then to explicitly pass this parameter to the target query using the <add-parameter> element, as in the example controller.xql file. Thus, in customizing the "path" section of the URL, we have actually paid attention to 3 items: 1. The root pattern and path to its root controller directory (recall the <root> element inside the controller-config.xml file) 2. The remainder of the path after the controller directory 3. The URL parameters included as part of the URL
368
URL Rewriting Basics This simple example only touches the surface of what you can do with URL Rewriting. Using URL Rewriting not only gives your apps 'cool URLs', but it also allows your apps to be much more portable, both on your server and in getting your apps onto other servers.
369
Further considerations
Defining multiple 'roots'
If you want your main app to live in /db/app but you still want to access apps such as the admin app ('/webapp/admin') stored on the filesystem, add a <root> element to controller-config.xml declaring the root pattern you want to associate with the filesystem's /webapp directory. Replace your current root elements with the following: <root pattern="/fs" path="/"/> <root pattern="/*" path="xmldb:exist:///db/app"/> This will pass all URL requests beginning with /fs to the filesystem's webapp directory. All other URLs will still go to the /db/app directory.
Variable Standards
The code inside of controller.xql gets passed some variables in addition to the usual ones. Below controller.xql does not do any forwarding, but instead prints their values, and the path to the document requested, if there is one there xquery version "1.0"; declare namespace exist="http://exist.sourceforge.net/NS/exist"; import module namespace text="http://exist-db.org/xquery/text"; declare declare declare declare declare variable variable variable variable variable $exist:root external; $exist:prefix external; $exist:controller external; $exist:path external; $exist:resource external;
370
Acknowledgments
Joe Wicentowski contributed the core of this article to the eXist-open mailing list on Mon, 19 Oct 2009. It was subsequently edited by Dan McCreary and Joe Wicentowski into its present form.
References
[1] http:/ / www. w3. org/ Provider/ Style/ URI [2] http:/ / exist-db. org/ urlrewrite. html
MusicXML
MusicXML [1] is an XML application for recording music scores. There is a range of software which produces and consumes MusicXML. There are two styles of MusicXML with two related schemas, one in which measures are within parts (partwise), the other in which parts are within measures (timewise). An example of a MusicXML partwise score is Mozart's Piano Sonata in A Major, K. 331 [2] Here is a sample definition of a note: <note> <pitch> <step>A</step> <octave>3</octave> </pitch> <duration>2</duration> <voice>3</voice> <type>eighth</type> <stem>down</stem> <staff>2</staff> <beam number="1">begin</beam> <notations> <slur type="stop" number="1"/> </notations> </note>
371
Notes Range
The Recordare site has some sample code to demonstrate the use of XQuery to process MusicXML [3]. The first script finds the lowest and highest notes in the score. The script shown on the site is not conformant to the current XQuery standard, but a few minor changes brings it up-to-date. declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer { let $step := $thispitch/step let $alter := if (empty($thispitch/alter)) then 0 else xs:integer($thispitch/alter) let $octave := xs:integer($thispitch/octave) let $pitchstep := if ($step = "C") then 0 else if ($step = "D") then 2 else if ($step = "E") then 4 else if ($step = "F") then 5 else if ($step = "G") then 7 else if ($step = "A") then 9 else if ($step = "B") then 11 else 0 return 12 * ($octave + 1) + $pitchstep + $alter } ; let $doc := doc("/db/Wiki/Music/examples/MozartPianoSonata.xml") let $part := $doc//part[./@id = "P1"] let $highnote := max(for $pitch in $part//pitch return local:MidiNote($pitch)) let $lownote := min(for $pitch in $part//pitch return local:MidiNote($pitch)) let $highpitch := $part//pitch[local:MidiNote(.) = $highnote] let $lowpitch := $part//pitch[local:MidiNote(.) = $lownote] let $highmeas := string($highpitch[1]/../../@number) let $lowmeas := string($lowpitch[1]/../../@number) return <result> <low-note>{$lowpitch[1]} <measure>{$lowmeas}</measure> </low-note> <high-note>{$highpitch[1]} <measure>{$highmeas}</measure> </high-note>
Using Intermediate Documents </result> With output: <result> <low-note> <pitch> <step>D</step> <octave>2</octave> </pitch> <measure>3</measure> </low-note> <high-note> <pitch> <step>E</step> <octave>6</octave> </pitch> <measure>5</measure> </high-note> </result> execute [4]
372
Ancestor access
The path to the measure in which a note is located let $highmeas := string($highpitch[1]/../../@number)
uses a fixed set of steps back up the hierarchy. This limits the application of this script to one type of MusicXML schema because the position of the measure in the hierarchy is different in the two schemas. When the script was written, the ancestor axis was not supported but it is now, so those lines are more generally expressible as: let $highmeas := string($highpitch/ancestor::measure/@number)
Note-to-midi
The function to convert notes to midi numbers uses nested if-then-else expressions. XQuery lacks a switch expression which might be used but a clearer approach would be to use a lookup-table, defined either locally in the script or stored in the database. Here a sequence of notes is created as a look-up table. This is bound to a global variable which is used in a revised note-to-midi function: declare variable ( <note name="C" <note name="D" <note name="E" $NOTESTEP := stepNo="0"/>, stepNo="2"/>, stepNo="4"/>,
Using Intermediate Documents <note <note <note <note ); declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer { let $alter := xs:integer(($thispitch/alter,0)[1]) let $octave := xs:integer($thispitch/octave) let $pitchstepNo := xs:integer($NOTESTEP[@name = $thispitch/step]/@stepNo) return 12 * ($octave + 1) + $pitchstepNo + $alter } ; name="F" name="G" name="A" name="B" stepNo="5"/>, stepNo="7"/>, stepNo="9"/>, stepNo="11"/>
373
Intermediate XML
The original script required repeated access to the original MusicXML source. An alternative approach would be to create an intermediate structure to hold the midi notes and use this in subsequent analysis. This structure is a computed view of the original notes augmented with derived data - the midi note and the measure.
$part//pitch
{$pitch/*} <midi>{local:MidiNote($pitch)}</midi> <measure>{string($pitch/../../@number)}</measure> </pitch> and this view is then used to locate the high and low notes and their position in the score: let $highnote := max($midiNotes/midi) let $lownote := min($midiNotes/midi) let $highpitch := $midiNotes[midi = $highnote] let $lowpitch := $midiNotes[midi = $lownote]
Revised script
declare variable ( <note name="C" step="0"/>, <note name="D" step="2"/>, <note name="E" step="4"/>, <note name="F" step="5"/>, $NOTESTEP :=
374
execute [5]
375
Discussion
Although arguably a cleaner, more direct design, the second script relies on the construction of temporary XML nodes which are then the subject of XPath expressions. These temporary XML nodes are handled differently in different implementations. In older verisons of eXist each is written to a temporary document in the database which creates an performance overhead and problems of garbage collection. In the 1.3 release, intermediate XML nodes remain in memory, resulting in a major performance improvement. There is however another problem with this approach. The size of the intermediate node may exceed pre-set, but configurable, limits on the size of constructed nodes.
References
[1] [2] [3] [4] [5] http:/ / www. recordare. com/ xml. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Music/ examples/ MozartPianoSonata. xml http:/ / www. recordare. com/ good/ max2002%2Dupdate. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Music/ noterange1. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Music/ noterange3. xq
Method
We will create a trigger that will fire on all document store operations. The trigger will modify the document to be stored in the database. Our identifier will be taken from the output of the util:uuid() function. A simple assignment will do due diligence. No special authority, such as an incremental counter, will be necessary.
System Configuration
This example assumes that the documents that you want to tag with identifiers live below /db/my-collection. eXist triggers are declared in a configuration file that is placed in the /db/system/config area with the above path added to it. Such a file with the relevant lines will look like this: /db/system/config/db/my-collection/collection.xconf
<collection xmlns="http://exist-db.org/collection-config/1.0"> <triggers> <trigger event="store" class="org.exist.collections.triggers.XQueryTrigger"> <parameter name="url" value="xmldb:exist://localhost/db/triggers/assign-id.xq"/> </trigger> </triggers> </collection>
Now every time a store or update event happens to this collection the XQuery script /db/triggers/assign-id.xq gets run.
Using Triggers to assign identifiers Beware! You cannot, due to limitations of current (<=1.5dev) design, attach more than one xquery script to the same trigger event. Only the trigger declared last for an event will be used.
376
XQuery Script
The script will add the uuid as an attribute to the root element of the incoming document, overwriting any uuid attribute that is already there. NOTE: These examples do not work reliably: As soon as your xquery causes exceptions in the thread that it runs in, then there is a great chance, that it will hang indefinitely, eg. if you store a binary resource below the path it works on. Further operations on the processed resource then will NOT trigger the script until a restart of the whole database. xquery version "1.0"; (: An XQueryTrigger that adds a uuid to all documents when they are stored in the database. :) declare namespace util="http://exist-db.org/xquery/util"; declare declare declare declare declare variable variable variable variable variable $local:triggerEvent external; $local:eventType external; $local:collectionName external; $local:documentName external; $local:document external;
declare variable $local:coll := "/db/my-collection"; declare variable $local:uuid := string($local:document/@uuid); declare variable $local:match := collection($local:coll)/*[@uuid = $local:uuid]; (: This is still the xquery prolog: from my experiments, an xquery trigger MUST NOT hava an xquery body. A severe limit: no conditionals allowed, just straight procedural action. :) util:log('debug', '### assign-id.xq trigger fired ###'), update insert attribute {'uuid'} {util:uuid()} into doc($local:documentName)/*, util:log('debug', '### assign-id.xq trigger done ###')
377
References
Guide to configuring eXist triggers [1]
References
[1] http:/ / exist-db. org/ triggers. html
Method
We will create a trigger that logs these events. The trigger will append a string to a log file. There are six trigger event types: store: Fired when a document is created in the collection or sub-collection update: Fired when a document is updated in the collection or sub-collection remove: Fired when a document is deleted from the collection or sub-collection create: Fired when a sub-collection is created rename: Fired when a sub-collection is renamed delete: Fired when a sub-collection is deleted
Sample Code
NOTE: These examples do not reliably work! In this example we will be logging all store, update and remove events from the collection /db/my-collection Here is a sample trigger configuration file. This file is placed in the /db/system/config are with the same db path added to it that you want to monitor: /db/system/config/db/my-collection Here is what the trigger file looks like: collection.xconf <collection xmlns="http://exist-db.org/collection-config/1.0"> <triggers> <trigger event="store, update, remove, create, rename, delete" class="org.exist.collections.triggers.XQueryTrigger"> <parameter name="url" value="xmldb:exist://localhost/db/triggers/log-changes.xq"/> <parameter name="test" value="test-value"/> </trigger> </triggers> </collection> Note that the three trigger operations (store, update, remove) are listed in the event attribute and separated by commas. When these operations are fired the XQuery /db/triggers/log-changes.xq gets run. You can pass
Using Triggers to Log Events parameters to this query using the parameter element.
378
XQuery logger
xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace response="http://exist-db.org/xquery/response"; declare namespace session="http://exist-db.org/xquery/session"; declare namespace xdb="http://exist-db.org/xquery/xmldb"; declare namespace util="http://exist-db.org/xquery/util"; declare declare declare declare declare declare declare variable variable variable variable variable variable variable $local:triggerEvent external; $local:eventType external; $local:collectionName external; $local:documentName external; $local:document external; $local:test external; $local:triggersLogFile := "triggersLog2.xml";
(: create the log file if it does not exist :) if(not(doc-available($local:triggersLogFile))) then ( xmldb:store("/db", $local:triggersLogFile, <events/>) ) else(), update insert <event ts="{ current-dateTime() }" event="{$local:triggerEvent}" eventType="{$local:eventType}" test-1="{$local:test}" collectionName="{$local:collectionName}" documentName="{$local:documentName}" > {$local:document} </event> into doc("/db/logs/event-log.xml")/triggers
References
Guide to configuring eXist triggers [1]
379
Method
The XQuery 1.0 specification has many built-in functions for handling strings, URIs and other data. To use a built-in function you need to know what its inputs are and their data types and what form of output it creates.
Example: string-length("Hello") returns 5, since the string "Hello" is five characters long.
concat($input as xs:anyAtomicType?) as xs:string - concatenates a list of strings together.
The function does not accept a sequence of values, just individual atomic values passed as separate arguments. Example: concat('big', 'red', 'ball') returns "bigredball"
string-join($sequence as xs:string*, $delimiter as xs:string) as xs:string - combines the items in a sequence, separating them with a delimiter
Example: string-join(('big', 'red', 'ball'), '-') returns "big-red-ball" Note that string-join takes as its first argument a single sequence of items, whereas concat takes zero or more strings as arguments.
380
Useful References
For the authoritative documentation of all the functions available in eXist, including those not defined in XQuery, see XQuery Function Documentation [1] For detailed information about how to specify sequence types see XQuery Sequence Types [1] For the standard XQuery and XPath function library, you may also refer to XQuery 1.0 and XPath 2.0 Functions and Operators [2] See also the detailed documentation of functions with examples from noted XQuery expert Pricilla Walmsley in the FunctX XQuery Function Library [3]
References
[1] http:/ / www. w3. org/ TR/ xquery/ #id-sequencetype-syntax [2] http:/ / www. w3. org/ TR/ xpath-functions [3] http:/ / www. xqueryfunctions. com/ xq
UWE StudentsOnline
381
UWE StudentsOnline
This site has ben developed to support staff, students and prospective students in the Faculty of Computing, Engineering and Mathematical Sciences (CEMS) at the University of the West of England, Bristol, UK The public face of this site is Students Online [1] with an intranet called FOLD. This site is implemented in XQuery with some XSLT on eXist-db. (more)
References
[1] http:/ / www. cems. uwe. ac. uk/ studentsonline
Validating a document
Motivation
You want to validate a document using an XML Schema.
Method
Note: Validation is a very complex topic. eXist come with default setting that may prevent files from being added that are associated with a namespace once a schema is saved in the registry. Please be aware of these factors that are documented here [19]. eXist support a validation module that includes a validate() function to validate an XML file against a grammer file such as an XML Schema.
validation:validate($input-doc as item(), $schema-uri as xs:anyURI) as xs:boolean
where: $input-doc is the document you want to validate $schema-uri is a URI to the XML Schema you want to use to validate the document. Note that this must be of type xs:anyURI. This function return a single true/fales value which is true if the document is valid according to the XML Schema.
Sample Code
xquery version "1.0"; let $doc := <root> <element>test</element> </root> let $schema := '/db/test/validate/schema.xsd' (: you must run this every time the XML Schema file changes! :) let $clear := validation:clear-grammar-cache() let $result := if (validation:validate($doc, $schema))
Validating a document then "PASS" else "FAIL" return <results> {$result} </results>
382
Validating a document let $result := if ( validation:validate($input-doc, $schema-uri) ) then "The XML File is Valid" else ( "The XML File is Not Valid", validation:validate-report($input-doc, $schema-uri) )
383
References
Documentation on validation in eXist [19]
Method
An XML Catalog file contains a list of URIs and the files use to validate them. For example the following is a catalog file that describes how DocBook files should be validated: <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <public publicId="-//OASIS//DTD XML DocBook V4.1.2//EN" uri="/db/grammar/docbook.dtd"/> <uri name="http://www.oasis-open.org/committees/docbook/" uri="/db/grammar/docbook.dtd /"/> </catalog>
References
http://atomic.exist-db.org/articles/Validating%20XML%20in%20eXist.pdf
384
Method
We will use an XQuery function that uses the dispatch pattern and the typeswitch function.
Sample Input
<aaa a1="A1" a2="A2" a3="A3"> <bbb b1="B1" b2="B2" b3="B3">BBB</bbb> <ccc c1="C1" c2="C2" c3="C3"> <ddd d1="D1" d2="D2" d3="D3">DDD</ddd> <eee> <fff>FFF</fff> </eee> </ccc> </aaa>
Web XML Viewer return xml-to-html:dispatch($c/node(), $depth + 1) } <span class="t"></{name($node)}></span> </div> (: otherwise pass it through. default return $node }; Used for comments, and PIs :)
385
Sample Driver
xquery version "1.0"; import module namespace xml-to-html="http://example.com/xml-to-html" at "xml-to-html.xqm"; let $title := 'View XML as HTML' let $input := <aaa a1="A1" a2="A2" a3="A3"> <bbb b1="B1" b2="B2" b3="B3">BBB</bbb> <ccc c1="C1" c2="C2" c3="C3"> <ddd d1="D1" d2="D2" d3="D3">DDD</ddd> <eee> <fff>FFF</fff> </eee> </ccc> </aaa> let $output := xml-to-html:xml-to-html($input, 1) return <html> <head> <title>{$title}</title> <link type="text/css" rel="stylesheet" href="syntax-colors-oxygen.css"/> </head> <body> <div class="xml"> {$output} </div> </body> </html>
386
Sample Output
<div class="xml"> <div class="element" style="margin-left: 5px"> <span class="t"><aaa</span> <span class="an">a1=</span> <span class="av">"A1"</span> <span class="an">a2=</span> <span class="av">"A2"</span> <span class="an">a3=</span> <span class="av">"A3"</span>><div class="element" style="margin-left: 10px"> <span class="t"><bbb</span> <span class="an">b1=</span> <span class="av">"B1"</span> <span class="an">b2=</span> <span class="av">"B2"</span> <span class="an">b3=</span> <span class="av">"B3"</span>>BBB<span class="t"></bbb></span> </div> <div class="element" style="margin-left: 10px"> <span class="t"><ccc</span> <span class="an">c1=</span> <span class="av">"C1"</span> <span class="an">c2=</span> <span class="av">"C2"</span> <span class="an">c3=</span> <span class="av">"C3"</span>><div class="element" style="margin-left: 15px"> <span class="t"><ddd</span> <span class="an">d1=</span> <span class="av">"D1"</span> <span class="an">d2=</span> <span class="av">"D2"</span> <span class="an">d3=</span> <span class="av">"D3"</span>>DDD<span class="t"></ddd></span> </div> <div class="element" style="margin-left: 15px"> <span class="t"><eee</span>><div class="element" style="margin-left: 20px"> <span class="t"><fff</span>>FFF<span class="t"></fff></span> </div> <span class="t"></eee></span> </div> <span class="t"></ccc></span> </div> <span class="t"></aaa></span> </div> </div>
387
Screen Image
388
Approach
The script is similar to the index script at the beginning, to get the list of pages in the book. Then it fetches each page and extracts the anchor tags whose href links to the UWE eXist site. The WikiBook page is linked from the page title and the actual URL is listed.
declare namespace h ="http://www.w3.org/1999/xhtml"; declare option exist:serialize "method=xhtml media-type=text/html";
let $book:= request:get-parameter("book","XQuery") let $base := "http://en.wikibooks.org" let $indexPage :=doc(concat($base,"/wiki/Category:",$book,"?x")) let $pages := $indexPage//h:div[@id="mw-pages"]//h:li return
<html> <head> <title>Index of {$book} code samples</title> </head> <body> <h1>Index of {$book} code samples</h1> <ul> { for $letter in distinct-values($pages/upper-case(substring(substring-after(.,'/'),1,1)))[string-length(.) = 1] for $page in $pages[starts-with(upper-case(substring-after(.,'/')),$letter)] let $title := string($page) let $url := concat($base,$page/h:a/@href) let $refs := doc($url)//h:a[starts-with(@href,"http://www.cems.uwe.ac.uk/xmlwiki")] order by $title return if (exists($refs)) then <div> <li><a href="{$url}">{$title}</a> <ul> {for $ref in $refs
389
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ wikicode. xq
390
("January","February","March","April","May","June","July","August","September","October","November","December") ; declare function local:wikidate($date as xs:date) as xs:string { concat(year-from-date($date),"_", $months[month-from-date($date)],"_", day-from-date($date) ) }; declare function local:displaydate($date as xs:date) as xs:string { concat(day-from-date($date)," ", $months[month-from-date($date)],", ", year-from-date($date) ) };
let $evtext := util:serialize($element,()) let $evtext := replace($evtext, concat ("href=",$delimiter,"/"), concat("href=",$delimiter,$base,"/") ) return util:parse($evtext) };
let $date := xs:date(request:get-parameter("date",())) let $wikidate := local:wikidate($date) let $url := concat("http://en.wikipedia.org/wiki/Portal:Current_events/",$wikidate) let $wikipage := doc($url) let $desc := $wikipage//h:td[@class="description"] let $nextDay := $date + xs:dayTimeDuration("P1D") let $previousDay := $date - xs:dayTimeDuration("P1D") return <html> <body> <h1>Current events from <a href="{$url}">Wikipedia</a></h1> <h2>Wiki Events for
391
References
[1] http:/ / en. wikipedia. org/ wiki/ Portal:Current_events/ 2007_September_24 [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ wikidate. xq?date=2007-09-24
: @param uri
: @param binary - true if data is base64 encoded : @return :) let $headers := element headers { element header {attribute name {"Pragma" }, attribute value {"no-cache"}}} let $response := httpclient:get(xs:anyURI($uri), true(), $headers) the body of the response as text or null
392
Parsing Function
We will create an XQuery module containing functions to carry out the parsing: module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met"; The csv module needs to be imported:
import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../lib/csv.xqm";
: @param the station number : @return the temperature record as an adhoc XML structure matched closely to the terms used in the original record
:) let $country := substring($station,1,2) (: this is the directory for all temperature records in a country :)
(: construct the URI for the corresponding record :) let $uri := concat("http://www.metoffice.gov.uk/climatechange/science/monitoring/reference/",$country,"/",$station) (:GET and convert to plain text :) let $data := csv:get-data($uri,false())
let $headertext := substring-before($data,"Obs:") (: the first section contains the meta data in the form of name=value statements :) let $headers := tokenize($headertext,$csv:nl)
return
393
let $name := replace(substring-before($header,"=")," ","") let $value := normalize-space(substring-after ($header,"=")) where $name ne "" return element {$name} {
(:create an XML element with the name :) (: these names have values which are a list of temperatures :)
if ($name = ("Normals","Standarddeviations")) then for $temp in tokenize($value,"\s+") return element temp_C {$temp} else if ($name = ("Name","Country")) then replace ($value,"-","") else if ($name = "Long") then else $value }, - xs:decimal($value)
(: the convention for signing longitudes in this data is the reverse of the usual E +, W - convention :)
for $year in $years let $value := tokenize($year,"\s+") where $year ne "" return element monthlyAverages { attribute year {$value[1]}, (: the first value in the row is the year :) (: the remainder are the temperatures for the months Jan to Dec :)
for $i in (2 to 13) let $temp := $value[$i] return element temp_C { if ($temp ne '-99.0') then $temp else () } } } };
will be empty :)
Main Script
The main script uses these functions to convert a given station's record:
(:~ : convert climate : @param :) import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; station file to XML id of station
394
Stornoway [3]
WMO stations
The station ids are based on those defined by the World Meteorological Organisation. There is a full list of all stations available online as a text file [4] with supporting documentation [5]. A typical record is
00;000;PABL;Buckland, Buckland Airport;AK;United States;4;65-58-56N;161-09-07W;;;7;;
The format of these record is 1. Block Number 2 digits representing the WMO-assigned block. 2. Station Number 3 digits representing the WMO-assigned station. 3. ICAO Location Indicator 4 alphanumeric characters, not all stations in this file have an assigned location indicator. The value "----" is used for stations that do not have an assigned location indicator. 4. Place Name Common name of station location. 5. State 2 character abbreviation (included for stations located in the United States only). 6. Country Name Country name is ISO short English form. 7. WMO Region digits 1 through 6 representing the corresponding WMO region, 7 stands for the WMO Antarctic region. 8. Station Latitude DD-MM-SSH where DD is degrees, MM is minutes, SS is seconds and H is N for northern hemisphere or S for southern hemisphere. The seconds value is omitted for those stations where the seconds value is unknown. 9. Station Longitude DDD-MM-SSH where DDD is degrees, MM is minutes, SS is seconds and H is E for eastern hemisphere or W for western hemisphere. The seconds value is omitted for those stations where the seconds value is unknown. 10. Upper Air Latitude DD-MM-SSH where DD is degrees, MM is minutes, SS is seconds and H is N for northern hemisphere or S for southern hemisphere. The seconds value is omitted for those stations where the seconds value is unknown. 11. Upper Air Longitude DDD-MM-SSH where DDD is degrees, MM is minutes, SS is seconds and H is E for eastern hemisphere or W for western hemisphere. The seconds value is omitted for those stations where the seconds value is unknown. 12. Station Elevation (Ha) The station elevation in meters. Value is omitted if unknown. 13. Upper Air Elevation (Hp) The upper air elevation in meters. Value is omitted if unknown. 14. RBSN indicator P if station is defined by the WMO as belonging to the Regional Basic Synoptic Network, omitted otherwise.
395
Conversion to XML
A function is needed to convert from the DD-MM-SSH format of latitudes and longitudes. This is complicated by the variations in this format. These variations all appear in the data: DD-MMH DD-MH DD-MM-SH DD-MM-SSH
Because this format occurs in other data, it has been added to a general module of geographic functions.
declare function geo:lz ($n as xs:string?) as xs:integer { xs:integer(concat (string-pad("0",2 - string-length($n)),$n)) };
declare function geo:dms-to-decimal($s as xs:string) as xs:decimal { (:~ : @param $s : - input string in the format of DD-MMH, DD-MH, DD-MM-SH,* DD-MM-SSH
where H is NSE or W
: @return decimal degrees :) let $hemi := substring($s,string-length($s),1) let $rest := substring($s,1, string-length($s)-1)
let $f := tokenize($rest,"-") let $deg := geo:lz($f[1]) let $min:= geo:lz($f[2]) let $sec := geo:lz($f[3]) let $dec :=$deg + ($min + $sec div 60) div 60
let $dec := round-half-to-even($dec,6) return if ($hemi = ("S","W")) then - $dec else $dec };
let $f := tokenize(normalize-space($station),";") let $cid := concat($f[1],$f[2],"0") (: this constructs the equivalent id used in the temperature records :) return element station{
396
element placeName {$f[4]}, if ($f[5] ne "") then element state {$f[5]} else (),
element country {$f[6]}, element WMORegion {$f[7]}, element latitude {geo:dms-to-dec($f[8])}, element longitude {geo:dms-to-dec($f[9])}, if ($f[12] ne "") if ($f[14] = "P") } }; then element elevation {$f[12]} then element RBSN {} else () else (),
397
Indexing
There are 11000 odd stations in total. These need to be indexed for efficient access. In eXist indexes are defined in a configuration file, one per collection (directory). For the collection in which the station XML document is to be written, the configuration file is: <collection xmlns="http://exist-db.org/collection-config/1.0"> <index> <create qname="id" type="xs:string"/> <create qname="country" type="xs:string"/> </index> </collection> This means that all XML documents in the collection will be indexed on the qnames id and country wherever these appear in the XML structure. Indexing will be performed when a document is added to the collection or an exitsing document is updated. A re-index can be forced if required. If the station data is stored in the collection /db/Wiki/Climate/Stations, this configuration file will be stored in /db/system/config/db/Wiki/Climate/Stations as configuration.xconf
However there is no location data here, so we will get that from the WMO station list: The approach taken to converting this to XML was: 1. 2. 3. 4. 5. 6. 7. View source on the HTML page Locate the station list Copy the text Save as a text file in the eXIst data base A script reads this file and parses it to XML The resultant XML is augmented with latitude and longitude from the WMO station data. The final XML document is stored in the database in the same Station directory
(:~ : convert list to XML :) the text representation of MET stations from the WMO
World Temperature records <stationList> { (: get the raw data from a text file stored as base64 in the eXist dataabse :) let $text := util:binary-to-string(util:binary-doc("/db/Wiki/Climate/cstations.txt")) (: ; separates the stations in each country :) for $country in tokenize($text,";") (: the station list is the array element content i.e. the string between =[ and ] :) let $stationlist := substring-before(substring-after($country,"=["),"]") (: The stations in each country are comma-separated, but commas are also used within the names of countries and stations. However a comma followed by a double quote is the required separator. :) let $stations := tokenize($stationlist,',"') for $station in $stations (: some cleanup of names is needed :) let $data :=replace ( replace($station,'"',"")," ","") (: :) let let let let Each station is in the format of Stationid | English name / French name $f := tokenize($data,"\|") $id := $f[1] $country := tokenize($f[2],"/") $WMOStation := $met:WMOStations[id=$id]
398
(: create a station element containing the id , country and english station name :) return element station { element id {$f[1]}, element country {normalize-space($country[1])}, element location {$f[3]}, $WMOStation/latitude, $WMOStation/longitude } } </stationList> Storing this file in the same Stations collection means that it will be indexed on the same element names, id and country,as the full WMO station data.
399
<xsl:stylesheet xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
exclude-result-prefixes="msxsl">
<!--
-->
<xsl:output method="html"/>
<xsl:template match="Station">
<html>
<head>
<title>
<xsl:value-of select="station/placeName"/>
<xsl:text> </xsl:text>
<xsl:value-of select="station/country"/>
</title>
</head>
<body>
<xsl:apply-templates select="station"/>
</body>
</html>
</xsl:template>
<!--
<p/>
deviation etc.)</p>
<p/>
<script type="text/javascript">
400
{'packages':['annotatedtimeline']});
google.setOnLoadCallback(drawChart);
function drawChart() {
data.addColumn('date', 'Date');
data.addColumn('number', 'temp');
data.addRows([
[null,null]
]);
google.visualization.AnnotatedTimeLine(document.getElementById('chart_div'));
</script>
</xsl:template>
<xsl:if test="(node())">
<xsl:text>[new Date(</xsl:text>
<xsl:value-of select="../@year"/>
<xsl:text>,</xsl:text>
<xsl:text>,15),</xsl:text>
<xsl:value-of select="."/>
<xsl:text>],
</xsl:text>
</xsl:if>
</xsl:template>
<!--
-->
<p/>
<p/>
<script type="text/javascript">
google.load('visualization', '1',
{'packages':['annotatedtimeline']});
google.setOnLoadCallback(drawChartSmoothed);
function drawChartSmoothed()
401
data.addColumn('date', 'Date');
data.addColumn('number', 'temp');
data.addRows([
[null,null]
]);
google.visualization.AnnotatedTimeLine(document.getElementById('smoothed_chart_div'));
</script>
</xsl:template>
<xsl:if test="count(temp_C[node()])=12">
<xsl:text>[new Date(</xsl:text>
<xsl:value-of select="@year"/>
<xsl:text>,5,15),</xsl:text>
<xsl:text>],
</xsl:text>
</xsl:if>
</xsl:template>
<!--
<table border="1">
<tr>
<td>Year</td>
<td>Jan</td>
<td>Feb</td>
<td>Mar</td>
<td>Apr</td>
<td>May</td>
<td>Jun</td>
<td>Jul</td>
<td>Aug</td>
<td>Sep</td>
402
<td>Nov</td>
<td>Dec</td>
<tr/>
</tr>
<xsl:apply-templates
select="monthlyAverages[@year][@year >=
mode="table"/>
</table>
</xsl:template>
<tr>
<td>
<xsl:value-of select="@year"/>
</td>
</tr>
</xsl:template>
<td>
<xsl:value-of select="."/>
</td>
</xsl:template>
<xsl:template match="Number">
</p>
</xsl:template>
<xsl:template match="station">
<h1>
<xsl:value-of select="placeName"/>
<xsl:text>, </xsl:text>
<xsl:value-of select="country"/>
<xsl:text> </xsl:text>
</h1>
<a href="http://maps.google.com/maps?q={latitude},{longitude}">
<img
src="http://maps.google.com/maps/api/staticmap?zoom=11&maptype=hybrid&size=400x300&sensor=false&key=ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ&markers=color:blue|{latitude},{longitude}"
alt="{placeName}"/>
</a>
</xsl:template>
<xsl:copy>
</xsl:copy>
</xsl:template>
403
Multiple formats
We would like to present either the original XML or the HTML visualisation page. We could use two scripts, or combine them into one script with a parameter to indicate how the output is to be rendered. eXist functions allow the serialization of the output and the mime-type to be set dynamically.
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
let $id := request:get-parameter("station",()) let $render := request:get-parameter("render",()) let $station := doc ("/db/Wiki/Climate/Stations/metstations.xml")//station[id = $id] $id]
let $tempStation := doc("/db/Wiki/Climate/Stations/tempstations.xml")//station[id = let $temp := if ($tempStaion) then met:station-to-xml($id) else () let $station := <Station> {$station} {$temp} </Station>
return if ($render="HTML") then let $ss := doc("/db/Wiki/Climate/FullHTMLMet-V2.xsl") let $options := util:declare-option("exist:serialize","method=xhtml media-type=text/html") let $start-year := request:get-parameter("start","1000") let $end-year := request:get-parameter("end","2100") let $params := <parameters> <param name="start-year" value="{$start-year}"/> <param name="end-year" value="{$end-year}"/> </parameters> return transform:transform($station,$ss,$params) else let $header := response:set-header("Access-Control-Allow-Origin","*") return $station
404
<html> <head> <title>Index of </head> <body> <h1>Index of Temperature Record Stations </h1> { for $country in distinct-values($met:tempStations/country) Temperature Record Stations </title>
order by $country return <div> <h3>{$country} </h3> {for $station in $met:tempStations[country=$country] let $id := $station/id order by $station/location return <span><a href="station.xq?station={$id}&render=HTML">{string($station/location)}</a> </span> } </div> } </body> </html>
Station Map
We can also generate a (large) KML overlay, with links to each station's page. We need a function transform a station into a PlaceMark with a link to the HTML station page:
declare function met:station-to-placemark ($station) { let $description := <div> <a href="http://www.cems.uwe.ac.uk/xmlwiki/Climate/station.xq?station={$station/id}&render=HTML">Temperature Record</a> </div> return <Placemark> <name>{string($station/location)}, {string($station/country)}</name> <description>{util:serialize($description,"method=xhtml")} </description> <Point> <coordinates>{string($station/longitude)},{string($station/latitude)},0</coordinates>
405
Then the main script iterates over all the temperature stations to generate the full KML file.
import module namespace met ="http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
indent=yes
omit-xml-declaration=yes";
return <kml xmlns="http://www.opengis.net/kml/2.2"> <Folder> <name>Stations</name> { for $station in $met:tempStations return met:station-to-placemark($station) } </Folder> </kml>
Work in progress
Resource URIs RDF
References
[1] http:/ / www. metoffice. gov. uk/ climatechange/ science/ monitoring/ subsets. html [2] http:/ / www. metoffice. gov. uk/ climatechange/ science/ monitoring/ reference/ 03/ 030260 [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ temp2xml. xq?station=030260 [4] http:/ / weather. noaa. gov/ data/ nsd_bbsss. txt [5] http:/ / weather. noaa. gov/ tg/ site. shtml [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ Stations/ tempstations. xml [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ station. xq?station=030260& render=HTML [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ station. xq?station=030260 [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ tempStations. xq [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ stationskml. xq [11] http:/ / maps. google. com/ maps?q=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Climate/ stationskml. xq
XHTML + Voice
406
XHTML + Voice
Motivation
You want to have your browser read Twitter updates using a text-to-speech conversion extension that is built into your browser.
Method
XHTML + Voice is supported by the Opera Browser with the Voice extension installed. In this simple application it is used as a browser-based Text-to-Speech engine.
Twitter Radio
This script creates a simple text-to-speech version of Twitter Search. Obama [1]
Limitations
Window has to be active for the T2S to play on refresh The cleaned text to speak is held in a div which is rendered as white on white text. Initially it was output as a block in the header but it did not seen possible to apply styles. Applying a style display:none hid the text from the T2S engine as well! Transforming the atom content to a a string suitable to speak needs more work. Retweets and similar tweets could be removed using levenstein Male and female voices are assigned randomly by tweet. I'd like to cache the voice assigned to a tweeter so that tweets are consistently spoken in the same voice The initial load doesn't seem to trigger playing, hence the play button, but this also re-fetches the page. This is an ideal situation to use AJAX instead of refresh The T2S engine is quite good at rendering the text but it needs be helped in places, for example by replacing texting abbreviations with their expanded form.
declare namespace atom = "http://www.w3.org/2005/Atom";
declare variable $n := xs:integer( request:get-parameter("n",6)); declare variable $search := request:get-parameter("search",""); declare variable $timestamp := request:get-parameter("timestamp",()); declare variable $seconds := $n * 12; declare variable $noise := ( "<b>", "</b>", "<.+?>", "http://[^ ]+", "#\w+", "RT *@\w+", "@\w+", "[\[\]\\=:;()_?!~\|]", '"',
XHTML + Voice
"\.\.+" );
407
declare function local:clean ($talk as xs:string, $noise as xs:string*) as xs:string { if (empty($noise)) then $talk else local:clean(replace($talk,string($noise[1])," "),subsequence($noise,2)) };
"method=xhtml
else $entries let $entries := $entries[position() <= $n] let $newtimestamp := if (exists($entries)) then
string($entries[1]/atom:published) else $timestamp let $entries := reverse($entries) return <html xmlns="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/vxml" xmlns:xv="http://www.voicexml.org/2002/xhtml+voice" xmlns:ev="http://www.w3.org/2001/xml-events" > <head> <meta http-equiv="refresh" content="{$seconds};url=?search={encode-for-uri($search)}&timestamp={$newtimestamp}&n={$n}"/> <title>Tweets matching {$search}</title> <vxml:form id="say"> <vxml:block> <vxml:prompt src="#news"/> </vxml:block> </vxml:form> <link rel="stylesheet" type="text/css" href="voice2.css" title="Normal"/> </head> <body ev:event="load" ev:handler="#say" >
XHTML + Voice
Listen for <input type="text" name="search" value="{$search}" /> Max items <input type="text" name="n" value="{$n}" size="4" /> <input type="submit" value="Tune"/> <button ev:event="click" ev:handler="#say">Play</button> </form> {for $entry in $entries return <div> <a href="{$entry/atom:author/atom:uri}">{substring-before($entry/atom:author/atom:name, "(")}</a>  
408
{util:parse(concat("<span xmlns='http://www.w3.org/1999/xhtml' >",$entry/atom:content/(text(),*),"</span>"))} </div> } <div id="news"> {for $entry in $entries return <p class='{if (math:random ()< 0.5) then "male" else "female"}'> { local:clean($entry/atom:content/text())}
With the style sheet: .male { voice-family: male; pause-after:1.5s; } .female { voice-family:female; pause-after:1s } #news { color:white; background-color:white }
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Twitter/ twitterRadio. xq?search=Obama& n=6
XML Differences
409
XML Differences
Motivation
You want to find the differences between two XML files and output a "colored diff" file of the differences.
Method
We will create a recursive XQuery function that compares all the nodes of an XML file.
XML Differences </parameters> <diff> <change>...<change> <diff> <diff> <addition>...<addition> <diff> <diff> <deletion>...<deletion> <diff> </xml-diffs>
410
Algorithm
O(ND) Difference Algorithm was originally designed to compare text files using linebreaks as a fundamental unit of comparison. We will need to modify it to recursively compare XML elements and attributes. XML comparison also should not report differences in the order of attributes. To be continued...
References
[1] "S. Chawathe, A. Rajaraman, H. Garcia-Molina and J. Widom" ("June 1996"). "Change Detection in Hierarchically Structured Information". "Proceedings of the ACM SIGMOD International" "Conference on Management of Data, Montreal".
An O(ND) Difference Algorithm and its Variations" by Eugene Myers Algorithmica Vol. 1 No. 2, 1986, p 251 (http://xmailserver.org/diff2.pdf) X-Diff: An Effective Change Detection Algorithm for XML Documents Yuan Wang, David J. DeWitt, Jin-Yi Cai, [[University of Wisconsin (http://www.cs.wisc.edu/niagara/papers/xdiff.pdf)] Madison
411
Method
[Note: See also the article XQuery/XQuery and XML Schema which has the same objective. If the author would like to contact me, I'm trying to get this code working to compare with the code I developed. ChrisWallace (talk) 15:09, 13 May 2009 (UTC)ChrisWallace] Create an xquery function that reads in a URI to an XML Schema file (.xsd), along with a set of display parameters, and generates a sample XML instance. These parameters are: 1. $schemaURI = the location of the .xsd file (e.g. db/cms/schemas/MySchema.xsd) 2. $rootElementName = the root element for the sample XML file you wish to generate (i.e. doesn't have to be the root of the whole schema) 3. $maxOccurances = for elements with a maxOccurs attribute greater than one, how many times should the element be repeated in the sample instance? 4. $optionalElements = Should optional elements (i.e. minOccurs="0") be included? 'true' or 'false' 5. $optionalAttributes = Should optional attributes (i.e. use="optional") be included? 'true' or 'false' 6. $choiceStrategy = Where there is a choice between elements or groups of elements, should the sample include a random selection from the choices, or simply use the first choice? 'random' or 'first' Call the function with the following:
xquery version "1.0"; (: Query which calls the function :)
return content:xsd-to-instance('/db/cms/content_types/schemas/Genericode.xsd','CodeList','1','true','true','random')
Notes
The function currently cannot dynamically set the namespaces in the sample instance. Any assistance in getting this to work would be much appreciated. The function requires that the xsd file use the xs namespace prefix (i.e. xs:element). Attempts to use a wildcard prefix in the xpath statements did not work for some reason (i.e. $xsdFile/*:schema/*:element). An alternative approach is to determine the prefix, assign it to a variable and then concatenate it to all the xpath statements (e.g. $xsdFile/concat($prefix,':schema/',$prefix,'element') but that makes for some pretty ugly code. Another alternative is to use another function to reset whatever the xsd file prefix is to xs. This does work fine but adds a bit more code. Any more efficient alternative suggestions would be welcome.
XML Schema to Instance The function uses two internal queries assigned to variables, $subElementsQuery and $attributesQuery, and then called using util:eval. This enables the recursive collection sub-elements and attributes without having to call an external function. These two queries could just has easily been declared as external functions.
412
module namespace content ="/db/cms/modules/content"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace util="http://exist-db.org/xquery/util";
(: Function :) declare function content:xsd-to-instance($schemaURI,$rootElementName,$maxOccurances,$optionalElements,$optionalAttributes,$choiceStrategy) { (: TO DO: - Handle substitution groups - Dynamically include namespacees - Handle any xsd file prefix (e.g. xs:element or xsd:element) :) (: Get the main xsd file :) let $xsdFile := doc($schemaURI) (: Determine the namespace prefix for the xsd file (e.g. xs, xsd or none) :) let $xsdFileNamespacePrefix := substring-before(name($xsdFile/*[1]),':') (: get the root element based on the root element name given in the function parameters :) let $rootElement := $xsdFile//xs:element[@name = $rootElementName] (: Gather the namespace prefixes and namespaces included in the xsd file :) let $namespaces := let $prefixes := in-scope-prefixes($xsdFile/xs:schema) return <Namespaces> {for $prefix in $prefixes return <Namespace prefix="{$prefix}" URI="{namespace-uri-for-prefix($prefix,$xsdFile/xs:schema)}"/>} </Namespaces> (: Determine the namespace prefix and namespace for the root element :) let $rootElementNamespace := $xsdFile/xs:schema/@targetNamespace let $rootElementNamespacePrefix := $namespaces/Namespace[@URI = $rootElementNamespace]/@prefix (: If the root element is a complex type, locate the complex type (not sure why the [1] predicate is required) :)
413
let $rootElementTypeSchema := util:eval($schemaFromPrefixQuery) let $complexType := if($rootElement/xs:complexType) then $rootElement/xs:complexType else if($namespacePrefix = 'xs' or $namespacePrefix = 'xsd') then () else if($rootElementTypeSchema//xs:complexType[@name = $rootElementType]) then $rootElementTypeSchema//xs:complexType[@name = $rootElementType] else() (: Query to recursively drill down to find the appropriate elements. If the complex type is a choice, include only the first sub-element. If the complex type is a group, include the group sub-elements. If the complex type is an extension, include the base sub-elements :) let $subElementsQuery := string(" for $xsElement in $complexType/* return if(name($xsElement)='xs:all') then let $complexType := $complexType/xs:all return util:eval($subElementsQuery) else if(name($xsElement)='xs:sequence') then let $complexType := $complexType/xs:sequence
414
then data($xsElement/*[$choice]/@name)
else data(substring-after($xsElement/*[$choice]/@ref,':')) let $namespace := namespace-uri-for-prefix($namespacePrefix,$xsdFile/*[1]) let $schemaLocation := if($namespace = $xsdFile/xs:schema/@targetNamespace or $namespace = '')
then $schemaURI
else $xsdFile//xs:import[@namespace = $namespace]/@schemaLocation let $minOccurs := $xsElement/*[$choice]/@minOccurs let $maxOccurs := $xsElement/*[$choice]/@maxOccurs return <SubElement>
<Name>{$subElementName}</Name>
<NamespacePrefix>{$namespacePrefix}</NamespacePrefix> <Namespace>{$namespace}</Namespace>
<SchemaLocation>{$schemaLocation}</SchemaLocation> <MinOccurs>{$minOccurs}</MinOccurs>
415
416
return util:eval($subElementsQuery) return $base union $extension else if(name($xsElement)='xs:element') then let $subElementName := if($xsElement/@name)
then data($xsElement/@name)
else data(substring-after($xsElement/@ref,':')) let $namespace := namespace-uri-for-prefix($namespacePrefix,$xsdFile/*[1]) let $schemaLocation := if($namespace = $xsdFile/xs:schema/@targetNamespace or $namespace = '')
then $schemaURI
return <SubElement>
<Name>{$subElementName}</Name>
<NamespacePrefix>{$namespacePrefix}</NamespacePrefix>
<Namespace>{$namespace}</Namespace>
417
<MinOccurs>{$minOccurs}</MinOccurs>
<MaxOccurs>{$maxOccurs}</MaxOccurs> </SubElement> else() ") (: Employ the sub-elements query to gather the sub-elements :) let $subElements := util:eval($subElementsQuery) (: Query to recursively drill down to find the appropriate attributes :) let $attributesQuery := string(" for $xsElement in $complexType/* return
if(name($xsElement)='xs:attributeGroup') then let $attributeGroupName := substring-after($xsElement/@ref,':') let $namespacePrefix := substring-before($xsElement/@ref,':') let $attributeGroupSchema := util:eval($schemaFromPrefixQuery) let $complexType := $attributeGroupSchema//xs:attributeGroup[@name = $attributeGroupName] return util:eval($attributesQuery) else if(name($xsElement)='xs:complexContent') then let $complexType := $complexType/xs:complexContent return util:eval($attributesQuery) else if(name($xsElement)='xs:extension') then let $extension := let $complexType := $complexType/xs:extension return util:eval($attributesQuery) let $base := let $baseName := substring-after($xsElement/@base,':') let $namespacePrefix := substring-before($xsElement/@base,':') let $baseSchema := util:eval($schemaFromPrefixQuery) let $complexType := $baseSchema//xs:complexType[@name = $baseName] return util:eval($attributesQuery)
418
return
element{if($rootElementNamespacePrefix) then concat($rootElementNamespacePrefix,':',$rootElementName) else $rootElementName } { (: for the time being, namespace attributes must be hard coded :) namespace gc {'http://www.test.com'} (: The following should dynamically insert namespace attributes with prefixes but does not work. It would be great id someone could help figure this out. for $namespace in $namespaces return namespace {$namespace/Namespace/@prefix} {$namespace/Namespace/@URI}, :)
,(: Comma is important, seperates the namespaces section from the attribute section in the element constructor :)
(: Create the element's attributes if any :) for $attribute in $attributes let $attributeName := if($attribute/@name) then data($attribute/@name) else data($attribute/@ref) return (: Make sure there is an attribute before calling the attribute constructor :) if($attributeName) then if($attribute/@use = 'optional') then if($optionalAttributes eq 'true') then attribute{$attributeName} (: Insert default attribute value if any :) {if($attribute/@default) then
419
,(: Comma is important, seperates the attribute section from the element content section in the element constructor :)
(: Insert default element value if any :) if($rootElement/@default) then data($rootElement/@default) else if($rootElement/@fixed) then data($rootElement/@fixed) else
(: Recursively create any sub-elements :) for $subElement in $subElements let $subElementName := $subElement/Name let $namespacePrefix := $subElement/NamespacePrefix let $schemaURI := $subElement/SchemaLocation
(: Set the number of element occurances based on the minOccurances and maxOccurances values if any :) let $occurances := if(xs:integer($subElement/@minOccurs) gt 0 and xs:integer($subElement/@minOccurs) gt xs:integer($maxOccurances)) then xs:integer($subElement/@minOccurs) else if(xs:integer($subElement/@minOccurs) eq 0 and $optionalElements eq 'false') then 0 else if($subElement/@maxOccurs eq 'unbounded') then if($maxOccurances) then xs:integer($maxOccurances) else 2 else if(xs:integer($subElement/@maxOccurs) gt 1) then
420
content:xsd-to-instance($schemaURI,$subElementName,$maxOccurances,$optionalElements,$optionalAttributes,$choiceStrategy) } };
Challenges
There are no functions in SVG to automatically estimate the size of text. We will need a small function to estimate the width of a text string based on the count of letters and the type of letters. Although this is only an estimation it usually good enough for non-publishing viewers. A sample of these utilities is given here: SVG Utilities to estimate text width [1]
Approach
We will use an XQuery typeswitch function to dispatch XML Schema elements to various functions.
Sample Models
The following is a SVG file that can be used to display the models for XML Schema. Sample Models in SVG [2]
References
[1] http:/ / xrx. googlecode. com/ svn-history/ r121/ trunk/ 18-xml-schema-to-svg/ svg-utilities. xqm [2] http:/ / xrx. googlecode. com/ svn-history/ r121/ trunk/ 18-xml-schema-to-svg/ sample-models. svg
421
Method
We will write an XQuery transform that will transform the XML Schema directly to an XForms file. The following will be automatically generated: 1. 2. 3. 4. A sample Instance will be place into the model. All "boolean" data types will have a bind statement to the xs:booleantype. All "date" data types will have a bind statement to the xs:date type. Each element in the XML Schema will have an input field unless it has the words "text, description, or note" in the element name. 5. All enumerated types with use an xs:select1 with a series of items in the enumeration.
XMP data
Motivation
Adobe have introduced an XML format for image metadata called XMP. You want to display a photograph and some of the metadata.
Background
Matt Turner has an example of using MarkLogic to extract XMP data from a JPEG image [1].
XMP data let $xmp := concat("<x:xmpmeta",$xmp,"</x:xmpmeta>") return util:parse($xmp) }; let $photo := request:get-parameter("photo",()) let $xmp := local:extract-xmp(concat("/db/Wiki/eXist/",$photo)) return $xmp XMP XML [3]
422
declare option exist:serialize "method=xhtml media-type=text/html"; let $photo := request:get-parameter("photo",()) let $xmp := local:extract-xmp(concat("/db/Wiki/eXist/",$photo)) return <div> <img src="../{$photo}"/> <ul> <li> Format : {string($xmp//dc:format)}</li> <li>Title: {string($xmp//dc:title)}</li> <li>Creator: {string($xmp//dc:creator)}</li> </ul> </div> Basic Dublin Core elements [4]
References
[1] [2] [3] [4] http:/ / xquery. typepad. com/ xquery/ xquery_tricks/ index. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ maineboat. jpg http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ util/ parse-xmp-xml. xq?photo=maineboat. jpg http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ util/ parse-xmp-basic. xq?photo=maineboat. jpg
423
Method
The eXist system provides a standards module for executing SQL queries.
Configuration Steps
1. Enable the module 2. Configure your connection string 3. Execute a test query
XQuery SQL Module In Oracle the string might be sql:get-connection('oracle.jdbc.OracleDriver', 'jdbc:oracle:thin:[USER/PASSWORD]@//[HOST][:PORT]/SERVICE", 'jdbc-connection-string', 'mysql-user-name', 'mysql-password') let $connection := sql:get-connection("com.mysql.jdbc.Driver", 'jdbc:mysql://localhost/db1', 'mysql-user-name', 'mysql-password') let $q1 := "select * from table1" return sql:execute( $connection, $q1, fn:true() )
424
Adder
Motivation
We would like to create a simple XQuery that takes two arguments and returns the the sum of the two numbers.
Results
<results> <sum>579</sum> </results>
Accumulating Adder
To make this into an interactive application, we can extend the script to create an XHTML document containing a Form. The script computes the new sum from the URL parameters (if any) and returns a minimal XHTML document containing a Form which both reports the sum and prompts for new inputs. Note the embedded XQuery expressions
Adder (in curly braces) which interpolates the computed values into the created XML element. The state of the computation, the value of the accumulator, is retained in a hidden input in the form. xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; let $sum := xs:integer(request:get-parameter("sum",0)) let $number := xs:integer(request:get-parameter("number","0")) let $newSum := $sum + $number return <html> <head><title>Accumulating Adder</title></head> <body> <h1>Accumulating Adder</h1> <form> {$newSum} + <input type="text" name="number" value="{$number}" /> <input type="hidden" name="sum" value="{$newSum}"/> </form> </body> </html> Execute [2]
425
Adder <h1>Accumulating Adder</h1> <form> {$newSum} + <input type="text" name="number" value="{$number}" /> <input type="hidden" name="sum" value="{$newSum}"/> <input type="submit" name="action" value="add"/> <input type="submit" name="action" value="clear"/> </form> </body> </html> Execute [3]
426
:= if (exists(session:get-attribute("sum"))) then session:get-attribute("sum") else 0 let $action := request:get-parameter("action","") let $sum := if ( $action= "clear") then 0 else $sum let $number := if ($action = "clear") then 0 else xs:integer(request:get-parameter("number","0")) let $newSum := $sum + $number let $s := session:set-attribute("sum",$newSum) return <html> <head><title>Accumulating Adder</title></head> <body> <h1>Accumulating Adder</h1> <form> {$newSum} + <input type="text" name="number" value="{$number}" /> <input type="submit" name="action" value="add"/> <input type="submit" name="action" value="clear"/> </form> </body>
427
(: get the parameters from the URL :) (: call this like ($hostname)/adder.xq?arg1=123&arg2=456 :) let $posted-data := request:get-data() let $arg1 := $posted-data//arg1/text() let $arg2 := $posted-data//arg2/text()
References
[1] [2] [3] [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ adder. xq?arg1=123& arg2=456 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ adderForm_1. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ adderForm_2. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ adderForm_3. xq
Ah-has
428
Ah-has
Redundancy in Expressions
let $r := if ($x = 1) then true() else false(); better as let $r := ($x = 1) and for $p in (0 to string-length($arg1)) return $p better as (0 to string-length($arg1)) and for $i in (1 to 5) return for $j in (11 to 15) return ($i, $j) better as for $i in (1 to 5), $j in (11 to 15) return ($i, $j) OR for $i in (1 to 5) for $j in (11 to 15) return ($i, $j) all three return (1 11 1 12 1 13 1 14 1 15 2 11 2 12 2 13...)
Ah-has
429
Default values
if (exists($a)) then $a else "Default" better as ($a,"Default") [1] OR for a sequence of items ($list1, "Default"[empty($list1)]) OR another possible for a sequence (<test/>,<test/>,<default/>)[not(position() = last() and not(last() = 1))] any number of cascaded defaults can be handled this way. compare with 'COALESCE' in SQL
Approach
To solve this problem we will use the get-parameter function and only return results if the parameter is present. If it is not present then we will return a useful error message.
Sample Program
xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; let $parameter-1 := request:get-parameter('p1', '') return if (not($parameter-1)) then ( <error> <message>Parameter 1 is missing. Parameter 1 is a required parameter for this XQuery.</message> </error>) else ( <results> <message>Parameter 1={$parameter-1}</message> </results> )
430
Output
If you do not supply the required parameter the following will result:
<error> <message>Parameter 1 is missing. </error> Parameter 1 is a required parameter for this XQuery.</message>
431
Dataflow diagrams
432
Dataflow diagrams
This description of the data flow in the Timetable application (another page scraping application) is loosely based on XPL <?xml version="1.0" encoding="UTF-8"?> <Pipeline id="timetable"> <process id="i1"> <title>Input id</title> </process> <process id="i2"> <title>input week number</title> </process> <process id="i3"> <title>input role</title> </process> <process id="s1"> <title>create url</title> <input>i1</input> <input>i2</input> <input>i3</input> </process> <process id="s2"> <title>get html</title> <input>s1</input> <input>x1</input> </process> <process id="x1"> <type>external</type> <input>s2</input> <title>Syllabus Plus</title> </process> <process id="s3"> <title>convert to xhtml</title> <input>s2</input> </process> <process id="s4"> <title>extract xml</title> <input>s3</input> </process> <process id="s5"> <title>transform to vcal</title> <input>s4</input> </process> <process id="s6"> <title>transform to htm</title> <input>s4</input>
Dataflow diagrams </process> </Pipeline> With a map from types to shapes: <ProcessTypes> <type name="input" shape="invtriangle"/> <type name="process" shape="box"/> <type name="external" shape="house"/> </ProcessTypes> Conversion to dot format for onward conversion to a GIF image
declare option exist:serialize "method=text"; declare variable $nl := " "; declare variable $url := request:get-parameter("url","/db/Wiki/DataFlow/timetablexpl.xml"); declare variable $processTypes := /ProcessTypes; let $pipe := doc($url)
433
return
"digraph {" , for $process in $pipe//process let $type := if (exists($process/type)) then $process/type else if (empty($process/input)) then "input" else "process" let $shape := return ( concat ($process/@id, ' [shape=',$shape,',label="',$process/title, '"];',$nl), for $input in $process/input return concat($input, '->', $process/@id,";",$nl) ), "} ",$nl ) string($processTypes/type[@name=$type]/@shape)
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ DataFlow/ xpl2dot. xq [2] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ DataFlow/ xpl2dot. xq
434
SPARQL to KML
This application uses DBpedia to create a kml file showing the birth places of the members of a selected UK Football team. Data quality is limited by a number of factors: the age of the Wikipedia extract on which DBpedia is based the existence or non-existence of individual pages in Wikipedia for players the consistency of property labeling on Wikipedia infoboxes
SPARQL Query
declare variable $query := " PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?player p:currentclub <http://dbpedia.org/resource/Arsenal_F.C.>. OPTIONAL {?player p:cityofbirth ?city}. OPTIONAL {?player p:dateOfBirth ?dob}. OPTIONAL {?player p:clubnumber ?no}. OPTIONAL {?player p:position ?position}. OPTIONAL {?player p:image ?image}. OPTIONAL { { ?city geo:long ?long. } UNION { ?city p:redirect ?city2. ?city2 geo:long ?long. }. }. OPTIONAL { { ?city geo:lat ?lat.} UNION { ?city p:redirect ?city3. ?city3 geo:lat ?lat. }. }. } "; This query is complicated by the need to handle possible redirection of the city name - (can this be improved - this is a generic problem?). To obtain more complete data, the query should also handle the multiple synonyms used for place and date of birth Changes to dbpedia lead to a short life for queries based om the data-model and vocabulary. As of Jan 2011, the query is being updated. Currently to get locations and birthdates for the current players at Arsenal, the following query seems to work.
DBpedia with SPARQL - Football teams PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX p: <http://dbpedia.org/property/> PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> SELECT * WHERE { <http://dbpedia.org/resource/Arsenal_F.C.> p:name ?player. ?player dbpedia-owl:birthPlace ?city; dbpedia-owl:birthDate ?dob. ?city geo:long ?long; geo:lat ?lat. } However this yields multiple geocoded locations, of which it can be assumed that the first is most specific (but not possible ? to filter in SPARQL).
435
DBpedia Query
The prototype SPARQL query is targeted on Arsenal_F.C. This team name needs to be replaced by the supplied team name, the query then URI-encoded and passed to the DBpedia SPARQL endpoint. let $club := request:get-parameter ("club","Arsenal_F.C.") let $queryx := replace($query,"Arsenal_F.C.",$club) Aside: Initially, the query was written with a generic placeholder ($team) rather than a protypical value (Arsenal_F.C.). The prototype idiom has the benefit of providing an executable SPARQL query without editing, is more expressive and less tricky - the $ in $team needs escaping in the replace expression since the second argument is a regular expression.
DBpedia Result
The result is in SPARQL Query Results XML format. It is more convenient to convert this generic format to tuples with named elements for later processing. declare namespace r = "http://www.w3.org/2005/sparql-results#"; declare function local:sparql-to-tuples($rdfxml ) { for $result in $rdfxml//r:result return <tuple> { for $binding in $result/r:binding
DBpedia with SPARQL - Football teams return if ($binding/r:uri) then element {$binding/@name} { attribute type {"uri"} , string($binding/r:uri) } else element {$binding/@name} { attribute type {$binding/r:literal/@datatype}, string($binding/r:literal) } } </tuple> };
436
Query to Tuples
let $result:= local:execute-sparql($queryx) let $tuples := local:sparql-to-tuples($result)
KML output
Since we are generating kml, we need to set the media type and file name and create a Document node - in the appropriate places in the script:
declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none";
return <Document> <name>Birthplaces of players <Style id="player"> <IconStyle> <Icon><href>http://maps.google.com/mapfiles/kml/pal2/icon49.png</href> </Icon> </IconStyle> </Style> ..... in the {$team} squad</name>
</Document>
437
Document construction
Due to the multiple values for some of the properties, for example cityofbirth is often expressed as an address path, there are multiple tuples for each player. These need grouping and compressing. Here we use the XQuery idiom which uses distinct-values to get a set of player names, and then accesses groups of rows with the name as the key. This scripts takes a simplistic approach of using only the first of multiple tuples which contains a latitude , pending a better resolution of the multiple cityofbirth values. We are only interested in players whose place of birth has been geo-coded, so we filter for tuples with a latitude element: { for $playername in distinct-values($tuples[lat]/player) let $player := $tuples[player=$playername][lat][1]
Data cleanup
The wikiPedia data needs some clean-up before being usable in the kml. A generic clean function decodes the uri-encoded characters, removes some irrelevant text and replaces underscores with spaces. ( this hack needs improving ) declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"http://dbpedia.org/resource/","") let $text := replace($text,"\(.*\)","") let $text := replace($text,"Football__positions#","") let $text := replace($text,"#",",") let $text := replace($text,"_"," ") return $text }; let $name := local:clean($player/player) let $city :=local:clean($player/city) let $position := local:clean($player/position)
Data typing
The date of birth is in the form xs:date, but is optional. If the value is a valid date, it is converted to a more readable form using an eXist function:
let $dob := if ($player/dob castable as xs:date) then datetime:format-date(xs:date($player/dob),"dd MMM, yyyy" ) else ""
The latitude and longitude should be xs:decimal. Since sometimes several players in a team come from the same place, the mapped positions are dithered a little. let $lat :=xs:decimal($player/lat) + (math:random() - 0.5)* 0.01 let $long :=xs:decimal($player/long) + (math:random() - 0.5)* 0.01
438
Placemark Construction
The body of the Placemark description will contain XHTML markup to display an image if there is one and to link to the DBpedia page. The XML needs to be serialised to a string for GoogleMap to render the description in a pop-up:
let $description := <div> {concat ($position, $no, " born ", $dob, " in ", $city)} <div> <a href="{$player/player}">DBpedia</a> <a href="http://images.google.co.uk/images?q={$name}">Google Images</a> </div> {if ($player/image !="") then <div><img src="{$player/image}" else () } </div> order by $name return <Placemark> <name>{$name}</name> <description> {util:serialize($description,"method=xhtml")} </description> <Point> <coordinates>{concat($long, ",",$lat,",0")}</coordinates> </Point> <styleUrl>#player</styleUrl> </Placemark> } height="200"/> </div>
Execute
Map of Arsenal players: generate the kml [3] link to GoogleMaps [4] Note that the q parameter is URI-encoded.
Complete Script
(: generate a sparql query on the dbpedia server This takes a team name and generates a kml file showing the birth place of the players
439
OPTIONAL {?player p:cityofbirth ?city}. OPTIONAL {?player p:birth ?dob}. OPTIONAL {?player p:clubnumber ?no}. OPTIONAL {?player p:position ?position}. OPTIONAL {?player p:image ?image}. OPTIONAL { { ?city geo:long ?long. } UNION { ?city p:redirect ?city2. ?city2 geo:long ?long. }. }. OPTIONAL { { ?city geo:lat ?lat.} UNION { ?city p:redirect ?city3. ?city3 geo:lat ?lat. }. }. } ";
declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=", encode-for-uri($query) ) return }; doc($sparql)
declare function local:sparql-to-tuples($rdfxml ) { for $result in $rdfxml//r:result return <tuple> { for $binding return if ($binding/r:uri) then element {$binding/@name} attribute type { {"uri"} , in $result/r:binding
440
declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"http://dbpedia.org/resource/","") let $text := replace($text,"\(.*\)","") let $text := replace($text,"Football__positions#","") let $text := replace($text,"#",",") let $text := replace($text,"_"," ") return $text };
let $club := request:get-parameter ("club","Arsenal_F.C.") let $queryx := replace($query,"Arsenal_F.C.",$club) let $result:= local:execute-sparql($queryx) let $tuples := local:sparql-to-tuples($result)
return
<Document> <name>Birthplaces of <Style id="player"> <IconStyle> <Icon><href>http://maps.google.com/mapfiles/kml/pal2/icon49.png</href> </Icon> </IconStyle> </Style> {$result} { for $playername in distinct-values($tuples[lat]/player) let $player := $tuples[player=$playername][lat][1] let $name := local:clean($player/player) let $city :=local:clean($player/city) let $position := local:clean($player/position) {local:clean($club)} players</name>
let $dob := if ($player/dob castable as xs:date) then datetime:format-date(xs:date($player/dob),"dd MMM, yyyy" ) else "" let $no := if ($player/no castable as xs:integer) then concat(" [# ", xs:integer($player/no),"] ") else ""
-0.5)* 0.01
441
</div> {if ($player/image !="") then <div><img src="{$player/image}" </div> order by $name return <Placemark> <name>{$name}</name> <description> {util:serialize($description,"method=xhtml")} </description> <Point> <coordinates>{concat($long, ",",$lat,",0")}</coordinates> </Point> <styleUrl>#player</styleUrl> </Placemark> } </Document> height="200"/> </div> else ()}
Club Index
We also need an index page, selecting all Clubs in the major English and Scottish leagues. This script follows the same lines as the more complex script above, except that due to the simpler data, the raw SPARQL result is used without transformation. The index is sorted alphabetically by club name and provides links to the player map and to the base DBpedia data.
XQuery Script
declare option exist:serialize "method=xhtml media-type=text/html"; declare namespace r = "http://www.w3.org/2005/sparql-results#"; declare variable $query := " PREFIX : <http://dbpedia.org/resource/> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?club p:league ?league. { ?club p:league :Premier_League.} UNION {?club p:league :Football_League_One.} UNION {?club p:league :Football_League_Two.} UNION {?club p:league :Scottish_Premier_League.} UNION
442
declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=" ) return doc($sparql) }; declare function local:clean($string as xs:string) as xs:string { let $string := util:unescape-uri($string,"UTF-8") let $string := replace($string,"\(.*\)","") let $string := replace($string,"_"," ") return $string };
<html> <body> <h1>England and Scottish Football Clubs</h1> <table border="1"> { for $tuple in local:execute-sparql($query)//r:result let $club := $tuple/r:binding[@name="club"]/r:uri let $club :=substring-after($club,"/resource/") let $clubx := local:clean($club) let $league := $tuple/r:binding[@name="league"]/r:uri let $league := local:clean(substring-after($league,"/resource/")) let $mapurl := concat("http://maps.google.co.uk/maps?q=",escape-uri(concat("http://www.cems.uwe.ac.uk/xm order by $club return <tr> <td>{$clubx}</td> <td>{$league}</td> <td><a href="{$mapurl}">Player Map</a></td> <td><a href="http://dbpedia.org/resource/{$club}">DBpedia</a></td> </tr> } </table> </body> </html>
443
Club Index
Club Index [5]
References
[1] [2] [3] [4] http:/ / www. dbpedia. org http:/ / dbpedia. org/ sparql http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ club2kml. xq?club=Arsenal_F. C. http:/ / maps. google. co. uk/ maps?q=http%3A%2F%2Fwww. cems. uwe. ac. uk%2Fxmlwiki%2FRDF%2Fclub2kml. xq%3Fclub%3DArsenal_F. C. [5] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xRDF/ clubIndex. xq
let $group:= request:get-parameter("group","Eagles") return <html> <head> <script src="http://simile.mit.edu/timeline/api/timeline-api.js" type="text/javascript"></script> <script <![CDATA[ function onLoad(group,start) { var theme = Timeline.ClassicTheme.create(); theme.event.label.width = 400; // px theme.event.bubble.width = 300; theme.event.bubble.height = 300; type="text/javascript">
444
var bandInfo = [ Timeline.createBandInfo({ eventSource: theme: date: width: intervalUnit: eventSource1, theme, start, "100%", Timeline.DateTime.YEAR,
intervalPixels: 45 }),
} ]]> </script> </head> <body onload="onLoad('{$group}',1980);"> <h1>{$group} Albums</h1> <div id="my-timeline" style="height: 700px; border: 1px solid #aaa"></div> </body> </html>
declare variable $query := " PREFIX p: <http://dbpedia.org/property/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT * WHERE { ?album p:artist <http://dbpedia.org/resource/The_Allman_Brothers_Band>.
445
declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=", encode-for-uri($query) ) return }; doc($sparql)
declare function local:sparql-to-tuples($rdfxml ) { for $result in $rdfxml//r:result return <tuple> { for $binding return if ($binding/r:uri) then element {$binding/@name} attribute type { {"uri"} , in $result/r:binding
declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"http://dbpedia.org/resource/","") let $text := replace($text,"\(.*\)","") let $text := replace($text,"_"," ") return $text };
declare function local:year-from-date($d) { let $d := replace($d,"[^0-9\-]","") let $dp := tokenize($d,"-") let $year := $dp[1] return if ($year castable as xs:integer and string-length($year)=4) then $year
446
let $group := request:get-parameter ("group","The_Allman_Brothers_Band") let $groupx := replace($group," ","_") let $queryx := replace($query,"The_Allman_Brothers_Band",encode-for-uri($group)) let $result := local:execute-sparql($queryx)
let $tuples := local:sparql-to-tuples($result) return <data> {for $album in distinct-values($tuples/album) let $rows := $tuples[album=$album] let $name := local:clean($album) let $year := local:year-from-date(($rows/dateofrelease)[1]) let $cover := ($rows/cover)[1] where exists($year) return <event start="{$year}" title="{$name}"> {util:serialize( <div> {if (starts-with($cover,"http://")) then <img src="{$cover}" height="200" alt=""/> else () } <p><a href="{$album}">DBpedia</a> <a href="{replace($album,"dbpedia.org/resource","en.wikipedia.org/wiki")}">Wikipedia</a></p> </div> , "method=xhtml") } </event> } </data>
Execution
Pink Floyd [2] Leonard Cohen [3]
Group Index
This script queries DBpedia for the resources which belong to a specified category, for example Rock_and_Roll_Hall_of_Fame_inductees. A table of group names in alphabetical order provides links to the Timeline view using the script above, and to an HTML table view of the discography.
declare namespace r = "http://www.w3.org/2005/sparql-results#";
declare option exist:serialize "method=xhtml media-type=text/html"; declare variable $query := " PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?group } skos:subject <http://dbpedia.org/resource/Category:Rock_and_Roll_Hall_of_Fame_inductees>.
447
declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=", escape-uri($query,true()) ) return }; doc($sparql)
declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"\(.*\)","") let $text := replace($text,"_"," ") return $text };
let $category := request:get-parameter("category","Rock_and_Roll_Hall_of_Fame_inductees") let $queryx := replace($query,"Rock_and_Roll_Hall_of_Fame_inductees",$category) let $result return <html> <body> <h1>{local:clean($category)}</h1> <table border="1"> {$result} { for $group in $result//r:result/r:binding[@name="group"]/r:uri let $name := substring-after($group,"resource/") let $namex := local:clean($name) order by $name return <tr> <td>{$namex}</td> <td><a href="group2html.xq?group={$name}">HTML</a></td> <td><a href="groupTimeline.xq?group={$name}">Timeline</a></td> </tr> } </table> </body> </html> := local:execute-sparql($queryx)
448
References
[1] [2] [3] [4] http:/ / simile. mit. edu/ timeline/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ groupTimeline. xq?group=Pink_Floyd http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ groupTimeline. xq?group=Leonard_Cohen http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ groupIndex. xq
Data File
Assume you have a data file such as the following XML file which is a sample glossary of terms and definitions:
terms.xml
<terms> <term> <term-name>Object</term-name> <definition>A set of ideas, abstractions, or things in the real world that are identified with explicit boundaries and meaning and whose properties and behavior follow the same rules</definition> </term> <term> <term-name>Organization</term-name> <definition>A unit consisting of people and processes established to perform some functions</definition> </term> </terms> <term> <term-name>Organization</term-name> <definition>BankOfAmerica</definition> </term> </terms> The <term> tags will repeat for each term in your glossary. You would like to display these terms in an HTML table.
449
Screen Image
Sample Code
xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/html"; let $my-doc := doc('file://c:/xml/terms.xml') return <html> <head> <title>Terms</title> </head> <body> <table border="1"> <thead> <tr> <th>Term</th> <th>Definition</th> </tr> </thead> <tbody>{ for $term at $count in for $item in $my-doc/terms/term let $term-name := $item/term-name/text() order by upper-case($term-name) return $item return <tr> {if ($count mod 2) then (attribute bgcolor {'Lavender'}) else ()} <td>{$term/term-name/text()}</td> <td>{$term/definition/text()}</td> </tr> }</tbody> </table> </body> </html> Execute [1]
450
Discussion
Sorting before counting
There are two nested for loops. The outer loop has the additional at count parameter that increments a counter for each result returned. The inner loop has the loop that returns a generic sorted item to the outer loop. Note that the inner loop does the sorting first and the outer loop does the counting of each item so that alternate rows are shaded. Note that if you know the original file is in the correct order the nested for loops are not necessary. A single for loop with the at $count is all that is needed.
class class
{'even'}) {'odd'})}
The CSS file would then contain the following: .odd {background-color: Lavender;}
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ eg/ stripes. xq
Displaying Lists
451
Displaying Lists
Motivation
You have a list of items in an XML structure and you want to display a comma separated list of the values in an output string.
Method
XQuery provides the string-join() function that will take a sequence of items and a separator string and create and output string with the separator between each of the items. The format of the function is: string-join(nodeset, separator) where nodeset is a list of nodes and separator the string that you would like to separate the values with.
Sample Program
xquery version "1.0"; let $tags := <tags> <tag>x</tag> <tag>y</tag> <tag>z</tag> <tag>d</tag> </tags> return <results> <comma-separated-values>{ string-join($tags/tag, ',') }</comma-separated-values> </results>
Output
<results> <comma-separated-values>a,x,c,z</comma-separated-values> </results> execute [1]
Discussion
The string-join function takes two arguments, the first is the sequence of strings to be joined and the second is the separator.
Displaying Lists
452
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ string-join. xq
Employee Search
This example shows JavaScript and XQuery combining to provide a directly updated Web page. AJAX is used in a form sometimes referred to as AHAH in which the server-side XQuery script returns an XHTML node (in this case a table containing the information about an employee) which is updated into the DOM using innerHTML. The behavior of this application is explained in this interactive sequence diagram. [1]
View [2]
Employee Search http.open("GET", "getemp.xq?empNo=" + empNo, true); http.onreadystatechange = updateEmp; isWorking = true; http.send(null); } } function getHTTPObject() { var xmlhttp; /*@cc_on @if (@_jscript_version >= 5) try { xmlhttp = new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { try { xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } catch (E) { xmlhttp = false; } } @else xmlhttp = false; @end @*/ if (!xmlhttp && typeof XMLHttpRequest != 'undefined') { try { xmlhttp = new XMLHttpRequest(); xmlhttp.overrideMimeType("text/xml"); } catch (e) { xmlhttp = false; } } return xmlhttp; } var http = getHTTPObject(); // var isWorking = false; create the HTTP Object
453
Employee Search </tr> } </table> }; let $empNo := request:get-parameter("empNo",()) let $emp := //Emp[EmpNo=$empNo] return if (exists($emp)) then local:element-to-table($emp) else <p>Employee Number {$empNo} not found.</p> Get the XHTML fragment [3]
454
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ showSequence. xq?uri=/ db/ Wiki/ SequenceDiagram/ sequences/ empajaxsd. xml [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ajax/ ajaxemp. html [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ajax/ getemp. xq?empNo=7521
Example Sequencer
The code examples used in the XQuery /SQL comparison are coded in an XML file. The WikiBook page redundantly has the code pasted into the page but an alternative is to provide an application to generate the whole page together with the executed examples from the XML script. Here is a sample of the XML script:
<Query id="30"> <Task>List the name of each employee together with the name of their manager.</Task> <MySQL>select e.ename, m.ename from emp e, emp m where e.mgr = m.empno ;</MySQL> <XQuery><![CDATA[for $emp in //Emp let $manager := //Emp[EmpNo = $emp/MgrNo] return <Emp> {$emp/Ename} <Manager>{string($manager/Ename)}</Manager> </Emp> ]]></XQuery> <Comment>The SQL Join has missed Employee King who has no manager,</Comment> </Query>
To allow the queries to be executed in a selected order, a lesson defines a sequence of queries: <Lesson id="t1"> <Name>Test Lesson 1</Name>
Example Sequencer <Step <Step <Step <Step <Step </Lesson> queryid="32"/> queryid="33"/> queryid="31"/> queryid="21a"/> queryid="20"/>
455
The user can step through the examples in the lesson : Test Lesson [1]
Implementation
Two scripts form the core of this application, one to list the queries in a lesson, the other to execute the query code, both SQL and XQuery and show the results. ....
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ showLesson. xq?lessonid=t1
456
References
[1] http:/ / msdn. microsoft. com/ library/ default. asp?url=/ library/ en-us/ odc_xl2003_ta/ html/ OfficeExcelXMLToolAddin. asp [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=1
457
import a module
import module namespace geo="http:/ / www. cems. uwe. ac. uk/ exist/ coord" at "coord.xqm";
declare a namespace
declare namespace tx="http://www.transxchange.org.uk/";
458
module header
module namespace c="http:/ / www. cems. uwe. ac. uk/ exist/ coord";
function declaration
in the default namespace in an XQuery script declare function local:times($tt, $dt) { if (exists($dt)) then local:times(($tt, $tt[last()]+ $dt[1]), remove($dt,1)) else $tt
eXist Crib sheet }; in the namespace of a module module namespace time = 'http:/ / www. cems. uwe. ac. uk/ fold/ time'; declare function time:times($tt, $dt) { if (exists($dt)) then time:times(($tt, $tt[last()]+ $dt[1]), remove($dt,1)) else $tt };
459
declare variable
declare variable $pi as xs:double := 3.14159265; in a module with namespace prefix fxx: declare variable $fxx:file := doc('/db/me/file.xml');
Add the following: let $my-color := request:get-parameter('color', 'red') let $my-shape := request:get-parameter('shape', '') If no color parameter is supplied a default color of "red" will be used.
Filtering Nodes
460
Filtering Nodes
Motivation
You want to create filters that remove or replace specific nodes in an XML stream. This stream may be in-memory XML documents and may not be on-disk.
Method
To process all nodes in a tree we will start with recursive function called the identity transform. This function copies the source tree into the output tree without change. We begin with this process and then add some exception processing for each filter. (: return a deep copy of the element and all sub elements :) declare function local:copy($element as element()) as element() { element {node-name($element)} {$element/@*, for $child in $element/node() return if ($child instance of element()) then local:copy($child) else $child } };
This function uses an XQuery construct called computed element constructor to construct an element. The format of the element constructor is the following: element {ELEMENT-NAME} {ELEMENT-VALUE} In the above case ELEMENT-VALUE is another query that finds all the child elements of the current node. The for loop selects all nodes of the current element and does the following pseudo-code: if the child is another element ''(this uses the "instance of" instruction)'' then copy the child ''(recursively)'' else return the child ''(we have a leaf element of the tree)'' If you understand this basic structure of this algorithm you can now modify it to filter out only the elements you want. You just start with this template and modify various sections. Note that you can also achieve this function by using the typeswitch operator: declare function local:copy($n as node()) as node() { typeswitch($n) case $e as element() return element {name($e)} {$e/@*, for $c in $e/(* | text()) return local:copy($c) } default return $n
Filtering Nodes };
461
This function can also be arrived at by using the typeswitch operator: declare function local:copy($n as node()) as node() { typeswitch($n) case $e as element() return element {name($e)} {for $c in $e/(* | text()) return local:copy($c) } default return $n };
The function can be parameterized by adding a second function argument to indicate what attributes should be removed.
Filtering Nodes if (name($att)=$old-attribute) then attribute {$new-attribute} {$att} else attribute {name($att)} {$att} else $node/@* , for $child in $node/node() return if ($child instance of element()) then local:change-attribute-name-for-element($child, $element, $old-attribute, $new-attribute) else $child } };
462
Filtering Nodes };
463
This adds the node() qualifier and the name of the node in the predicate: /node()[not(name(.)=$element-name)] To use this function just pass the input XML as the first parameter and a sequence of element names as strings as the second parameter. For example: let $input := doc('my-input.xml') let $remove-list := ('xxx', 'yyy', 'zzz') local:remove-elements($input, $remove-list)
Filtering Nodes
464
Run [1]
Below two functions will remove any namespace from a node, nnsc stands for no-namespace-copy. The first one performs much faster: From my limited understanding it jumps attributes quicker. The other one still here, something
Filtering Nodes tricky might be hidden there. (: return a deep copy of the element withouth namespaces declare function local:nnsc1($element as element()) as element() { element { local-name($element) } { $element/@*, for $child in $element/node() return if ($child instance of element()) then local:nnsc1($child) else $child } };
465
(: return a deep copy of the element withouth namespaces declare function local:nnsc2($element as element()) as element() { element { QName((), local-name($element)) } { for $child in $element/(@*,*) return if ($child instance of element()) then local:nnsc2($child) else $child } };
Conversely, if you want to add a namespace to an element, a starting point in this blog post: http:/ / fgeorges. blogspot.com/2006/08/add-namespace-node-to-element-in.html
Filtering Nodes
466
References
W3C page on computed element constructors [3]
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ex/ copy. xq [2] http:/ / www. w3. org/ TR/ xquery/ #id-copy-namespaces-decl [3] http:/ / www. w3. org/ TR/ xquery/ #id-computedConstructors
Filtering Words
467
Filtering Words
Motivation
Sometimes you have a text body and you want to filter out words that are on a given list, often called a stoplist.
Screen Image
Screen Image
Sample Program
xquery version "1.0";
declare namespace exist = "http://exist.sourceforge.net/NS/exist"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes omit-xml-declaration=yes";
Filtering Words
<word>the</word> <word>or</word> <word>over</word> </words>
468
let $input-text := 'a quick brown fox jumps over the lazy dog' return <html> <head> <title>Test of is a word on a list</title> </head> <body> <h1> Test of is a word on a list</h1>
<h2>WordList</h2> <table border="1"> <thead> <tr> <th>StopWord</th> </tr> </thead> <tbody>{ for $word in $stopwords/word return <tr> <td align="center">{$word}</td> </tr> }</tbody> </table>
<h2>Sample Input Text</h2> <p>Input Text: <div style="border:1px solid black">{$input-text}</div></p> <table border="1"> <thead> <tr> <th>Word</th> <th>On List</th> </tr> </thead> <tbody>{ for $word in tokenize($input-text, '\s+') return <tr> <td>{$word}</td> <td>{ if ($stopwords/word = $word) then(<font color="green">true</font>)
Filtering Words
else(<font color="red">false</font>) }</td> </tr> }</tbody> </table> </body> </html>
469
Execute [1]
Discussion
The input string is split into words using the tokenize function which accepts two parameters, the string to be parsed and a separator expressed as a regular expression. Here words are separated by one or more spaces. The result is a sequence of words. This program uses XPath generalized equality to compare the sequence $stopwords/word with the sequence (of one item) $word. This is true if the two sequences have items in common, that is if the stoplist contains the word.
Alternative coding
You can also use a quantified expression to perform a stopword lookup using the some...satisfies - see XQuery/Quantified Expressions expression such as: some $word in $stopwords satisfies ($word = $thisword)
There are other alternatives; the stop words as a sequence of strings, or a long string and use contains() or a element in the database. There are however significant differences in performance. There is a set of tests which show the differences in a number of alternatives. Unit Tests [2] What these tests reveal is that, on the eXist db platform, both the suggested implementations are far from optimal. Testing against a sequence of strings takes about a fifth of the time to compare with elements. Generalised equality is equally superior to the use of a qualified expression.
Recommended Practice
It would appear that the preferable approach is: let $stopwords := ("a","and","in","the","or","over") let $input-string := 'a quick brown fox jumps over the lazy dog' let $input-words := tokenize($input-string, '\s+') return for $word in $input-words return $stopwords = $word
If the stop words are held as an element, it is better to convert to a sequence of atoms first: let $stopwords := <words>
Filtering Words <word>a</word> <word>and</word> <word>in</word> <word>the</word> <word>or</word> <word>over</word> </words> let $stopwordsx := $stopwords/word/string(.) let $input-string := 'a quick brown fox jumps over the lazy dog' let $input-words := tokenize($input-string, '\s+') return for $word in $input-words return $stopwordsx = $word Note that referencing the stop list in the database slightly improved performance.
470
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ stoplist. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ UnitTest2/ runTests. xql?uri=/ db/ Wiki/ UnitTest2/ Tests/ match. xml
Fizzbuzz
Here's an XQuery solution to the FizzBuzz problem posed in David Patterson's blog.. solution. I took the liberty of splitting the hyphenated range into two attributes. let $config := <fizzbuzz> <range min="1" max="100"/> <test> <mod value="3" test="0">Fizz</mod> <mod value="5" test="0">Buzz</mod> </test> </fizzbuzz> return string-join( for $i in ($config/range/@min to $config/range/@max) let $s := for $mod in $config/test/mod return if ($i mod $mod/@value = $mod/@test) then string($mod) else () return if (exists($s)) then string-join($s,' ')
[1]
[2]
471
References
[1] http:/ / www. oreillynet. com/ xml/ blog/ 2007/ 03/ fizzbuzz_20_adventures_in_beau. html [2] http:/ / dev. aol. com/ blog/ mdavidpeterson/ 2007/ 03/ 14/ fizz-buzz-in-xslt-1. 0 [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ fizzbuzz. xq
Method
To do this you use the request:get-data() XQuery function.
Sample echo-post.xq
xquery version "1.0"; (: echo-post.xq: Return all data from an HTTP post to the caller. :) declare namespace exist = "http://exist.sourceforge.net/NS/exist"; declare namespace xmldb="http://exist-db.org/xquery/xmldb"; declare namespace request="http://exist-db.org/xquery/request"; declare option exist:serialize "method=xml media-type=text/xml indent=yes"; let $post-data := request:get-data() return <post-data> {$post-data} </post-data>
472
Discussion
The program above (called echo-post.xq) is a very useful program for testing your web forms. It just takes the data sent to the XQuery service and returns it wrapped in a <post-data> tag. Sometimes HTTP POST statements put their data in parameters. For example the RichTextEditor CKEdit has multiple text areas that might each contain HTML markup in encoded forms. In this case you can also use the request:get-parameter on HTTP POST data. After your server gets a POST from a CKEditor client the server will use the following:
473
<parameter name="{$parameter}" value="{request:get-parameter($parameter, '')}"/> } </parameters> </results> </results> If you have the following form: <source lang="xml"> <html> <head><title></title></head> <body> <form action="echo-post.xq" method="post"> First name: <input type="text" name="FirstName" value="Mickey" /><br /> Last name: <input type="text" name="LastName" value="Mouse" /><br /> <input type="submit" value="Send HTTP Post to Server" /> </form> </body> </html>
<header name="Accept-Charset" value="ISO-8859-1,utf-8;q=0.7,*;q=0.7"/> <header name="keep-alive" value="115"/> <header name="Connection" value="keep-alive"/> <header name="Referer" value="http://demo.danmccreary.com/rest/db/dma/apps/xforms-examples/unit-tests/html-form-post.html"/> <header name="Content-Type" value="application/x-www-form-urlencoded"/>
474
Format
The format of a calling URL that uses the HTTP Get or POST command is: <hostname>:<port>/<path>/xquery.xq?param1=abc¶m2=xyz Where param1 is the first parameter with a value of abc and param2 is the second parameter with a value of xyz. Note that question mark is used to start the parameters and the ampersand is used to separate parameters. xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; declare namespace xs="http://www.w3.org/2001/XMLSchema"; let $param1:= request:get-parameter("param1",0) let $param2:= request:get-parameter("param2",0) return <results> <message>Got param1: {$param1} and param2: {$param2}</message> </results> Try this out by activating the following link. Change the parameters and see the changes reflected in the output. getparams.xq?param1=abc¶m2=xyz [1]
Getting URL Parameters then xs:integer($myint) else 0 let $mydecimal := request:get-parameter("mydecimal", 0.0) let $mydecimal := if ($mydecimal castable as xs:decimal) then xs:decimal($mydecimal) else 0.0 return <results> <message>Got </results>
475
Try this out by activating the following link. Change the parameters and see the changes reflected in the output. invalid decimal [2]
Getting URL Parameters </results> Change parameters in the URL and see the changes reflected in the output.
476
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Parameters/ getparams. xq?param1=abc& param2=xyz [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Parameters/ paramtypes2. xq?myint=6& mydecimal=x [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Parameters/ echo-parameters. xq?a=1& b=2
Google Geocoding
Motivation
You have one or more geographic names and you want to create a map of these locations.
Method
We will use a Google RESTful web service to return geographical data from a list of place names. Google provides an HTTP-based Geocoding service [1]. This requires registration of a site for a API key and there are limitations on the usage of this API.
Google Geocoding Examples Single City Examples: Minneapolis [2] - example using a single city with no country Bristol,UK [3] - example using a city and country Multiple matches may be returned: Utopia [4] or none: Santa's House [5]
477
Response as KML
The XML response can be reformated as a simpler KML file. Note the addition of the relevant media-type for KML and the declaration of the KML namespace required to access the returned XML.
declare option exist:serialize "method=xml media-type=application/vnd.google-earth.kml+xml indent=yes";
declare namespace kml = "http://earth.google.com/kml/2.0"; let $key := "ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ" let $location := request:get-parameter("location",()) let $location := escape-uri($location,false()) let $url := concat("http://maps.google.com/maps/geo?q=",$location,"&output=xml&key=",$key) let $response := doc($url) let $x := response:set-header('Content-disposition',concat('inline;filename="',$location,'.kml";')) return <kml xmlns="http://earth.google.com/kml/2.0"> <Folder> <name>{$location}</name> { for $place in $response//kml:Placemark return <Placemark> <name>{string($place/kml:address)}</name> {$place/kml:Point} </Placemark> } </Folder> </kml>
If you have GoogleEarth, this should load an overlay: Utopia KML [6]
Google Geocoding
478
GoogleMap
A simple way to view the generated kml is to use GoogleMaps. This script simply constructs the relevant GoogleMap URL and then redirects to that URL:
let $location := request:get-parameter("location",()) let $location := escape-uri($location,false()) let $wikiurl := escape-uri(concat("http://www.cems.uwe.ac.uk/xmlwiki/geocodekml.xq?location=",$location),false()) let $url := concat("http://maps.google.co.uk/maps?q=",$wikiurl) return response:redirect-to(xs:anyURI($url))
Map of Utopia Locations [7] This mimic of the GoogleMap is useful to check that the scripts are working, but more usefully, the geocoding service could be used within an application.
This is now a service which can be used as a REST service. The Postcode for UWE, Bristol [8] Here's a Yahoo Pipe to take some data on Scotch Whiskies: <?xml version="1.0" encoding="UTF-8"?> <WhiskyList> <Whisky> <Brand>Glen Ord</Brand> <Address>Glen Ord Distillery, Muir of Ord, Ross-shire</Address>
Google Geocoding <Postcode>IV67UJ</Postcode> </Whisky> <Whisky> <Brand>Dalwhinnie</Brand> <Address>Dalwhinnie Distillery, Dalwhinnie, Inverness-shire</Address> <Postcode>PH191AB</Postcode> </Whisky> <Whisky> <Brand>Laphroaig</Brand> <Address>Laphroaig Distillery, Port Ellen, Isle of Islay</Address> <Postcode>PA427DU</Postcode> </Whisky> </WhiskyList> and generate a geo-coded RSS feed: Whisky Map [9]
479
RSS feed
Of course this feed could be generated in XQuery alone:
declare namespace geo = "http://www.w3.org/2003/01/geo/wgs84_pos#"; declare namespace kml = "http://earth.google.com/kml/2.0"; declare option exist:serialize "method=xml omit-xml-declaration=no indent=yes encoding=iso-8859-1 media-type=application/rss+xml"; declare variable $key := "ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ"; declare function local:geocode-location($location as xs:string) { let $url := concat("http://maps.google.com/maps/geo?q=",$location,"&output=xml&key=",$key) let $response := doc($url) let $place := $response//kml:Placemark[1] let $point := $place/kml:Point/kml:coordinates let $coords := tokenize($point,",") return ( <geo:lat>{$coords[2]}</geo:lat>, <geo:long>{$coords[1]}</geo:long> ) }; <rss version='2.0' xmlns:geo = "http://www.w3.org/2003/01/geo/wgs84_pos#"> <channel> <title>Whiskies of Scotland</title> { for $whisky in //Whisky let $postcode := $whisky/Postcode
Google Geocoding
let $location := local:geocode-location($postcode) return <item> <title>{string($whisky/Brand)}</title> <description>{string($whisky/Address)}</description> {$location} </item> } </channel> </rss>
480
References
[1] [2] [3] [4] http:/ / www. google. com/ apis/ maps/ documentation/ services. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ googlegeocode. xq?location=Minneapolis http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ googlegeocode. xq?location=Bristol,UK http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ googlegeocode. xq?location=Utopia
[5] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ googlegeocode. xq?location=Santas+ House [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ geocodekml. xq?location=Utopia [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ geocodemap. xq?location=Utopia [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ geocode. xq?location=BS16+ 1QY [9] http:/ / pipes. yahoo. com/ pipes/ pipe. info?_id=OEuSIml73BGq1a0QouNLYQ [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ whiskyRSS. xq
Gotchas
generalised equals
= is a sequence comparison which is true if the intersection is not empty. Thus: (1, 2, 3) = (3, 4, 5) and 3 = (3, 4, 5) but there are some oddities. For example () = () is false, as is () != () The operator eq is used to compare values only.
Gotchas
481
Arithmetic operators
The minus sign needs space around it since $x-3 is a valid variable name which is not the same as $x - 3.
Binding
:= is the binding operator. The abbreviated form in which multiple binding statements can be separated by a comma: let $x := 3 let $y := 4 let $x := "fred" Abbreviates to let $x := 3, $y := 4, $x := "fred" is convenient but can lead to errors when code is amended. Consider avoiding this syntax.
Conditional expression
The conditional expression must have the else part. Return the empty sequence () if one alternative is not required: if ($x = 4) then "Four" else ()
Sorting
'Order by' sorts numbers as text order by $c/population To get it sorted as a number, you have to use the number() function order by number($C/population) or cast to a number type order by xs:integer($c/population) or order by $c/population cast as xs:integer
Gotchas
482
XML construction
You can build XML by simply starting a tag. These tags are really XQuery expression operators. However, this puts you in a lexical scope where everything is XML and curly braces then switch back to normal XQuery (and an open tag will then switch back to XML). Escape curly braces with double braces (). Note the toggling between XQuery and XML construction modes: let $a:= "bob" let $b:= "jane" let $ab := ($a, $b) return <people> { for $person in $ab return <person> {$person} </person> } </people>
Comments
Comments in XQuery use (: ... :) whereas comments in XML use <!-- ... --> It is easy to use the wrong kind in the wrong context, particularly XQuery comments in constructed XML. <A> (: a comment :) </A> makes the comment the text part of an XML element.
let
let statements are part of a FLOWR expression and can't appear alone. XQuery is a functional language and let statements are only temporary bindings of names to expressions. let $x := 5 return $x
function result
return is not required in functions. It is part of a FLOWR statement and not a statement in itself (like let). So it is not required and not allowed if there is no for or let. declare function local:sum($a, $b) { $a + $b } or declare function local:sum($a, $b) { let $c := $a + $b return $c } but not declare function local:sum($a, $b) { return $a + $b
Gotchas }
483
Graph Visualization
Graphviz [1] developed by AT&T provides a package of code for generating graph images from a text definition. The input file in 'dot' format can be generated by an XQuery script with text output.
Motivation
You want to create a graph to visualize complex structures such as taxonomies, object hierarchies or organizational hierarchies.
Database visualization
A graphical representation of the relationships between employee and manager in the empdept example. This script generates the dot format file, with employees as (implicit ) nodes and arcs from employee to manager to show managed by relationships. The output is serialised as text. Serialisation strips out all XML tags so XML can be used to structure the output and there is no need to serialise each item. The Graphviz dot format uses { } curly brackets as delimiters so these need to be escaped (doubled) in XQuery. declare option exist:serialize "method=text "; <graph> digraph {{ { for $emp in //Emp let $mgr := //Emp[EmpNo = $emp/MgrNo] where exists($mgr) return concat( $emp/Ename, " -> ", $mgr/Ename, ";") } }} </graph> Generate dot file [2] If this is now passed through a Graphviz transformer (here a standalone service), we get a graph of these relationships as gif image: PNG image [3] SVG image [4] This would look more like a typical organisational chart if the graph was reversed. Graphviz provides a wide range of controls over the content and appearance of the graph. declare option exist:serialize "method=text "; <graph> digraph {{ rankdir=BT; { for $emp in //Emp
Graph Visualization let $mgr := //Emp[EmpNo = $emp/MgrNo] where exists($mgr) return concat( $emp/Ename, " -> ", $mgr/Ename, ";") } }} </graph> PNG image [5] Since Enames are not necessarily unique, it would be better to use the EmpNo as the node identifier and label the node with the name: declare option exist:serialize "method=text "; <graph> digraph {{ {for $emp in //Emp let $mgr := //Emp[EmpNo = $emp/MgrNo] return <emp> {$emp/EmpNo} [label="{$emp/Ename}"]; {if ( exists($mgr)) then <arc> {$mgr/EmpNo} -> {$emp/EmpNo} ; </arc> else () } </emp> } }} </graph> image [6] Similarly, the Department/Employee Hierarchy can be graphed: declare option exist:serialize "method=text "; <graph> digraph {{ {for $dept in //Dept return <dept> Company -> {$dept/DeptNo} ; {$dept/DeptNo} [ label="{$dept/Dname}" ]; { for $emp in //Emp[DeptNo = $dept/DeptNo] return <emp>
484
Graph Visualization {$emp/EmpNo} [label="{$emp/Ename}" ]; {$dept/DeptNo} -> {$emp/EmpNo} ; </emp> } </dept> } }} </graph> image [7]
485
References
[1] http:/ / graphviz. org/ [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ hierarchydot. xq [3] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ hierarchydot. xq [4] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?output=svg& url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ hierarchydot. xq [5] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ hierarchyrevdot. xq [6] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ hierarchynodot. xq [7] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2media. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ companytodot. xq
HelloWorld
Motivation
You want to run a small program that tests to see if your XQuery execution environment is working.
XML Output
xquery version "1.0"; let $message := 'Hello World!' return <results> <message>{$message}</message> </results> Execute [1]
HelloWorld
486
Expected Output
<?xml version="1.0" encoding="UTF-8"?> <results> <message>Hello World!</message> </results>
Discussion
The program creates a temporary variable called $message and assigns it a string value. The output is an XML element containing a message element which contains the value of the variable.
Suggestions
Try omitting the curly braces from inside of the result message element. What do you get? [Execute [2]] What happens if you omit the results wrappers? [Execute [3]]
Plain Text
You can get XQuery to return plain text using serialization options which define the serialization and the output media-type. For example to output the message as text, specify the serialization as text and the media-type as text/plain. xquery version "1.0"; declare option exist:serialize "method=text media-type=text/plain"; let $message := 'Hello World!' return $message
[Execute [4]]
Expected Output
Depending on your browser set-up, this will launch a viewer for text documents and display Hello World!
Execution Methods
If you are using the oXygen IDE this can be done by selecting the "transform" icon on the toolbar. If you are running this program in the eXist databases you can upload a file called hello.xq using the "Browse" function in the web administrator and then run the following in the browser: http://localhost:8080/exist/rest/db/hello.xq There are three important items to note in this URL. 1. This is the URL that you would use if you used the default eXist configuration 2. Note that the world "rest" is in the URL before the "/db" indicating that you are using the REST interface (as opposed to the WebDAV, Atom or SOAP interface) 3. Note that the port is "8080" (the default port for development web sites) and that the "context" of the server is "exist". Both of these can be easily changed by editing the $EXIST_HOME/tools/jetty/etc/conf.xml file and restarting your eXist server. The short-form on production sites might be: http://localhost/rest/db/hello.xq
HelloWorld With tools like URL rewriting you can also remove the "/rest" and the "/db" components of the URL.
487
References
[1] [2] [3] [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ helloWorld. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ helloWorld_1. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ helloWorld_2. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Basics/ helloWorld_3. xq
HTML Table View This could then be used to view selected nodes: local:sequence-to-table(//Emp)
488
This approach is ideal if you know that the first node in a dataset has all the elements for all the columns in the table. This approach is used in the later Database example to display computed sequences. The following line must be added if you are using strict XHTML. This puts all the HTML tags (<table>, <htead>, <th>, <tbody>, <tr> and <td>) in the correct namespace. declare base-uri "http://www.w3.org/1999/xhtml";
Execute [1]
Sequence as CSV
A similar approach can be used to export the sequence as CSV. Here the header Content-Disposition is set so the Browser will allow the generated file to be opened directly in Excel.
declare option exist:serialize "method=text media-type=text/text"; declare variable declare variable $sep := ','; $eol := ' ';
declare function local:sequence-to-csv($seq) as xs:string { (: returns a string-join( (string-join($seq[1]/*/name(.),$sep), for $row in $seq return string-join( for $node in $seq[1]/* let $data := string($row/*[name(.)=name($node)]) return if (contains($data,$sep)) then concat('"',$data,'"') else $data , $sep) ),$eol ) }; let $x := response:set-header('Content-Disposition','inline;filename=empdept.csv') return local:sequence-to-csv(//Emp) multi-line string of comma delimited strings :)
Execute [2]
489
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empTable. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empTablecsv. xq
Execution
Search the Elements [3]
doctype-public=-//W3C//DTD HTML 4.01 Transitional//EN doctype-system=http://www.w3.org/TR/loose.dtd";
<html xmlns="http://www.w3.org/1999/xhtml" > <head> <title>Chemical Elements</title> <script language="javascript" src="ajaxelement.js"/> <style type="text/css"> td {{background-color: #efe; font-size:14px;}} th {{background-color: #ded; text-align: right; font-variant:small-caps;padding:3px; font-size:12px;}} </style> </head> <body> <h1>Chemical Elements</h1> <table class="page"> <tr> <td valign="top" width="30%"><form onSubmit="getList(); return false"> <span><label for="name">Element Name </label> <input type="text" size="5" name="name" id="name" title="e.g. Silver" onkeyup="getList();" onfocus="getList();" /> </span> </form> </td>
490
The JavaScript
The JavaScript implements the simple functionality of calling the server-side script getElement.xq with the string entered in the search box, and in the callback, pasting the returned XHTML into the div. function updateList() { if (http.readyState == 4) { var divlist = document.getElementById('list'); divlist.innerHTML = http.responseText; isWorking = false; } } function getList() { if (!isWorking && http) { var name = document.getElementById("name").value; http.open("GET", "getElement.xq?name=" + name); http.onreadystatechange = updateList; // this sets the call-back function to be invoked when a response from the HTTP request is returned isWorking = true; http.send(null); } } function getHTTPObject() { var xmlhttp; /*@cc_on @if (@_jscript_version >= 5) try { xmlhttp = new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { try { xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } catch (E) { xmlhttp = false; } } @else xmlhttp = false; @end @*/ if (!xmlhttp && typeof XMLHttpRequest != 'undefined') {
Incremental Search of the Chemical Elements try { xmlhttp = new XMLHttpRequest(); xmlhttp.overrideMimeType("text/xml"); } catch (e) { xmlhttp = false; } } return xmlhttp; } var http = getHTTPObject(); // var isWorking = false; create the HTTP Object
491
Incremental Search of the Chemical Elements if (count($matches) = 0) then <span>No matches</span> else if (count($matches) =1) then local:atom-to-table($matches) else (: multiple matches :) <table class="list"> <tr> <th>Name</th> <th>Symbol</th> <th>Atomic Weight</th> </tr> {for $match in $matches order by $match/NAME return <tr> <th>{string($match/NAME)}</th> <td>{string($match/SYMBOL)}</td> <td>{string($match/ATOMIC_WEIGHT)} </td> </tr> } </table> else ()
492
To do
Some naming problems here - needs tidying. Units need to be included
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ajax/ periodicTable. xml [2] http:/ / www. cafeconleche. org [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ ajax/ periodicTable. xq
493
Strategy
Limiting Records in a Document
If you are limiting records in a large XML document you can do this by adding a predicate to the end of your for loop: for $person in doc($file)/Person[position() lt 10]
494
Limiting Result Sets </tr> </thead> <tbody> { for $person in subsequence(collection('/db/contacts/data'), $start, $num)/Person let $pid := $person/id let $lname := $person/u:PersonSurName/text() let $fname := $person/u:PersonGivenName/text() let $street := $person/u:StreetFullText/text() let $city := $person/u:LocationCityName/text() let $state := $person/u:LocationStateName/text() let $zip := $person/u:LocationPostalCodeID/text() let $email := $person/u:ContactEmailID/text() let $phone := $person/u:TelephoneNumberFullID/text() order by $lname, $fname return <tr> <td>{$pid}</td> <td>{$lname}</td> <td>{$fname}</td> <td>{$street}</td> <td>{$city}</td> <td>{$state}</td> <td>{$zip}</td> <td>{$email}</td> <td>{$phone}</td> <td><a href="update-person-form.xq?id={$pid}">Edit</a></td> <td><a href="delete-person.xq?id={$pid}">Delete</a></td> </tr> } </tbody> </table> <input type="button" onClick="parent.location='{$query-base}?start={$start $records}&num={$num}'" value="< Previous"/> <input type="button" onClick="parent.location='{$query-base}?start={$start + $records}&num={$num}'" value="Next >"/> <br/> <a href="create-person.xhtml">Create New Person</a> <br/> <a href="index.xhtml">Return to main demo page</a> </body> </html>
495
Manipulating URIs
496
Manipulating URIs
Motivation
Sometimes you need to be able to manipulate the URI of your own XQuery. This is useful when you need to call your own XQuery with different parameters. For example if you have an XQuery that returns the first 20 rows in a query but you want to add a Get Next 20 Records button you may want to simply call yourself with additional parameters for what record to start with, in this case start at record 21. This program demonstrates some XQuery functions that are not part of the original XQuery specification but are required for precise web server XQuery functionality. The functions are: eXist request:get-uri() - Returns the URI of the current request within the web server. For example /exist/rest/db/test/my-query.xq eXist request:get-url() - Returns the full URL including the server and port. http://www.example.com:8080/exist/rest/db/test/my-query.xq eXist request:get-query-string() - Returns the full query string passed to the servlet (without the initial question mark). eXist system:get-module-load-path() - returns the path to the place where a module has been loaded from eXist system:get-exist-home() - returns the base of the eXist web root
Sample Program
xquery version "1.0"; declare namespace system="http://exist-db.org/xquery/system"; declare namespace request="http://exist-db.org/xquery/request"; declare option exist:serialize "method=html media-type=text/html indent=yes";
let $get-uri := request:get-uri() let $get-url := request:get-url() let $module-load-path := system:get-module-load-path() let $exist-home := system:get-exist-home() let $path := substring-after($module-load-path, 'xmldb:exist://embedded-eXist-server') let $replace := replace($module-load-path, 'xmldb:exist://embedded-eXist-server', '')
return <html> <head> <title>URI Path Example</title> </head> <body> <h1>Sample URI manipulation with XPath</h1> <table border="1"> <thead> <tr> <th>Out</th> <th>In</th>
Manipulating URIs
</tr> </thead> <tr> <td>request:get-url()</td> <td>{$get-url}</td> </tr> <tr> <td>request:get-uri()</td> <td>{$get-uri}</td> </tr> <tr> <td>system:get-module-load-path()</td> <td>{$module-load-path}</td> </tr> <tr> <td>system:get-exist-home()</td> <td>{$exist-home}</td> </tr> <tr>
497
<td>substring-after(system:get-module-load-path(), 'xmldb:exist://embedded-eXist-server')</td> <td>{$path}</td></tr> <tr> <td>replace(system:get-module-load-path(), 'xmldb:exist://embedded-eXist-server', '')</td> <td>{$replace}</td> </tr> </table> </body> </html>
Execute [1]
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ uri. xq?a=4& b=5
498
TransXChange
Here is an extract from the beginning of a typical timetable document showing a single StopPoint definition: <TransXChange xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:apd="http://www.govtalk.gov.uk/people/AddressAndPersonalDetails" xmlns="http://www.transxchange.org.uk/" xsi:SchemaLocation="http://www.transxchange.org.uk/ TransXChange_general.xsd" CreationDateTime="2006-12-07T14:47:00-00:00" ModificationDateTime="2006-12-07T14:47:00-00:00" Modification="new" RevisionNumber="0" FileName="SVRSGAO070-20051210-5580.xml" SchemaVersion="2.1" RegistrationDocument="false"> <StopPoints> <StopPoint CreationDateTime="2006-12-07T14:47:00-00:00"> <AtcoCode>0100BRP90340</AtcoCode> <NaptanCode>BSTGAJT</NaptanCode> <Descriptor> <CommonName>Rupert Street (CA)</CommonName> <Landmark>NONE</Landmark> <Street>Rupert Street</Street> <Crossing>Colston Avenue</Crossing> </Descriptor> <Place> <NptgLocalityRef>N0076879</NptgLocalityRef> <Location> <Easting>358664</Easting> <Northing>173160</Northing> </Location> </Place> <StopClassification> <StopType>BCT</StopType> <OnStreet> <Bus> <BusStopType>MKD</BusStopType> <TimingStatus>OTH</TimingStatus> <MarkedPoint> <Bearing> <CompassPoint>N</CompassPoint> </Bearing> </MarkedPoint> </Bus>
499
Coordinate transformation
Transformation from OS National Grid Coordinates to WSG84 latitudes and longitudes used in GoogleMaps requires two kinds of transformation: between latitude and longitudes on an ellipsoidal model of the Earth and the Transverse Mercator projection used for the OS between the latitude/longitude coordinates based on on different ellipsoids used in the OS coordinates and the global WGS84 coordinates. An XQuery module which contains these functions and other utility functions is available in the XQuery Examples [2] Google Code project.
declare function local:camelCase($s) { string-join( for $word in tokenize($s,' ') return concat(upper-case(substring($word,1,1)), lower-case(substring($word,2))), ' ') };
<StopPointSet> {for $stopCode in distinct-values(//tx:StopPoint/tx:AtcoCode) let $stop := (//tx:StopPoint[tx:AtcoCode=$stopCode])[1] let $d := $stop/tx:Descriptor let $l := $stop/tx:Place/tx:Location return <StopPoint> <AtcoCode>{string($stop/tx:AtcoCode)}</AtcoCode> <CommonName>{string($d/tx:CommonName)}</CommonName> {if ($d/tx:Landmark ne 'NONE') then <LandMark>{local:camelCase($d/tx:Landmark)}</LandMark> else () }
500
Convert [3]
Output
The output of this transformation contains StopPoints e.g.
<StopPoint> <AtcoCode>0170SGP90690</AtcoCode> <CommonName>Coldharbour Lane</CommonName> <Street>Coldharbour Lane</Street> <Crossing>Filton Road</Crossing>
let $longCorr := math:cos(math:radians(($f/@latitude +$s/@latitude) div 2)) let $dlong := ($f/@longitude - $s/@longitude) * 60 * $longCorr return math:sqrt(($dlat * $dlat) + ($dlong * $dlong)) };
import module namespace geo="http://www.cems.uwe.ac.uk/xmlwiki/geo" at "../lib/geo.xqm"; import module namespace gmap = "http://www.cems.uwe.ac.uk/xmlwiki/gmap" at "../lib/gmap.xqm"; declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none";
let $latitude := xs:decimal(request:get-parameter("latitude", 51.4771)) let $longitude := xs:decimal(request:get-parameter ("longitude",-2.5886)) let $range := xs:decimal(request:get-parameter("range",0.5)) let $focus := geo:LatLong($latitude,$longitude) let $x := response:set-header('Content-Disposition','attachment;filename=stops.kml;')
501
<Placemark> <name>Home</name> <Point> <coordinates>{gmap:LatLong-as-kml($focus)}</coordinates> </Point> <styleUrl>#home</styleUrl> </Placemark> { for $stop in doc("/db/Wiki/geo/stopPoints.xml")//StopPoint let $dist := geo:plain-distance($focus,$stop/geo:LatLong) * 0.868976242 (: distance is in nautical where $dist < $range return <Placemark> <name>{string($stop/CommonName)}</name> <description> {concat($stop/CommonName,' ',$stop/Landmark,' on ', $stop/Street, ' near ', $stop/Crossing)} </description> <Point> <coordinates>{gmap:LatLong-as-kml($stop/geo:LatLong)}</coordinates> </Point> <styleUrl>#stop</styleUrl> </Placemark> } </Document> is {geo:round($dist,2)} miles away. miles :)
Stops within half a mile of my home as KML [4] rendered by GoogleMap [5]. On GoogleMaps the stops appear to be closely aligned to the bus stop overlay, presumably generated from the same base locations.
Icons
Selecting Icons for kml is eased if you can easily browse them. Here is a simple browser in XQuery: declare variable $base := "http://maps.google.com/mapfiles/kml/"; declare option exist:serialize "method=xhtml media-type=text/html"; <html> <h2>Google Earth icons</h2>
Nationalgrid and Google Maps <p>Base url {$base}</p> {for $pal in (2 to 5) return <div> <h2>Palette pal{$pal}</h2> {for $i in (0 to 63) let $icon := concat('pal',$pal,'/icon',$i,'.png') return <img src="{$base}{$icon}" title="{$icon}"/> } </div> } </html> Browse kml icons [6]
502
References
[1] [2] [3] [4] [5] [6] http:/ / www. transxchange. org. uk/ http:/ / code. google. com/ p/ xquery-examples/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ txc2Stops. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ stopsNearbykml. xq http:/ / maps. google. com/ maps?q=http:%2F%2Fwww. cems. uwe. ac. uk%2Fxmlwiki%2Fgeo%2FstopsNearbykml. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ geo/ showIcons. xq
Approach
Since NetWorkingDays is a calculation that is shared by many systems, it makes sense to use a XQuery module to put the logic into.
module namespace fxx = "http://xquery.wikibooks.org/fxx";
declare function fxx:net-working-days-n($s as xs:date, $f as xs:date, $dates as xs:date*, $total as xs:integer) as xs:integer { if ($s= $f) then $total else if (fxx:weekday($s) and not($s = $dates)) then fxx:net-working-days-n($s + xs:dayTimeDuration('P1D'), $f, $dates,$total + 1) else fxx:net-working-days-n($s + xs:dayTimeDuration('P1D'), $f, $dates,$total ) };
503
fxx:net-working-days-n($s,$f, (), 0)
};
declare function fxx:net-working-days($s as xs:date,$f as xs:date, $dates as xs:date*) as xs:integer { fxx:net-working-days-n($s,$f, $dates, 0)
};
The heart of this calculation is a NetWorkingDays algorithm that is passed two dates.
(: Test driver for Net Working Days : tags to generate documentation using xqdoc.org scripts at http://www.xqdoc.org/qs_exist.html : : @return XHTML table for next "caldays" days from now including working days calculations from today : @input-parameter: caldays - an integer number of calendar days in the future from now : :)
let $now := xs:date(substring(current-date(),1,10)) return <html> <body> <h1>Days from {$now}</h1> <p>Today is a {fxx:day-of-week-name-en(xs:date(substring(current-date(),1,10)))}</p> <p>Format: net-working-days.xq?cal-days=50</p> <table border="1"> <thead> <tr> <th>Cal Days</th> <th>Furture Date</th> <th>Day of Week</th>
504
let $dow := fxx:day-of-week($d) return <tr> <td align="center">{$i}</td> <td align="center">{$d}</td> <td align="center">{fxx:day-of-week-name-en(xs:date(substring($d,1,10)))}</td> <td align="center">{fxx:net-working-days(xs:date(substring(current-date(),1,10)),$d)}</td> </tr> } </table> <br/> <a href="index.xhtml">Back to Unit Testing Main Menu</a> <br/> <a href="../../index.xhtml">Back to CRV Main Menu</a> </body> </html>
Discussion
The recursive function works but it is slow. It has to call itself once for each date between the two dates. An alternative approach is to count the end days in each fraction of a week, count the weeks and multiply by five. Code??
Acknowledgments
An initial version of this was provided by Chris Wallace.
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ workCalendar. xq
505
Yahoo Pipe
This task can be accomplished by the Yahoo Pipe [2] written by Paul Daniel. (up to the extraction of the location ID) However the inherent instability of HTML markup leads to the current failure of this pipeline.
XQuery
This script takes a location parameter, extracts the first letter of the location, constructs the URL of the yahoo weather index page for that letter, the index page for the letter B [3] and fetches the page via the httpclient module in eXist. The page is not valid XHTML but the httpclient:get function cleans up the XML so it is well-formed. HTML page [4] The page structure can be seen in the tree view [5]. Next this XML is navigated to locate the li element containing the location and strips out the code for that location. Finally this code is appended to the stem of the URL of the RSS page for this location, created a URL fo rthe RSS feed at that location. RSS feed [6] and the script then redirects to that URL. This process can be visualized using a data flow diagram Diagram [7] declare variable $yahooIndex := "http://weather.yahoo.com/regional/UKXX"; declare variable $yahooWeather := "http://weather.yahooapis.com/forecastrss?u=c&p="; let let let let let let let let $location := request:get-parameter("location","Bristol") $letter := upper-case(substring($location,1,1)) $suffix := if($letter eq 'A') then '' else concat('_',$letter) $index := xs:anyURI(concat ($yahooIndex,$suffix,".html")) $page := httpclient:get($index,true(),()) $href := $page//div[@id="yw-regionalloc"]//li/a[.= $location]/@href $code := substring-after(substring-before($href,'.'),'forecast/') $rss := xs:anyURI(concat($yahooWeather,$code) )
return response:redirect-to ($rss) Bristol RSS feed [8] Cardiff RSS feed [9]
506
Notes
1. Although the index page is not valid XHTML (why not?) and needs tidying, Yahoo have been helpful to the scapper by using ids on the sections. This allows the XPath expression to pick out the relevant section by id, and then select the li containing the location. However such tagging is not stable, and in fact changed recently from an id of browse to the current yw-regionalloc. Note also that there is additional work required because the page for A has a different URL to the remainder of the letters -a feature not easily seen or tested for. 2. eXist is not ideally suited to this task since the page has to be first stored in the database so that XPath expressions can be executed using the structural index. An in-memory XQuery engine such as Saxon would be expected to perform better on this task. At present the performance is a bit slow but the new 1.3 release improves this situation. 3. Extracting the code from the string would be clearer with a regular expression, but XQuery does not provide a simple matching function to extract the matched pattern. An XQuery function which wraps some XSLT to do this is described in analyse-string 4. The script uses the eXist function response:redirect-to to re-direct the browser to the constructed URL for the RSS feed
XSLT
For comparison, here is the equivalent XSLT script, using analyse-string.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:param name="location"/> <xsl:variable name="html2xml">
<xsl:text>http://www.html2xml.nl/Services/html2xml/version1/Html2Xml.asmx/Url2XmlNode?urlAddress=</xsl:text> </xsl:variable> <xsl:variable name="yahooIndex"> <xsl:text>http://weather.yahoo.com/regional/UKXX_</xsl:text> </xsl:variable> <xsl:variable name="yahooWeather"> <xsl:text>http://weather.yahooapis.com/forecastrss?u=c&p=</xsl:text> </xsl:variable> <xsl:template match="/"> <xsl:variable name="letter" select="upper-case(substring($location,1,1))"/> <xsl:variable name="suffix" select="if($letter eq 'A') then '' else concat('_',$letter)"></xsl:variable> <xsl:variable name="page" select="doc(concat ($html2xml,$yahooIndex,$suffix,'.html'))"/> <xsl:variable name="href" select="$page//div[@id='yw-regionalloc']//li/a[.= $location]/@href"/> <xsl:variable name="code" > <xsl:analyze-string select="$href" regex="forecast(.*)\.html"> <xsl:matching-substring> <xsl:value-of select="regex-group(1)"/> </xsl:matching-substring> </xsl:analyze-string> </xsl:variable> <xsl:variable name="rssurl" select="concat($yahooWeather,$code)"/> <xsl:copy-of select="doc($rssurl)"/> </xsl:template>
507
XPL
Another approach is to use XPL [11] developed by Erik Bruchez and Alessandro Vernet at Orbeon to describe the sequence of transformations as a pipeline. Here the pipeline is extended to create a custom HTML page from the RSS feed.
<?xml version="1.0" encoding="UTF-8"?> <p:pipeline xmlns:p="http://www.cems.uwe.ac.uk/xpl" <p:output id="weatherPage"/> <p:processor name="xslt"> <p:annotation>construct the index page url from the parameter</p:annotation> <p:input name="parameter" id="location"/> <p:input name="xml"> <dummy/> </p:input> <p:input name="xslt"> <xsl:template match="/"> <xsl:text>http://weather.yahoo.com/regional/UKXX_</xsl:text> <xsl:value-of select="upper-case(substring($location,1,1))"/> <xsl:text>.html</xsl:text> </xsl:template> </p:input> <p:output name="result" id="indexUrl"/> </p:processor> <p:processor name="tidy"> <p:annotation>tidy the index page</p:annotation> <p:input name="url" id="indexUrl"/> <p:output name="xhtml" id="indexXhtml"/> </p:processor> <p:processor name="xslt"> <p:annotation>parse the index page and construct the URL for the RSS feed</p:annotation> <p:input name="xml" id="indexXhtml"/> <p:input name="parameter" id="location"/> <p:input name="xslt"> <xsl:template match="/"> <xsl:variable name="href" select="//div[@id='yw-regionalloc']//li/a[.= $location]/@href"/> <xsl:text>http://weather.yahooapis.com/forecastrss?u=c%26p=</xsl:text> <xsl:value-of select="substring-before(substring-after($href,'forecast/'),'.html')" /> </xsl:template> </p:input> <p:output name="result" id="rssUrl"/> </p:processor> <p:processor name="fetch"> <p:annotation>fetch the RSS feed</p:annotation> xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
508
Given implementations for each of the named processor types, this can be executed prototype XQuery processor )
[12]
This is a work in progress - at present this XPL engine is only a very simple, partial prototype, and even this simple sequential example is not conformant with the XPL schema (hence the local namespace). The pipeline can be visualized [13] using GraphViz. The intention is to generate an additional image map to support linking to the underlying processes as well as support the full XPL language
References
[1] [2] [3] [4] [5] [6] [7] http:/ / developer. yahoo. com/ weather/ http:/ / pipes. yahoo. com/ pipes/ pipe. info?_id=MEY4dst33BGiVWNbOTY80A http:/ / weather. yahoo. com/ regional/ UKXX_B. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ geturi. xq?uri=http:/ / weather. yahoo. com/ regional/ UKXX_B. html http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ treeview. xq?path=no& uri=http:/ / weather. yahoo. com/ regional/ UKXX_B. html http:/ / xml. weather. yahoo. com/ forecastrss?p=UKXX0025 http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2image. php?format=gif& url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ DataFlow/ xpl2dot. xq?url=/ db/ Wiki/ DataFlow/ yahooweatherpl. xml [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ weather/ yahoo. xq?location=Bristol [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ weather/ yahoo. xq?location=Cardiff [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ xslt2html. xq?xslt=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ weather/ yahooRSS. xsl& location=Bristol [11] http:/ / www. orbeon. com/ ops/ doc/ reference-xpl-pipelines [12] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xmlpipes/ executeXPL. xq?location=Bristol [13] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ services/ dot2image. php?url=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xmlpipes/ xpl2dot. xq& format=gif
509
exist request:get-parameter-names()
Namespace
module namespace common = "http:/ / www. metaphoricalweb. org/ xmlns/ common";
Platform
eXist
510
common:get-parameter
This function retrieves a sequence of string values corresponding to the values for a given parameter key given in the query string. Note that while typically there will be only one string in the sequence, if you have a query string of the form ?a=val1;b=val2;a=val3 then get-parameter("a","",";") will return ("val1","val3")
declare function common:get-parameter($param-name as xs:string,$default-value as xs:string,$delimiter as xs:string) as xs:string* { let $params := common:get-parameters($delimiter) let $param-nodes := $params/param[@name=$param-name] let $param-values := for $param-node in $param-nodes return if ($param-node/@value) then string($param-node/@value) else $default-value return }; $param-values
common:get-parameter-names
This function retrieves the name of each query string key (once and only once per key).
declare function common:get-parameter-names($delimiter as xs:string) as xs:string* { let $params := common:get-parameters($delimiter) for $param-name in distinct-values($params/param/@name) return $param-name };
511
Example Program
Assumes query string of http://www.metaphoricalweb.org/?a=5;b=test;a=8;c=new+message let $msg := common:get-parameter("c","",";") return $msg returns [Execute [1]] new message <data> { for $key in common:get-parameter-names(";") return <seq>{$key}:{common:get-parameter($key,"",";")}</seq> } </data> returns [Execute [2]] <data> <seq>a:5 8</seq> <seq>b:test</seq> <seq>c:new message</seq> </data let $seq1 := common:get-parameter("a",0,";") return sum(for $n in $seq1 return number($n)) returns [Execute [3]] 13
References
[1] http:/ / www. cems. uwe. ac. uk/ xmldb/ rest/ / db/ Wiki/ param_1. xq?a=5;b=test;a=8;c=new+ message [2] http:/ / www. cems. uwe. ac. uk/ xmldb/ rest/ / db/ Wiki/ param_2. xq?a=5;b=test;a=8;c=new+ message [3] http:/ / www. cems. uwe. ac. uk/ xmldb/ rest/ / db/ Wiki/ param_3. xq?a=5;b=test;a=8;c=new+ message
Project Euler
512
Project Euler
Project Euler [1] is a collection of mathematical problems. Currently there are 166 so it may take some time to get through them all :-).
Problem 1 [2]
Add all the natural numbers below 1000 that are multiples of 3 or 5. sum ((1 to 999)[. mod 3 = 0 or . mod 5 = 0]) Run [3]
Problem 2 [4]
Find the sum of all the even-valued terms in the Fibonacci sequence which do not exceed one million. declare function local:fib($fibs,$max) { let $next := $fibs[1] + $fibs[2] return if ($next > $max) then $fibs else local:fib(($next,$fibs),$max) }; sum( local:fib((2,1),1000000)[. mod 2 = 0]) Run [5] This brute-force approach recursively builds the Fibonacci sequence (in reverse) up to the maximum, then filters and sums the result.
Problem 3 [6]
What is the largest prime factor of the number 317584931803? First we need to get a list of primes. The algorithm known as the Sieve of Eratosthenes is directly expressible in XQuery:
declare function local:sieve($primes as xs:integer*,$nums as xs:integer ) if (exists($nums)) then let $prime := $nums[1] 0]) as xs:integer* {
The list of primes starts off empty, the list of numbers starts off with the integers. Each recursive call of local:sieve takes the first of the remaining integers as a new prime and reduces the list of integers to those not divisible by the prime. When the list of integers is exhausted, the list of primes is returned.
Project Euler Primes less than 1000 [7] Factorization of a number N is also easily expressed as the subset of primes which divide N:
513
Hence let $n:= xs:integer(request:get-parameter("n",100)) let $max := xs:integer(round(math:sqrt($n))) let $primes := local:sieve((),2 to $max) return <result> { local:factor($n,$primes) } </result> Factors of 13195 [8] And the largest is max (local:factor($n,$primes)) Largest factor of 13195 [9] Sadly this elegant method runs out of space and time for integers as large as that in the problem.
Problem 4 [10]
Find the largest palindrome made from the product of two 3-digit numbers. declare function local:palindromic($n as xs:integer) as xs:boolean { let $s := xs:string($n) let $sc := string-to-codepoints($s) let $sr := reverse ($sc) let $r := codepoints-to-string($sr) return $s = $r }; max( (for $i in (100 to 999) for $j in (100 to 999) return $i * $j) [local:palindromic(.)] ) Run [11] [ takes 20 seconds]
Project Euler
514
Problem 5 [12]
What is the difference between the sum of the squares and the square of the sums for integers from 1 to 100? declare function local:diff-sum($n as xs:integer) as xs:integer) { sum (1 to $n) * sum(1 to $n) - sum( for $i in 1 to $n return $i * $i ) }; local:diff-sum(100) Run [13] This nasty brute-force method can be replaced by an explicit expression using familiar formula: declare function local:diff-sum($n as xs:integer) as xs:integer { let $sum := $n * ($n + 1) div 2 let $sumsq :=( $n * ($n+1) * (2 * $n +1) ) div 6 return $sum * $sum - $sumsq }; local:diff-sum(100) Run [14]
References
[1] http:/ / projecteuler. net/ index. php?section=about [2] http:/ / projecteuler. net/ index. php?section=problems& id=1 [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ euler1. xq [4] http:/ / projecteuler. net/ index. php?section=problems& id=2 [5] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ euler2. xq [6] http:/ / projecteuler. net/ index. php?section=problems& id=3 [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ sieve. xq [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ factor. xq?n=13195 [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ maxfactor. xq?n=13195 [10] http:/ / projecteuler. net/ index. php?section=problems& id=4 [11] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ euler4. xq [12] http:/ / projecteuler. net/ index. php?section=problems& id=5 [13] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ euler5. xq [14] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ puzzles/ euler5a. xq
515
Searching
This example searches for a string in the location of the earthquake.
declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; "util.xqm";
import module namespace wikiutil = "http://www.cems.uwe.ac.uk/xmlwiki" at let $search := request:get-parameter("search","") let $matches := //Earthquake[contains(Location,$search)] return <html> <head> <title>Search Earthquakes for {$search}</title> </head> <body> <h1>Search Earthquakes</h1> <form>Search for <input type="text" name="search" value="{$search}"/> </form> { wikiutil:sequence-to-table($matches) } </body> </html>
Execute [3]
Paging
This script implements paging of the search results. Here the full search is repeated for each call, with the state of the interaction held in a hidden input.
declare option exist:serialize import module namespace "method=xhtml media-type=text/html indent=yes"; "util.xqm";
wikiutil = "http://www.cems.uwe.ac.uk/xmlwiki" at
let $search := request:get-parameter("search","") let $start:= xs:integer(request:get-parameter("start", "1")) let $records := xs:integer(request:get-parameter("records", "5")) let $action := request:get-parameter("action","search")
(: compute the limits for this page :) let $max := count($result) let $start :=
516
(: restrict the full set of matches to this subsequence :) let $matches := subsequence($allMatches,$start,$records)
return <html> <head> <title>Search Earthquakes </title> </head> <body> <h1>Search Earthquakes</h1> <form > Search Location for <input type="text" name="search" value="{$search}"/> <input type="submit" name="action" value="Search"/> <br/> <input type="hidden" name="start" value="{$start}"/> <input type="submit" name="action" value="Previous"/> <input type="submit" name="action" value="Next"/> <p>Displaying {$start} to {$end} out of {$max} records found.</p> {wikiutil:sequence-to-table($matches) } <p>Records per Page <input type="text" name="records" value="{$records}"/></p> </form> </body> </html>
Execute [4]
Sorting
To get the columns sorted, we add a submit button to each column. This requires extending the generic table viewer to sort the nodes by the selected column.
declare function wikiutil:sequence-to-table($seq,$sort) { <table border="1"> <tr> {for $node in $seq[1]/* return <th><input type="submit" name="Sort" value="{name($node)}"/></th> } </tr>
517
wikiutil = "http://www.cems.uwe.ac.uk/xmlwiki" at
let $search := request:get-parameter("search","") let $sort := request:get-parameter("Sort","Date") let $matches := //Earthquake[contains(Location,$search)] return <html> <head> <title>Search Earthquakes}</title> </head> <body> <h1>Search Earthquakes</h1> <form>Search Location for <input type="text" name="search" value="{$search}"/> {wikiutil:sequence-to-table($matches,$sort)} </form> </body> </html>
Note that the sort is by string value: Sorting by magnitude succeeds only by chance, whereas the sort on Fatalities does not. Execute [5] An improvement would be to allow successive clicks to a column heading to reverse the sort direction. This requires the addition of two more items into the interaction state, the current sort order and current direction, and changes to the table generator. One would like to be able to say something like: for $row .. let $sortBy := .. let $direction := if (..) then "ascending" else "descending" order by $sortBy $direction but this is not a valid FLWOR expression. Instead we have to have two FLWOR expressions, one for each direction.
declare function wikiutil:sequence-to-table($seq,$sort,$direction) { <table border="1">
518
let $search := request:get-parameter("search","") let $sort := request:get-parameter("Sort","Date") let $lastSort := request:get-parameter("LastSort","") let $lastDirection := number(request:get-parameter("LastDirection","1")) let $direction := if ($lastSort = $sort) then - $lastDirection else 1 let $matches := //Earthquake[contains(Location,$search)] return <html> <head> <title>Search Earthquakes</title>
519
Execute [6]
Searching,Paging and Sorting This computes the first non-null item in the sequence, a cleaner and more generalisable alternative to: if (exists($column/@heading)) then $column/@heading else $column/@name Execute [7]
520
References
[1] [2] [3] [4] [5] [6] [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ earthquakes. xml http:/ / www. swivel. com/ data_columns/ show/ 4101545 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ search. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ pagedSearch. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ sortedSearch. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ sortedSearch2. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ reports/ search-with-schema. xq
Sequence Diagrams
Background
Sequence Diagrams are tedious to draw, even with a diagramming tool. They are even worse to edit when the sequence changes. An alternative is to define an XML vocabulary to define the message sequencing and to use XQuery to render this description as XHTML. This textual approach also allows explanations to be revealed at each step and alternative renderings of the XML definition of the Sequence to be generated, such as a printed version. This demonstrator uses a simplified meta-model, with only messages between actors and actions undertaken by actors. (article under re-design - CW)
Models
3-tier architecture
Here is a sample description of interaction in a 3-tier architecture: (badly needs re-writing)
<SequenceDiagram id="3tier"> <name>3-tier architecture</name> <description>An overview of the 3-tier Architecture</description> <cast> <actor> <name>user</name> <label>The User</label> <color>pink</color> <location>client</location> <description>The user of the site</description> </actor> <actor> <name>browser</name> <label>Presentation Layer</label>
Sequence Diagrams
<color>lightgreen</color> <location>client</location> <description>A browser such as Firefox, Opera or Internet Explorer</description> </actor> <actor> <name>server</name> <label>Application Layer</label> <color>lightblue</color> <location>server</location> <description>Scripts in languages such as PHP or Java invoked via a web server</description> </actor> <actor> <name>database</name> <label>Persistance Layer</label> <color>grey</color> <location>server</location> <description>A database server such as Oracle or MySQL</description> </actor> </cast> <communication> <connection> <actor>user</actor> <actor>browser</actor> <method/> <prep>on</prep> </connection> <connection> <actor>browser</actor> <actor>server</actor> <method>HTTP</method> <prep>to</prep> </connection> <connection> <actor>server</actor> <actor>database</actor> <method>SQL</method> <prep>to</prep> </connection> </communication> <trace> <message> <from>user</from> <to>browser</to> <action>click</action> <object>link</object> </message> <message>
521
Sequence Diagrams
<from>browser</from> <to>server</to> <action>request</action> <object>URL</object> <url>http://www.cems.uwe.ac.uk/~cjwallac/apps/poll2/tally.php?pollid=2</url> </message> <do> <at>server</at> <action>decode input</action> <object/> </do> <do> <at>server</at> <action>create SQL request</action> <object/> </do> <message> <from>server</from> <to>database</to> <action>request</action> <object>SQL statement</object> </message> <message> <from>database</from> <to>server</to> <action>respond</action> <object>tables</object> </message> <do> <at>server</at> <action>create page with data in table</action> <object/> </do> <message> <from>server</from> <to>browser</to> <action>respond</action> <object>HTML page</object> </message> <message> <from>user</from> <to>browser</to> <action>read</action> <object>page</object> </message> </trace> </SequenceDiagram>
522
Sequence Diagrams
523
let $id:= request:get-parameter('id','') let $sd :=//SequenceDiagram[@id=$id] let $trace := $sd/trace let $actors := $sd/cast/actor let $nactors := count($sd/cast/actor) let $width := 100 div $nactors return <html> <head><title>Sequence Diagram {string($sd/@id)}</title> </head> <body> <h1>{string($sd/name)} </h1> <div class="description"> {$sd/description/node() } </div> <table border='1'> <tr> {for $a in $actors return <th width='{$width}%' bgcolor='{$a/color}'>{string($a/label)}</th> } </tr> { if ($actors/description) then <tr> {for $a in $actors return <th width='{$width}%' bgcolor='{$a/color}'>{string($a/description)} </th> }
Sequence Diagrams </tr> else () } {for $event in $trace/* return <tr> {if (name($event)='do') then let $p := index-of($actors/name,$event/at ) let $text:= local:makeText($event) return ( for $i in (1 to $p - 1) return <td/>, <td align='center' bgcolor='{$actors[name=$event/at]/color}'> { if ($event/url) then <a href='{$event/url}' target='demo'>{$text}</a> else $text } </td>, for $i in ($p + 1to $nactors) return <td/> ) else if (name($event)='message') then let $pfrom := index-of($actors/name,$event/from ) let $pto := index-of($actors/name,$event/to) let $pfirst := min (($pfrom,$pto)) let $plast := max(($pfrom,$pto)) let $ltor := $pfrom = $pfirst let $text:= local:makeText($event) let $connection := $sd//connection[actor = $event/from and actor= $event/to] let $text := if ($ltor) then concat($connection/method,$leftsym,$text,$leftsym) else concat($rightsym,$text,$rightsym, $connection/method) return ( for $i in (1 to $pfirst - 1)
524
Sequence Diagrams return <td/>, <td align='center' colspan='{$plast - $pfirst + 1 }' bgcolor ='{$actors[name=$event/from]/color}' > {$text} { if ($event/url) then <a href='{$event/url}' target='demo'>Link </a> else () } </td>, for $i in ($plast + 1 to $nactors) return <td/> ) else () } </tr> } </table> </body> </html> Display [1]
525
Sequence Diagrams and call the function let $step := local:next-step( number(request:request-parameter("step",0)), request:request-parameter("action", "start"), count($trace/*)) A form to provide the controls and maintain the interaction state: <h2> <form> <input <input <input <input <input <input </form> </h2> Limit the events displayed to the specified number of steps: for $event in $trace/*[position()<=$step] and display the explanation of the last step: <div class="description"> {if ($step=0) then $sd/description/node() else $trace/*[position()=$step]/description/node() } </div> type="hidden" type="hidden" type="submit" type="submit" type="submit" type="submit" name="id" value="{$id}"/> name="step" value="{$step}"/> name="action" value="start"/> name="action" value="back"/> name="action" value="forward"/> name="action" value="end"/>
526
References
[1] [2] [3] [4] [5] [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ makeDiagram. xq?id=3tier http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ sequences/ track4. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ showSequence. xq?uri=/ db/ Wiki/ SequenceDiagram/ sequences/ track4. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ makeDiagramMove. xq?id=3tier http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ makeDiagramMove. xq?id=ge4 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ showSequence. xq?uri=/ db/ Wiki/ SequenceDiagram/ sequences/ 3tier. xml& action=expand& amp;steps=9999
527
News Page
Reformat the RSS feed as HTML:
declare option exist:serialize "method=xhtml media-type=text/html";
let $news := doc("http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/education/rss.xml") let $dateTime := $news/rss/channel/lastBuildDate return <html> <body> <h2>Education news from the BBC at {string($dateTime)}</h2> { for $newsItem in $news/rss/channel/item[position() < 10] return <div> <h4>{string($newsItem/title)}</h4> <p>{string($newsItem/title/description)} <a href="{$newsItem/link}">more..</a></p> </div> } </body> </html>
Execute [2]
Text-to-Speech
The Opera [3] browser with Voice extension supports text-to-speech, allowing this news to be spoken. This uses the XML vocabularies VoiceXML [4] and XML Events [5].
declare option exist:serialize "method=xhtml media-type=application/xv+xml";
let $news := doc("http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/education/rss.xml") let $dateTime := $news/rss/channel/lastBuildDate let $newsItems := return <h:html xmlns:h="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/vxml" xmlns:ev="http://www.w3.org/2001/xml-events" > <h:head> <h:title>BBC Education news</h:title> <vxml:form id="news"> <vxml:block> {for $newsItem in $newsItems return string($newsItem/description) $news/rss/channel/item[position() < 10]
528
Execute [6] Note that the html namespace has been given a prefix, so that the default prefix can refer to the RSS feed.
References
[1] [2] [3] [4] [5] [6] [7] http:/ / newsrss. bbc. co. uk/ rss/ newsonline_uk_edition/ education/ rss. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RSS/ bbcednews. xq http:/ / www. opera. com/ http:/ / en. wikipedia. org/ wiki/ VoiceXML http:/ / en. wikipedia. org/ wiki/ XML_Events http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RSS/ bbcednewsvoiced. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RSS/ rssvoiced. xq?rss=http:/ / info. uwe. ac. uk/ news/ uwenews/ downloadxml. asp
529
XForms Engines
These examples use: Firefox addin [2] Requires Firefox with the XForms add-in media-type set to application/xhtml+xml FormFaces [3] Cross-browser support - examples tested on Firefox and IE6 The Javascript source is stored in the eXist database and linked to each form media-type set to text/html XSLTForms [4] Uses XSLT to transform to an HTML page and JavaScript to execute. The XSLT transformation may be either server-side (via eXist) or client-side. All examples use the same css stylesheet [5]
XForm Output
Firefox
declare option exist:serialize "method=xhtml media-type=application/xhtml+xml indent=yes";
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:form="http://www.w3.org/2002/xforms" xml:lang="en"> <head> <title>Output a Model value</title> <form:model> <form:instance> <data xmlns=""> <name>Mozilla XForms add-in</name> </data> </form:instance> </form:model> </head> <body> <h2> <form:output ref="name"></form:output>
530
Execute [6]
FormFaces
declare option exist:serialize "method=xhtml media-type=text/html indent=yes";
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:form="http://www.w3.org/2002/xforms" xml:lang="en"> <head> <title>Output a Model value</title> <script language="javascript" src="../formfaces/formfaces.js"/> <link rel="stylesheet" type="text/css" href="xforms.css" /> <form:model> <form:instance> <data xmlns=""> <name>Formfaces</name> </data> </form:instance> </form:model> </head> <body> <h2> <form:output ref="name"></form:output> </h2> </body> </html>
Execute [7]
XSLTForms
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="ajaxforms.xsl" type="text/xsl"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:form="http://www.w3.org/2002/xforms" xml:lang="en"> <head> <title>Output a Model value</title> <script language="javascript" src="formfaces.js"/> <link rel="stylesheet" type="text/css" href="xforms.css" /> <form:model> <form:instance> <data xmlns=""> <name>Formfaces</name> </data> </form:instance> </form:model> </head>
531
Execute [8]
Simple Controls
Firefox [9] Formfaces [10] XSLTForms [11]
Observations
FormFaces : breaks on Firefox XSLTForms : Changes only made when triggered
Multiple instances
A model can contain multiple instances, whose root node is accessed with the instance(id) construct.
<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xf="http://www.w3.org/2002/xforms" xmlns:ev="http://www.w3.org/2001/xml-events"> <head> <title>Test conditional selection lists</title> <xf:model> <xf:instance id="data" xmlns=""> <data> <selected-season>spring</selected-season> <selected-month>March</selected-month> </data> </xf:instance> <xf:instance id="seasons" xmlns=""> <seasons> <item name="winter"/> <item name="spring"/> <item name="summer"/> <item name="autumn"/> </seasons> </xf:instance> <xf:instance id="months" xmlns=""> <months> <item name="January" season="winter"/> <item name="February" season="winter"/> <item name="March" season="spring"/> <item name="April" season="spring"/>
532
<xf:itemset nodeset="instance('months')/item[@season=instance('data')/selected-season]"> <xf:label ref="@name"/> <xf:value ref="@name"/> </xf:itemset> </xf:select1> </div> <div> <xf:output ref="instance('data')/selected-season"> <xf:label>selected-season: </xf:label> </xf:output> <xf:output ref="instance('data')/selected-month"> <xf:label>selected-month: </xf:label> </xf:output> </div> </body> </html>
FireFox [12]
533
Date Entry
Firefox generates a drop-down calendar. Firefox [13] Formfaces currently has no Calendar widget Formfaces [14]
Server Interaction
Interaction with a server can be via GET or POST. This example is based on Dan McCreary's example in the XForms Wiki book [15]. On Firefox: <html xmlns:xf="http://www.w3.org/2002/xforms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns="http://www.w3.org/1999/xhtml"> <head> <title>XQuery Tester</title> <link rel="stylesheet" type="text/css" href="xforms.css" /> <xf:model> <xf:instance> <data xmlns=""> <input> <arg1>123</arg1> <arg2>456</arg2> </input> <result> <sum>0</sum> </result> </data> </xf:instance> <xf:submission id="get-instance" replace="instance" action="adderGet.xq" separator="&"> </xf:submission> method="get"
Simple XForms Examples <h1>XForm interaction with XQuery</h1> <xf:input ref="input/arg1" incremental="true"> <xf:label>Arg1:</xf:label> </xf:input> <br/> <xf:input ref="input/arg2" incremental="true"> <xf:label>Arg2:</xf:label> </xf:input> <br/> <xf:output ref="result/sum"> <xf:label> Sum:</xf:label> </xf:output> <br/> <xf:submit submission="get-instance"> <xf:label>Get</xf:label> </xf:submit> <xf:submit submission="post-instance"> <xf:label>Post</xf:label> </xf:submit> <p id="status"></p> </body> </html> Firefox [16] Formfaces [17] The respective server scripts are GET xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request"; let $arg1 := number(request:get-parameter("arg1", "0")) let $arg2 := number(request:get-parameter("arg2", "0")) return <data xmlns=""> <input> <arg1>{$arg1}</arg1> <arg2>{$arg2}</arg2> </input> <result> <sum>{$arg1+$arg2}</sum> </result> </data> POST xquery version "1.0"; declare namespace request="http://exist-db.org/xquery/request";
534
535
let $data := request:get-data() let $arg1 := number($data/arg1) let $arg2 := number($data/arg2) return <data xmlns=""> <input> <arg1>{$arg1}</arg1> <arg2>{$arg2}</arg2> </input> <result> <sum>{$arg1+$arg2}</sum> </result> </data> In this example, the whole model is updated and returned to the client. Alternatively, part of the model can be updated (tbc)
Generic XForms
Tabular example
A simple approach to generic XForms is illustrated in this script based on an example in the XForms wikibook declare option exist:serialize "method=xhtml media-type=application/xhtml+xml indent=yes"; let $data := <Data> <GivenName>John</GivenName> <MiddleName>George</MiddleName> <Surname>Doe</Surname> <CityName>Anytown</CityName> <StateCode>MM</StateCode> <PostalID>55123-1234</PostalID> </Data> return <html xmlns:xf="http://www.w3.org/2002/xforms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Formatting XForms</title> <link rel="stylesheet" type="text/css" href="xforms.css" />
Simple XForms Examples <style type="text/css"> { for $item in $data/* let $width := string-length($item) return concat('._',name($item),' .xf-value {width:', $width,'em} ') } </style> <xf:model> <xf:instance xmlns=""> {$data} </xf:instance> </xf:model> </head> <body> <fieldset> <legend>Name and Address</legend> {for $item in $data/* return ( <xf:input class="_{name($item)}" ref="/Data/{name($item)}"> <xf:label>{name($item)}: </xf:label> </xf:input>, <br/> ) } </fieldset> </body> </html> In this simple tabular example, the XForm and accompanying CSS to define input field widths is generated by reflection on the supplied instance. This example works correctly in Firefox [18] but the styling fails in Formfaces [19]
536
Simple XForms Examples <StateCode>MM</StateCode> <PostalID>55123-1234</PostalID> </Data> let $schema := <Schema> <Row name="GivenName" label="First Name" width="20"/> <Row name="Surname" label="Surname" width="15"/> <Row name="CityName" label="City" width="15"/> <Row name="StateCode" label="State" width="3"/> <Row name="PostalID" label="ZipCode" width="8"/> </Schema> return <html xmlns:xf="http://www.w3.org/2002/xforms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Formatting XForms</title> <link rel="stylesheet" type="text/css" href="xforms.css" /> <style type="text/css"> { for $item in $schema/* let $id := concat("_",$item/@name) let $width := $item/@width return concat('.',$id,' .xf-value {width:', $width,'em} ') } </style> <xf:model> <xf:instance xmlns=""> {$data} </xf:instance> </xf:model> </head> <body> <fieldset> <legend>Name and Address</legend> {for $item in $schema/* let $id := concat("_",$item/@name) let $label := string( $item/@label) return (
537
Simple XForms Examples <xf:input class="{$id}" ref="/Data/{$item/@name}"> <xf:label>{$label}: </xf:label> </xf:input>, <br/> ) } </fieldset> </body> </html> Firefox [20]
538
References
[1] [2] [3] [4] [5] http:/ / en. wikibooks. org/ wiki/ XForms http:/ / www. mozilla. org/ projects/ xforms/ http:/ / www. formfaces. com/ http:/ / www. agencexml. com/ xsltforms/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ xforms. css
[6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ outputMz. xq [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ outputFF. xq [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ outputXSLT. html [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ controlsMz. xq [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ controlsFF. xq [11] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ controlsXSLT. html [12] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ selectionwiki. xml [13] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ dateMz. xq [14] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ dateFF. xq [15] http:/ / en. wikibooks. org/ wiki/ XForms/ Adder [16] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ adderFormMz2. xq [17] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ adderFormFF2. xq [18] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ addressFormMz. xq [19] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ addressFormFF. xq [20] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xforms/ addressFormMzSchema. xq
SPARQL interface
539
SPARQL interface
The following script provides, via a Joseki server at UWE, a query interface to RDF. Literal language and datatype are ignored in this representation. URIs link to the browse query and also directly to the resource. A function converts the SPARQL XML Query result to a table, with links.
declare function fr:sparql-to-table($rdfxml,$script-name ) { (: literal language and datatype ignored in this representation. URI
links to the browse query and directly to the resource are generated :) let $vars := $rdfxml//sr:head/sr:variable/@name return <table border="1"> <tr> {for $var in return <th>{string($var)}</th> } </tr> { for return <tr> { for $var in $vars let $binding return <td> { typeswitch ($binding) case element(sr:uri) return := $row/sr:binding[@name=$var]/* $row in $rdfxml//sr:results/sr:result $vars
(<a href="{$script-name}?uri={string($binding)}">{ string($binding) }</a>, <a href="{string($binding)}"> ^ </a> ) case element(sr:literal) return string($binding) case element (sr:bnode) return concat("_:",$binding) default () } </td> } </tr> } </table> }; return
The SPARQL interface uses the configuration file to declare the namespaces.
SPARQL interface import module namespace fr="http://www.cems.uwe.ac.uk/wiki/fr" at "fr.xqm"; declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; declare namespace rdfs = "http://www.w3.org/2000/01/rdf-schema#"; declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=no indent=yes doctype-public=-//W3C//DTD XHTML 1.0 Transitional//EN doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";
540
declare variable $config-file := request:get-parameter("config", "/db/Wiki/RDF/empdeptconfig.xml"); declare variable $config := doc($config-file); declare variable $graph := concat("http://www.cems.uwe.ac.uk/xmlwiki/RDF/xml2rdf.xq?config=",$config-file); declare variable $default-engine := "http://www.cems.uwe.ac.uk/joseki/sparql"; declare variable $script-name := tokenize(request:get-uri(),'/')[last()]; declare variable $default-prolog := "PREFIX fn: <http://www.w3.org/2005/xpath-functions#> PREFIX afn: <http://jena.hpl.hp.com/ARQ/functions#> "; declare variable $browse := "select ?s ?p ?o where { {<uri> ?p ?o } UNION {?s ?p <uri>} UNION {?s <uri> ?o} }"; let let let let let $config-prolog := fr:sparql-prefixes($config) $query := request:get-parameter ("query",()) $uri := request:get-parameter("uri",()) $engine := request:get-parameter("engine",$default-engine) $query := if ($uri) then replace($browse,"uri",$uri) else $query
SPARQL interface "&query=",encode-for-uri($queryx) ) let $result := if ($query !="") then fr:sparql-to-table(doc($sparql), $script-name) else () return <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Emp-dept Query</title> </head> <body> <h1>Emp-dept Query</h1> <form action="{$script-name}"> <textarea name="query" rows ="8" cols="90"> {$query} </textarea> <br/> <input type="submit"/> </form> <h2>Result</h2> {$result} </body> </html>
541
Application
Query [1] The interface expands a query like select ?name ?job where { ?emp rdf:type f:emp. ?emp foaf:surname ?name. ?emp f:Job ?job. } into:
prefix foaf: <http://xmlns.com/foaf/0.1/> prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix f: <http://www.cems.uwe.ac.uk/xmlwiki/empdept/concept/> prefix xs: <http://www.w3.org/2001/XMLSchema#> select ?name ?job from <http://www.cems.uwe.ac.uk/xmlwiki/RDF/xml2rdf.xq?config=/db/Wiki/RDF/empdeptconfig.xml"> where { ?emp rdf:type f:emp.
SPARQL interface
?emp f:Job ?job. }
542
and sends this to the Joseki service. The graph to query is actually passed as the default graph rather than in the from clause.
To do
handle language and datatype local URIs as local names rather than full URIs# better handling of default graph - should be able to reference the cached rdf defined in the config file
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ sparqlquery. xq
SPARQL Tutorial
SPARQL interface
The emp-dept RDF can be queried using SPARQL via an XQuery front end This script supports SPARQL queries and browsing the RDF graph. The interface expands a query like select ?name ?job where { ?emp rdf:type f:emp. ?emp foaf:surname ?name. ?emp f:Job ?job. } that you can run here [4] into prefix prefix prefix prefix prefix select where ?emp ?emp ?emp } foaf: <http://xmlns.com/foaf/0.1/> rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> rdfs: <http://www.w3.org/2000/01/rdf-schema#> f: <http://www.cems.uwe.ac.uk/empdept/concept/> xs: <http://www.w3.org/2001/XMLSchema#> ?name ?job { rdf:type f:emp. foaf:surname ?name. f:Job ?job.
[1]
to a store
[2]
provided by Talis
[3]
and sends this to the Talis service in a form that can be run that you can run here Results XML [6] is converted to HTML.
[5]
SPARQL Tutorial
543
Example Queries
List all employees
select ?emp where { ?emp rdf:type f:emp. } Run [7]
SPARQL Tutorial
544
SPARQL Tutorial
545
|| ?job = "MANAGER")
SPARQL Tutorial
546
SPARQL Tutorial select (max(?sal) as ?maxsal) where { ?maxemp rdf:type f:emp. ?maxemp f:Sal ?sal. }
547
SPARQL Tutorial ?dept f:Dname ?edname. ?mdept f:Dname ?mdname. FILTER (?dept != ?mdept) }
548
SPARQL Tutorial foaf:surname ?name; :Sal ?sal; :Dept ?dept; :Job ?job. ?dept :DeptNo ?dno. } and if we don't need to return the resource itself, it can be anonymous prefix : <http://www.cems.uwe.ac.uk/empdept/concept/> select ?name ?sal ?dno ?job where { [ a :emp; foaf:surname ?name; :Sal ?sal; :Dept ?dept; :Job ?job ]. ?dept :DeptNo ?dno. }
549
Aggregate features
Aggregation functions like count() and sum() and the GROUP BY clause are not defined in SPARQL 1.0 although they are available on some services (such as the Talis [3] platform) in advance of standardisation in SPARQL 1.1.
Generic queries
The uniformity of the triple data model enable us to query the dataset in very general ways, which are useful if we know nothing about the data.
SPARQL Tutorial
550
SPARQL Tutorial }
551
Schema queries
The presence of schema data enables SPARQL to be used to query this meta-data. The results could be comapred with the results by directly querying the data.
SPARQL Tutorial
552
To do
the example RDF lacks language tags which are required to illustrate lang() function all queries to be moved to the codelist together with the SQL and XQuery equivalents
References
[1] [2] [3] [4] [5] [6] [7] [8] [9] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdeptquery. xq http:/ / api. talis. com/ stores/ cwallace-dev1 http:/ / www. talis. com/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdeptquery. xq?query= http:/ / api. talis. com/ stores/ cwallace-dev1/ services/ sparql http:/ / www. w3. org/ TR/ rdf-sparql-XMLres/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdeptquery. xq?id=1 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdeptquery. xq?id=4a http:/ / dallemang. typepad. com/
String Analysis
XQuery analyze-string
XSLT 2.0 includes the analyze-string construct which captures matching groups (in parentheses) in a regular expresssion. Strangely this is not available in XQuery. It is possible to use the XSLT construct by wrapping an XQuery function round a generated XSLT stylesheet, even though this seems rather painful. In this installation of eXist, the XSLT engine is Saxon 8.
declare function str:analyze-string($string as xs:string, $regex as xs:string,$n as xs:integer ) { transform:transform (<any/>, <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match='/' > <xsl:analyze-string regex="{$regex}" select="'{$string}'" > <xsl:matching-substring> <xsl:for-each select="1 to {$n}"> <match> <xsl:value-of select="regex-group(.)"/> </match> </xsl:for-each> </xsl:matching-substring> </xsl:analyze-string> </xsl:template> </xsl:stylesheet>, () ) };
String Analysis
553
String Analysis
let $regno := request:get-parameter("regno",()) return local:decode-regno($regno)
554
Decode tables
Separate tables decode codes to date ranges or areas. These tables are plain XML created from CSV files via Excel. The pre-83 area codes are currently incorrect. e.g. <CodeList id="Area83"> <Entry> <Code>AA</Code> <Location>Bournemouth</Location> </Entry> <Entry> <Code>AB</Code> <Location>Worcester</Location> </Entry> <Entry> <Code>AC</Code> <Location>Coventry</Location> </Entry> ...
Examples
1. A current number plate: WP05LNU [1] 2. One from the previous series: L162BAY [2]
Location Mapping
One use of this conversion is to display the locations on a map. Here we take a file of observed registration numbers, decode them all, group by location and generate a KML file with the locations geocoded through the Google API. <NumberList> <Regno>H251GBU</Regno> <Regno>WRA870Y</Regno> <Regno>ENB427T</Regno> <Regno>C406OUY</Regno> <Regno>N62VNF</Regno> <Regno>R895KCV</Regno> <Regno>C758HOV</Regno> <Regno>H541HEM</Regno> ...
(: this script plots the registration locations of a set of UK vehicle license plates using kml. :)
import module namespace str = "http://www.cems.uwe.ac.uk/string" at "../lib/string.xqm"; declare namespace reg = "http://www.cems.uwe.ac.uk/wiki/reg";
String Analysis
555
declare option exist:serialize "method=xml media-type=application/vnd.google-earth.kml+xml declare variable $reg:icon := "http://maps.google.com/mapfiles/kml/paddle/ltblu-blank.png"; declare variable $reg:patterns := <patterns> <pattern version="01" regexp="([A-Z][A-Z])(\d\d)[A-Z][A-Z][A-Z]"> <field>Area</field><field>Date</field> </pattern> <pattern version="83" regexp="([A-Z])\d+[A-Z]([A-Z][A-Z])"> <field>Date</field><field>Area</field> </pattern> <pattern version="63" regexp="([A-Z][A-Z])[A-Z]?\d+([A-Z])"> <field>Area</field><field>Date</field> </pattern> </patterns>;
indent=yes
omit-xml-declaration=yes";
declare function reg:decode-regno($regno) let $regno := upper-case($regno) let $regno := replace($regno, " ","")
return for $pattern in $reg:patterns/pattern let $regexp := concat("^",$pattern/@regexp,"$") return if (matches($regno,$regexp)) then let $analysis := str:analyze-string($regno,$regexp,count($pattern/field)) return <regno version="{$pattern/@version}"> {for $field at $i in $pattern/field let $value := string($analysis[position() = $i]) let $table := concat($field,$pattern/@version) let $value := /CodeList[@id=$table]/Entry[Code=$value] return element {$field} {$value/*} } </regno> else () };
String Analysis
else () };
556
return <Document> <name>Reg nos</name> {for $i in (1 to 10) return <Style id="size{$i}"> <IconStyle> <scale>{$i}</scale> <Icon><href>{$reg:icon}</href> </Icon> </IconStyle> </Style> } { let $locations := reg:regno-locations(doc($url)//Regno)
let $max := count($locations) for $place in distinct-values($locations) let $latlong := geo:geocode(concat($place,',UK')) let $count := count($locations[. = $place]) let $scale := max((round($count div $max order by $count descending return <Placemark> <name>{$place} ({$count})</name> <styleUrl>#size{$scale}</styleUrl> <Point><coordinates>{geo:position-as-kml($latlong)}</coordinates></Point> </Placemark> } </Document> * 10),1))
String Analysis
557
SMS service
The Department of Information Science and Digital Media supports an SMS service [4] with facilities to send and receive text messages. The service is paid for by the University of the West of England, Bristol and all traffic is logged. A decoder for UK vehicle license numbers is one of the demonstration services which are supported for mobile-originated (MO) text messages. The format of the text message is REG <regno> e.g. REG L162 BAY A text message in this format sent to our SMS mobile number 447624803759 passes through a PHP script which allows multiple SMS services to be supported. The script uses the first word of the message to identify the associated service endpoint, and then invokes that endpoint via HTTP, passing the prefix as code, the rest of the message as text and the origination mobile number as from. For the prefix REG, the associated endpoint is an XQuery script: http:/ / www. cems. uwe. ac. uk/ xmlwiki/ regno/ smsregno. xq The smsregno.xq script is essentially the parseregno script above. declare option exist:serialize "method=text media-type=text/text"; ... let $regno := request:get-parameter("text",()) let $data := local:decode-regno($regno) return concat("Reply: ", $regno , " was registered in ", $data/Area/Location, " between ", $data/Date/From , " and ", $data/Date/To )
The SMS switch then sends the Reply on to the originating mobile phone.
String Analysis
558
To do
solve problem with repetition modifiers (or function support for analayze-string) Pre-83 area code data Switch implementation in XQuery to replace the PHP application - awaits switch to eXist v2
References
[1] [2] [3] [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ regno/ parseregno. xq?regno=WP05LNU http:/ / www. cems. uwe. ac. uk/ xmlwiki/ regno/ parseregno. xq?regno=L162BAY http:/ / www. cems. uwe. ac. uk/ xmlwiki/ regno/ regnoMap. xq?url=/ db/ Wiki/ regno/ sample. xml http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ sms/
Tag Cloud
Counting Words
A tag cloud (or weighted list in visual design) is a visual depiction of user-generated tags, or simply the word content of a site, typically used to describe the content of web sites. One method of creating a tag cloud is to create a list of the words in a document, count the number of occurrences of each word, and depict the more frequently occurring words with a larger font size than the words that occur less frequently.
This version uses the \W+ regular expression (which matches non-alphabetical characters) to return word tokens.
Tag Cloud
559
Counting Keywords
Kurt Cagle suggested the following XQuery for counting keywords: declare namespace xqwb="http://xquery.wikibooks.org"; declare function xqwb:word-count($wordlist as element() ) as element() { <terms> {for $term in distinct-values($wordlist/term) let $term-count := count($wordlist/term[. = $term]) return <term count="{$term-count}">{$term}</term> } </terms> }; let $keywords := <keywords> <term>red</term> <term>green</term> <term>red</term> <term>blue</term> <term>violet</term> <term>red</term> <term>blue</term> <term>blue</term> <term>red</term> <term>orange</term> <term>green</term> <term>yellow</term> <term>indigo</term> <term>red</term> </keywords> let $result := xqwb:word-count($keywords) return $result [Execute [1]]
Tag Cloud
560
let $keywords := <keywords> <term>red</term> <term>green</term> <term>red</term> <term>blue</term> <term>violet</term> <term>red</term> <term>blue</term> <term>blue</term> <term>red</term> <term>orange</term> <term>green</term> <term>yellow</term> <term>indigo</term> <term>red</term> </keywords> let $result := xqwb:word-count($keywords) let $total := count($keywords/term) let $scale := 20 return <div> { for $term in $result/term let $fontSize := round( $term/@count div $total * 100 * $scale) order by $term return <span style="font-size:{$fontSize}%">{string($term)}</span>
561
References
[1] http:/ / www. cems. uwe. ac. uk/ xmldb/ rest/ / db/ Wiki/ wordCount. xq [2] http:/ / www. flickr. com/ photos/ tags [3] http:/ / www. cems. uwe. ac. uk/ xmldb/ rest/ / db/ Wiki/ wordCount_1. xq
Topological Sort
Motivation
You have a Directed Acyclic Graph (DAG) to track things such as a dependancy graph. You want to sort in input DAG of nodes so that in the output reflects the dependancy structure. The Topological Sort of a Directed Acyclic Graph puts nodes in a sequence such that every node references only preceding nodes. This ordering is needed for example in scheduling processes in a Pipeline. For example, given a DAG defined as <node id="a"> <ref id="b"/> <ref id="c"/> </node> <node id="b"> <ref id="c"/> </node> <node id="c"/> the topological order would be: <node id="c"/> <node id="b"> <ref id="c"/> </node> <node id="a"> <ref id="b"/> <ref id="c"/> </node> The definition of topological order can be simply expressed in XQuery: declare function local:topological-sorted($nodes) as xs:boolean { every $n in $nodes satisfies every $id in $n/ref/@id satisfies $id = $n/preceding::node/@id };
562
which is invoked as let $graph := <graph> <node id="a"> <ref id="b"/> <ref id="c"/> </node> <node id="b"> <ref id="c"/> </node> <node id="c"/> </graph> let $sortedNodes := <graph>{local:topological-sort($graph/node,())}</graph> return local:topological-sorted($sortedNodes)
Explanation
$ordered is initially the original sequence, $ordered is empty. At each iteration, the set of nodes which are dependant only on the ordered nodes are calculated and these are removed from the unordered nodes and added to the ordered nodes.
References
Tree View
563
Tree View
Motivation
You want a general purpose function that creates a tabular view of hierarchical data.
Method
We will write a recursive function to display each node and then to display each child in an HTML table. Some systems call this a "Grid View" of XML data.
element-to-nested-table function
The following function generates an HTML table with nested subtables for the child nodes. declare function local:element-to-nested-table($element) { if (exists ($element/(@*|*))) then <table> {if (exists($element/text())) then <tr class="text"> <th></th> <td>{$element/text()}</td> </tr> else () } {for $attribute in $element/@* return <tr class="attribute"> <th>@{name($attribute)}</th> <td>{string($attribute)}</td> </tr> } {for $node in $element/* return <tr class="element"> <th>{name($node)}</th> <td>{local:element-to-nested-table($node)}</td> </tr> } </table> else $element/text() }; Note that the rows displaying different kinds of items (text, attribute,element) are classed so that they may be styled.
Tree View
564
Document display
This function can be used in a script to provide a viewer for any XML document. declare namespace hc ="http://exist-db.org/xquery/httpclient"; declare option exist:serialize "method=xhtml media-type=text/html indent=yes"; (: function declaration :) let $uri := request:get-parameter("uri",()) let $element:= httpclient:get(xs:anyURI($uri),true(),())/hc:body/html return <html> <head> <title>Tree view</title> <style type="text/css"> th {{border-style:double}} tr {{border-style:dotted}} tr .attribute {{font-style:italic}} td {{border-style:ridge}} </style> </head> <body> <h1>Tree view of {$uri} </h1> {local:element-to-nested-table($element)} </body> </html> e.g. 1. 2. 3. 4. UWE's news feed [1] Whisky data [2] Employee data [3] Met Office shipping Forecast [4] mal-formed XML
References
[1] [2] [3] [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ treeview. xq?uri=http:/ / info. uwe. ac. uk/ news/ uwenews/ downloadxml. asp http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ treeview. xq?uri=/ db/ Wiki/ whisky1. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ treeview. xq?uri=/ db/ Wiki/ empdept/ emp. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ treeview. xq?uri=http:/ / www. metoffice. gov. uk/ weather/ marine/ shipping_forecast. html
Validating a hierarchy
565
Validating a hierarchy
Whilst schema validation can check for some aspects of model validity, business rules are often more complex than is expressible in XML schema. XQuery is a powerful language for describing more complex rules. One such rule is that a relationship should define a tree structure,for example, the relationship between an employee and her manager. Consider the following set of employees: <company> <emp> <name>Fred</name> <mgr>Bill</mgr> </emp> <emp> <name>Joe</name> <mgr>Bill</mgr> </emp> <emp> <name>Alice</name> <mgr>Joe</mgr> </emp> <emp> <name>Bill</name> </emp> </company> The criteria for a valid hierarchy are: 1. 2. 3. 4. one root (the boss); every employee has at most one manager; every employee reports finally to the boss; there are no cycles
In XQuery we can define the management hierarchy from the boss down to an employee as : declare function local:management($emp as element(emp) , $hierarchy as element(emp)* ) as element(emp)* { if ($emp = $hierarchy ) (: cycle detected :) then () else let $mgr := $emp/../emp[name=$emp/mgr] return if (count($mgr) > 1) then () else if (empty ($mgr)) (: reached the root :) then ($emp,$hierarchy) else local:management($mgr, ($emp,$hierarchy)) };
Validating a hierarchy The function is initially called as local:managment($emp,()) The hierarchy is built up as a parameter to allow cycles to be detected. Finally, the condition for the management structure to be a tree is declare function local:management-is-tree($company)) { let $boss := $company/emp[empty(mgr)] return count($boss) = 1 and (every $emp in $company/emp satisfies $boss = local:management($emp,())[1] ) };
566
The complex path here is it ensure that only the relevant li tags are included and that only terminals in a hierarchy of terms are included, hence the check that the li has no ul child. Execute [2]
References
[1] http:/ / xquery. typepad. com/ xquery/ 2007/ 10/ xquery-at-work. html [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ Scrape/ wikiweaponslist. xq
567
Method
Fetch the Category page for the book and re-format based on the initial letter of the page, hence skipping the category level.
return <html> <h1>Index of {$book}</h1> { for $letter in distinct-values($pages/substring(substring-after(.,'/'),1,1))[string-length(.) = 1] return <div> <h3>{$letter}</h3> <ul> {for $page in $pages[starts-with(substring-after(.,'/'),$letter)] let $url := concat($base,$page/h:a/@href) return <li> <a href="{$url}">{substring-after($page,'/')}</a> </li> } </ul> </div> } </html> XQuery Index [25]
568
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ wikiindex. xq?book=XForms [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ wikiindex. xq?book=XRX
Wikipedia Lookup
Page scraping is one way to retrieve a specific fact from a page provided its structure is stable. Here the task is to use wikipedia to find the Latin name for a bird, given its common name.
declare namespace h = "http://www.w3.org/1999/xhtml";
let $name := request:get-parameter("name",()) let $url := escape-uri(concat("http://en.wikipedia.org/wiki/",$name),false()) let $page := doc($url) let $genus := $page//h:tr[h:td[. ='Genus:']]/h:td[2] let $species := $page//h:tr[h:td[. ='Species:']]/h:td[2] let $binomial := string($page//h:tr[h:th//h:a[.='Binomial name']]/following-sibling::h:tr//h:b) return <bird name="{$name}" genus="{$genus}" species="{$species}" binomial="{$binomial}"/>
Here, the path to locate the data required, assuming the page is in Bird page format, involves complex XPath expressions. For example, the genus is the second cell in a table row whose first cell is 'Genus'. Black Swan [1] Wikipedia [2] The script often fails because: 1. the name is ambiguous Thrush [3]Wikipedia [4] 2. the name is too broad Kiwi [5] Wikipedia [6] It is not hard to see that more semantic markup with ontological relationships would be preferable to these uncertain contortions.
References
[1] [2] [3] [4] [5] [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ birdlinneas. xq?name=Black%20Swan http:/ / en. wikipedia. org/ wiki/ Black%20Swan http:/ / www. cems. uwe. ac. uk/ xmlwiki/ birdlinneas. xq?name=Thrush http:/ / en. wikipedia. org/ wiki/ Thrush http:/ / www. cems. uwe. ac. uk/ xmlwiki/ birdlinneas. xq?name=Kiwi http:/ / en. wikipedia. org/ wiki/ Kiwi
XML to RDF
569
XML to RDF
For the Emp-DEPT case study, RDF must be generated from underlying XML files. An XQuery script generates the RDF. It uses a configuration file to define how columns of a table should be mapped into RDF and the namespaces to be used. This mapping needs a little more work to allow composite keys and allow user defined transformations. An interactive tool to create this map would be useful.
. This work
This conversion illustrates a few of the differences between local datasets, whether SQL or XML, and a dataset designed to fit into a global database. Some decisions remain unclear. tables are implicitly within an organisational context. This context has to be added in RDF by creating a namespace for the local properties and identifiers the scope of queries is implicitly within organisational boundaries, but in RDF this scope needs to be explicit. In the SQL query select * from emp; emp is ambiguously either the class of employees or the set of employees in the company. In RDF this needs to be explicit, so that two kind of tuples need to be added: tuples to type employees to a company definition of employee tuples to relate the employee to the company (to be added) linkage to the global database requires two kinds of links: local properties need to be mapped to global predicates. Here the employee name is mapped to foaf:surname (but the case probably needs changing). Alternatively a local predicate f:name could be defined, which is equated to the foaf predicate with owl:samePropertyAs. local identifiers of resources to be replaced by global URIs. Here location is mapped to a dbpedia resource URI. Alternatively, the local URI f:location/Dallas could be equated to the dbPedia resource with owl:sameAs. (where? and why delay this?) foreign keys are replaced by full URIs, pointing directly to the linked resource. The name of this property is no longer the name of the foreign key (e.g. MgrNo but rather the name of the related resource (Manager). However, the foreign key itself might also need to be replaced. primary keys are also replaced by URIs, but the local primary key value, for example the employee number, will need to be retained as a literal if it is not purely a surrogate key. This perhaps should be mapped to rdf:label. datatypes are preferably explicit in the data to avoid conversion in queries although this increases the size of the RDF graph. namespaces have been expanded in full where they occur in RDF attribute values. An alternative would be to define entities in an DTD prolog as shorthand for these namespaces, but not all processors of the RDF would do the expansion. xml:base can be used to default one namespace.
[The choices made here are those of a novice and review would be welcome. ] Some issues not yet addressed: meta-data about the dataset as a whole - its origin, when and how converted, - these can be DC properties of a document, with each entity tied to that document as a part? an alternative approach to mapping would be to start with an ontology and add mapping information to it rather than generating it from the ad-hoc configuration file.
XML to RDF
570
Configuration file
To facilitate the conversion from XML to RDF, a separate configuration file is defined. Here is the configuration file for the emp-dept data.
<?xml version="1.0" encoding="UTF-8"?> <XML-to-RDF> <namespaces> <namespace prefix="f" uri="http://www.cems.uwe.ac.uk/empdept/concept/" /> <namespace prefix="ft" uri="http://www.cems.uwe.ac.uk/empdept/"/> <namespace prefix="rdf" uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#" /> <namespace prefix="rdfs" uri="http://www.w3.org/2000/01/rdf-schema#" /> <namespace prefix="foaf" uri="http://xmlns.com/foaf/0.1/" /> <namespace prefix="xs" uri="http://www.w3.org/2001/XMLSchema#" /> </namespaces> <map type="emp" prefix="f"> <source file="/db/Wiki/empdept/emp.xml" path="//Emp"/> <col name="EmpNo" pk="true" uribase="ft:emp" type="xs:string"/> <col name="Ename" prefix="rdfs" tag="label"/> <col name="Sal" type="xs:integer"/> <col name="Comm" type="xs:integer"/> <col name="HireDate" type="xs:date"/> <col name="MgrNo" tag="Mgr" uribase="ft:emp"/> <col name="MgrNo"/> <col name="DeptNo" tag="Dept" uribase="ft:dept"/> <col name="Ename" prefix="foaf" tag="surname"/> <col name="Job"/> </map> <map type="dept" prefix="f"> <source file="/db/Wiki/empdept/dept.xml" path="//Dept"/> <col name="Dname" prefix="rdfs" tag="label"/> <col name="Dname"/> <col name="Location" uribase="http://dbpedia.org/resource"/> <col name="DeptNo" pk="true" uribase="ft:dept" type="xs:string"/> </map> <map type="salgrade" prefix="f"> <source file="/db/Wiki/empdept/salgrade.xml" path="//SalGrade"/> <col name="HiSal" type="xs:integer"/> <col name="LoSal" type="xs:integer"/> <col name="Grade" pk="true" uribase="ft:grade" type="xs:integer"/> <col name="Grade" prefix="rdfs" tag="label"/> </map> </XML-to-RDF>
XML to RDF
571
declare function fr:expand($qname as xs:string?, $map ) as xs:string ?{ let $namespace := $map/..//namespace return if ($qname) then if (contains($qname,":")) then let $qs := tokenize($qname,":") let $prefix := $qs[1] let $name := $qs[2] let $uri := $namespace[@prefix=$prefix]/@uri return concat($uri,$name) else if ($namespace[@prefix = $qname]) then $namespace[@prefix = $qname]/@uri else $qname else () };
declare function fr:row-to-rdf($row as element() , $map as element() ) as element(rdf:Description) * { let $pk := $map/col[@pk="true"] let $pkv := string($row/*[name()=$pk/@name])
let $pkuri := fr:expand($pk/@uribase, $map) return <rdf:Description> {attribute rdf:about {concat($pkuri,"/",$pkv)}} { if ($map/@type) then let $typeuri := fr:expand(concat($map/@prefix,":",$map/@type),$map) return <rdf:type rdf:resource="{$typeuri}"/>
XML to RDF
else () } {for $col in $map/col
572
let $name := $col/@name let $data := string($row/*[name(.)=$name]) return if ($data !="") then element { concat(($col/@prefix,$map/@prefix)[1], ":", ($col/@tag,$name)[1])} { if ($col/@type) then (attribute rdf:datatype { fr:expand($col/@type,$map)} ,
declare function fr:map-to-schema ($map as element()) as element(rdf:Description) * { let $typeuri := fr:expand(concat($map/@prefix,":",$map/@type),$map) for $col in $map/col[@type] let $prop := concat( fr:expand(($col/@prefix,$map/@prefix)[1],$map ), ($col/@tag,$col/@name)[1]) let $rangeuri := ( fr:expand($col/@type,$map), fr:expand($col/@uribase,$map),"http://www.w3.org/2000/01/rdf-schema#literal")[1] return <rdf:Description rdf:about="{$prop}"> <rdf:type rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>
XML to RDF
573
Links
Get RDF [2] Cached RDF [3] Validate RDF [4]
Resource RDF
In addition each resource is retrieved as RDF. In this simple example, the request for a resource URI like: http:/ / www. cems. uwe. ac. uk/ empdept/ emp/ 7839 is re-written by Apache to http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdeptrdf. xq?emp=7839 and the script retrieves the RDF:Description of the selected resource from the RDF file directly. This mechanism does not conform to the recommended practice of distinguishing between information resources (such as the information about employee 7839) and the real world entity being represented. At present, the resource URI de-references directly to the RDF, rather than to indirect using the 303 mechanism recommended. declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; declare variable $rdf := doc("/db/Wiki/RDF/empdept.rdf"); declare option exist:serialize "media-type=application/rdf+xml"; (: better to just parse the uri itself :) let $param := request:get-parameter-names()
XML to RDF let $type := $param[1] return if ($type="all") then $rdf else let $key := request:get-parameter($type,()) let $resourceuri := concat("http://www.cems.uwe.ac.uk/empdept/",$type,"/",$key) return <rdf:RDF> {$rdf//rdf:Description[@rdf:about=$resourceuri]} </rdf:RDF>
574
To Do
compound primary keys conversion functions, for example to convert the case of strings, reformat dates added resources and relationships - here the a company entity and links from departments to company
References
[1] [2] [3] [4] http:/ / www4. wiwiss. fu-berlin. de/ bizer/ pub/ LinkedDataTutorial/ http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ xml2rdf. xq?config=/ db/ Wiki/ RDF/ empdeptconfig. xml http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdept. rdf http:/ / www. w3. org/ RDF/ Validator/ ARPServlet?URI=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ RDF/ empdept. rdf& PARSE& TRIPLES_AND_GRAPH=PRINT_BOTH
XML to SQL
575
XML to SQL
Tabular XML can be exported to SQL by generating the create statement: declare function generic:element-to-SQL-create($element) { ("create table ", name($element), $generic:nl , string-join( for $node in $element/*[1]/* return concat (" ",name($node) , " varchar(20)" ), concat(',',$generic:nl) ), ";",$generic:nl ) }; and the insert statements: declare function generic:element-to-SQL-insert ($element) { for $row in $element/* return concat ( " insert into table ", name($element), " values (", string-join( for $node in $element/*[1]/* return concat('"',data($row/*[name(.)=name($node)]),'"'), "," ), ");",$generic:nl ) }; and using these two functions in a script:
declare option exist:serialize "method=text media-type=text/text";
import module namespace generic = "http://www.cems.uwe.ac.uk/generic" at "../lib/generic.xqm"; let $x := response:set-header('Content-Disposition','inline;filename=emp.sql') return (generic:element-to-SQL-create(/EmpTable), generic:element-to-SQL-insert(/EmpTable) )
Generate SQL [1] (not yet tested ) This SQL is very general, with all fields defined as varchar because of the lack of a schema. With a Schema, appropriate datatypes could be defined in SQL.
XML to SQL
576
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ emp2SQL. xq
XPath examples
Motivation
You would like to select specific structures within an XML document. You would like to use a language that is consistent across all W3X XML Standards. This language is XPath.
XPath examples
<list-price>29.95</list-price> </book> <book> <title>XRX: XForms, Rest and XQuery</title> <author>Dan McCreary</author> <description>This book is an overview of the key architectural and design patters.</description> <format>Wikibook</format> <license>Creative Commons Sharealike 3.0 Attribution-Non-commercial</license> <list-price>29.95</list-price> </book> </books>
577
XPath provides a number of functions and axes to move around an XML structure.
XPath examples
578
Count the number of books using //. With eXist executes much faster on larger collections. count($books//book) (: should return 4 :) Get a sequence of all the titles in the book collection. $books//title/text()
XPath examples Calculate the total and average price of all the books in the collection. sum($books//list-price/text()) avg($books//list-price/text()) min($books//list-price/text()) max($books//list-price/text()) (: (: (: (: Should Should Should Should return return return return a a a a number number number number such such such such as as as as 139.84 :) 34.96 :) 29.95 :) 44.99 :)
579
The following scripts show some of these functions and axes in use.
Adding Predicates
A predicate is a qualifier that is added to the end of an XPath expression. It is usually used to filter out nodes from result set. Predicates are similar to WHERE constructs in SQL. $books//book[format='Wikibook'] Get just the titles of the wikibooks $books//book[format='Wikibook']/title/text() Get all the books that contain the word: 'XQuery" somewhere in the title $books//book[contains(title, 'XQuery')]/title/text()
Complex Predicates
1. node() 2. text() 3. * 4. string(..) 5. data(..) 6. child:: 7. parent:: 8. following-sibling:: 9. preceding-sibling:: 10. descendant:: 11. descendant-or-self Navigating around a tree with distinct tags [1] Navigating around a tree with a single tag [2]
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xpathExamples1. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ xpathExamples2. xq
580
581
References
[1] [2] [3] [4] http:/ / www. ibm. com/ developerworks/ xml/ library/ x-simplifyxmlreads. html?S_TACT=105AGX06& S_CMP=EDU#listing2 http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ atags. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ atags3. xq http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ links. xq?url=http:/ / en. wikipedia. org/ wiki/ XQuery
582
</xs:complexType> </xs:element> <xs:element name="Third"> <xs:complexType> <xs:attribute name="AttributeDecimal" use="required" type="xs:decimal"/> <xs:attribute name="AttributeTime" use="required" type="xs:time"/> <xs:attribute name="AttributeDateTime" use="required" type="xs:dateTime"/> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> </xs:element> <xs:element name="Unbounded" maxOccurs="unbounded"/> <xs:element ref="Ref"/> <xs:element name="WithRefType" type="RefType"/> <xs:element name="Optional" minOccurs="0"/> </xs:sequence> </xs:complexType>
583
</xs:schema>
local:process-element($sub-element)
case element(xs:annotation) return () case element(xs:element) let $max := if ($element/@maxOccurs = "unbounded") then 10 else ($element/@maxOccurs,1)[1] let $count := $min + $min)) return if ($count then >0 ) round(math:random() * ($max return let $min := ($element/@minOccurs,1)[1]
584
local:process-element(root($element)/xs:schema/xs:complexType[@name=$element/@type]) } else element {$element/@name} { if ($element/*) return local:process-element($sub-element) else local:process-type(($element/@type,"xs:string")[1]) } else () case element(xs:attribute) then attribute {$element/@name} { local:process-type($element/@type) } else () case element(xs:sequence) return return or math:random() > 0.5) if ($element/@use="required"
for $sub-element in $element/* return local:process-element($sub-element) case element(xs:all) for $sub-element return in $element/*
585
This function uses the element construct to create a new XML instance tree based on the information in the XML Schema file and the typeswitch construct to select the appropriate processing of a given element type.
586
Script
Here is a basic script which applies this function to the example schema. By default, the root of the generated instance is the first element in the schema or a named element if a root parameter is supplied. let $file := request:get-parameter("file",()) let $root := request:get-parameter("root",()) let $schema := doc($file) return if ($root) then local:process-element($schema/xs:schema/xs:element[@name=$root]) else local:process-element($schema/xs:schema/xs:element[1])
Execute [1]
Sample Output
<Root> <Start>C</Start> <Any> <AnyA>BADADAD</AnyA> <AnyC>BA</AnyC> <AnyB>D</AnyB> </Any> <Multiple> <Choice> <Third AttributeDecimal="8.937041402178778" AttributeTime="20:38:04" AttributeDateTime="1995-08-06T21:08:43"/> </Choice> <Unbounded>BDAACAAAA</Unbounded> <Ref>
587
</Multiple> <Multiple> <Choice> <Second AttributeString="ADAB" AttributeDate="2005-12-16"/> </Choice> <Unbounded/> <Unbounded>CABBB</Unbounded> <Unbounded/>
<First AttributeInt="21" AttributeString="AABC" AttributeBoolean="true"/> </Choice> <Unbounded>ADADDDDDB</Unbounded> <Unbounded>DCDCBDDD</Unbounded> <Unbounded>CBAD</Unbounded> <Unbounded>DAAAD</Unbounded> <Ref>
588
To do
full set of xml types mixed restriction (only ennumeration so far) Group AttributeGroup problem with missing attributes in a complexType hinting for distributions namespaces randomisation configuration
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ XMLSchema/ schema2instance. xq?file=/ db/ Wiki/ XMLSchema/ eg3. xsd
589
Method
XQuery is superior to XSLT in many ways. XQuery is designed to be brief and concise programming language that interleaves XML and functional language statements. Therefore XQuery programs are usually much smaller than XSLT. XQuery processors are also designed to use indexes so that XQueries over large data sets can run quickly. But unfortunately there are still some times when you must use XSLT. One example of this is in-browser transforms. The eXist database comes with an XQuery function that allows you to transform an XML file using XSLT.
where:
$input is the node tree to be transformed $stylesheet is either a URI or a node to be transformed. If it is an URI, it can either point to an external
location or to an XSL stored in the db by using the 'xmldb:' scheme. $params are the optional XSLT name/value parameters with the following structure: <parameters> <param name="param-name1" value="param-value1"/> </parameters>
The result is zero or one nodes. The namespace of the transform module is http://exist-db.org/xquery/transform'''. The transform:transform() function can be used to provide a service which accepts the url of an XML file, the url of an XSLT script and any other parameters which are passed to the stylesheet. Currently output is text/html. declare option exist:serialize "method=html media-type=text/html"; (: look for URL parameters for the XML file and the transform :) let $xslt:= request:get-parameter("xslt",()) let $xml := request:get-parameter("xml",()) (: now get a list of all the URL parameters that are not either xml= or xslt= :) let $params := <parameters> {for $p in request:parameter-names() let $val := request:get-parameter($p,()) where not($p = ("xml","xslt")) return <param name="{$p}" value="{$val}"/>
XQuery and XSLT } </parameters> return (: now run the transform :) transform:transform(doc($xml), doc($xslt), $params)
590
Form-based search
In this example, an XML file on one host is transformed by a XSLT script on another. The XSLT script defines a form to allow the use to select a subset of the entries in the XML file, followed by the search results, if any. 1. Stylesheet [1] 2. Data [2] 3. Search Whisky data [3] A sequence diagram describes the interaction involved: Sequence Diagram [4]
591
592
But you will note that using the following does not work: <xsl:import href="/exist/rest/db/test/xslt/common.xsl"/> Imported common.xsl <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="element"> <li> <xsl:value-of select="."/> </li> </xsl:template> </xsl:stylesheet>
XForms Example
You can also create a simple XForms example that serves as a front end to this script. See the XRX wikibook for an example of this XForms front end. xquery version "1.0"; declare option exist:serialize "method=xhtml media-type=text/xml indent=no process-xsl-pi=yes"; (: transform:transform($node-tree as node()?, $stylesheet as item(), $parameters as node()?, as xs:string) node()? :) let $transform := 'http://localhost:8080/exist/rest/db/xforms/xsltforms/xsltforms.xsl' let $form := <html xmlns="http://www.w3.org/1999/xhtml" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:xf="http://www.w3.org/2002/xforms"> <head> <title>XForms Template</title> <xf:model> <xf:instance xmlns="" id="save-data"> <data> <name>John Smith</name> </data> </xf:instance> </xf:model> </head> <body>
XQuery and XSLT <h1>XForms Test Program</h1> <xf:input ref="name"> <xf:label>Name: </xf:label> </xf:input> </body> </html> let $serialization-options := 'method=xml media-type=text/xml omit-xml-declaration=yes indent=no' let $params := <parameters> <param name="output.omit-xml-declaration" value="yes"/> <param name="output.indent" value="no"/> <param name="output.media-type" value="text/html"/> <param name="output.method" value="xhtml"/> </parameters> return transform:transform($form, $transform, $params, $serialization-options)
593
Caching Management
By default, once a document has been transformed it resides in the cache. This is very good for performance reasons if a file needs to be retransformed but sometimes if the source file changes the transform needs to be rerun. You can disable caching by changing the configuration file. In the file conf.xml change the @caching value from yes to no.:
<transformer class="org.apache.xalan.processor.TransformerFactoryImpl" caching="no"/>
http://demo.exist-db.org/exist/xquery.xml#N10375
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ whisky/ t4g. xsl [2] http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ pipes/ whisky. xml [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ util/ xslt2html. xq?xml=http:/ / www. cems. uwe. ac. uk/ ~cjwallac/ apps/ pipes/ whisky. xml& xslt=http:/ / www. cems. uwe. ac. uk/ xmlwiki/ whisky/ t4g. xsl [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ SequenceDiagram/ showDiagram. xq?id=whiskyxslt
594
Execution environments
The eXist demo server is used for the XQuery examples. These are returned either as plain XML or converted to table format. The equivalent SQL queries are executed on an w:MySQL server,also based at the University of the West of England in Bristol
Basic Queries
Counting Records
Task: How many Employees? SQL: select count(*) from emp MySQL [10] XQuery: count(//Emp) XML [11] Task: How many Departments? SQL: select count(*) from dept MySQL [12] XQuery: count(//Dept) XML [13]
Selecting records
Task: Show all Employees with a salary greater than 1000 SQL: select * from emp where sal > 1000; MySQL [14] XQuery: //Emp[Sal>1000] XML [15] Table [16] Task: Show all Employees with a salary greater than 1000 and less than 2000 SQL: select * from emp where sal between 1000 and 2000; MySQL [17] XQuery: //Emp[Sal>1000][Sal<2000] XML [18] Table [19] Here, successive filter conditions replace the anded conditions implied by 'between'. Although there is no 'between' function in XQuery, it is a simple matter to write one:
declare function local:between($value as xs:decimal, $min as xs:decimal, $max as xs:decimal) as xs:boolean { $value >= $min and $value <= $max };
XQuery from SQL which simplifies the query to //Emp[local:between(Sal,1000,2000)] XML [20] Table [21] and has the advantage that the conversion of Sal to a number is now implicit in the function signature. Task: Show all employees with no Commission SQL: select * from emp where comm is null; MySQL [22] XQuery: //Emp[empty(Comm/text())] XML [23] Table [24] Note that empty(Comm) is not enough, since this is true only if the element itself is absent, which in this sample XML it is not. XQuery: //Emp[empty(Comm)] XML [25] Task: Select the first 5 employees SQL: select * from emp limit 5; MySQL [26] XQuery: //Emp[position() <=5] XML [27] Table [28]
595
Selecting Columns
List Employee names and salaries SQL: Select ename,sal from emp MySQL [29] Surprisingly, selecting only a subset of children in a node (pruning) is not supported in XPath. //Emp/(Ename,Sal) XML [30] retrieves the required elements, but the parent Emp nodes have been lost. //Emp/(Ename|Sal) XML [31] is better since it keeps the elements in sequence, but it does not return Emp nodes with only the Ename and Sal children as required. //Emp/*[name(.) = ("Ename","Sal")] XML [32] uses reflection on the element names. XQuery: for $emp in //Emp return <Emp> {$emp/(Ename|Sal)} </Emp> XML [33] Table [34] Here an XQuery FLWOR expression is used to create a new EMP element from the original elements.
Computing values
Computing the Annual Salary Task: Compute the Annual Salaries of all employees. The Annual Salary is computed from 12 times the Monthly salary plus Commission. Since commission may be null, it must be replaced by a suitable numeric value: SQL: select 12 * sal + ifnull(comm,0) from emp; MySQL [35] XQuery: //Emp/(12*number(Sal)+(if(exists(Comm/text())) then number(Comm) else 0)) XML [36] The SQL function COALESCE is the same as IFNULL but will accept multiple arguments: SQL: select 12 * sal + coalesce(comm,0) from emp; MySQL [37] XQuery: //Emp/(12*number(Sal)+ number((Comm/text(),0)[1])) XML [38]
XQuery from SQL The lack of a schema in this simple example to carry information on the type of the items, leads to the need for explicit conversion of strings to numbers. Note the XQuery idiom: (Comm/text(),0)[1] computes the first non-null item in the sequence, the counter-part of COALESCE. Selecting and Creating Columns Task: List the employee names with their Annual Salary. SQL: select ename, 12 * sal + ifnull(comm,0) as "Annual Salary" from emp; MySQL [39] XQuery: for $emp in //Emp return <Emp> {$emp/Ename} <AnnualSalary> {12*number($emp/Sal)+ (if (exists($emp/Comm/text())) then number($emp/Comm) else 0) } </AnnualSalary> </Emp> XML [40] Table [41] Again we have the problem of tree-pruning, but now with added grafting, which again requires the explicit construction of an XML node.
596
SQL Operators
IN
Task: Show all employees whose Job is either ANALYST or MANAGER SQL: select * from emp where job in ("ANALYST","MANAGER") MySQL [42] XQuery: //Emp[Job = ("ANALYST","MANAGER")] XML [43] Table [44]
NOT IN
Task :Select all employees whose Job is not 'ANALYST' or 'MANAGER' SQL: select * from emp where job not in ("ANALYST","MANAGER") MySQL [45] This doesn't work: XQuery: //Emp[Job !=("ANALYST","MANAGER")] XML [46] Table [47] The generalised equals here is always true since everyone is either not an ANALYST or not a MANAGER. This works: XQuery: //Emp[not(Job =("ANALYST","MANAGER"))] XML [48] Table [49]
597
Distinct values
Task: Show the different Jobs which Employees have MySQL: select distinct job from emp; MySQL [50] XQuery: distinct-values(//Emp/Job) XML [51]
Pattern Matching
Task: List all Employees with names starting with "S" MySQL: select * from emp where ename like "S%"; MySQL [52] XQuery: //Emp[starts-with(Ename,"S")] XML [53] Table [54] See starts-with() [55] Task: List all Employees whose name contains "AR" MySQL: select * from emp where ename like "%AR%"; MySQL [56] XQuery: //Emp[contains(Ename,"AR")] XML [57] Table [58] See contains() [59] Task: List all Employees whose name contains "ar" ignoring the case MySQL: select * from emp where ename like "%ar%"; MySQL [60] LIKE in SQL is case insensitive, but fn:contains() is not, so the case needs to be converted: XQuery: //Emp[contains(upper-case(Ename),upper-case("ar"))] XML [61] Table [62] See upper-case() [63] More complex patterns need regular expressions. MySQL: select * from emp where ename regexp "M.*R"; MySQL [64] XQuery: //Emp[matches(Ename,"M.*R")] XML [65] Table [66] See matches() [67] Similarly, SQL's REGEXP is case-insensitive, whereas additional flags control matching in the XQuery matches() MySQL: select * from emp where ename regexp "m.*r"; MySQL [68] XQuery: //Emp[matches(Ename,"m.*r",'i')] XML [69] Table [70] ('i' makes the regex match case insensitive.)
Table Joins
Simple Inner joins Task: Find the name of the department that employee 'SMITH' works in: SQL : select dept.dname from emp, dept where dept.deptno = emp.deptno and ename='SMITH'; MySQL [71] XPath : //Dept[DeptNo = //Emp[Ename='SMITH']/DeptNo]/Dname XML [72]
XQuery from SQL Perhaps a FLWOR expression in XQuery would be more readable: let $dept := //Emp[Ename='SMITH']/DeptNo return //Dept[DeptNo = $dept ]/Dname XML [73] Task: To find the names of all employees in Accounting SQL: select emp.ename from emp,dept where dept.deptno = emp.deptno and dname='Accounting'; MySQL [74] XPath: //Emp[DeptNo = //Dept[Dname='Accounting']/DeptNo]/Ename XML [75] XQuery: let $dept := //Dept[Dname='Accounting']/DeptNo return //Emp[DeptNo = $dept]/Ename XML [76] Note that in this release of eXist, the order of the operands in the equality is significant - to be fixed in a later release. XQuery: //Emp[//Dept[Dname='Accounting']/DeptNo = //Emp/DeptNo]/Ename XML [77] More complex Inner Join Task: List the name of each Employee, together with the name and location of their department. SQL: select ename, dname,location from emp, dept where emp.deptno = dept.deptno; MySQL [78] Where elements must be selected from several nodes, XPath is insufficient and XQuery is needed: XQuery: This join could be written as: for $emp in //Emp for $dept in //Dept where $dept/DeptNo= $emp/DeptNo return <Emp> {$emp/Ename} {$dept/(Dname|Location)} </Emp> XML [79] Table [80]
598
XQuery from SQL But it would be more commonly written in the form of a sub-selection: for $emp in //Emp let $dept := //Dept[DeptNo=$emp/DeptNo] return <Emp> {$emp/Ename} {$dept/(Dname|Location)} </Emp> XML [81] Table [82] Inner Join with Selection Task: List the names and department of all Analysts SQL: select from where and MySQL [83] XQuery: for $emp in //Emp[Job='ANALYST'] let $dept := //Dept[DeptNo= $emp/DeptNo] return <Emp> {$emp/Ename} {$dept/Dname} </Emp> XML [84] Table [85] 1 to Many query Task: List the departments and the number of employees in each department SQL: select dname, (select count(*) from emp where deptno = dept.deptno ) as headcount from dept; MySQL [86] XQuery: for $dept in //Dept let $headCount := count(//Emp[DeptNo=$dept/DeptNo]) return ename, dname emp, dept emp.deptno = dept.deptno job="ANALYST";
599
XQuery from SQL <Dept> {$dept/Dname} <HeadCount>{$headCount}</HeadCount> </Dept> </pre> XML [87] Table [88] Theta (Inequality) Join Task: List the names and salary grade of staff in ascending grade order Grades are defined by a minimum and maximum salary. SQL: select ename, grade from emp, salgrade where emp.sal between salgrade.losal and salgrade.hisal; MySQL [89] XQuery:
for $emp in //Emp let $grade :=
600
XML [90] Table [91] Recursive Relations The relationship between an employee and their manager is a recursive relationship. Task: List the name of each employee together with the name of their manager. SQL: select e.ename, m.ename from emp e join emp m on e.mgr = m.empno MySQL [92] XQuery: for $emp in //Emp let $manager := //Emp[EmpNo = $emp/MgrNo] return <Emp> {$emp/Ename} <Manager>{string($manager/Ename)}</Manager> </Emp>
XQuery from SQL XML [93] Table [94] The XQuery result is not quite the same as the SQL result. King, who has no manager, is missing from the SQL inner join. To produce the same result in XQuery, we would filter for employees with Managers: for $emp in //Emp[MgrNo] let $manager := //Emp[EmpNo = $emp/MgrNo] where $emp/MgrNo/text() return <Emp> {$emp/Ename} <Manager>{string($manager/Ename)}</Manager> </Emp> XML [95] Table [96] Alternatively, an outer join returns all employees, including King: SQL: select e.ename, m.ename from emp e left join emp m on e.mgr = m.empno MySQL [97]
601
602
603
Path to Employee
Almost all the queries remain the same (except for the change of element name to Employee). This is because the path used to select Emps in the Emp.xml document is //Emp and is now //Employee in the merged document. If a full path had been used (/EmpList/Emp), this would need to be replaced by /Company/Department/Employee
Simple Navigation
Task: To find the department name of employee 'Smith' XQuery: //Employee[Ename='SMITH']/../Dname XML [101] Task: To find the names of employees in the Accounting department XQuery: //Department[Dname='Accounting']/Employee/Ename XML [102]
XQuery from SQL 1 - many To list departments and the number of employees in the separate tables is : for $dept in //Dept let $headCount := count(//Emp[DeptNo=$dept/DeptNo]) return <Dept> {$dept/Dname} <HeadCount>{$headCount}</HeadCount> </Dept> XML [87] Table [88] which becomes: for $dept in //Department let $headCount := count($dept/Employee) return <Department> {$dept/Dname} <HeadCount>{$headCount}</HeadCount> </Department> XML [105] Table [106]
604
XML [108] Better to factor out the XPath expression for the subset of employess:
let $managers := //Emp[Job='MANAGER'] return (count($managers),round(avg($managers/Sal)),min($managers/Sal),max($managers/Sal))
XML [109] It would be better to tag the individual values computed: let $managers := //Emp[Job='MANAGER'] return <Statistics> <Count>{count($managers)}</Count> <Average>{round(avg($managers/Sal))}</Average> <Min>{min($managers/Sal)}</Min> <Max>{max($managers/Sal)}</Max> </Statistics>
605
Grouping
Task: Show the number, average (rounded), min and max salaries for each Job. SQL: SELECT job, count(*), round(avg(sal)), min(sal), max(sal) FROM emp GROUP BY job; MySQL [111] In XQuery, grouping must be done by iterating over the groups. Each Group is identified by the Job and we can get the set (sequence) of all Jobs using the distinct-values function: for $job in distinct-values(//Emp/Job) let $employees := //Emp[Job=$job] return <Statistics> <Job>{$job}</Job> <Count>{count($employees )}</Count> <Average>{round(avg($employees/Sal))}</Average> <Min>{min($employees/Sal)}</Min> <Max>{max($employees/Sal)}</Max> </Statistics> XML [112] Table [113]
Hierarchical report
Task: List the departments , their employee names and salaries and the total salary in each department This must generate a nested table. SQL: ? XQuery: <Report> { for $dept in //Dept let $subtotal := sum(//Emp[DeptNo = $dept/DeptNo]/Sal) return <Department> {$dept/Dname} {for $emp in //Emp[DeptNo = $dept/DeptNo] return <Emp> {$emp/Ename} {$emp/Sal} </Emp> } <SubTotal>{$subtotal}</SubTotal> </Department> } <Total>{sum(//Emp/Sal)}</Total> </Report> XML [114]
XQuery from SQL Note that the functional nature of the XQuery language means that each total must be calculated explicitly, not rolled up incrementally as might be done in an imperative language. This has the advantage that the formulae are explicit and independent and can thus be placed anywhere in the report, such as at the beginning instead of at the end: <Report> <Total>{sum(//Emp/Sal)}</Total> { for $dept in //Dept let $subtotal := sum(//Emp[DeptNo = $dept/DeptNo]/Sal) return <Department> <SubTotal>{$subtotal}</SubTotal> {$dept/Dname} {for $emp in //Emp[DeptNo = $dept/DeptNo] return <Emp> {$emp/Ename} {$emp/Sal} </Emp> } </Department> } </Report> XML [115]
606
Restricted Groups
Task: Show the number, average (rounded), min and max salaries for each Job where there are at least 2 employees in the group. SQL: SELECT FROM GROUP HAVING job, count(*), round(avg(sal)), min(sal), max(sal) emp BY job count(*) > 1;
MySQL [116] XQuery: for $job in distinct-values(//Emp/Job) let $employees := //Emp[Job=$job] where count($employees) > 1 return <Statistics> <Job>{$job}</Job> <Count>{count($employees )}</Count> <Average>{round(avg($employees /Sal))}</Average> <Min>{min($employees /Sal)}</Min> <Max>{max($employees /Sal)}</Max>
607
Date Handling
Selecting by Date
Task: list all employees hired in the current millenium SQL: SELECT * from job where hiredate >= '2000-01-01' MySQL [119] XQuery: //Emp[HireDate >= '2000-01-01'] Actually this comparison is a string comparison because of the lack of a schema to define HireDate as an xs:date. XML [120] Table [121]
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=1 [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=1 [3] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=2 [4] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=2 [5] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=2 [6] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=3 [7] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=3 [8] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=3 [9] http:/ / 2sun. org/ scott-tiger-port-mysql [10] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=4 [11] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=4 [12] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=5 [13] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=5 [14] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=7 [15] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=7 [16] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=7 [17] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=8 [18] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=8 [19] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=8 [20] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=9 [21] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=9 [22] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=10 [23] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=10 [24] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=10 [25] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=10a [26] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=42 [27] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=42 [28] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=42 [29] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=11 [30] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=11 [31] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=12 [32] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=12a [33] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=13 [34] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQueryTable. xq?id=13 [35] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=14 [36] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=14 [37] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=14a [38] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=14a [39] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=16 [40] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runXQuery. xq?id=16
608
[97] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runSQL. xq?id=32 [98] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runQuery. xq?id=33 [99] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ empdept/ runQuery. xq?id=50
609
XQuery IDE
In progress The eXist Database stores binary files as well as XML files. Binary files include the XQuery scripts themselves. This allows XQuery scripts to manipulate the XQuery scripts themselves - viewing, searching, analyzing, modifying and creating scripts, all the operations required of a development environment.
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ IDE/ listXQueryText. xq?name=listXQueryText. xq
XSL-FO Images
610
XSL-FO Images
Motivation
You want to enrich your documents with print quality images and charts etc.
Method
We will use the fo:external-graphic primitive. For example, to add an external image add a block to the XSL-FO:
<fo:block> <fo:external-graphic src="http://www.uwe.ac.uk/includes/branding/better/engine/images/logo.gif"/> </fo:block>
execute [1]
Vector Images
SVG is a standard way to describe graphical artwork as vectors. Recent eXist installations (>1.4) with the Apache FOP processor enabled can embed SVG data in the resulting PDF as vector art: just reference them via http redirection as they are not in the file system. See ../Generating PDF from XSL-FO files/ on how to activate the XSLFO feature. <fo:block> <fo:external-graphic src="http://localhost:8080/exist/rest/db/logo.svg"/> </fo:block>
I had to restart exist to activate pdf-images support in fop. The fo syntax is the same as with SVG, a page-number can be specified after a hash sign in the URL.
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xsl-fo/ helloworld-image. xq
XSL-FO SVG
611
XSL-FO SVG
Motivation
You want to convert XML into PDF and embed SVG graphics in the output.
Method
eXist of current trunk supports placing svg in fop rendered pdf out of the box, the only step being to activate the XSLFO module before building the source: echo "include.module.xslfo = true" > $EXIST_HOME/extensions/local.build.properties Beware: On headless *nix systems, make sure that there is no DISPLAY environment variable set, when eXist-db is started, otherwise the apache batik svg renderer may throw an exception.
Sample XQuery
Highlights of the xquery: The FO is created from an xsl transform: I know, this could be done with xquery, but I can reuse the xsl in other places. The document is found through a search. The xsl stylesheet resides in "views", as the pdf.xq does. I create lots of variables, but its better that way. Filename deduced from request. There may be errors, as I stripped down working code! It may be overly complicated due to my ignorance of xquery features. I think you will find your way. Application directory laid out as advised in the XRX wiki. let let let let let let let $host := "http://localhost:8080/exist/rest/" $home := "/db/apps/myApp" $match := collection(concat($home, "/data"))//produkt[@uuid=$uuid] $coll := util:collection-name($match) $file := util:document-name($match) $xsls := concat($home, "/views/foPDF.xsl") $rand := concat("?", util:random())
let $params := <parameters> <param name="svgfile" value="{$host}{$home}{$file}.svg"/> <param name="rand" value="{$rand}"/> </parameters> let $tmp := transform:transform(doc(concat($coll, "/", $file)), doc($xsls), $params) let $pdf := xslfo:render($tmp, "application/pdf", (), ()) (: substring-afer-last is in functx :) let $fname := substring-after(request:get-path-info(), "/") return response:stream-binary($pdf, "application/pdf", $fname)
XSL-FO SVG
612
<!-- place SVG in PDF output --> <fo:block-container> <fo:block> <fo:external-graphic src="{$svgfile}{$rand}"/> </fo:block> </fo:block-container>
XSL-FO Tables
Motivation
You want to be able to create high-quality tabular outputs suitable for book-publishing.
Method
To accomplish this we will convert our XML into XSL-FO tables. Unlike HTML, XML-FO allows you to create flows of text and you can set up rules on how objects span page boundaries.
Sample Input
Here is a sample XML file that contains a table with two columns. <table heading="Department Phone Extensions"> <Person> <Name>John Doe</Name> <Extension>1234</Extension> </Person> <Person> <Name>Sue Smith</Name> <Extension>5678</Extension> </Person> </table>
XSL-FO Tables We would like this XML file to be rendered with two columns, the first containing the person's name and the second their phone extension. It should look like the following.
613
Example FO File
The following is the core of the XML-FO layout that you will need to create the table (without control on the column widths).
<fo:block xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:block font-size="14pt" padding="10px" font-family="Verdana">Department Phone Extensions</fo:block> <fo:block font-size="10pt"> <fo:table border="solid" border-collapse="collapse"> <fo:table-header> <fo:table-row> <fo:table-cell> <fo:block font-weight="bold">Name</fo:block> </fo:table-cell> <fo:table-cell> <fo:block font-weight="bold">Extension</fo:block> </fo:table-cell> </fo:table-row> </fo:table-header> <fo:table-body> <fo:table-row> <fo:table-cell> <fo:block>John Doe</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>1234</fo:block> </fo:table-cell> </fo:table-row> <fo:table-row> <fo:table-cell> <fo:block>Sue Smith</fo:block> </fo:table-cell> <fo:table-cell> <fo:block>5678</fo:block> </fo:table-cell> </fo:table-row> </fo:table-body>
</fo:table>
XSL-FO Tables
</fo:block> </fo:block>
614
615
XQuery integration
Finally we can generate the full XSL-FO document and render as PDF with an XQuery script. We use the XSLT to transform the table, and then embed that XSL-FO fragment in the XSL-FO master before rendering as PDF and streaming the binary document. There are of course other ways to assemble the full XSLT-FO document.
xquery version "1.0"; import module namespace xslfo="http://exist-db.org/xquery/xslfo"; import module namespace transform="http://exist-db.org/xquery/transform"; declare namespace fo="http://www.w3.org/1999/XSL/Format"; let $table := <table heading="Department Phone Extensions"> <Person> <Name>John Doe</Name> <Extension>1234</Extension> </Person> <Person> <Name>Sue Smith</Name> <Extension>5678</Extension> </Person> </table> let $table-fo := transform:transform($table,doc("/db/Wiki/eXist/xsl-fo/table2fo.xsl"),()) let $fo := <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="my-page"> <fo:region-body margin="1in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="my-page"> <fo:flow flow-name="xsl-region-body"> <fo:block> {$table-fo} </fo:block> </fo:flow> </fo:page-sequence> </fo:root> let $pdf := xslfo:render($fo, "application/pdf", ()) return response:stream-binary($pdf, "application/pdf", "output.pdf")
Execute [1]
XSL-FO Tables
616
Database data
As a further example, the following XQuery selects all employees and renders them in a PDF table:
xquery version "1.0"; import module namespace xslfo="http://exist-db.org/xquery/xslfo"; import module namespace transform="http://exist-db.org/xquery/transform"; declare namespace fo="http://www.w3.org/1999/XSL/Format"; let $table := <table heading="Employees"> {//Emp} </table> let $table-fo := transform:transform($table,doc("/db/Wiki/eXist/xsl-fo/table2fo.xsl"),()) let $fo := <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="my-page"> <fo:region-body margin="1in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="my-page"> <fo:flow flow-name="xsl-region-body"> <fo:block> {$table-fo} </fo:block> </fo:flow> </fo:page-sequence> </fo:root> let $pdf := xslfo:render($fo, "application/pdf", ()) return response:stream-binary($pdf, "application/pdf", "output.pdf")
Execute [2]
References
[1] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xsl-fo/ table2fo. xq [2] http:/ / www. cems. uwe. ac. uk/ xmlwiki/ eXist/ xsl-fo/ emp2fo. xq
617
618
619
620
621
License
622
License
Creative Commons Attribution-Share Alike 3.0 Unported //creativecommons.org/licenses/by-sa/3.0/