Вы находитесь на странице: 1из 14

Discover the Wonders of XSLT

By Benot Marchal This is the first article in a new series introducing XSLT. XSLT is an acronym for XML Stylesheet Language Transformations, but I believe the W3C should change it into XML Scripting Language. Over the years, I have used XSLT to publish Web sites, to generate PDFs from documentation, to prepare e-commerce transactions, to build Web services, to import documents in databases, to construct UML models, to pre- or post-process articles, to generate Java code, ... you name it. If it involves manipulating an XML document, chances are XSLT is my favorite solution. Obviously, there's nothing you can do with XSLT that can't be done with straight Java or C#. Why bother learning a new language, then? Because XSLT is highly specialized, you will find that coding is faster and more maintainable.

Getting the Tools


Before going any further, you need to install an XSLT processor. Chance are there's already one on your machine because both Microsoft and Java ship with one. Microsoft's XSLT processor is MSXML. There's a command line interface that is great for testing, or you can call the processor from your application through the .NET run-time. On Java 1.4 or above, the XSLT processor is available via the javax.xml.transform package. For this series, I recommend that you install Eclipse and the ananas.org XM plugin. Eclipse is an IDE available on most platforms (Windows, Linux, and MacOS X). Refer to "Using XML for Web Publishing" for more details.

XSLT Basics
Listing 1 is a very simple stylesheet to show you what XSLT looks like. It takes an XML article and publishes it as an HTML page. Download the listings for a sample XML document. Listing 1: basic.xsl <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:a="http://psol.com/2004/article" version="1.0"> <xsl:output method="html"/> <xsl:template match="a:article"> <html><xsl:apply-templates/></html> </xsl:template> <xsl:template match="a:body"> <body> <xsl:apply-templates/> <p>This page was made with XML and XSLT.</p> </body> </xsl:template> <xsl:template match="a:para"> <p><xsl:apply-templates/></p> </xsl:template>

<xsl:template match="a:section"> <xsl:apply-templates/><hr/> </xsl:template> <xsl:template match="a:info/a:title"> <head><title><xsl:apply-templates/></title></head> </xsl:template> <xsl:template match="a:section/a:title"> <h1><xsl:apply-templates/></h1> </xsl:template> </xsl:stylesheet> An XSLT stylesheet is an XML document itself (this has several implications, as we will see in a minute). The instructions must appear in the http://www.w3.org/1999/XSL/Transform namespace. If you encounter problems with a stylesheet, make sure the namespace has been declared properly; it's the number one cause of problems that my students have. The root of the stylesheet is the <xsl:stylesheet> element. It needs a version attribute and the value must be "1.0." Below the root comes the <xsl:output> element that specifies whether the result is an HTML, XML, or text document. Then come the templates. Each template is a rule that transforms one or more elements from the source document into one or more elements in the result. For example, the template: <xsl:template match="a:article"> <html><xsl:apply-templates/></html> </xsl:template> specifies that <a:article> in the source becomes the <html> in the result. In other words, the root of the XML document becomes the HTML root. The <xsl:apply-templates/> instruction is a placeholder for the content of the element. In the above example, the processor inserts the article content between the <html> tags. The position of <xsl:apply-templates/> in the template is important because it determines where the element content appears in the result. Look at the following template: <xsl:template match="a:section"> <xsl:apply-templates/><hr/> </xsl:template> It inserts an horizontal line after the section content. If <hr/> is placed before the <xsl:applytemplates/>, the line would appear before the section so that <xsl:apply-templates/> represents the section content. The match attributes select to which source elements the template applies. In most cases, that's an element name. When there's a risk of confusion, you can specify a path (or conditions, as we'll see next month) to test on the element ancestor. The a:section/a:title path selects the <a:title> elements as a child of <a:section>. Note that it's <a:title> as a child of <a:section>, and not the opposite.

Finally, I'd like to draw your attention to syntax issues. A stylesheet is an XML document and it must respect the XML syntax, which means that: Elements need both a starting and ending tag (in HTML you often dispense with the ending tag). An empty element follows the XML convention, so <hr> is written as <hr/>. (Don't worry; the processor will remove the trailing slash.)

Testing and Exercise


I encourage you to download the listing and run the example for yourself. The listings also includes a small exercise so you can practice what you have learned. As you work with the listings, you will notice that the XML documents start with the following processing instruction: <?xml-stylesheet href="basic.xsl" type="text/xsl"?> It tells the processor which stylesheet applies to the document. In case it is being misunderstood, let me stress that the processing instruction appears in the XML document, not the stylesheet! So, if you want to apply another stylesheet to a document, you need to modify the document. Next month, we will cover XPath, attributes, and more XSLT instructions.

Discover the Wonders of XSLT: XPaths


By Benot Marchal Go to page: 1 2 Next This is Part 2 of the developer.com introduction to XSLT. The first part was about tools and the basic syntax. I recommend you read it first. Make sure you download the updated listings before reading any further.

XPaths
The style sheet language is made up of two W3C recommendations: XPath, which is a querying language XSLT itself, which is a scripting language with an XML syntax

A style sheet describes how to convert the input document into the output. XPath deals with the input; it allows you to retrieve values from the input document. XSLT deals with generating the output. It offers instructions to create elements, attributes and other XML markup in the output. XPaths are not unlike file paths and URLs, but are adapted to the XML syntax. For example, download the listings and open the sample2.xml file. The path to the document titles is the following: /a:article/a:info/a:title

Essentially, an XPath lists all the elements that lead to the one you're interested into, just like the way that a file path lists all the directories leading to the file you're interested in. The separator is the forwards slash, /. An XPath returns a node set, i.e. a list of nodes that match the XPath. A node set may contain zero (which most likely indicates an error in the XPath), one, or more nodes. The node set for the XPath above contains only one node (the article title). The element names in an XPath must be fully qualified, i.e. they must include both the namespace prefix and the local name. Make sure you declare the namespace prefix in the style sheet as well (see the example below).

Relative XPaths
The previous example was for an absolute path because it starts from the root of the document. XPaths may also be relative to the current node. Again, the concept is very similar to file paths that can either start from the root (or a disk under Windows) or be relative to the current directory. Absolute XPaths start with the forward slash; relative XPaths start with an element name. Assuming the current node is /a:article, the following XPath points to the article title. a:info/a:title You may recognize this XPath from the style sheet in the previous article. Indeed, the template match attribute contains an XPath, in most cases a relative one. As it interprets the style sheet, the XSLT processor keeps track of the current node. Some instructions, such as xsl:apply-templates and xsl:for-each (see below), change the current node.

Attributes and other special cases


To include an attribute in an XPath, prefix its name with the @ character. The following (relative) XPath selects the link's URI if the current node is a section: a:para/a:link/@uri The @ is not a separator but a prefix identifying attributes. Therefore, you still need the forward slash between the attribute name and its parent. The single and double dot (. and ..) represent the current element and the parent of the current element respectively. If the current element is a paragraph, ../a:para selects all the paragraphs in the section. The .. selects the paragraph's parent (the section); from there, the XPath selects all the paragraphs in the section. Note that this XPath may return a node set with several nodes, as many nodes as paragraphs in the section, in fact. To select all the paragraphs in the body, use this XPath: ../../a:section/a:para Using two slashes as a separator // selects amongst the descendants, as opposed to the children, of the element. The descendants include the children, the children of the children, the children of

the children of the children, and so on. The following absolute XPath selects all the titles (article and section titles): /a:article//a:title

Predicates
To conclude this section on XPaths, let's look at predicates. Predicates allow you to specify conditions that must apply to an element. The predicate appears between square brackets, [ and ], immediately after the element on which the condition applies. The following XPath selects links pointing to the XSLT recommendation: //a:link[@uri='http://www.w3.org/TR/xslt'] Predicates allow you to compare an XPath (@uri in this example) with a literal or another XPath. A whole set of functions also is available (see an XSLT reference for a complete list of functions). For example, this XPath uses the count function to select the paragraph from a section that has only one paragraph: //a:section[count(a:para) = 1]/a:para Note that the predicate appears after the element on which it applies, which is not necessarily the last element in the XPath. Be careful not to confuse the separator, /, with the predicate indicators, [ and ].

Attributes
Attributes have a weird syntax in XSLT: <a href="{@uri}"> The curly brackets, { and }, mark the content of the attribute as an XPath. If the curly brackets are missing, the processor assumes that the content is a literal. The XPath should return one node only. If it returns several nodes, the processor retains only the first one. Students of XSLT often confuse the curly brackets and the at symbol. Both are related to attributes, but they serve completely different roles. The curly brackets are part of XSLT; they indicate that the content of the attribute is an XPath. The at symbol is part of XPath; it indicates the path points to an attribute. A quick debugging tip: If you can't get what you want in an attribute, make sure you have not forgotten the curly brackets.

Regular Structure
When working on a style sheet, the output may be structured and repetitive. Then, it may be easier to use the xsl:for-each and xsl:value-of instructions. xsl:for-each loops over the node set. xsl:value-of prints the content of the first element in a node set. Used together, they allow you to loop over and format the result of an XPath.

For example, to print the paragraphs, you could write: <xsl:for-each select="/a:article/a:section/a:para"> <p><xsl:value-of select="."/></p> </xsl:for-each> Be warned that xsl:for-each changes the current node, so it is crucial that you use a relative XPath in the loop! An absolute path would select data outside of the loop, which is most likely not what you want.

A New Style Sheet


Listing 1 is an updated style sheet that demonstrates the techniques introduced in this article: The template for the body now prints the article title using an xsl:value-of and a table of content through an xsl:for-each. A predicate differentiates the templates for bold and italics. The style sheet inserts hyperlinks by using the special syntax for attribute contents.

Listing 1: updated style sheet <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:a="http://psol.com/2004/article"> <xsl:output method="html"/> <xsl:template match="a:article"> <html><xsl:apply-templates/></html> </xsl:template> <xsl:template match="a:body"> <body> <h1><xsl:value-of select="../a:info/a:title"/></h1> <h2>Table of Contents</h2> <ul> <xsl:for-each select="a:section"> <li><xsl:value-of select="a:title"/></li> </xsl:for-each> </ul><hr/> <xsl:apply-templates/> <p>This page was made with XML and XSLT.</p> </body> </xsl:template> <xsl:template match="a:para"> <p><xsl:apply-templates/></p> </xsl:template> <xsl:template match="a:section"> <xsl:apply-templates/><hr/> </xsl:template> <xsl:template match="a:info/a:title"> <head><title><xsl:apply-templates/></title></head> </xsl:template> <xsl:template match="a:section/a:title"> <h2><xsl:apply-templates/></h2>

</xsl:template> <xsl:template match="a:link"> <a href="{@uri}"><xsl:apply-templates/></a> </xsl:template> <xsl:template match="a:em"> <i><xsl:apply-templates/></i> </xsl:template> <xsl:template match="a:em[@role='bold']"> <b><xsl:apply-templates/></b> </xsl:template> </xsl:stylesheet>

Testing and exercise


I encourage you to download the listing and run the example for yourself. The listings also include a small exercise so that you can practice what you have learned. Remember to adapt the processing instruction, as explained in Part 1, if you change the style sheet. Next month, we will cover more XSLT instructions.

Discover the Wonders of XSLT: Advanced Techniques


By Benot Marchal Welcome to the third installment of Developer.com's introduction to XSLT. The first two parts (Part One and Part Two) have introduced the most fundamental XSLT instructions: templates and loops as a means to transform an XML document into either HTML or another XML document XPaths and predicates as a querying language to extract data from an XML document

Together, the first two parts cover 70% of XSLT coding needs and you write many fine stylesheets using only these techniques. This month, we will cover more advanced techniques that simplify XSLT coding.

Tests
One could argue that we have covered testing already through predicates. Yet there are cases where a simple if/then/else would do the job faster and more cleanly than predicates. XSLT offers two instructions for tests: xsl:if is the standard if statement xsl:choose is a switch statement that allows you to combine multiple tests and implement if/then/else

The simplest test looks like the following: <xsl:if test="count(a:para) > 1">

<p><xsl:value-of select="count(a:para)"/> paragraphs</p> </xsl:if> As the name implies, the test attribute holds... the test. The processor will output the content of the xsl:if element if test evaluates to true. The content can be any combination of text literal, XML elements, and XSLT instructions. The xsl:choose statement is a more sophisticated test, as follows: <xsl:choose> <xsl:when test="not(a:para)"> <p>no paragraphs</p> </xsl:when> <xsl:when test="count(a:para) = 1"> <p>one paragraph</p> </xsl:when> <xsl:otherwise> <p><xsl:value-of select="count(a:para)"/> paragraphs</p> </xsl:otherwise> </xsl:choose> The processor will output the content of the first xsl:when whose test attribute evaluates to true or, failing that, the content of xsl:otherwise. Be careful with the order of xsl:when statements because the processor will output the first one that is true only. Also, the xsl:otherwise statement is optional. To implement an if/then/else statement, you could use an xsl:choose with a single xsl:when and a single xsl:otherwise. A quick tip: The empty node set evaluates to false so to test for the presence of an element, it suffices to write the appropriate XPath in the test attribute. If the element exists, the XPath will return a non-empty node set; if the element does not exist, the XPath will return an empty node set. In the above example, I use the technique in the first xsl:when.

Generating Output: Text Literals


So far, in templates, loops, and tests, you learned to write the text literals and XML instructions as you want them to appear in the output. A typical template mixes text literals, XML elements, and XSLT statements: <xsl:template match="a:body"> <body> <h1><xsl:value-of select="../a:info/a:title"/></h1> <h2>Table of Contents</h2> <ul> <xsl:for-each select="a:section"> <li><xsl:value-of select="a:title"/></li> </xsl:for-each> </ul><hr/> <xsl:apply-templates/> <p>This page was made with XML and XSLT.</p> </body> </xsl:template>

There are cases where you need more control over the output. XSLT offers special instructions to generate text literals, elements, and attributes. xsl:text is an XSLT statement that generates a text literal. It is mostly identical to just typing the text literal with one simple difference: xsl:text preserves the spaces. The XSLT processor will normalize most text literals which is the sensible behavior in most cases. Uf you absolutely need the spaces though, use xsl:text. In practice, the most common application of xsl:text is the following: <xsl:text> </xsl:text> to insert one blank space (without the xsl:text instruction, the processor could remove the space as part of the normalization process).

Generating output:

attributes

xsl:attribute adds an attribute to the current element. In most cases, you just want to write the attribute as a literal, like this: <a href="{@uri}"> But xsl:attribute is an XSLT statement, so it can appear wherever a statement can appear. It is useful mostly for tests. The following example marks hyperlinks to my Web site in red: <a href="{@uri}"> <xsl:if test="starts-with(@uri,'http://www.marchal.com')"> <xsl:attribute name="style">color: red;</xsl:attribute> </xsl:if> <xsl:apply-templates/> </a> The xsl:attribute has a name parameter with the attribute's name and an optional namespace parameter with the attribute's namespace. xsl:attribute must appear before any other children; it is a mistake to insert a text literal or any instruction that will insert text before xsl:attribute. I like to think that the processor has not yet closed the start tag (>) when it encounters the xsl:attribute statement.

Generating Output:

elements

For completeness, note that the xsl:element statement exists. It is mostly similar to xsl:attribute, but it is seldom needed in practice. About the only sensible application is to compute an element name: <xsl:element name="record-{position()}"> <xsl:apply-templates/></xsl:element>

Output
While we're on the matter of generating output, let's return to the very instruction in any xsl:output stylesheet.

As you learned in Part 1, xsl:output controls whether the processor generates an HTML, text, or XML document. xsl:output supports more attributes that give you a lot of control over the output document. The most useful attributes are: encoding, which specifies the encoding. The default is UTF-8, but you can specify any valid encoding, such as UTF-16, ISO-8859-1 (Latin-1), and more. indent set to yes tells the processor to indent the code. This is handy for debugging. doctype-public and doctype-system control the DOCTYPE statement required by some XML vocabularies. omit-xml-declaration removes the XML declaration from the output document. It is not used often.

Discover the Wonders of XSLT: XSLT Quirks


By Benot Marchal You've made it to Part 4 of this XSLT introduction at developer.com. Congratulations! You have learned how to create efficient style sheets to process your XML documents. I trust that, as your experience with XSLT grows, you begin to appreciate how versatile the language is. As easy as it is to hack a style sheet quickly to reformat a document, it is equally easy to integrate style sheets in more serious production environments, such as processing orders in an e-commerce setup. Truly, XSLT is a tool with many uses.

XSLT Quirks
When I teach XSLT in seminars, the most frequent complaint is that XSLT is a verbose language. After a few exercises, students appreciate the power of the language but few like the syntax. My answer is twofold: First, you may want to consider a good XSLT editor. Many students like XML Spy (www.altova.com). On the Eclipse platform, I recommend XML Buddy (www.xmlbuddy.com). Still, I do most of my coding with BBEdit (www.barebones.com) or Boxer (www.texteditors.com). Both offer syntax coloring. Keep practicing. The XSLT syntax has one distinct advantage: It makes it almost impossible for your style sheets to produce invalid XML documents. With time, you will come to appreciate this.

Coding Styles
XSLT is a very specialized language with a distinct declarative flavor. The templates, introduced in Part 1, are a declarative set of rules to render and format XML elements. The developer need not explicitly call the templates; instead, each template specifies (in its match attribute) to which element it applies and the XSLT processor automatically calls the relevant templates. The declarative flavor is in sharp contrast with more generic languages (Java, C#, C++, Basic, or Pascal) where the developer has to call methods explicitly. In my experience, XSLT's declarative flavor is the source of some confusion.

Part 2 and Part 3 of this series introduced a more procedural flavor to XSLT with the for-each, if, and choose instructions. Some of my students jump on these instructions and never look back to declarative templates again... until they hit a wall. To avoid problems, you should realize that there are two different coding styles in XSLT: the declarative flavor that uses templates and predicates, and the procedural flavor with loops and tests. XSLT, as a language, is biased towards the first and this bias should be reflected in your style sheets.

Rewriting
Specifically, you might be tempted to write the following code: <xsl:template match="a:para"> <xsl:choose> <xsl:when test="@type='bold'"> <p><b><xsl:apply-templates/></b></p> </xsl:when> <xsl:otherwise> <p><xsl:apply-templates/></p> </xsl:otherwise> </xsl:choose> </xsl:template> Don't. The preceding code uses one template and then a test. The more correct expression in XSLT is to use two templates, one for each condition, and let the processor select the more appropriate one: <xsl:template match="a:para[@type='bold']"> <p><b><xsl:apply-templates/></b></p> </xsl:template> <xsl:template match="a:para"> <p><xsl:apply-templates/></p> </xsl:template> In the long run, it will be more readable and more maintenable. So, what should you do with the looping and test instructions? Think of them as shorthand for documents with a regular structure. A table would be a good example. A table has a very regular structure (columns and cells within columns), so using the shorthand notation of a loop will be more readable. If in doubt, stick to the declarative style. You will find that the declarative style is more powerful than the procedural one. Some concepts are trivial in the declarative style and nearly impossible to write with the procedural style.

Discover the Wonders of XSLT: Workflows


This article concludes the introduction to XSLT at developer.com. In the previous four articles, the series has covered the essentials of XSLT coding. The final article moves to more advanced subjects such as working with functions and multiple files.

Functions

Functions are implemented in XPath so they are valid wherever an XPath is valid. We have already encountered functions, such as count() and not(): count(a:para) A function takes zero, one, or more arguments and computes a result. The result may be a number, a string, or a node set. Much of the power of functions arises from their integration with XPaths. Functions can appear in predicates or, for those that return node sets, in place of an element: a:section[count(a:para) = 1] current()/a:para Because functions appear in XPaths, to use the result, you turn to the familiar value-of instruction: <xsl:value-of select="count(xxx)"/> As always, if the function/XPath returns multiple values, you will need the for-each or applytemplates instructions instead: <xsl:for-each select="a:section[count(a:para) > 1]">

Predefined Functions and Extensions


XPath and XSLT include functions to cover most common needs: string manipulation (substring, length), number manipulation (sum, conversion), boolean (negation), indexing (key search), and more. The XPath and XSLT recommendations themselves do a good job at documenting the function. I suggest you bookmark the recommendations. Still, you will find yourself looking for your favorite "insert name here" function. XSLT offers a halfbaked extension mechanism that links with functions written in Java, JavaScript, Python, C#, or other languages. Unfortunately, the W3C has not fully defined the extension mechanism; much is left as implementation details that create serious incompatibilities among XSLT processors. Therefore, to implement a function, you must forego portability and tie yourself to one specific XSLT implementation. If this is unacceptable to you, there are two workarounds. First, if at all possible, don't use extensions. As you become more familiar with XSLT, you will find that many algorithms are best implemented through XSLT native (and portable) templates. If you still need a function, check EXSLT. EXSLT defines standards for the most commonly requested extensions. Unless your needs are really exotic, chances are EXSLT covers them. However, because it's a voluntary effort and not part of the official W3C recommendations, not every processor supports EXSLT, although the major ones do. Again, check your processor documentation.

Many Documents
The default workflow with XSLT is to process one file through one style sheet. While this simple workflow is appropriate for basic applications, you may want something more sophisticated.

Figure 1: Four common XSLT workflows Figure 1 illustrates four common workflow options, clockwise: The default workflow, one XML document is the input for a style sheet that produces one document. The document() function (see below) opens multiple input documents but it still produces one output only. XSLT 2.0 (see below) supports multiple outputs. Think of an photo gallery where the style sheet generates as many HTML pages as there are photos in the input document. Finally, a batch engine extends the XSLT processor to work with directories and file hierarchies instead of isolated files. If you followed the exercises throughout the series, you have been using such a batch engine, (XM).

document() Function
The document() function opens a second (or a third, fourth, and so on) input document. The function takes the URI to the file, opens the file, parses it, and returns a node set with the file content. Because the result is a node set, you can query the result with an XPath, as we saw in the Functions section: document('params.xml')/p:Parameters/p:Param[@id='1'] The usual combination of for-each and apply-templates instructions offers many options to process the second document: <xsl:for-each select="document('params.xml')/p:Parameters/p:Param"> Typically, document() accesses parameter files. It is also handy to combine several documents into one output.

result-document
What about the opposite, taking an XML document and splitting it in multiple output documents? There is no solution with XSLT 1.0 but support for multiple output documents will be added in XSLT 2.0. At the time of writing, the draft XSLT 2.0 proposes the result-document element. Basically, anything that appears within a result-document element is written to a separate file. <xsl:result-document href="photo-{@id}.html"> <!-- ... --> </xsl:result-document> A word of warning: XSLT 2.0 has not been formally approved at the time of writing, so this feature may still change. Furthermore, chances are your XSLT processor does not implement it (most processors have a proprietary alternative, though). Again, consult your processor documentation if you need this feature. So far, dates are displayed in the ISO format: 2004-02-08. By using the substring-before() function, you can reformat it to the more common 02/08/2004 format.

Вам также может понравиться