Вы находитесь на странице: 1из 9

Understanding Structured Authoring

One of the most important characteristics of effective technical writing is consistency; be it in form, function, or style of writing. Structured authoring helps maintain consistency in the structure of content. For example, an organization might define a chapter and sections within the chapter as:

Sample Structure for a Chapter

Sample Structure for a Section Most organizations define a content structure and propagate it through training, mentoring, and style guides. It is with practice that technical writers master the intricacies of the defined content structure. Until then looking up the style guide or talking to the more experienced members of the team is the only option. Even after that, it is only the very disciplined of technical writers who is able to stay faithful to the defined structure. Deviations from the defined structure are typically detected in a review. So one of the reviewers focus areas is ensuring that the defined content structure has been maintained. Given that a reviewer typically handles multiple technical writers, a significant amount of time of the reviewers time is spent on activities like checking document structure. The reviewer should ideally been concentrating on the content; relevance, completeness, flow, readability, and more. Anything that takes away time from the focus on content reduces the effectiveness of the reviewer.

NOTE: Structured authoring is especially useful for organizations with large and/or geographically dispersed technical communications teams.

Structured authoring uses technology to define and enforce content structure. It aids the technical writer in adhering to the structure by specifying the allowable type of content at a particular place in the document. Even if the technical writer moves away from the defined structure, the structured authoring system will warn the technical writer of the misdemeanor. (You cannot use a table here!) So the technical writer no longer needs to remember the prescribed structure and the associated dos and donts. The structured authoring system does that! From the reviewers point of view, structured authoring systems free them from having to check for validity of a documents structure. That is more time that the reviewer can dedicate to the content. Another benefit is that structured authoring systems often allow presentation (formatting) to be associated with content based on its position in the document structure. For example, a title in a first level section may use the style Heading 1, while a title within a sub-section may use the style Heading 2. So the technical writer does not have to remember the formats to be applied. It is also one more thing off the table for a reviewer. All in all structured authoring aids in improving the efficiency of the technical writing team and helps them in improving effectiveness.

Structured Authoring: Defining the Content Structure


In an earlier post, we had introduced the concept of structured authoring. From that post, we realise that there are two imperatives for structured authoring: 1. 2. defining a content structure enforcing the defined structure

In this post, we will look at the first and the most important part of implementing structured authoring systems, defining the content structure. Assuming we are creating a content structure for a particular document, we have to analyze: 1. 2. 3. the structure of the document the type of content in each section the relationship between various types of content

NOTE: In the following sections, a * against an information unit indicates that it is optional, but can be used more than once. A

+ against an information unit it must be used at least once. If there is no

symbol against an information unit, it is mandatory and can be used only once.
Consider an email. We could define its overall structure as:

Thus, an email starts with the Address section. The next section in an email is the Subject, and this is followed by the Body. All three sections are mandatory and must appear in the specified order. Let us take a closer look at the Address section. It could be defined as:

Thus, the Address section must have a To section while the CC and BCCsections are optional. However, if both CC and BCC are used then CC must precede BCC. Next we take define the TO, CC, and BCC elements as containing one or more email addresses:

Finally, we define Body as starting with a mandatory Salutation (we are sticklers for etiquette! followed by the Message which must have at least one Para. This Para can be followed by one or more Para, List, and/or Imageelements in any order. Finally, there is the mandatory Signature section.

We define the Para as containing Text with the option of having Hyperlink,Bold and/or Italic elements. Just as we defined a Para, we can define List and its constituents as:

Thus, you can see that defining the content structure has to be an extremely meticulous exercise resulting in a detailed content definition. This is typically the job of an information architect. Having analyzed the content and defined a content structure, the next step is to encode the content structure using technology. Stay tuned!

Structured Authoring: Introducing XML


The previous post, Structured Authoring: Defining the Content Structure, described the first step in adopting structured authoring; analyzing and defining the content structure. However, as of now, the content structure is on paper. The next step is to use technology to encode and enforce the defined content structure. So what kind of technology would we need? When choosing a technology for structured authoring, your primary requirement is that the chosen technology should allow you to define the content structure.

So, you should be able to:

identify the start and end of each piece of content. For example, the start and end of the body of an email. describe/label each piece of content using terms relevant to you. For example, give the body of the email a name such as Message. specify relationships that exist between pieces of content. For example, every mail starts with an Address. Each Address must contain a To, CC, or BCC section. It can also contain any or all of these sections. But when more than one of the sections are present, they must be in the specified order. That is, if a mail has To and BCC, then To must come beforeBCC.You get the picture, so we wont bore you further!

specify which pieces of content are mandatory and which are optional.

These are just some of the most requirements and one technology that meets all these is eXtensible Markup Language, known popularly by its acronym XML. A First Look at XML XML is a markup language that helps you to define and use tags to describe content. XML is best explained using an example. Consider an email that could be represented as:

Content enclosed in <> is called a tag. Tags are used to identify and describe the content. Tags are always used in pairs and enclose relevant content within them.<tag_name> is called a start tag, while </tag_name> is called a end tag. Tag names are something you decide to suit the content you are describing. For example, we have used <message> to describe the email body.

You can have tags within tags to create a hierarchy for the content and establish a relationship between various pieces of content. For example, you have an outer tag <email> which contains the tags <address>,<subject>, and <message>. This can be interpreted as an email contains an address, subject, and message.

This is a very basic explanation of XML. As you can see from the example, the XML document describes and structures the content in a human-readable form.

Structured Authoring: The Role of the DTD/Schema


In an earlier post, Structured Authoring: Introducing XML, we saw the role XML had to play in structured documentation. Now let us take a look at how the implementation works. In the post titled Structured Authoring: Defining the Content Structure, we discussed how to go about analysing the content and breaking it up into smaller units. While the content analysis and definition is a pen-and-paper exercise, the content structure has to be defined using technology so that it can be enforced. This is where the DTD or Schema comes in.

NOTE: For the purpose of this preliminary discussion, we are going to use DTD and Schema as analogous terms.
What does a DTD do? A document type definition, or a DTD as it is commonly called, defines the structure of a document. It specifies the document structure in terms of the tags that may can be used in a document, where and when these tags can be used, and the attributes that each of the tags may have. Every unique document will have its own DTD or Schema. For example, there will be separate DTDs for user manuals, release notes, case studies, and API references.When an author wishes to create a particular type of document, the author has to base the document on the relevant DTD.

NOTE: The DTD itself is an XML document and must adhere to all rules of XML.

Validating an XML Document Against the DTD


When a DTD is associated with an XML document, the XML document is compared against the DTD to ensure it is structured as specified by the DTD. This process of comparison is called validation and it is performed by thevalidating parser. When the XML document adheres to the structure defined in the DTD, it is said to be valid.

What Happens If an XML Document is Invalid?


When you are in the authoring stage, the XML editor typically warns you that the document is invalid, but you can save the document. However, when you generate the output or process the document in

any way, the invalid document will not be processed. This means that, typically, this document is omitted from the output. So, the DTD is the foundation on which the structured authoring system is built.

What is an XML Schema and why should I care?


What is an XML Schema? Some of you may already know this, others dont. So before Im going to share some more technical information about XML Schemas in subsequent blog posts, I better get some of the basics out of the way first. When you process and manage information in XML format, you can choose to use an XML Schema with your XML documents. Roughly speaking, an XML Schema can be used to define what you want your XML documents to look like. For example, in an XML Schema you can define: Which elements and attributes are allowed to occur in your XML documents How the elements can be or must be nested, or the order in which the elements must appear Which elements or attributes are mandatory vs. optional The number of times a given element can be repeated within a document (e.g. to allow for multiple phone numbers per customer, multiple items per order, etc.) The data types of the element and attribute values, such as xs:integer, xs:decimal, xs:string, etc. The namespaces that the elements belong to and so on. If you choose to create an XML Schema, it may define just some or all of the aspects listed above. The designer of the XML Schema can choose the degree to which the schema constraints the characteristics of the XML documents. For example, an XML Schema can be very loose and define only a few key features for your XML documents and allow for a lot of flexibility. Or it can be very strict to tightly control the XML data in every aspect. Or anything in between. The use of an XML Schema is optional, i.e. an XML Schema is not required to store, index, query, or update XML documents. However, an XML Schema can be very useful to ensure that the XML documents that you receive or produce are compliant with certain structural rules that allow applications to process the XML. In other words, XML Schemas help you to enforce data quality.

Validation
If an document complies with a given XML Schema, then the document is said to be valid for this schema. A document might be valid for one schema but invalid for another schema. The process of testing an XML document for compliance with an XML Schema is called validation. When an XML document is parsed by an XML parser, validation can be enabled as an optional part of the parsing process. Full validation of an XML document always requires XML parsing. For many documents and schemas, validation typically incurs only a small delta cost (in terms of CPU usage) on top of the cost of XML parsing.

What does an an XML Schema look like?


An XML Schema itself is an XML document! But, a very special document that needs to comply with very specific rules that are defined by -you guessed it!- another XML Schema, i.e. the schema for schemas. Large XML schemas can consist of multiple schema documents that reference each other through import and include relationships. This allows you to compose an XML Schema out of smaller building blocks in a modular fashion. I dont want to go into the syntax details of XML Schemas here, but there are some useful resources available:

The XML Schema Primer: http://www.w3.org/TR/xmlschema-0/ A tutorial: http://www.w3schools.com/schema/default.asp Best practices for XML Schema design: http://www.xfront.com/BestPracticesHomepage.html

When and why should I use an XML Schema?


Simply put, if you want to ensure data quality and detect XML documents that do not comply with an expected format, use an XML Schema and validate each document! However, what if XML documents pass through multiple components of your IT infrastructure, such as a message queue, an application server, an enterprise service bus, and the database system? If these components do not modify the XML but merely read & route it, examine whether all of these components need to validate each document. For example, if the application server has already validated a document before insertion into a DB2 database, does the document need to be validated again in DB2? Maybe not, if you trust the application layer. Maybe yes, if you dont. An XML Schema is also often used as a contract between two or more parties that exchange XML documents. With this contract the parties agree on a specific format and structure of the XML messages that they send and receive, to ensure seamless operation. Practically every vertical industry has defined XML Schemas to standardize XML message formats for the data processing in their industry. A good overview is given by the introduction of this article: Getting started with Industry Formats and Services with pureXML:http://www.ibm.com/developerworks/data/library/techarticle/dm-0705malaika/

How can I validate XML documents in DB2?


Simple. First, you register one or multiple XML Schemas in the DB2 database. This can be done with CLP commands, stored procedures, or through API calls in the JDBC or .NET interface to DB2. After a schema is registered in DB2, you can use it to validate XML documents in DB2, typically when you insert, load, or update XML documents. You can enforce a single XML Schema for all XML documents in an XML column, or you can allow multiple XML Schemas per column. A database administrator can force automatic validation upon document insert, or allow applications to choose one of the previously registered schema for validation whenever a document inserted.

And validation can also be done in SQL statements?


Yup. The SQL standard defines a function called XMLVALIDATE, which can be used for document validation in INSERT statement, UPDATE statements, triggers, stored procedures, and even in queries. Here is a simple example of an INSERT statement that adds a row to a customer customer table, which consists of an integer ID column and an XML column called doc: INSERT INTO customer(id, doc) VALUES (?, XMLVALIDATE( ? ACCORDING TO XMLSCHEMA ID db2admin.custxsd) ); The id and the document are provided by parameter markers ?, and the XMLVALIDATE function that is wrapped around the second parameter ensures validation against the XML Schema that has been regoistered under the identifier db2admin.custxsd. If the inserted document is not compliant with the XML Schema, the INSERT statement fails with an appropriate error message. Similarly, the XMLVALIDATE function can also be used in the right-hand side of the SET clause of an UPDATE statement that modifies or replaces an XML document. Ok, so much for now. In my next blog post well go into more detail.

Вам также может понравиться