Spreading the XML paradigm around
2006-03-09
In many real world projects it is unacceptable to use DocBook as is. You need remove unused elements and at the same time add new specific elements to improve usability of the whole documentation solution. With DocBook V5.0, RELAX NG and few clever tricks you can create custom version of DocBook together with processing rules in a few minutes.
Table of Contents
If you have ever tried to customize DocBook DTD you will agree with me that it is quite complex and painful task. Although DocBook DTD is heavily using parameter entities to make it modular and easier to customize, it is simply complex process and requires deep knowledge of general DTD rules and organization of DocBook DTD.
New DocBook version 5.0 is based on a very powerful schema language called RELAX NG. In RELAX NG it is very easy both to exclude existing elements or add new ones.
But changing schema alone is not the end of story. If you add new elements to schema, you must customize processing tools to support those new elements. DITA documentation system solves this problem by using specialization. Newly added elements use fixed values in schema to identify more generic element which has similar processing model. Norm has shown that this specialization can be easily implemented also in DocBook.
While I can see advantages of specialization there is one big problem in a way it is implemented—it completely relies on metainformations in schema that are used as a guide for formatting fallback. This means that you are not able to process your document without having schema. OTOH, concept of specialization is very interesting approach for extending schema with new elements. In the following text I will show you, how you can use specialization in DocBook without run-time dependency on schema.
Suppose that you want to create your own version of DocBook suited for writing texts about assembly language. You will probably need specific elements to capture things like registers or instructions. It is very easy to create such schema extension for DocBook
default namespace = "http://docbook.org/ns/docbook" namespace db = "http://docbook.org/ns/docbook" # new element for CPU registers db.register = element register { text } # new element for instructions db.instruction = element instruction { text } # combined pattern asm.inlines = db.register | db.instruction include "docbook.rnc" { # register and instruction are permitted everywhere in inline content db.general.inlines |= asm.inlines }
You can now use your new elements together with other DocBook elements inside your document.
<?xml version="1.0" encoding="UTF-8"?> <article xmlns="http://docbook.org/ns/docbook" version="5.0"> <title>Assembler guide</title> <para><register>AX</register> is 16bit register. You can use registers <register>AH</register> and <register>AL</register> to access its higher and lower parts. <instruction>MOV</instruction> is instruction for moving data between your registers and memory.</para> </article>
With customized schema you can easily edit and validate
your document. So far, so good. But problem will arise once you want
to process such document with the DocBook XSL stylesheets. You will
see warnings about unhandled elements register
and
instruction
. You can manually add missing templates into
stylesheets, but lets try to automate this step.
Norm proposes very clever way of augmenting schema with DocBook
code which should be used as a replacement for newly added
elements. This allow you to map new elements to existing elements with
similar formatting and more generic meaning. Our schema asmbook.rnc
augmented with mapping of
new elements:
default namespace = "http://docbook.org/ns/docbook" namespace db = "http://docbook.org/ns/docbook" namespace r = "http://nwalsh.com/xmlns/schema-remap/" # new element for CPU registers db.register = [ r:remap [ db:emphasis [ role = "bold" ] ] ] element register { text } # new element for instructions db.instruction = [ r:remap [ db:code [ ] ] ] element instruction { text } # combined pattern asm.inlines = db.register | db.instruction include "file:///e:/src/db/docbook/relaxng/docbook/docbook.rnc" { # register and instruction are permitted everywhere in inline content db.general.inlines |= asm.inlines }
You can see, that we use special element r:remap
to
define that register should be formatted as bold text and instruction
as a piece of code. This is sufficient for the start.
Now you can use Trang to convert RNC schema into RELAX NG XML syntax
which can be easily processed by XML tools (asmbook.rng
). Norm implemented
special functionality into the DocBook XSL2 stylesheets which can read
schema during document transformation to HTML and replace elements
on-the-fly. The problem of this approach is that it works only in XSLT
2.0 based stylesheets which are not complete yet, and it also depends on
schema availability during transformation.
My goal was to use Norm's schema annotations to automatically generate stylesheet customizations. This customization is generated just once and once you have it you do not need schema during transformation. Moreover my solution integrates with the DocBook XSL stylesheets (XSLT 1.0 based ones) which are the most used tool for processing DocBook content.
I created short, but little tricky stylesheet rng2customization.xsl
which
is able to generate XSLT code from schema annotations. This code hooks
itself deeply inside the stylesheets and renames elements before
actual processing.
To generate customization layer I simply processed my schema with this stylesheet:
saxon -o asmbook.xsl asmbook.rng rng2customization.xsl
Customization asmbook.xsl
can be used
together with any profiling enabled
stylesheet—e.g. fo/profile-docbook.xsl
,
html/profile-docbook.xsl
or
html/profile-chunk.xsl
. We just need to create
simple driver file which will combine base stylesheet with
specialization layer. For HTML output we can use the following
stylesheet (html.xsl
):
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:import href="http://docbook.sourceforge.net/release/xsl/current/html/profile-docbook.xsl"/> <xsl:import href="asmbook.xsl"/> </xsl:stylesheet>
And we are done. This stylesheet doesn't need to access schema during processing.
For more complex elements it is not always possible to map them
to existing DocBook structures. In this situation it is needed to
create appropriate XSLT templates manually. This approach can be
combined with our automatically generated specialization layer, but we
must disable this layer for elements that we handle manually. This
technique is shown on the following example (html-customized.xsl
):
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:import href="http://docbook.sourceforge.net/release/xsl/current/html/profile-docbook.xsl"/> <xsl:import href="asmbook.xsl"/> <!-- List of elements for which we want to use custom processing --> <xsl:template match="db:instruction" mode="remap" xmlns:db="http://docbook.org/ns/docbook"> <xsl:copy> <xsl:apply-templates select="node()|@*" mode="remap"/> </xsl:copy> </xsl:template> <xsl:template match="instruction"> <tt style="color: red; font-weight: bold;"><xsl:apply-templates/></tt> </xsl:template> </xsl:stylesheet>
Template in remap
mode is used to disable
default schema driven remapping for instruction
element. The last template defines custom handling of
instruction
element. Please note that in this template we
are referencing element in no namespace because stylesheets were
written primary for older version of DocBook that are not in
a namespace. The stylesheets thus remove DocBook namespace from source
documents prior processing.
As you can see, customizing DocBook V5.0 can be very easy. Dumb easy!