xmlguru.cz

Spreading the XML paradigm around

DocBook specialization made easy

2006-03-09

In many real world projects it is unacceptable to use DocBook as is. You need remove unused elements and at the same time add new specific elements to improve usability of the whole documentation solution. With DocBook V5.0, RELAX NG and few clever tricks you can create custom version of DocBook together with processing rules in a few minutes.


Table of Contents

Sample schema
Schema augmentation
From schema to stylesheet customization
Customizing customization

If you have ever tried to customize DocBook DTD you will agree with me that it is quite complex and painful task. Although DocBook DTD is heavily using parameter entities to make it modular and easier to customize, it is simply complex process and requires deep knowledge of general DTD rules and organization of DocBook DTD.

New DocBook version 5.0 is based on a very powerful schema language called RELAX NG. In RELAX NG it is very easy both to exclude existing elements or add new ones.

But changing schema alone is not the end of story. If you add new elements to schema, you must customize processing tools to support those new elements. DITA documentation system solves this problem by using specialization. Newly added elements use fixed values in schema to identify more generic element which has similar processing model. Norm has shown that this specialization can be easily implemented also in DocBook.

While I can see advantages of specialization there is one big problem in a way it is implemented—it completely relies on metainformations in schema that are used as a guide for formatting fallback. This means that you are not able to process your document without having schema. OTOH, concept of specialization is very interesting approach for extending schema with new elements. In the following text I will show you, how you can use specialization in DocBook without run-time dependency on schema.

Sample schema

Suppose that you want to create your own version of DocBook suited for writing texts about assembly language. You will probably need specific elements to capture things like registers or instructions. It is very easy to create such schema extension for DocBook

default namespace = "http://docbook.org/ns/docbook"
namespace db = "http://docbook.org/ns/docbook"

# new element for CPU registers
db.register =
  element register { text }

# new element for instructions
db.instruction =
  element instruction { text }
  
# combined pattern
asm.inlines = db.register | db.instruction
              
include "docbook.rnc"
  {
    # register and instruction are permitted everywhere in inline content
    db.general.inlines |= asm.inlines
  }

You can now use your new elements together with other DocBook elements inside your document.

<?xml version="1.0" encoding="UTF-8"?>
<article xmlns="http://docbook.org/ns/docbook" version="5.0">
<title>Assembler guide</title>
  
<para><register>AX</register> is 16bit register. You can use registers
<register>AH</register> and <register>AL</register> to access its
higher and lower parts.  <instruction>MOV</instruction> is instruction
for moving data between your registers and memory.</para>
  
</article>

With customized schema you can easily edit and validate your document. So far, so good. But problem will arise once you want to process such document with the DocBook XSL stylesheets. You will see warnings about unhandled elements register and instruction. You can manually add missing templates into stylesheets, but lets try to automate this step.

Schema augmentation

Norm proposes very clever way of augmenting schema with DocBook code which should be used as a replacement for newly added elements. This allow you to map new elements to existing elements with similar formatting and more generic meaning. Our schema asmbook.rnc augmented with mapping of new elements:

default namespace = "http://docbook.org/ns/docbook"
namespace db = "http://docbook.org/ns/docbook"
namespace r  = "http://nwalsh.com/xmlns/schema-remap/"

# new element for CPU registers
db.register =
  [ r:remap [ db:emphasis [ role = "bold" ] ] ]
  element register { text }

# new element for instructions
db.instruction =
  [ r:remap [ db:code [ ] ] ]
  element instruction { text }
  
# combined pattern
asm.inlines = db.register | db.instruction
              
include "file:///e:/src/db/docbook/relaxng/docbook/docbook.rnc"
  {
    # register and instruction are permitted everywhere in inline content
    db.general.inlines |= asm.inlines
  }

You can see, that we use special element r:remap to define that register should be formatted as bold text and instruction as a piece of code. This is sufficient for the start.

Now you can use Trang to convert RNC schema into RELAX NG XML syntax which can be easily processed by XML tools (asmbook.rng). Norm implemented special functionality into the DocBook XSL2 stylesheets which can read schema during document transformation to HTML and replace elements on-the-fly. The problem of this approach is that it works only in XSLT 2.0 based stylesheets which are not complete yet, and it also depends on schema availability during transformation.

From schema to stylesheet customization

My goal was to use Norm's schema annotations to automatically generate stylesheet customizations. This customization is generated just once and once you have it you do not need schema during transformation. Moreover my solution integrates with the DocBook XSL stylesheets (XSLT 1.0 based ones) which are the most used tool for processing DocBook content.

I created short, but little tricky stylesheet rng2customization.xsl which is able to generate XSLT code from schema annotations. This code hooks itself deeply inside the stylesheets and renames elements before actual processing.

To generate customization layer I simply processed my schema with this stylesheet:

saxon -o asmbook.xsl asmbook.rng rng2customization.xsl

Customization asmbook.xsl can be used together with any profiling enabled stylesheet—e.g. fo/profile-docbook.xsl, html/profile-docbook.xsl or html/profile-chunk.xsl. We just need to create simple driver file which will combine base stylesheet with specialization layer. For HTML output we can use the following stylesheet (html.xsl):

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/html/profile-docbook.xsl"/>

<xsl:import href="asmbook.xsl"/>

</xsl:stylesheet>

And we are done. This stylesheet doesn't need to access schema during processing.

Customizing customization

For more complex elements it is not always possible to map them to existing DocBook structures. In this situation it is needed to create appropriate XSLT templates manually. This approach can be combined with our automatically generated specialization layer, but we must disable this layer for elements that we handle manually. This technique is shown on the following example (html-customized.xsl):

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/html/profile-docbook.xsl"/>

<xsl:import href="asmbook.xsl"/>
  
<!-- List of elements for which we want to use custom processing -->  
<xsl:template match="db:instruction" mode="remap" xmlns:db="http://docbook.org/ns/docbook">
  <xsl:copy>
    <xsl:apply-templates select="node()|@*" mode="remap"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="instruction">
  <tt style="color: red; font-weight: bold;"><xsl:apply-templates/></tt>
</xsl:template>  
  
</xsl:stylesheet>

Template in remap mode is used to disable default schema driven remapping for instruction element. The last template defines custom handling of instruction element. Please note that in this template we are referencing element in no namespace because stylesheets were written primary for older version of DocBook that are not in a namespace. The stylesheets thus remove DocBook namespace from source documents prior processing.

As you can see, customizing DocBook V5.0 can be very easy. Dumb easy!

blog comments powered by Disqus
Copyright © Jiří Kosek, 2006–2018