Spreading the XML paradigm around
2013-11-25
DITA is probably the most popular XML-based authoring format these days. Content stored in this format is usually being translated into other languages and DITA thus can profit from integration with a new ITS 2.0 standard. Let's see what are a ways to integrate those two technologies.
ITS provides set of metadata which can be injected into your document in order to make localization and translation of your content more easier. ITS can be expressed by using several attributes and elements in a special ITS namespace. Namespaces are common XML approach for extensibility. One would though that adding ITS support into DITA would be easy then. Unfortunatelly situation is more complicated.
Extensions to DITA are not made as a simple schema customization (eg. see how ITS has been integrated into DocBook), but one have to use so called specialization. Specialization makes schema cutomization more complex but in exchange DITA tools are able to provide general processing of newly added elements and attributes without an additional effort.
The trable is that current version of DITA prohibits to use namespaces inside specializations (see this email thread). The reason why this is prohibited is legacy of DITA specialization. DITA specialization still relies on DTDs and DTDs do not support namespaces properly.
There are two ways how to use ITS in DITA. One uses namespaces,
but do not rely on proper specialization. This is fine as long as you
are not exchanging documents with other parties. Second possibility is
to store ITS markup inside prefixed attribute the same way as in
HTML5. This means using attribute named its-term
instead of its:term
, etc.
The best schema language for hacking is RELAX NG so I have decided to use it in this DITA+ITS integration. RELAX NG schemas for DITA are available as a part of DITA-NG project.
Customization is pretty straightforward. We need to allow few
ITS elements like rules and stand-off markup inside metadata
container. Only suitable container is unknown
element. Although foreign
element has more friendly name it
is supposed to be used by content that is going to be rendered which
is not case here. And also we want to allow ITS local attributes
almost anywhere. DITA already has translate
attribute, so we do not have to
include its:translate
attribute. The
following example shows how such markup can look like inside simple
DITA topic.
Example 1. DITA document with ITS markup
<?xml version="1.0" encoding="UTF-8"?> <?xml-model href="topic-its.rnc"?> <topic id="demo" xmlns:its="http://www.w3.org/2005/11/its"> <title>Topic title</title> <prolog> <metadata> <unknown> <its:rules version="2.0"> <its:translateRule selector="//*" translate="no"/> </its:rules> </unknown> </metadata> </prolog> <body> <p its:term="yes">Topic paragraph</p> <p translate="yes">Foo bar</p> </body> </topic>
Schema customization is quite easy as both DITA and ITS RELAX NG schemas are written in a modular way. Only drawback is that there is no single DITA schema, but many schemas for each topic type, maps etc. So similar schema customization must be made for each such schema.
Example 2. Schema for DITA topic combined with ITS markup
namespace its = "http://www.w3.org/2005/11/its" # Include DITA schema for particulat module include "urn:dita-ng:dita:rnc:topic.rnc" { # Add ITS rules and standoff inside unknown element unknown.element = element unknown { unknown.attlist, (any & its-rules? & its-standoff?) } # Prevent ambiguity inside unknown element any = (topic.element | element * - (topic | its:* ) { attribute * { text }*, any } | text)* } # Include ITS schema include "http://www.w3.org/TR/its20/schemas/its20.rnc" { # Disable ITS directionality its-attribute.dir = empty # Prevent conflicting ID problem its-foreign-element = element * - its:* { (its-foreign-attribute* | text | its-foreign-element)* } } # Add local ITS attributes almost everywhere base-attribute-extensions &= its-local.attributes
This schema allows you to validate or comfortably edit DITA+ITS content in tools like oXygen XML editor.
As DITA content is XML you can use any ITS 2.0 aware tool with it like various CAT tools, XLIFF extractors, …
Another possibility is to propagate ITS markup into HTML code generated from DITA content. The most common tool for processing DITA content is DITA-OT. I have created simple DITA-OT plugin which adds ITS support into HTML based output formats. Just install it as any other plugin.
ITS support in DITA-OT stylesheets allows you to very easily achieve very nice things. As ITS markup is propagated into generated HTML code it can be used by various translation widgets. This is really important in these times—companies are trying to save costs and providing only machine translated documents for languages on margin markets is one way to go. ITS markup can greatly improve precision of a machine translation.
I intentionally haven't provided proper DITA specialization which would use its-* attributes. Such approach would render DITA content incompatible with XML based ITS tools. I think that correct approach to use ITS inside DITA content is relying on XML namespaces.
If you are in a need for DITA+ITS and you have to use DTD or W3C XML Schema based schemas, let me know, I can try to provide such schemas as well. And of course any other comment related to ITS usage in DITA is more then welcomed.