Using ITS 2.0 in DITA

2013-11-25

DITA is probably the most popular XML-based authoring format these days. Content stored in this format is usually being translated into other languages and DITA thus can profit from integration with a new ITS 2.0 standard. Let's see what are a ways to integrate those two technologies.

Table of Contents

Extending DITA schema with ITS
Processing ITS markup inside DITA
Next steps

ITS provides set of metadata which can be injected into your document in order to make localization and translation of your content more easier. ITS can be expressed by using several attributes and elements in a special ITS namespace. Namespaces are common XML approach for extensibility. One would though that adding ITS support into DITA would be easy then. Unfortunatelly situation is more complicated.

Extensions to DITA are not made as a simple schema customization (eg. see how ITS has been integrated into DocBook), but one have to use so called specialization. Specialization makes schema cutomization more complex but in exchange DITA tools are able to provide general processing of newly added elements and attributes without an additional effort.

The trable is that current version of DITA prohibits to use namespaces inside specializations (see this email thread). The reason why this is prohibited is legacy of DITA specialization. DITA specialization still relies on DTDs and DTDs do not support namespaces properly.

There are two ways how to use ITS in DITA. One uses namespaces, but do not rely on proper specialization. This is fine as long as you are not exchanging documents with other parties. Second possibility is to store ITS markup inside prefixed attribute the same way as in HTML5. This means using attribute named its-term instead of its:term, etc.

Extending DITA schema with ITS

The best schema language for hacking is RELAX NG so I have decided to use it in this DITA+ITS integration. RELAX NG schemas for DITA are available as a part of DITA-NG project.

Customization is pretty straightforward. We need to allow few ITS elements like rules and stand-off markup inside metadata container. Only suitable container is unknown element. Although foreign element has more friendly name it is supposed to be used by content that is going to be rendered which is not case here. And also we want to allow ITS local attributes almost anywhere. DITA already has translate attribute, so we do not have to include its:translate attribute. The following example shows how such markup can look like inside simple DITA topic.

Example 1. DITA document with ITS markup

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="topic-its.rnc"?>
<topic id="demo" xmlns:its="http://www.w3.org/2005/11/its">
  <title>Topic title</title>
  <prolog>
    <metadata>
      <unknown>
        <its:rules version="2.0">
          <its:translateRule selector="//*" translate="no"/>
        </its:rules>
      </unknown>
    </metadata>
  </prolog>  
  <body>
    <p its:term="yes">Topic paragraph</p>
    <p translate="yes">Foo bar</p>
  </body>
</topic>

Schema customization is quite easy as both DITA and ITS RELAX NG schemas are written in a modular way. Only drawback is that there is no single DITA schema, but many schemas for each topic type, maps etc. So similar schema customization must be made for each such schema.

Example 2. Schema for DITA topic combined with ITS markup

namespace its = "http://www.w3.org/2005/11/its"

# Include DITA schema for particulat module
include "urn:dita-ng:dita:rnc:topic.rnc"
{

# Add ITS rules and standoff inside unknown element
unknown.element =
  element unknown { unknown.attlist, (any & its-rules? & its-standoff?)  }
  
# Prevent ambiguity inside unknown element
any =
  (topic.element
   | element * - (topic | its:* ) {
       attribute * { text }*,
       any
     }
   | text)*  
}

# Include ITS schema
include "http://www.w3.org/TR/its20/schemas/its20.rnc" 
{
   # Disable ITS directionality
   its-attribute.dir = empty
   
   # Prevent conflicting ID problem
   its-foreign-element = element * - its:* { (its-foreign-attribute* | text | its-foreign-element)* }
}

# Add local ITS attributes almost everywhere
base-attribute-extensions &= its-local.attributes

This schema allows you to validate or comfortably edit DITA+ITS content in tools like oXygen XML editor.

Processing ITS markup inside DITA

As DITA content is XML you can use any ITS 2.0 aware tool with it like various CAT tools, XLIFF extractors, …

Another possibility is to propagate ITS markup into HTML code generated from DITA content. The most common tool for processing DITA content is DITA-OT. I have created simple DITA-OT plugin which adds ITS support into HTML based output formats. Just install it as any other plugin.

ITS support in DITA-OT stylesheets allows you to very easily achieve very nice things. As ITS markup is propagated into generated HTML code it can be used by various translation widgets. This is really important in these times—companies are trying to save costs and providing only machine translated documents for languages on margin markets is one way to go. ITS markup can greatly improve precision of a machine translation.

Next steps

I intentionally haven't provided proper DITA specialization which would use its-* attributes. Such approach would render DITA content incompatible with XML based ITS tools. I think that correct approach to use ITS inside DITA content is relying on XML namespaces.

If you are in a need for DITA+ITS and you have to use DTD or W3C XML Schema based schemas, let me know, I can try to provide such schemas as well. And of course any other comment related to ITS usage in DITA is more then welcomed.

xmlguru.cz