xmlguru.cz

Spreading the XML paradigm around

Using GNU Gettext with Saxon

2006-10-21

There are many existing solutions to create localized XSLT stylesheets which are able to output content in a different languages. But none of them offers robustness, performance, flexibility and elegance of GNU Gettext library—popular localization library available for almost any programming language. Because I was in need for a good localization XSLT library for one project I decided to create extension which will allow to use GNU Gettext in XSLT 2.0 stylesheets when processed with Saxon 8.x/9.x.


Table of Contents

Getting saxon-gettext
Marking text to localize in stylesheets
Extracting text to localize
Compiling translations
Testing translations
Overview of extension elements
Overview of extension functions
Updating translations
Conclusion

The following instructions are expecting that you have at least basic knowledge of Gettext principles. If not, please read Gettext tutorial first.

Getting saxon-gettext

UPDATE: New distribution adjusted for Saxon 9.1 is available from http://www.kosek.cz/sw/saxon-gettext/saxon-gettext-20101124.zip.

The complete distribution of saxon-gettext can be downloaded from http://www.kosek.cz/sw/saxon-gettext/saxon-gettext-20070205.zip. Distribution contains both sources and binary packages.

Marking text to localize in stylesheets

saxon-gettext XSLT extension defines several XSLT extension instructions and XPath functions which can be used to mark translatable text. These instructions and functions belongs to the http://kosek.cz/cz.kosek.saxon.gettext.Gettext namespace. The following examples assume that the prefix t was declared for this namespace. For example:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
		xmlns:t="http://kosek.cz/cz.kosek.saxon.gettext.Gettext"
		extension-element-prefixes="t"
                version="2.0">
  …
</xsl:stylesheet>

Now in order to localize your stylesheets you must surround all texts with saxon-gettext instruction or function. So

<h1>Hello world!</h1>

has to be changed to

<h1><t:gettext>Hello world!</t:gettext></h1>

or you can use abbreviated form which is more handy:

<h1><t:_>Hello world!</t:_></h1>

If you have some text to localize inside XPath expression you similarly have to use functions t:gettext, resp. t:_ to mark those texts. For example you would change

<xsl:value-of select="if (@sale) then 'Discounted item'"/>

to

<xsl:value-of select="if (@sale) then t:_('Discounted item')"/>

Extracting text to localize

GNU Gettext comes with a great utility called xgettext which is able to extract all texts to translate from your source codes. It has support for many languages, but of course XSLT is missing. But it is quite easy to write XSLT transformation which will scan your stylesheets and extract all texts to translate. The only difficult part is parsing of XPath expression to find usage of f:gettext function. Fortunately there is XPath parsing library available as a part of XQuery parser and David Carlisle wrote a set very useful wrappers called XQ2XML around it.

There is a simple stylesheet extract.xsl in the samples subdirectory of saxon-gettext distribution. You can use it to extract translatable content from XSLT stylesheet to PO file. The stylesheet is able to process xsl:import and xsl:include instructions as well. You will need XPath parsing library on classpath in order to use this stylesheet.[1]

java -cp saxon8.jar;xpath.jar net.sf.saxon.Transform -o messages.po test1.xsl extract.xsl

The resulting file messages.po should be now translated to a different language and stored under different name. Let's have some fun with displaying non-Latin characters and create Russian translation. The content of ru.po should look like:

msgid ""
msgstr ""
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

msgid "Hello world!"
msgstr "Здравствуй, мир!"

There are many specialized editors for editing PO files. One of them is poEdit and it is highly recommended to use such tools for larger chunks of texts to translate.

Compiling translations

During program runtime (which in our case means during XSLT transformation) Gettext uses binary format for looking up translations. This is more efficient then using text format but it also means that PO file must be converted to binary form. saxon-gettext calls Gettext from Java and translations here are not stored in MO files as usual, but they must be compiled into byte-code. For conversion you will need complete Gettext installation for your favorite operating system. For Linux there is package gettext and for Windows it is easiest to install Cygwin and select gettext package during installation.

Now we are ready to compile PO file (ru.po) into byte-code for later use. Type the following command (in Windows you must be running Cygwin shell):

msgfmt --java2 -d . -l ru ru.po
msgfmt --java2 -d . -l en messages.po

Several classes (.class) should be created. If you are getting error message like:

msgfmt: Java compiler not found, try installing gcj or set $JAVAC
msgfmt: compilation of Java class failed, please try --verbose or set $JAVAC

then you are facing known issue of invoking Windows Java from Cygwin environment. The workaround is to create simple shell script javac.sh which points to your copy of JDK:

#!/bin/sh
/cygdrive/c/j2sdk1.4.2/bin/javac $1 $2 $3 `cygpath -w -p $4`

And setting JAVAC environment variable to point to this script:

export JAVAC=./javac.sh

After those changes you should be able to compile PO files to byte-code.

Testing translations

At this point you can test whether saxon-gettext is working for you. There is simple stylesheet called test1.xsl:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
     version="2.0"
     xmlns:t="http://kosek.cz/cz.kosek.saxon.gettext.Gettext"
     extension-element-prefixes="t">

<xsl:param name="lang">en</xsl:param>

<xsl:template match="/">
  <t:init locale="{$lang}" domain="Messages"/>

  <h1><t:_>Hello world!</t:_></h1>
</xsl:template>

</xsl:stylesheet>

By default English is used and you will get English output when you run this transformation against any XML document:

java -cp saxon8.jar;saxon-gettext.jar net.sf.saxon.Transform test1.xsl test1.xsl
<h1>Hello world!</h1>

Now we can simply change language used by using lang parameter:

java -cp saxon8.jar;saxon-gettext.jar;. net.sf.saxon.Transform test1.xsl test1.xsl "lang=ru"
<h1>Здравствуй, мир!</h1>

As you can see text is translated on-the-fly there is no need for changing stylesheet. Do not forget that classes with translated texts must be on the classpath.

Overview of extension elements

<t:init locale="language" domain="domain"/>

This instruction switches further messages to language. The prefix of classes with translations is specified by domain.

<t:gettext>text</t:gettext>, <t:_>text</t:_>

This instruction translates text into the language choosen by t:init.

Overview of extension functions

t:gettext(text), t:_(text)

This function translates text into the language choosen by t:init.

t:ngettext(singular, plural, n)

This function translates text and uses proper case of text depending on number of items (n) specified.

t:format(text, sequence of values)

This functions replaces {n} marks in text by n-th item in the supplied seqence.

Updating translations

If you add new texts in primary language into stylesheets, it is very easy to update existing translations.

  1. Extract all text for translation from stylesheets, for example:

    java -cp saxon8.jar;xpath.jar net.sf.saxon.Transform -o messages.po test1.xsl extract.xsl
  2. Merge new texts with existing translations:

    msgmerge -U ru.po messages.po
  3. Fill missing translations in ru.po.

  4. Compile translations into byte-code:

    msgfmt --java2 -d . -l ru ru.po
    msgfmt --java2 -d . -l en messages.po

Conclusion

I planed to release saxon-gettext much more earlier, but I had hard times finding some spare time to write this very basic documentation. If there is something unclear or if you find bug, just let me know. I will try to update software and instructions.



[1] You can get xpath.jar from http://www.w3.org/2005/qt-applets/xgrammar.zip. It is inside xgrammar/grammar/parser directory.

blog comments powered by Disqus
Copyright © Jiří Kosek, 2006–2018