Spreading the XML paradigm around
2006-10-21
There are many existing solutions to create localized XSLT stylesheets which are able to output content in a different languages. But none of them offers robustness, performance, flexibility and elegance of GNU Gettext library—popular localization library available for almost any programming language. Because I was in need for a good localization XSLT library for one project I decided to create extension which will allow to use GNU Gettext in XSLT 2.0 stylesheets when processed with Saxon 8.x/9.x.
Table of Contents
The following instructions are expecting that you have at least basic knowledge of Gettext principles. If not, please read Gettext tutorial first.
UPDATE: New distribution adjusted for Saxon 9.1 is available from http://www.kosek.cz/sw/saxon-gettext/saxon-gettext-20101124.zip.
The complete distribution of saxon-gettext can be downloaded from http://www.kosek.cz/sw/saxon-gettext/saxon-gettext-20070205.zip. Distribution contains both sources and binary packages.
saxon-gettext XSLT extension defines several XSLT extension instructions
and XPath functions which can be used to mark translatable text. These
instructions and functions belongs to the
http://kosek.cz/cz.kosek.saxon.gettext.Gettext
namespace. The following examples assume that the prefix
t
was declared for this namespace. For
example:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:t="http://kosek.cz/cz.kosek.saxon.gettext.Gettext" extension-element-prefixes="t" version="2.0"> … </xsl:stylesheet>
Now in order to localize your stylesheets you must surround all texts with saxon-gettext instruction or function. So
<h1>Hello world!</h1>
has to be changed to
<h1><t:gettext>Hello world!</t:gettext></h1>
or you can use abbreviated form which is more handy:
<h1><t:_>Hello world!</t:_></h1>
If you have some text to localize inside XPath expression you
similarly have to use functions t:gettext
,
resp. t:_
to mark those texts. For
example you would change
<xsl:value-of select="if (@sale) then 'Discounted item'"/>
to
<xsl:value-of select="if (@sale) then t:_('Discounted item')"/>
GNU Gettext comes with a great utility called
xgettext which is able to extract all texts to
translate from your source codes. It has support for many languages,
but of course XSLT is missing. But it is quite easy to write XSLT
transformation which will scan your stylesheets and extract all texts
to translate. The only difficult part is parsing of XPath expression
to find usage of f:gettext
function. Fortunately
there is XPath parsing library available as a part of XQuery
parser and David Carlisle wrote a set very useful wrappers
called XQ2XML
around it.
There is a simple stylesheet extract.xsl
in
the samples
subdirectory of saxon-gettext
distribution. You can use it to extract translatable content from XSLT
stylesheet to PO file. The stylesheet is able to process
xsl:import
and xsl:include
instructions as
well. You will need XPath parsing library on classpath in order to use
this stylesheet.[1]
java -cp saxon8.jar;xpath.jar net.sf.saxon.Transform -o messages.po test1.xsl extract.xsl
The resulting file messages.po
should be
now translated to a different language and stored under different
name. Let's have some fun with displaying non-Latin characters and
create Russian translation. The content of ru.po
should look like:
msgid "" msgstr "" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" msgid "Hello world!" msgstr "Здравствуй, мир!"
There are many specialized editors for editing PO files. One of them is poEdit and it is highly recommended to use such tools for larger chunks of texts to translate.
During program runtime (which in our case means during XSLT transformation) Gettext uses binary format for looking up translations. This is more efficient then using text format but it also means that PO file must be converted to binary form. saxon-gettext calls Gettext from Java and translations here are not stored in MO files as usual, but they must be compiled into byte-code. For conversion you will need complete Gettext installation for your favorite operating system. For Linux there is package gettext and for Windows it is easiest to install Cygwin and select gettext package during installation.
Now we are ready to compile PO file
(ru.po
) into byte-code for later use. Type the
following command (in Windows you must be running Cygwin
shell):
msgfmt --java2 -d . -l ru ru.po msgfmt --java2 -d . -l en messages.po
Several classes (.class
) should be
created. If you are getting error message like:
msgfmt: Java compiler not found, try installing gcj or set $JAVAC msgfmt: compilation of Java class failed, please try --verbose or set $JAVAC
then you are facing known issue of invoking Windows Java from
Cygwin environment. The workaround is to create simple shell script
javac.sh
which points to your copy of JDK:
#!/bin/sh /cygdrive/c/j2sdk1.4.2/bin/javac $1 $2 $3 `cygpath -w -p $4`
And setting JAVAC
environment variable to point
to this script:
export JAVAC=./javac.sh
After those changes you should be able to compile PO files to byte-code.
At this point you can test whether saxon-gettext is working for
you. There is simple stylesheet called
test1.xsl
:
<?xml version="1.0" encoding="UTF-8" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:t="http://kosek.cz/cz.kosek.saxon.gettext.Gettext" extension-element-prefixes="t"> <xsl:param name="lang">en</xsl:param> <xsl:template match="/"> <t:init locale="{$lang}" domain="Messages"/> <h1><t:_>Hello world!</t:_></h1> </xsl:template> </xsl:stylesheet>
By default English is used and you will get English output when you run this transformation against any XML document:
java -cp saxon8.jar;saxon-gettext.jar net.sf.saxon.Transform test1.xsl test1.xsl
<h1>Hello world!</h1>
Now we can simply change language used by using lang
parameter:
java -cp saxon8.jar;saxon-gettext.jar;. net.sf.saxon.Transform test1.xsl test1.xsl "lang=ru"
<h1>Здравствуй, мир!</h1>
As you can see text is translated on-the-fly there is no need for changing stylesheet. Do not forget that classes with translated texts must be on the classpath.
<t:init
locale="language
" domain="domain
"/>
This instruction switches further messages to
language
. The prefix of classes with
translations is specified by domain
.
<t:gettext>text
</t:gettext>
, <t:_>text
</t:_>
This instruction translates
text
into the language choosen by t:init
.
t:gettext(text
)
, t:_(text
)
This function translates
text
into the language choosen by t:init
.
t:ngettext(singular
,
plural
, n
)
This function translates text and uses proper case of
text depending on number of items (n
)
specified.
t:format(text
,
sequence of values
)
This functions replaces
{
marks in
n
}text
by n
-th
item in the supplied seqence
.
If you add new texts in primary language into stylesheets, it is very easy to update existing translations.
Extract all text for translation from stylesheets, for example:
java -cp saxon8.jar;xpath.jar net.sf.saxon.Transform -o messages.po test1.xsl extract.xsl
Merge new texts with existing translations:
msgmerge -U ru.po messages.po
Fill missing translations in
ru.po
.
Compile translations into byte-code:
msgfmt --java2 -d . -l ru ru.po msgfmt --java2 -d . -l en messages.po
I planed to release saxon-gettext much more earlier, but I had hard times finding some spare time to write this very basic documentation. If there is something unclear or if you find bug, just let me know. I will try to update software and instructions.
[1] You can get
xpath.jar
from http://www.w3.org/2005/qt-applets/xgrammar.zip. It is
inside xgrammar/grammar/parser
directory.