Spreading the XML paradigm around
2006-03-14
For a long time DocBook was supporting only CALS table model. Since V4.3 support for HTML table model was added. Which model you should use? How to remove “the other” table model from schema?
To be honest, I
was never happy that HTML table model was added into
DocBook. Why? Because it is inconsistent with the rest of
DocBook, it is not using namespaces to identify itself and it has
bloated DocBook DTD with attributes like onmouseover
.
Example 1. Sample CALS table and its rendering
<table> <title>Test table</title> <tgroup cols="2"> <colspec colname="c1"/> <colspec colname="c2"/> <thead> <row> <entry>A</entry> <entry>B</entry> </row> </thead> <tbody> <row> <entry>42</entry> <entry>123</entry> </row> <row> <entry namest="c1" nameend="c2">spanned</entry> </row> </tbody> </tgroup> </table>
The CALS table model has a very long history and some things can be considered as archaic—e.g. specifying number of columns is required by CALS because table rendering was quite hard and expensive task few years ago. Knowing total number of columns allows to use more effective rendering routines. This is no longer true, but attribute is still there.
Another little inconvenience is need to name columns if you want to merge cells. You are right that this is little bit overkill for simple cases, but it can be advantage for complex tables with many spans when you have to insert new column inside spanned area. Indirection is good, but it means more work usually. And anyway, you are editing DocBook tables in a WYSIWYG editor and your are shielded from seeing XML code. ;-)
Let's compare sample table with the same table expressed in HTML.
Example 2. Sample HTML table
<table> <caption>Test table</caption> <thead> <tr> <th>A</th> <th>B</th> </tr> </thead> <tbody> <tr> <td>42</td> <td>123</td> </tr> <tr> <td colspan="2">spanned</td> </tr> </tbody> </table>
The structure of the table is almost identical. You can see that
the title of the table is expressed by caption
element. This is inconsistent with the rest of DocBook formal objects
which use title
. You are also not allowed to use
titleabbrev
and info
elements for HTML
table. I personally consider this as the biggest disadvantage. This
simply hides some general DocBook functionality from you and creates
inconsistent schema.
There is no need for specifying total number of columns and
merging of cells is done directly by using colspan
attribute. But simply renaming
row
→ tr
and entry
→
td
/th
is the real difference between table
models.
There are also some reasons for using HTML tables in
DocBook—more users are familiar with them and you can copy'n'paste
them from Web pages. But I don't consider this as a real argument. Why
then not allow HTML inline elements as well? Why not to allow
h1
instead of bridgehead
as well? OK, table
model is something little bit different, but it should have been added
in XHTML namespace.
I will left decision about which table model to use on you. But it would be very strange to mix those two models in the same document. Because of this, it is good idea to create customized DocBook schema which will permit only one table model. I must say that I really like RELAX NG (thanks, James Clark and Murata Makoto) and modularity of DocBook V5.0 schemas (thanks, Norm). To disable HTML table models it is sufficient to use the following simple customization:
include "docbook.rnc" { db.html.table = notAllowed db.html.informaltable = notAllowed }
If you prefer HTML tables, and want to disable CALS tables, take it as a homework. Given my attitude I'm not going to help you with removing CALS and using HTML tables in DocBook, but process is completely analogical. You can even do things like using CALS for tables, and HTML for informal tables. Or you can jump directly from window.