xmlguru.cz

Spreading the XML paradigm around

Comments for Office Open XML (OOXML)

2007-08-02

If you are watching process of OOXML adoption as an ISO standard you might be interested in reading comments which will be sent by Czech Republic.


These comments are translation of original comments in Czech language. Some editoral changes might appear after proofreading in the final version sent to ISO, but comments will be same in their nature.

Type of comment is either general (ge), editorial (ed) or technical (te). You can easily create reference to any of those comments by appending #Cnn at the end of this page URL.

IDClause/SubclauseLocationTypeCommentProposed change
C01  ge

Proposed standard DIS29500 has big functional overlap with existing standard ISO/IEC 26300:2006 (ODF) which has been approved quite recently in the last year. However we think that office applications users will benefit from having Office Open XML standardized as DIS29500 if below mentioned comments are incorporated into the final version of standard. This is mainly because DIS29500 has features for representing common document elements which are not yet supported by ODF standard and it will took several years before those features are incorporated also into standardized ODF format. Another reason is OOXML's ability to represent large corpus of existing documents (previously stored usually in proprietary binary formats) in an open and easy to process format. For each standard it is also important to gain mass adoption, otherwise its benefits are diminished. It seems that majority of office applications (in terms of market share) will support DIS29500 which is not yet case of ODF.

Coexistence of two very similar international standards such as ODF and OOXML is undesirable in a long term perspective. Therefore we ask JTC1 to start work on a progressive harmonization of both formats in cooperation with OASIS and ECMA organizations which are originators of these document formats.

There are many possible approaches for harmonization. For example, as the first step both formats could start to use the same unified packaging system based on OPC (as described in Part 2 of DIS29500). Moreover, OPC could be extended to support storage of alternative representations of a single object—single file then could contain one document stored in several variants (e.g. ODF, OOXML and XHTML). Applications will be then free to choose format which best fits their needs and capabilities.

In a long term it is recommended to carefully study both formats and then create unified abstract document model. ODF and OOXML formats will then serve just as alternative serializations of this data model. If experience will disclose weaknesses of both ODF and OOXML formats, it is possible to start thinking about creating completely new document data model serialization.

 
C02  ge

The standard is very huge and not all applications have to implement support for all document types. The standard split to several smaller and more standalone parts would be more usable.

Create separate parts for WordprocessingML, SpreadsheetML, PresentationML and shared vocabularies.

C03  ed

Long attribute descriptions including examples of use are repeated on all elements supporting this attribute. This prolongs text of the standard. Moreover examples are not related to currently defined element on several places because description of attributes is shared.

List of attributes for a given element should contain only name of attribute, its data type and very brief description (single line or sentence). Detailed attribute description should be provided just once and it should be referenced from all attribute instances.

C04Part 4/6p. 4343ed

VML language is marked as depreciated and it is intended as temporal solution for maintaining backwards compatibility. Therefore there is no reason for including VML description directly into the standard.

VML specification should be published as Technical Report only.

C05Part 1/Annex Ap. 162/l. 7ed

Reference to ZIP format specification is not pointing to particular ZIP version.

Include ZIP format version into reference or state that the latest version available should be used.

C06Part 4/2.18.51p. 1747/l. 18te

Only language codes defined in ISO 639, ISO 3166 and ISO 15924 should be used for language identification. If there is no corresponding ISO code for some combination of language, region and script it is possible to use newer language identification mechanism defined in RFC 4646 (BCP 47).

Definition of ST_Lang type should use language identifiers as defined in BCP 47. ST_LangCode type should be completely removed and for languages which cannot be represented using BCP 47 new language and country code should be added into ISO 639 and ISO 3166, for example utilizing space reserved for local codes.

C07Part 4/2.18.51p. 1754/l. 4ed

It is not clear whether numbers in table are decimal or hexadecimal (text before table mentions hexadecimal numbers, but table contains decimal numbers).

Number range requires 4 hexadecimal digits, not just two as is written in the text.

The example wrongly describes number 1033 as being hexadecimal.

 
C08Part 4/3.2.28p. 1912te

The default date system should be “1904” because it does not suffer leap year bug of “1900” system in which year 1900 is wrongly considered to be leap. All newly created documents should use “1904” date system, “1900” based system should be allowed only for representation of already existing documents.

“date1904” attribute should be mandatory so it is always explicitly known which date system is used. Text of the standard should recommend usage of “1904” date system. Standard should allow usage of the “1900” date system only in documents that were converted from legacy formats.

C09Part 4/3.17.4.1p. 2522te

The standard should provide facilities for representing dates prior 1900-01-01/1904-01-01.

Either negative values should be allowed as serial value of date or a completely new date/time data type should be introduced.

C10Part 4/3.17 ed

Definition of a spreadsheet formula language should be put into a separate standard or part to make it reusable in other standards, for example in ISO/IEC 26300:2006 (ODF).

 
C11  te

The standard describes VML format as depreciated and states that DrawingML should replace it. Because of this DrawingML content should be allowed on all places where currently only VML content is allowed in various vocabularies defined in DIS29500.

Allow DrawingML content on all places where VML is allowed.

In particular inside “background”, “pict” and “object” elements.

C12Part 1/10.1.2p. 23/l. 20ed

Reference pointing to part 5 section 12 is not meaningful.

Fix the reference, it should point to section 11 likely.

C13Part 4/2.15.2.32p. 1337/l. 9te

Optimizing output for particular Web browser is generally considered as bad practice. If an application should ever support this feature for whatever reason then the standard should provide more parameters for controlling this feature and normative list of Web browsers should not be included in the standard as browsers are continuously evolving and adding support for new technologies.

The standard should define the following elements for describing browser capabilities: allowGIF, allowJPEG, allowPNG, allowSVG, doNotRelyOnCSS, doNotRelyOnJavascript, relyOnVML, doNotSaveWebPageAsSingleFile. The table after line 18 should be removed or marked as informal.

C14Part 4/2.15.3.6p. 1378te

Behavior of “autoSpaceLikeWord95” element is not sufficiently defined.

Add definition of behavior for this element. Especially, the definition should list formatting differences between situations when the element is used and when it is not used.

C15Part 4/2.15.3.26p. 1416te

Behavior of “footnoteLayoutLikeWW8” element is not sufficiently defined.

Add definition of behavior for this element. Especially, the definition should list formatting differences between situations when the element is used and when it is not used.

C16Part 4/2.15.3.31p. 1426te

Behavior of “lineWrapLikeWord6” element is not sufficiently defined.

Add definition of behavior for this element. Especially, the definition should list formatting differences between situations when the element is used and when it is not used.

C17Part 4/2.15.3.32p. 1427te

Behavior of “mwSmallCaps” element is not sufficiently defined.

Add definition of behavior for this element. Especially, the definition should list formatting differences between situations when the element is used and when it is not used.

C18Part 4/2.15.3.41p. 1442te

Behavior of “shapeLayoutLikeWW8” element is not sufficiently defined.

Add definition of behavior for this element. Especially, the definition should list formatting differences between situations when the element is used and when it is not used.

C19Part 4/2.15.3.51p. 1462te

Behavior of “suppressTopSpacingWP” element is not sufficiently defined.

Add definition of behavior for this element. Especially, the definition should list formatting differences between situations when the element is used and when it is not used.

C20Part 4/2.15.3.53p. 1467te

Behavior of “truncateFontHeightsLikeWP6” element is not sufficiently defined.

Add definition of behavior for this element. Especially, the definition should list formatting differences between situations when the element is used and when it is not used.

C21Part 4/2.15.3.63p. 1481te

Behavior of “useWord2002TableStyleRules” element is not sufficiently defined.

Add definition of behavior for this element. Especially, the definition should list formatting differences between situations when the element is used and when it is not used.

C22Part 4/2.15.3.64p. 1482te

Behavior of “useWord97LineBreakRules” element is not sufficiently defined.

Add definition of behavior for this element. Especially, the definition should list formatting differences between situations when the element is used and when it is not used.

C23Part 4/2.15.3.65p. 1483te

Behavior of “wpJustification” element is not sufficiently defined.

Add definition of behavior for this element. Especially, the definition should list formatting differences between situations when the element is used and when it is not used.

C24Part 4/2.15.3.66p. 1485te

Behavior of “wpSpaceWidth” element is not sufficiently defined.

Add definition of behavior for this element. Especially, the definition should list formatting differences between situations when the element is used and when it is not used.

C25Part 4/2.15.3.54p. 1469te

“uiCompat97To2003” parameter is related to an application behavior but not to a document and its content. As such it should not be part of the standard. If necessary applications could use custom elements defined in accordance with rules of Part 5 for storing such information.

Remove “uiCompat97To2003” element from the standard.

C26Part 4/2.16.5.33p. 1537/l. 19te

Example uses MS-DOS/Windows file path conventions. To improve interoperability all paths should be specified as URIs.

Consistently use URIs for specifying paths in the whole standard. If a reference to a local file system is necessary use “file” schema.

C27Part 4/2.16.5.34p. 1538/l. 1te

There is no parameter for specifying type of included data in INCLUDETEXT field. It is not always possible to reliably determine type without explicit content type specification. Moreover, sometimes user might want to load data in a different way—for example he or she might want to load XML document as a plain text to show source code of this XML file.

Add two additional parameters. One for specifying MIME type of included data and second for specifying encoding of included data (to handle situations when encoding couldn't be determined from file contents).

C28Part 4/2.16.5.41p. 1545/l. 21te

MACROBUTTON field doesn't define interface for macro invocation.

Extend the description and state that macro invocation is application dependent and it is not defined in this version of the standard.

C29Part 4/2.18.4 ed

Images as shown in the standard cannot be faithfully reproduced.

Attach an electronic representation of all graphical objects in an open vector format like SVG, CGM or DrawingML to the standard.

C30Part 4/2.18.4 te

The standard does not allow to use custom graphics for artistic borders.

Allow artistic borders based on any image provided or completely remove artistic borders from the standard.

C31Part 4/2.18.45p. 1738/l. 6ed

Length of xs:hexBinary data type is specified using bytes not characters.

 
C32Part 4/2.18.66p. 1772ed

Reference to definition of “chicago” numbering format is insufficient.

Specify term which can be used to lookup numbering format definition in the Chicago Manual of Style or include more detailed description of this numbering format.

C33Part 4/3.3.1.61p. 1988te

Specifying allowed page sizes by enumeration is too restrictive.

Add new value 0 (= custom paper size) for “paperSize” attribute. Page size will be specified manually using attributes like “pageWidth” and “pageHeight” when this value is used.

Do the same modification also for the corresponding attribute of “pageSetup” element in section 5.7.2.135 (p. 4063).

C34Part 4/5.1.3.4p. 3294te

The standard is not referencing QuickTime specification. Moreover need for QuickTime specific element is not justified as there is already generic element for embedding video data (videoFile).

Provide better explanation why it is necessary to have specific QuickTime element. Add reference to definition of QuickTime format.

C35Part 4/6.1.2.19p. 4653te

Putting XML fragment into an attribute value is completely unacceptable.

Use nested element instead of equationxml attribute. This change will allow to directly represent mathematical equation in XML syntax without need for escaping. We will not insist on this change if VML is moved into a separate Technical Report as suggested in one of previous comments.

C36Part 4/6.1.2.19p. 4655te

Putting XML fragment into an attribute value is completely unacceptable.

Use nested element instead of gfxdata attribute for storing direct representation of XML. We will not insist on this change if VML is moved into a separate Technical Report as suggested in one of previous comments.

C37Part 4/6.4.3.1p. 4955/l. 17te

It is not clear whether and how other formats like PNG or EPS can be used for storing clipboard data.

Modify description in such way that it is clear that any bitmap format can be used for “Bitmap” type and that any metaformat can be used for “Pict” type. Change remaining types in the same fashion. Accompany each clipboard format type with several examples of possible image formats, for example PNG, BMP, GIF and JPEG for “Bitmap” type and EMF, EPS and SVG for “Pict” type.

Alternatively, consider using more general content type identification mechanism based on MIME types (like image/png).

Add example showing how to represent PNG image stored inside clipboard.

C38Part 4/7.4.2.4p. 5122te

Escape mechanism does not define escaping for “_” character.

Add escaping definition for “_” character.

C39Part 4/7.4.2.5p. 5122te

It is not clear what the purpose of “cf” element is. Is it used for holding clipboard content or is it used only for identification of clipboard data format? The standard does not justify needs for such element in an interchange format like OOXML.

Clarify element definition and its usage.

C40Part 4/3.13.12p. 2471te

Text encoding specified for “textPr” element should not use codePage attribute which can contain only one from predefined codes. Encoding should be specified using character encoding names registered at IANA instead.

Replace “codePage” attribute with “encoding” attribute. Value of this attribute can be any encoding name from the corresponding IANA registry (http://www.iana.org/assignments/character-sets).

C41Part 2/8.1.1.2 te

Part names are compared case insensitively but only for ASCII characters. Why is comparison not case insensitive for all Unicode/ISO 10646 characters which are available in both lowercase and uppercase variants?

Clarify this conflict or define comparison as case sensitive.

C42Part 2p. 37te

It should be possible to attach additional metadata like language to each keyword stored inside “keywords” element.

Change content model of “keywords” element to mixed content in which subelements can be used to markup individual keywords and to attach additional text properties to each keyword.

C43Part 3/2.6.2p. 21te

Precise algorithm for extracting custom XML markup from document is not defined.

Define algorithm for converting custom XML markup into a standalone XML document.

C44Part 4/2.6.13p. 631te

“w” and “h” attributes are optional and it is not defined how to compute their value from value of “code” attribute.

Either “w” and “h” attributes should be required or it should be defined how to compute page size from the value of “code” attribute.

It is not clear what the purpose of “code” attribute is. Improve its description.

C45  te

The standard uses several different length units—for example font size is specified using half pt (see ST_HpsMeasure Part 4/2.18.48/p. 1742), DrawingML uses EMU unit (see ST_Coordinate data type Part 4/5.1.12.16/p. 3694) and 100th of point (see ST_TextPoint data type Part 4/5.1.12.75/p. 3861). On other place twips unit (see ST_TwipsMeasure Part 4/2.18.105/p. 1836) is used. Although usage of such different units might have some benefits like suitable scale or elimination of rounding errors it would be very useful if any length value can be specified using any common length unit.

Modify all length data types to support also values with specified measure unit. At least the following units should be supported: cm, mm, in, pc and pt. These units must be recognized during document loading but they do not have to be preserved during editing session. When saving a default unit for given length data type might be used.

C46Part 4/2.3.1.16 te

Characters are enumerated only by showing their glyph which is not always unambiguous.

Add corresponding Unicode/ISO 10646 code point to each character.

C47Part 4/2.3.1.21p. 97/l. 19–20ed

Definition of “hanging punctuation” is meaningless. Punctuation is always on the same line as related text, the only difference is that hanging punctuation can be shifted out from normal printing area to gain better visual appearance.

 
C48Part 4/2.3.1.7p. 52/l. 16ed

Element description should be border bottom not border between.

Correct text and all occurrences where this erroneous text is referenced.

C49Part 4/2.14.26p. 1090ed

Version of SQL language which can be used for writing queries is unspecified.

Add reference to the corresponding SQL standard.

C50Part 4/2.15.1.1p. 1106te

“dllVersion” attribute which specifies version of grammar checker module is too platform dependent.

Use more general mechanism. Change data type of attribute to string.

C51Part 4/2.15.1.1p. 1107te

The standard does not define how to allocate codes for “vendorID” attribute.

Use more general mechanism. Change data type of attribute to string.

C52Part 4/2.15.1.6p. 1113/l. 10te

Platform dependent path is used.

Specify all paths and addresses using URI syntax.

C53Part 4/2.15.1.28p. 1158/l. 19te

Text assumes that Unicode string is represented using UCS-2 encoding where each character is stored in exactly two bytes. Nowadays Unicode contains almost 100000 characters and other encodings with full Unicode coverage like UTF-16 have to be used. In UTF-16 some characters are stored in four bytes using surrogate pairs.

Specify which encoding is used for Unicode string representation. Instead of using high and low bytes base description on octet positions.

C54Part 4/2.15.1.88p. 1254ed

The fact that “summaryLength” element contains percentage value is described only in example.

Improve description of the corresponding data type in such way that it is clear that value is specified as percentage.

C55Part 4/2.15.1.89p. 1256ed

It is not apparent from the description of “themeFontLang” element that it can be used together with “bidi” and “eastAsia” attributes and what is meaning of those attributes.

 
C56Part 4/5.1.12.51p. 3763ed

Fill patterns are not sufficiently defined using sample images only.

Provide electronic representation of fill patterns in appendix.

C57Part 4/2.3.2.25 ge

There is no text run property for specifying whether given piece of text should be translated during localization process. This functionality is very important in environments where texts are routinely translated to many other languages, for example in EU.

Add new property for specifying whether given run of text should be translated during document localization. Proposed mechanism should be compatible with ITS markup (http://www.w3.org/TR/its/).

C58Part 1p. 57/l. 29ed

There are missing quotes around attribute value.

 
C59Part 1p. 139/l. 9ed

In URL forward slashes (“/”) should be used to separate path parts instead of backslashes (“\”).

 
C60Part 1p. 149/l. 27ed

There is an excessive comma before word “core”.

 
C61Part 2p. 27/l. 18ed

There is an excessive second period at the end of sentence.

 
C62Part 3p. 4/l. 1–7ed

Provided XML example is not well-formed. Several attribute values are not enclosed in quotes, there is some strange text “[3204]” in place where only attributes can occur.

 
C63Part 3p. 19/l. 36ed

Text mentions “CNTS” ticker but example on the following page shows “MSFT” ticker.

 
C64Part 3p. 40/l. 31ed

There is an excessive file path artifact before “<w:style>” element.

 
C65Part 3p. 209/l. 26ed

In URL forward slashes (“/”) should be used to separate path parts instead of backslashes (“\”).

 
C66Part 3p. 217/l. 4ed

Correct “xpath” spelling is “XPath” (note the first two uppercased letters).

 
C67Part 3p. 217/l. 5ed

There is an error in XPath expression. “@type” should be preceded by “/” to separate it from the start of location path.

 
C68Part 3p. 217/l. 8ed

There is an error in XPath expression. “@currency” should be preceded by “/” to separate it from the start of location path.

 
C69Part 4p. 17/l. 2ed

“This element specifies the background information for this document.”—obviously “this document” should be replaced by the appropriate object.

 
C70Part 4p. 82/l. 2ed

“all lines for this page” → “all lines of this paragraph”

 
C71Part 4p. 85/l. 8–9ed

Usage of terms “Chinese PRC” and “Chinese Taiwan” is not consistent with the common practice and rest of the standard. Use terms “Simplified Chinese” and “Traditional Chinese” instead.

 
C72Part 4p. 230/l. 8ed

Example shows how to specify kerning value, but we are inside description of font size element. There are more instances of this error because examples for attribute with the same name (e.g. “val”) are somehow shared and reused.

 
C73Part 4p. 631/l. 2ed

There is an excessive backslash at the end of sentence.

 
C74Part 4p. 1965/l. 16—p. 1966/l. 23ed

Ampersand character (“&”) should not be escaped when it is not part of XML source listing.

 
C75Part 5p. 9/l. 30ed

The paragraph is broken in the middle of “docume-nt” word.

 
blog comments powered by Disqus
Copyright © Jiří Kosek, 2006–2014