Office Open XML

This is an old revision of this page, as edited by Charles Esson (talk | contribs) at 20:50, 6 February 2007 (Criticism: A rebuttal is not a ref, as a second ref it would be good.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Office Open XML (commonly abbreviated as OOXML) is a file format specification for the storage of electronic documents such as memos, presentations, and spreadsheets. The specification was developed by Microsoft for its Microsoft Office product suite and was standardized by Ecma International as Ecma 376 in December 2006.[1]

Office Open XML format uses a ZIP container for packaging XML and other data files.[2] Microsoft stated that its primary goal was backward compatibility with existing documents and full support of the feature set of Microsoft Office.[3]

Office Open XML has been the subject of controversy in the computing industry, with criticism of the document format coming from members of the free software movement, some independent software vendors, industry analysts and Microsoft's competitors Sun Microsystems and IBM (most of whom support OpenDocument).

File format and structure

The Office Open XML file is a ZIP package containing the individual files that form the basis of the document. As well as XML files the ZIP package can also include embedded (binary) files in formats such as PNG, BMP, GIF, WMF, and ISF.

Document markup languages

Office Open XML is a container format for several specialized XML-based document markup languages, roughly corresponding to individual applications within the Microsoft Office product line:

Container structure

A basic Office Open XML file contains an XML file called [Content_Types].xml at the root level of the ZIP package, along with three folders: _rels, docProps, and a directory specific for the document type (for example, in a .docx word processing file that would be a word directory). The word directory contains thewordDocument.xml file which is the core content of the document.

[Content_Types].xml file
This file describes the content of the ZIP package. It also contains a mapping for file extensions and overrides for specific URIs.
_rels Folder
The _rels folders are where one goes to find the relationships for any given part within the package. To find the relationships for a specific part, one looks for the _rels folder that is a sibling of one's part. If the part has relationships, the _rels folder will contain a file that has one's original part name with a .rels appended to it. For example, if the content types part had any relationships, there would be a file called [Content_Types.xml.rels] inside the _rels folder.
_rels/.rel
The root level _rels folder always contains a part called .rels. This URI (/_rels/.rels) and /[Content_Types].xml are the only two reserved URIs for parts in files that adhere to Office Open XML conventions. This is where the "package relationships" are located. Whenever one opens a file using these conventions, one always starts by going to the _rels/.rels file. All relationship files are represented with XML. If one opens it in a text editor, one will see a bunch of XML that outlines each relationship for that part. In a minimal word document containing only the basic wordDocument.xml, the top level parts are two metadata parts, and the wordDocument.xml part.
word/wordDocument.xml
This is the main part for any Word document. If one views it in an XML editor, one will see a pretty basic XML file. The body of the word processing document is contained in this part.

Relationships

Relationship files in Office Open XML

An example relationship file in Office Open XML (for example word/_rels/document.xml.rels)

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<Relationships xmlns="http://schemas.microsoft.com/package/2005/06/relationships">
        <Relationship Id="rId1" Type="http://schemas.microsoft.com/office/2006/relationships/image"
                Target="http://en.wikipedia.org//images/wiki-en.png" TargetMode="External" />
        <Relationship Id="rId2" Type="http://schemas.microsoft.com/office/2006/relationships/hyperlink"
                Target="http://www.wikipedia.org" TargetMode="External" />
</Relationships>

Relationship files allow navigation of the package without having to open up each part. For example, images that are referenced in a wordDocument can be found in the relationship file by looking for all relationships that are of type http://schemas.microsoft.com/office/2006/relationships/image. To point to a different image, you just edit the relationship. This is especially useful for external relationships.

The following code shows an example of inline markup for a hyperlink:

<w:hyperlink w:rel="rId2" w:history="1"> 

In this example, the URL is represented by "rId2". The actual URL is located by the corresponding "rId2" item in the accompanying relationships file. Linked images, templates, and other items are referenced in the same way. The locations of referenced items can be updated by simply editing the relationships file.

Embedded or linked media file relations

Pictures can be embedded or linked in the XML files using a tag:

<v:imagedata w:rel="rId1" o:title="example" />

This is the reference to the image file. In Office Open XML, all references are done via relationships. For example a wordDocument.xml part has a relationship to the image part. The actual URI is located by the corresponding "rId1" item in the accompanying relationships file. There is a _rels folder in the ZIP package, in the same directory as wordDocument.xml. Inside _rels is a file called wordDocument.xml.rels. In this file there will be a relationship definition that contains a type, an ID and a ___location. The ID is the referenced ID used in the XML document. The type will be a reference schema definition for the media type and the ___location will be an internal ___location within the ZIP package or an external ___location defined with an URL.

Standardization

Microsoft stated that Office Open XML would be an open standard, and submitted it to the Ecma standardization process. On the 2005-12-08 Ecma created technical committee 45 (TC45); the press release issued by Ecma the following day stated that TC45 was formed to "produce a formal standard for office productivity applications that is fully compatible with the Office Open XML Formats, submitted by Microsoft". The proposal was co-sponsored by Apple Inc., Barclays Capital, BP, the British Library, Essilor, Intel, Microsoft, NextPage, Statoil ASA and Toshiba [4].

TC45 was chaired by Mr. J. Paoli (Microsoft) and Mrs. I. Valet-Harper (Microsoft), the Vice chair and secretary were Mr. A. Farquhar (British Library) and Mr. J. van den Beld (Ecma)[5]. The committee included representatives from Apple, Canon, Intel, NextPage, Novell, Pioneer, Statoil ASA, Toshiba and The United States Library of Congress [1].

At the General Assembly meeting on 2006-12-07, Ecma International approved Office Open XML as an Ecma standard (Ecma 376).[1] The General Assembly also approved submitting the standard for adoption under the ISO/IEC JTC 1 process.

As an ISO external Category A liaison, Ecma have submitted Ecma 376 to the ISO Fast Track process, the same process available to National Standard Organisations. To meet the requirements of this process [6] Ecma have submitted the documents, "Explanatory report on Office Open XML Standard (Ecma-376) submitted to JTC 1 for fast-track" and "Licensing conditions that Microsoft offers for Office Open XML".

The fast track process allows a 30 day review period by national standardizing bodies (NBs), during this period NBs may identify to the JTC 1 Secretariat any perceived contradiction with other JTC 1, ISO or IEC standards. If such a contradiction is alleged, the matter shall be resolved by the ITTF and JTC 1 Secretariat before ballot voting can commence. Within the 30 day review period a collaborative review was organized by Grokdoc to supply additional material for the national standardizing body reviews. [7] The closing date for the 30 day review was 2007-02-05, at which point the United Kingdom national standardizing body, the British Standards Institution (BSI) had issued a contradiction. [8]


A full copy of Ecma 376 or a copy in bits can be downloaded from Ecma international.

Licensing

The Office Open XML format was initially made available under a free and perpetual license [9]. As there was concern that free and open source software (FOSS) could not use the format under the proposed license [10], Microsoft provided a covenant not to sue.[11] The covenant received a mixed reception, with some in the FOSS community identifying problems [12]and others (such as Lawrence Rosen) endorsing it. [13] In support of the licensing arrangements Microsoft commissioned an analysis from the London legal firm Baker & Mckenzie. [14]

The covenant not to sue was included in documents submitted to ISO in support of the Ecma 376 fast track submission [15].

Adoption

Office Open XML is the default Office 2007 format if macros are not enabled. Microsoft has also released a compatibility pack for older versions. [16] Using the compatibility pack users can create and edit Office Open XML files from within Office 2000, Office XP and Office 2003. The compatibility pack can also be used as a stand alone converter in combination with Office 97.

There is not yet a converter for the Office Open XML format in Office 2004 for Mac OS. Microsoft's Mac OS BU developers, advised users of Office 2007 to save their files in the old Office binary format[17] until a file converter is released.

  • Corel has indicated its WordPerfect Office suite will support Office Open XML.[18]
  • Gnumeric has limited SpreadsheetML markup languange support.[19]
  • Novell has announced that they will be offering a Office Open XML plug in for OpenOffice.org, that the plug in will be released as open source software and that they will submit it for inclusion into the OpenOffice.org project. [20]
  • Maarten Balliauw has created a set of PHP classes to create SpreasheetML markup language documents.[21]
  • Panergy Ltd. has developed a converter from WordprocessingML markup language to Rich Text Format (RTF). The converter, called docXConverter, allows Word versions that are not supported by Microsoft's compatibility pack, e.g. Word 97, to open OOXML files containing WordprocessingML markup language. DocXConverter can be used to transfer WordprocessingML data to other applications that read RTF data. [22]
  • Wouter van Vugt has developed a package explorer that allows you to edit XML parts and validate parts against the Ecma schemas. [23]

Criticism

The Open XML standard has been the subject of wide and varied controversy in the computing industry, particularly from members of the free software movement, independent software vendors,[24] industry analysts and Microsoft's competitors Sun Microsystems and IBM, most of whom favor the OpenDocument format, which is notably present in the freely available OpenOffice.org application suite.

The essential premise behind this criticism, apart from several technical issues, is that Microsoft has standardised its proprietary format in order to prevent the widespread adoption of the Open Document format, which could threaten the dominance of Microsoft's own Office suite. Furthermore, commentators have argued that while competitors will likely implement compatibility of the new standard in their own applications, Microsoft has not released any plans to similarly support the OpenDocument format.[25] As the world's largest software company, Microsoft believes that its royalty-free format will end persistent incompatibility problems in working environments with diverse software applications due to its underlying foundation of strong industry standards such as XML and ZIP.[26]


Voiced criticisms include:

  • The 6000 page specification is too long to evaluate in the 30-day contradiction (only) review and the five-month ballot period.[27]
  • The format specification references external formats which are not part of the Ecma standard, and therefore not covered by covenant not to sue: For example book 4 section 6.4.3.1 Clipboard format types.[28]
  • XML names and inconsistent naming conventions inconstant with XML goal, that is human-legible documents.[29][7]
  • Relies on application-defined behaviors to support important functionality that should be documented or supported via existing standards: For example book 4 section 6.1.2.19 defines the "equationxml" attribute of "shape" elements, "used to rehydrate an equation using the Office Open XML Math syntax" however the "actual format of the contents of this attribute are application-defined".[7]
  • Use of a two-byte language code instead of the ISO 639 two-letter and three-letter language codes (the grokdoc article cites book 4 clause 2.18.52, which describes the ST_LangCode simple type which does use hexadecimal values; however all references to languages in the Word document format use the ST_Lang simple type, which is described in book 4 clause 2.18.51, ST_Lang simple type allows the use of ISO 639).[30]
  • A date format which is not in ISO 8601 is used in spreadsheet cells, the format incorrectly treats 1900 as a leap year; this continues a problem introduced by the once dominant spreadsheet package Lotus 1-2-3.[31]
  • Use of DrawingML and VML instead of SVG, and of a new mathematical format instead of MathML. MathML and SVG are W3C standards.
  • Internal inconsistencies and omissions: For example book 4 section 2.18.4 lists numerous styles such as apples, scaredCat, heebieJeebies, etc. however the specification does not fully define these styles (e.g missing height, width, color-depth, orientation).[7]
  • Inconsistent notations for percentage units: book 4 Section 2.18.85 uses predefined symbols (like "pct15" for 15%) in 5 or 2.5 percent increments, book 4 section 2.15.1.95 uses a decimal number giving the percentage, book 4 section 2.18.97 uses a number in 50ths of a percent, book 4 section 5.1.12.41 uses a number in 1000ths of a percent.[7]
  • Inappropriate application settings: For example book 4 section 2.15.3.16 "doNotLeaveBackslashAlone" is an application setting, not a document setting.[7]
  • Non-XML formatting codes: For example book 4 section 2.16.5.79 "XE" (full name not defined) defines 'b', 'i' as bold and italic, which is contrary to XML and CSS. Similarly book 4 section 2.16.5.76–2.16.5.78 define "\* Caps", "\* FirstCap", "\* Lower", and "\* Upper" to format the capitalization of preceding text.[7]
  • Mismatched example description: Book 4 section 2.16.5.77 about the field USERINITIALS presents an example that uses a field name USERNAME where it should use USERINITIALS.[7]
  • Inflexible numbering format: For example book 4 section 2.18.66 describe a numbering format that is fixed to a few countries, that contradicts W3C XSLT and contradicts Unicode ISO 10646.[7]
  • Throughout the spec Microsoft-specifics are used for VML objects: For example book 4 section 6.2.3.23 uses a microsoft namespace "urn:schemas.microsoft.com:office:office".[7] VML objects are defined for backward compatabilty, the section starts, "to maintain backward compatibility, all VML namespaces defined in this specification maintain the legacy namespace structure already used by millions of documents".
  • Nonstandard, inflexible paper-size naming: For example book 4 sections 3.3.1.61 define a "paperSize" attribute for which values 1 through 68 are predefined standard papersizes like A4 paper.[7]
  • Bitmasks are not extensible and they creates a new data model, separate from the XML data model, however many element attributes are defined as bitmasks: For example book 4 section 2.8.2.16 "sig (Supported Unicode Subranges and Code Pages)" describes the <w:sig> element whose attributes are all bitmasks.[7]
  • Cloning the behavior of proprietary applications: For example, book 4 section 2.15.3.6, autoSpaceLikeWord95, book 4 section 2.15.3.31, lineWrapLikeWord6.[7]

References

  1. ^ a b c "Ecma International approves Office Open XML standard" (Press release). Ecma International. December 7 2006. Retrieved 2006-12-08. {{cite press release}}: Check date values in: |date= (help)
  2. ^ Tom Ngo (December 11 2006). "Office Open XML Overview" (PDF). Ecma International. p. 6. Retrieved 2007-01-23. {{cite web}}: Check date values in: |date= (help)
  3. ^ "Q&A: Microsoft Co-Sponsors Submission of Office Open XML Document Formats to Ecma International for Standardization". Microsoft PressPass (Press release). Microsoft. November 21 2005. Retrieved 2007-01-23. {{cite press release}}: Check date values in: |date= (help)
  4. ^ "The new open standard safeguards the continued use of billions of existing documents". Ecma International. Retrieved 2007-01-28.
  5. ^ "TC45 - Office Open XML Formats". Ecma International. Retrieved 2007-01-28.
  6. ^ "ISO/IEC JTC 1 Directives, 5th Edition, Version 2.0". iso. Retrieved 2007-01-28.
  7. ^ a b c d e f g h i j k l m "EOOXML objections". grokdoc. Retrieved 2007-01-02.
  8. ^ "Microsoft standards bid faces failure". VNU Business Publications Ltd., 32-34 Broadwick Street, London, W1A 2HG. Retrieved 2007-02-06.
  9. ^ Paoli, Jean. "Clarification of License Terms for Office XML Schema". Microsoft. Retrieved 2007-01-23.
  10. ^ "Open XML Incompatible With GPL". eweek. Retrieved 2007-01-29.
  11. ^ "Microsoft Covenant Regarding Office 2003 XML Reference Schemas". Microsoft. Retrieved 2006-07-11.
  12. ^ "2 Escape Hatches in MS's Covenant Not to Sue". Groklaw. Retrieved 2007-01-29.
  13. ^ Berlind, David (November 28 2005). "Top open source lawyer blesses new terms on Microsoft's XML file format". ZDNet. Retrieved 2007-01-27. {{cite web}}: Check date values in: |date= (help)
  14. ^ Baker & McKenzie (2006). "Standardisation and Licensing of Microsoft's Office Open XML Reference Schema" (PDF). Baker & Mckenzie. Retrieved 2007-02-01. {{cite web}}: Unknown parameter |month= ignored (help)
  15. ^ "Licensing conditions that Microsoft offers for Office Open XML". ISO. Retrieved 2007-01-28.
  16. ^ http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=941B3470-3AE9-4AEE-8F43-C6BB74CD1466
  17. ^ http://blogs.msdn.com/macmojo/archive/2006/12/05/converters-coming-free-and-fairly-fast.aspx
  18. ^ "Corel WordPerfect Office To Support Open Document Format and Microsoft Office Open XML". corel. Retrieved 2007-01-30.
  19. ^ "GNOME Office / Gnumeric". GNOME.org. Retrieved 2006-07-28.
  20. ^ "Novell Boosts OpenOffice.org and Microsoft Office Interoperability". Novell. Retrieved 2007-01-30.
  21. ^ "Office 2007 SpreadsheetML classes in PHP". Retrieved 2007-02-01. {{cite web}}: Text "publisher Maarten Balliauw" ignored (help)
  22. ^ "docXConverter - Features". panergy. Retrieved 2007-01-31.
  23. ^ "Package Explorer V2.0". Wouter van Vugt. Retrieved 2007-01-31.
  24. ^ Ben Langhinrichs (October 27 2006). "Self deprecating standards" (HTML). Genii Software Ltd. p. 1. Retrieved 2007-02-01. {{cite web}}: Check date values in: |date= (help)
  25. ^ Walt Hucks (January 20 2007). "Most contrived tech awards" (HTML). Opportunity Knocks. p. 1. Retrieved 2007-02-06. {{cite web}}: Check date values in: |date= (help)
  26. ^ Microsoft (January 1 2007). Microsoft. p. 1 http://office.microsoft.com/en-us/products/HA102058151033.aspx. Retrieved 2007-02-06. {{cite web}}: Check date values in: |date= (help); Missing or empty |title= (help)
  27. ^ "Six thousand pages, one month, no chance..." Retrieved 2007-02-03.
  28. ^ Andrew Updegrove (January 17 2007). "The Contradictory Nature of OOXML". The ConsortiumInfo.org. Retrieved 2007-01-23. {{cite web}}: Check date values in: |date= (help)
  29. ^ "Extensible Markup Language (XML) 1.0 (Fourth Edition)". Retrieved 2007-02-04.
  30. ^ Wouter van Vugt (2007-01-27). "Doug is Evil". Info Support Blog Community. Retrieved 2007-02-02.
  31. ^ Spolsky, Joel (2006-06-16). "My First BillG Review". Joel on Software. Retrieved 2007-01-31.

See also

General Office Open XML

OOXML criticism

Converters