Compound File Binary Format: Difference between revisions

Content deleted Content added
Kukini (talk | contribs)
adding notes section and arranging
No edit summary
 
(63 intermediate revisions by 43 users not shown)
Line 1:
{{Short description|Compound document file format}}
[[Category:Computer file formats]]
{{Use dmy dates|date=December 2019}}
[[Category:Container formats]]
'''Compound File Binary Format''' (CFBF), also called '''Compound File''', '''Compound Document format''',<ref>{{cite web|url=http://poi.apache.org/poifs/index.html|title=Apache POI – POIFS|publisher=POI Project|accessdate=10 May 2011|archive-url=https://web.archive.org/web/20110426150340/http://poi.apache.org/poifs/index.html|archive-date=26 April 2011|url-status=dead}}</ref> or '''Composite Document File V2'''<ref>{{cite web
{{compu-storage-stub}}
|url=https://linuxconfig.org/how-to-convert-documents-between-libreoffice-and-microsoft-office-file-formats-on-linux
|title=How to convert documents between LibreOffice and Microsoft Office file formats on Linux
|accessdate=25 November 2016
|archive-url=https://web.archive.org/web/20190921163547/https://linuxconfig.org/how-to-convert-documents-between-libreoffice-and-microsoft-office-file-formats-on-linux
|archive-date=21 September 2019
|url-status=dead
}}</ref> (CDF), is a [[compound document|compound]] [[document file format]] for storing numerous files and streams within a single file on a disk. CFBF is developed by [[Microsoft]] and is an implementation of Microsoft [[COM Structured Storage]].<ref>{{cite web
|url=http://msdn.microsoft.com/en-us/library/aa378938%28VS.85%29.aspx
|title=Compound Files (Windows)
|work=Microsoft Developers Network (MSDN) library – COM SDK|publisher=Microsoft Corporation|accessdate=23 September 2009|date=20 November 2008}}</ref><ref>{{cite web|url=http://msdn.microsoft.com/en-us/library/ydd3k45e.aspx|title=Containers: Compound Files|work=Microsoft Developers Network (MSDN) library – Visual Studio 2008 documentation|publisher=Microsoft Corporation|accessdate=23 September 2009}}</ref><ref>{{cite web|url=http://msdn.microsoft.com/en-us/library/cc542545%28VS.85%29.aspx|title=Understand Compound Files|work=Microsoft Developers Network (MSDN) library – ActiveDirectory Rights Management|accessdate=23 September 2009|date=25 June 2009}}</ref> The file format is used for storing storage objects and stream objects in a hierarchical structure within a single file.<ref>{{Cite web |date=2020-01-28 |title=Microsoft Compound File Binary File Format, Version 4 |url=https://www.loc.gov/preservation/digital/formats/fdd/fdd000392.shtml |access-date=2024-06-13 |website=www.loc.gov}}</ref>
 
Compound File Binary Format (CFBF) is a format originally developed by Microsoft for on-disk storage of data using the IStorage COM object and the OLE32 StgCreateStorageEx, StgOpenStorageEx, etc API calls. Microsoft havehas opened the format for use by others and it is now used in a variety of programs from [[Microsoft|Microsoft's Word]] and [[Microsoft Access|Access]] to [http://www.businessobjects.com Business Objects].{{Citation needed|date=November 2009}} It also forms the basis of the [[Advanced Authoring Format]].<ref>[http://www.aafassociation.org AMW Association (formerly AAF Association)] {{webarchive|url=https://web.archive.org/web/20000815063147/http://www.aafassociation.org/ |date=15 August 2000 }}</ref>
 
==Overview==
At its simplest, the Compound File Binary Format is a container, with little restriction on what can be stored within it.
 
Internally aA CFBF file structure loosely resembles a [[File Allocation Table|FAT]] [[file filesytemsystem]]. The file is partitioned into ''Sectors'' which are chained together with a ''File Allocation Table'' (FAT,not akato Sectorbe Allocationmistaken Tablewith forthe disambiguationfile system of the same name) which contains chains of sectors related to each file, a ''Directory'' holds information for contained files with a Sector ID (SID) for the starting sector of a FAT chain and so on.
 
==Structure==
The CFBF file consists of a 512-Bytebyte header record followed by a number of sectorsSectors whose size is defined in the header. The literature defines Sectors to be either 512 or 4096 bytes in length, although the format is potentially capable of supporting sectors ranging in size from 128-Bytes bytes upwards, in powers of 2two (128, 256, 512, 1024, etc.). The lower limit of 128 is the minimum required to fit a single directory entry in a Directory Sector.{{Relevance inline|paragraph|reason=MS-CFB standard clearly says that a sector MUST be 512 or 4096, this sentence is encouraging to break the standard|date=November 2016}}
a Directory Sector.
 
There are several types of sector that may be present in a CFBF file:
 
* File Allocation Table (FAT) Sector - contains chains of sector indices much as a FAT does in the FAT/FAT32 filesystems
* MiniFAT Sectors - similar to the FAT but storing chains of mini-sectors within the Mini-Stream
* Double-Indirect FAT (DIFAT) Sector - contains chains of FAT sector indices
* Directory Sector - contains directory entries
* Stream Sector - contains arbitrary file data
* Range Lock Sector - contains the byte-range locking area of a large file
 
More detail is given below for the header and each sector type.
 
===CFBF Headerheader Formatformat===
The CFBF Headerheader occupies the first 512 bytes of the file and information required to interpret the rest of the file. The C-Stylestyle structure declaration below (extracted from the AAFA's Low-Level Container Specification) shows the members of the CFBF header and their purpose:
<syntaxhighlight lang="c">
typedef unsigned long ULONG; // 4 bytes
typedef unsigned short USHORT; // 2 bytes
typedef short OFFSET; // 2 bytes
typedef ULONG SECT; // 4 bytes
typedef ULONG FSINDEX; // 4 bytes
typedef USHORT FSOFFSET; // 2 bytes
typedef USHORT WCHAR; // 2 bytes
typedef ULONG DFSIGNATURE; // 4 bytes
typedef unsigned char BYTE; // 1 byte
typedef unsigned short WORD; // 2 bytes
typedef unsigned long DWORD; // 4 bytes
typedef ULONG SID; // 4 bytes
typedef GUID CLSID; // 16 bytes
 
struct StructuredStorageHeader { // [offset from start (bytes), length (bytes)]
typedef unsigned long ULONG; // 4 Bytes
BYTE _abSig[8]; // [00H,08] {0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1,
typedef unsigned short USHORT; // 2 Bytes
typedef short OFFSET; // 20x1a, 0xe1} for current Bytesversion
typedef ULONG SECT CLSID _clsid; // 4[08H,16] Bytesreserved must be zero (WriteClassStg/
// GetClassFile uses root directory class id)
typedef ULONG FSINDEX; // 4 Bytes
USHORT _uMinorVersion; // [18H,02] minor version of the format: 33 is
typedef USHORT FSOFFSET; // 2 Bytes
typedef USHORT WCHAR; // 2written by reference Bytesimplementation
USHORT _uDllVersion; // [1AH,02] major version of the dll/format: 3 for
typedef ULONG DFSIGNATURE; // 4 Bytes
// 512-byte sectors, 4 for 4 KB sectors
typedef unsigned char BYTE; // 1 Byte
USHORT _uByteOrder; // [1CH,02] 0xFFFE: indicates Intel byte-ordering
typedef unsigned short WORD; // 2 Bytes
USHORT _uSectorShift; // [1EH,02] size of sectors in power-of-two;
typedef unsigned long DWORD; // 4 Bytes
typedef ULONG SID; // 4typically 9 indicating 512-byte Bytessectors
USHORT _uMiniSectorShift; // [20H,02] size of mini-sectors in power-of-two;
typedef GUID CLSID; // 16 Bytes
// typically 6 indicating 64-byte mini-sectors
USHORT _usReserved; // [22H,02] reserved, must be zero
struct StructuredStorageHeader { // [offset from start (bytes), length (bytes)]
ULONG BYTE _abSig[8]_ulReserved1; // [00H24H,0804] {0xd0reserved, 0xcf,must 0x11,be 0xe0, 0xa1, 0xb1,zero
FSINDEX _csectDir; // 0x1a[28H,04] 0xe1}must be zero for current512-byte versionsectors,
CLSID _clsid; // [08H,16]number reservedof mustSECTs bein zerodirectory (WriteClassStg/chain for 4 KB
// GetClassFile uses root directory class id)sectors
FSINDEX USHORT _uMinorVersion_csectFat; // [18H2CH,0204] minor versionnumber of theSECTs format:in the 33FAT ischain
SECT _sectDirStart; // [30H,04] first SECT in the directory chain
// written by reference implementation
DFSIGNATURE USHORT _uDllVersion_signature; // [1AH34H,0204] majorsignature versionused offor thetransactions; dll/format: 3 formust
// 512-byte sectors, 4be forzero. 4The KBreference sectorsimplementation
USHORT _uByteOrder; // [1CH,02] 0xFFFE:does indicatesnot Intelsupport byte-orderingtransactions
ULONG USHORT _uSectorShift_ulMiniSectorCutoff; // [1EH38H,0204] maximum size offor sectorsa inmini power-of-twostream;
// typically 9 indicating 512-byte4096 sectorsbytes
SECT USHORT _uMiniSectorShift_sectMiniFatStart; // [20H3CH,0204] sizefirst of mini-sectorsSECT in power-of-two;the MiniFAT chain
FSINDEX _csectMiniFat; // [40H,04] number of SECTs in the MiniFAT chain
// typically 6 indicating 64-byte mini-sectors
SECT USHORT _usReserved_sectDifStart; // [22H44H,0204] reserved,first SECT in mustthe beDIFAT zerochain
FSINDEX ULONG _ulReserved1_csectDif; // [24H48H,04] reserved,number of SECTs in mustthe beDIFAT zerochain
SECT FSINDEX _csectDir_sectFat[109]; // [28H4CH,04436] mustthe beSECTs zeroof forfirst 512-byte109 FAT sectors,
// number of SECTs in directory chain for 4 KB
// sectors
FSINDEX _csectFat; // [2CH,04] number of SECTs in the FAT chain
SECT _sectDirStart; // [30H,04] first SECT in the directory chain
DFSIGNATURE _signature; // [34H,04] signature used for transactions; must
// be zero. The reference implementation
// does not support transactions
ULONG _ulMiniSectorCutoff; // [38H,04] maximum size for a mini stream;
// typically 4096 bytes
SECT _sectMiniFatStart; // [3CH,04] first SECT in the MiniFAT chain
FSINDEX _csectMiniFat; // [40H,04] number of SECTs in the MiniFAT chain
SECT _sectDifStart; // [44H,04] first SECT in the DIFAT chain
FSINDEX _csectDif; // [48H,04] number of SECTs in the DIFAT chain
SECT _sectFat[109]; // [4CH,436] the SECTs of first 109 FAT sectors
};
</syntaxhighlight>
 
===File Allocation Table (FAT) Sectorssectors===
When taken together as a single stream the collection of FAT sectors define the status and linkage of every sector in the file. Each entry in the FAT is 4 bytes in length and contains the sector number of the next sector in a FAT chain or one of the following special values:
 
* {{Mono|FREESECT}} ({{Mono|0xFFFFFFFF}}) - denotes an unused sector
* {{Mono|ENDOFCHAIN}} ({{Mono|0xFFFFFFFE}}) - marks the last sector in a FAT chain
* {{Mono|FATSECT}} ({{Mono|0xFFFFFFFD}}) - marks a sector used to store part of the FAT
* {{Mono|DIFSECT}} ({{Mono|0xFFFFFFFC}}) - marks a sector used to store part of the DIFAT
 
===MiniFAT Sectors===
''(This section is not yet written. Refer to documentation linked below.)''
 
===Double-Indirect FAT (DIFAT) Sectors===
''(This section is not yet written. Refer to documentation linked below.)''
 
===Directory Sectors===
''(This section is not yet written. Refer to documentation linked below.)''
 
===Stream Sectors===
''(This section is not yet written. Refer to documentation linked below.)''
 
===Range Lock Sector===
{{Expand section|date=November 2009}}
''(This section is not yet complete. Refer to documentation linked below.)''
 
The '''Range Lock Sector''' ''must'' exist in files greater than 2GB in size, and ''must not'' exist in files smaller than 2GB. The Range Lock Sector must contain the byte range 0x7FFFFF00 to 0x7FFFFFFF in the file. This area is reserved by Microsoft's COM implementation for storing byte-range locking information for concurrent access.
 
The '''Range Lock Sector''' must exist in files greater than 2&nbsp;GB in size, and must not exist in files smaller than 2&nbsp;GB. The Range Lock Sector must contain the byte range {{Mono|0x7FFFFF00}} to {{Mono|0x7FFFFFFF}} in the file. This area is reserved by Microsoft's COM implementation for storing byte-range locking information for concurrent access.
==Glossary==
''(This section is not yet complete. Refer to documentation linked below.)''
 
===Glossary===
* ''FAT'' - File Allocation Table, also known as: ''SAT'' - Sector Allocation Table
* ''DIFATFAT'' - Double-Indirect File Allocation Table; also known as ''SAT'' – Sector Allocation Table
* ''DIFAT'' – Double-Indirect File Allocation Table
* ''FAT Chain'' - a group of FAT entries which indicate the sectors allocated to a Stream in the file
* ''StreamFAT Chain'' - a virtualgroup fileof FAT entries which occupiesindicate athe numberSectors ofallocated sectorsto withina Stream in the CFBFfile
* ''Stream'' – a virtual file which occupies a number of Sectors within the CFBF
* ''Sector'' - the unit of allocation within the CFBF, usually 512 or 4096 Bytes in length
* ''Sector'' – the unit of allocation within the CFBF, usually 512 or 4096 Bytes in length
 
==See Alsoalso==
* [[COM Structured Storage]]
* [[Advanced Authoring Format|Advanced Authoring Format (AAF)]]
* [[Advanced Authoring Format]] (AAF)
* [[Cabinet (file format)]]
* [[SNP File Format]]
 
==References==
{{Reflist}}
The following references relate to the Compound File Binary Format format:
 
==External links==
* {{cite web
| accessdate = 2006-05-226 July 2019
| url = https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cfb/53989ce4-7b05-4f8d-829b-d08d6148375b
| format = PDF
| title = [MS-CFB]: Compound File Binary File Format
| publisher = Microsoft
}}
* {{cite web
| accessdate = 22 May 2006
 
| url = http://sc.openoffice.org/compdocfileformat.pdf
| title = Microsoft Compound Document File Format
Line 124 ⟶ 128:
}}
* {{cite web
| accessdate = 2006-05-22 May 2006
| url = http://www.amwa.tv/downloads/specifications/aafcontainerspec-v1.0.1.pdf
| format = PDF
| url = http://www.aafassociation.org/html/specs/aafcontainerspec-v1.0.1.pdf
| title = Advanced Authoring Format Low-Level Container Specification
| work = Microsoft Structured Storage version 3 specification (PDF)
| archive-url = https://web.archive.org/web/20110809045600/http://www.amwa.tv/downloads/specifications/aafcontainerspec-v1.0.1.pdf
| archive-date = 9 August 2011
| url-status = dead
}}
* {{cite web
| accessdate = 6 July 2019
| url = https://www.loc.gov/preservation/digital/formats/fdd/fdd000380.shtml
| title = Microsoft Compound File Binary File Format, Version 3
| publisher = Library of Congress, Digital Formats web site
}}
 
[[Category:Computer file formats]]
==Notes==
[[Category:Digital container formats]]
{{reflist}}
 
==External links==
* Microsoft [http://www.microsoft.com/downloads/details.aspx?FamilyID=b73df33f-6d74-423d-8274-8b7e6313edfb&DisplayLang=en Snapshot Viewer]