Data set (IBM mainframe)

This is an old revision of this page, as edited by 63.103.206.10 (talk) at 09:30, 22 November 2006 (Partitioned Datasets). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The term data set or dataset is used to refer to files on an IBM mainframe computer, typically stored on DASD or magnetic tape. They are record-oriented files. The term pertains to the IBM mainframe operating systems starting with OS/360, and continued to be used through later systems based on that heritage, MVS system, OS/390, and z/OS.

Unlike files on UNIX systems, they are not unstructured streams of bytes but rather are organized in various logical record and block structures determined by the DSORG (data set organization) and RECFM (record format) parameters of the DCB (Data Control Block). The DCB is a data structure used to access datasets. These parameters may also be specified in the Job Control Language JCL DD statements that are used to allocate them.

Dataset Organization

In OS/360, The DCB's DSORG parameter specifies how the dataset is organized. It may be physically sequential ("PS"), indexed sequential {"IS"), partitioned ("PO"), or Direct Access ("DA"). Datasets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed in, in particular, by how it might be updated.

Record Format (RECFM)

Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the dataset. This is specified in the DCB RECFM parameter. RECFM=F means that the records are of fixed length, specified via the LRECL parameter, and RECFM=V specifies a variable-length record. Variable-length records are prefixed by a "Record Descriptor word" containing the integer length of the record in bytes. Records of format FB and VB are fixed-blocked, and variable-blocked, respectively. This means that multiple logical records are grouped together into a single physical block on tape or disk. The BLKSIZE parameter specifies the maximum length of the block. RECFM could also specify "FBS" meaning Fixed-blocked-standard, meaning the all blocks except the last one were required to be full-length. RECFM=VBS, means Variable-blocked-spanned, meaning that a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one.

This mechanism eliminates the need for using any "delimiter" byte value to separate records. The file is an abstraction of a collection of records, in contrast to the unstructured "stream" of bytes found in systems found in smaller computers such as Unix, Windows, or MacOS. This allows data to be of any type, including binary integers, floating point, or characters, without introducing a false end-of-record condition.