Data set (IBM mainframe): Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 2:
A '''data set''', or '''dataset''', is a [[computer file]] having a [[record-oriented file|record organization]]. The term pertains to the [[IBM]] [[mainframe computer|mainframe]] operating system line, starting with [[OS/360]], and is still used by its successors, including the current [[z/OS]]. Those systems historically preferred this term over a ''file''. Data set is typically stored on [[direct access storage device]] (DASD) or [[magnetic tape]].
 
Datasets are not unstructured streams of [[byte]]s but rather are organized in various logical record and block structures determined by the <code>DSORG</code> (data set organization) and <code>RECFM</code> (record format) parameters of the [[Data Control Block]] (DCB). The DCB<code>DSORG</code>, is<code>RECFM</code> aand data structure used to access datasets. Thesesimilar parameters are specified at the time of the data set allocation (creation), for example with the [[Job Control Language]] DD statements. Then they are stored in the DCB, which is a data structure used to later access datasets.
 
== Dataset Organization ==
{{Template:Mainframe I/O access methods}}
In OS/360, the DCB's DSORG parameter specifies how the dataset is organized. It may be physically sequential ("PS"), indexed sequential ("IS"), partitioned ("PO"), or Direct Access ("DA"). Datasets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed in, and in particular, by how it mightis to be updated.
 
Programmers utilize various ''access methods'' (such as [[QSAM]], [[ISAM]], [[VSAM]]) in programs reading and writing data sets, their choice depending on desiredgiven data set organization. Those include [[QSAM]], [[ISAM]], [[VSAM]], and others.
 
== Record Format (RECFM) ==
 
Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the dataset. This is specified in the DCB <code>RECFM</code> parameter. <code>RECFM=F</code> means that the records are of fixed length, specified via the <code>LRECL</code> parameter, and <code>RECFM=V</code> specifies a variable-length record. V records when stored on media are prefixed by a Record Descriptor Word (RDW) containing the integer length of the record in bytes. With <code>RECFM=FB</code> and <code>RECFM=VB</code>, multiple logical records are grouped together into a single [[Block (data storage)|physical block]] on tape or disk. FB and VB are <code>fixed-blocked</code>, and <code>variable-blocked</code>, respectively. The <code>BLKSIZE</code> parameter specifies the maximum length of the block. <code>RECFM=FBS</code> could be also specified, meaning <code>fixed-blocked- standard</code>, meaning the all blocks except the last one were required to be in full <code>BLKSIZE</code> length. <code>RECFM=VBS</code>, or <code>variable-blocked- spanned</code>, means a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one.
 
This mechanism eliminates the need for using any "delimiter" byte value to separate records. Thus data can be of any type, including binary integers, floating point, or characters, without introducing a false end-of-record condition. The data set is an abstraction of a collection of records, in contrast to files as unstructured streams of bytes.