Data set (IBM mainframe): Difference between revisions

Content deleted Content added
m WikiCleaner 0.98 - Repairing link to disambiguation page - You can help!
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5
 
(113 intermediate revisions by 62 users not shown)
Line 1:
{{otheruses4Short description|mainframeType of computer file|a generalexisting meaningon inIBM computingmainframe field|Dataoperating setsystems}}
{{about|computer files|data communications|modem}}
A '''data set''', or '''dataset''', is a [[computer file]] having a [[record-oriented file|record organization]]. The term pertains to the [[IBM]] [[mainframe computer|mainframe]] operating system line, starting with [[OS/360]], and is still used by its successors, including the current [[z/OS]]. Those systems historically preferred this term over a ''file''. Data set is typically stored on [[direct access storage device]] (DASD) or [[magnetic tape]].
 
In the context of [[IBM]] [[mainframe computer]]s in the [[IBM System/360]] line and its successors, a '''data set''' (IBM preferred) or '''dataset''' is a [[computer file]] having a [[record-oriented file|record organization]]. Use of this term began with, e.g., [[DOS/360]] and [[OS/360]], and is still used by their successors, including the current [[VSE (operating system)|VSE]] and [[z/OS]]. Documentation for these systems historically preferred this term rather than ''[[computer file|file]]''.
Datasets are not unstructured streams of [[byte]]s, but rather are organized in various logical record and block structures determined by the <code>DSORG</code> (data set organization), <code>RECFM</code> (record format), and other parameters. These parameters are specified at the time of the data set allocation (creation), for example with the [[Job Control Language]] <code>DD</code> statements. Inside a job they are stored in the [[Data Control Block]] (DCB), which is a data structure used to access datasets, for example using [[access method]]s.
 
A data set is typically stored on a [[direct access storage device]] (DASD) or [[magnetic tape]],<ref>{{cite web
== Dataset organization ==
|url=https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.zconcepts/zconcepts_172.htm
{{Mainframe I/O access methods}}
|title=What is a catalog?
In OS/360, the DCB's DSORG parameter specifies how the dataset is organized. It may be physically sequential ("PS"), indexed sequential ("IS"), partitioned ("PO"), or Direct Access ("DA"). Datasets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed, and in particular, how it is to be updated.
|website=[[IBM]]
|quote=Cataloging of data sets on magnetic tape ...}}</ref> however unit record devices, such as punch card readers, card punches, line printers and page printers can provide input/output (I/O) for a data set (file).<ref>{{cite web|url=http://publib.boulder.ibm.com/infocenter/zvm/v5r4/index.jsp?topic=/com.ibm.zvm.v54.hcpa7/hcse7b3050.htm|title=IBM Knowledge Center - Home of IBM product documentation|website=publib.boulder.ibm.com}}</ref>
 
Data sets are not unstructured streams of [[byte]]s, but rather are organized in various logical record<ref>{{cite web
Programmers utilize various [[access method]]s (such as [[QSAM]] or [[VSAM]]) in programs reading and writing data sets, their choice depending on given data set organization.
|url=https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.zconcepts/zconc_datasetintro.htm
|title=What is a data set? |website=[[IBM]] |quote=data set .. a file that contains one or more records.}}</ref> and block structures determined by the <code>DSORG</code> (data set organization), <code>RECFM</code> (record format), and other parameters. These parameters are specified at the time of the data set allocation (creation), for example with [[Job Control Language]] <code>DD</code> statements. Within a running program they are stored in the [[Data Control Block]] (DCB) or Access Control Block (ACB), which are data structures used to access data sets using [[access method]]s.
 
Records in a data set may be fixed, variable, or “undefined” length.<ref>{{cite web
== Record format (RECFM) ==
|url=https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.zconcepts/zconcepts_159.htm
|title=Data set record formats
|website=[[IBM]]
|quote=Records are either fixed length or variable length in a given data set.}}</ref>
 
==Data set organization==
Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the dataset. This is specified in the DCB <code>RECFM</code> parameter. <code>RECFM=F</code> means that the records are of fixed length, specified via the <code>LRECL</code> parameter, and <code>RECFM=V</code> specifies a variable-length record. V records when stored on media are prefixed by a Record Descriptor Word (RDW) containing the integer length of the record in bytes. With <code>RECFM=FB</code> and <code>RECFM=VB</code>, multiple logical records are grouped together into a single [[Block (data storage)|physical block]] on tape or disk. FB and VB are <code>fixed-blocked</code>, and <code>variable-blocked</code>, respectively. The <code>BLKSIZE</code> parameter specifies the maximum length of the block. <code>RECFM=FBS</code> could be also specified, meaning <code>fixed-blocked standard</code>, meaning the all blocks except the last one were required to be in full <code>BLKSIZE</code> length. <code>RECFM=VBS</code>, or <code>variable-blocked spanned</code>, means a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one.
For OS/360, the DCB's <code>DSORG</code> parameter specifies how the data set is organized. It may be<ref>{{cite manual
| title = IBM System/3S0 Operating System: Job Control Language Reference - OS Release 21.7
| id = GC28-6704-4
| section = Section IV: The DD Statement -- DCB Parameter
| section-url = http://bitsavers.org/pdf/ibm/360/os/R21.7_Apr73/GC28-6704-4_OS_JCL_Aug76.pdf#page=138
| pages = 138–139
| url = http://bitsavers.org/pdf/ibm/360/os/R21.7_Apr73/GC28-6704-4_OS_JCL_Aug76.pdf
| series = IBM Systems Reference Library
| publisher = IBM
}}
</ref>
;CQ
:[[Queued Telecommunications Access Method]] (QTAM) in Message Control Program (MCP)
;CX
:Communications line group
;DA
:[[Basic Direct Access Method]] (BDAM)
;GS
:Graphics device for Graphics Access Method(GAM)
;IS
:[[Indexed Sequential Access Method]] (ISAM)
;MQ
:QTAM message queue in application
;PO
:Partitioned Organization
;PS
:Physical Sequential
among others.
Data sets on tape may only be <code>DSORG=PS</code>. The choice of organization depends on how the data is to be accessed, and in particular, how it is to be updated.
 
Programmers utilize various [[access method]]s (such as [[Queued Sequential Access Method|QSAM]] or [[VSAM]]) in programs for reading and writing data sets. Access method depends on the given data set organization.
This mechanism eliminates the need for using any "delimiter" byte value to separate records. Thus data can be of any type, including binary integers, floating point, or characters, without introducing a false end-of-record condition. The data set is an abstraction of a collection of records, in contrast to files as unstructured streams of bytes.
 
==Record format (RECFM)==
== Partitioned datasets ==
Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the data set. This is specified in the DCB <code>RECFM</code> parameter. <code>RECFM=F</code> means that the records are of fixed length, specified via the <code>LRECL</code> parameter. <code>RECFM=V</code> specifies a variable-length record. V records when stored on media are prefixed by a Record Descriptor Word (RDW) containing the integer length of the record in bytes and flag bits. With <code>RECFM=FB</code> and <code>RECFM=VB</code>, multiple logical records are grouped together into a single [[Block (data storage)|physical block]] on tape or DASD. FB and VB are <em>fixed-blocked</em>, and <em>variable-blocked</em>, respectively. <code>RECFM=U</code> (undefined) is also variable length, but the length of the record is determined by the length of the block rather than by a control field.
 
The <code>BLKSIZE</code> parameter specifies the maximum length of the block. <code>RECFM=FBS</code><ref>{{cite web
For example, a '''PDS''' or '''Partitioned Data Set''' is a dataset containing multiple ''members'', each of which holds a separate sub-data set, similar to a [[directory (file systems)|directory]] in other types of [[file system]]s. This type of dataset is often used to hold executable programs (''load modules''), source program libraries (especially Assembler macro definitions). A PDS is most somewhat analogous to a [[ZIP (file format)|Zip]] file on [[microcomputer]]s, except the files stored in a PDS are not compressed.
|url=https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_72/rzatb/rbfvbs.htm
|title=Example: Record format VBS
|website=[[IBM]]
|quote=Variable-length, blocked, spanned (VBS)}}</ref> could be also specified, meaning <em>fixed-blocked standard</em>, meaning all the blocks except the last one were required to be in full <code>BLKSIZE</code> length. <code>RECFM=VBS</code>, or <em>variable-blocked spanned</em>, means a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one.
 
This mechanism eliminates the need for using any "[[delimiter]]" byte value to separate records. Thus data can be of any type, including binary integers, floating-point, or characters, without introducing a false end-of-record condition. The data set is an abstraction of a collection of records, in contrast to files as unstructured streams of bytes.
The Partitioned Data Set can only allocate on a single volume with the maximum size of 65536 tracks.
 
== Partitioned data set ==
Besides members, a PDS consists also of their directory. Each member can be accessed directly using the directory structure. Once a member is located, the data stored in that member is handled in the same manner as a PS (sequential) data set.
{{anchor|Partitioned dataset}}{{anchor|Partitioned datasets}}{{anchor|Partitioned data sets}}
<!--confused? Someone dropped (an) anchor here, so let's honor it -->
{{confused|Passive data structure}}
A '''partitioned data set''' ('''PDS''')<ref>{{cite book
| title = z/OS DFSMS Using Data Sets Version 2 Release 3
| id = SC23-6855-30
| date = October 2, 2018
| section = Structure of a PDS
| url = https://www-01.ibm.com/servers/resourcelink/svc00100.nsf/pages/zOSV2R3sc236855/$file/idad400_v2r3.pdf
| mode = cs2
}}</ref>
is a data set containing multiple ''members'', each of which holds a separate sub-data set, similar to a [[directory (file systems)|directory]] in other types of [[file system]]s. This type of data set is often used to hold ''load modules'' (old format bound executable programs), source program libraries (especially Assembler macro definitions), [[ISPF]] screen definitions, and [[Job Control Language]]. A PDS may be compared to a [[ZIP (file format)|Zip]] file or [[COM Structured Storage]].
 
A Partitioned Data Set can only be allocated on a single volume and have a maximum size of 65,535 tracks.
Whenever a member is deleted, the space it occupied is unusable for storing other data. Likewise, if a member is re-written, it is stored in a new spot at the back of the PDS and leaves wasted “dead” space in the middle. The only way to recover “dead” space is to perform frequent file compression, that moves all members to the front of the data space and leaves free usable space at the back. (Note that in modern parlance, this kind of operation might be called [[defragmentation]] or [[garbage collection (computer science)|garbage collection]]; [[data compression]] nowadays refers to a different, more complicated concept.) PDS files can only reside on disk in order to use the directory structure to access individual members, not on tape. They are most often used for storing multiple JCL files, utility control statements and executable modules.
 
Besides members, a PDS contains also a directory. Each member can be accessed indirectly via the directory structure. Once a member is located, the data stored in that member are handled in the same manner as a PS (sequential) data set.
An improvement of this scheme is a Partitioned Data Set Extended (PDSE or PDS/E, sometimes just ''libraries'') introduced with [[MVS/XA]] system.
 
Whenever a member is deleted, the space it occupied is unusable for storing other data. Likewise, if a member is re-written, it is stored in a new spot at the back of the PDS and leaves wasted “dead” space in the middle. The only way to recover “dead” space is to perform file compression.<ref name=Stephens>{{cite book|last1=Stephens|first1=David|title=What On Earth is a Mainframe?|date=Oct 2008|publisher=Lulu.com|isbn=978-1-4092-2535-5|page=52|url=https://books.google.com/books?id=1NMYOOW3gHMC|access-date=May 11, 2018}}</ref> Compression, which is done using the [[IEBCOPY]] utility,<ref>{{cite book
PDS/E structure is similar to PDS and is used to store the same types of data. However, PDS/E files have a better directory structure which does not require pre-allocation of directory blocks when the PDS/E is defined (and therefore does not run out of directory blocks if not enough were specified). Also, PDS/E automatically stores members in such a way that compression operation is not needed to reclaim "dead" space. PDS/E files can only reside on disk in order to use the directory structure to access individual members.
| title = z/OS DFSMSdfp Utilities Version 2 Release 3
| id = SC23-6864-30
| date = July 17, 2017
| publisher = IBM Corporation
| section = Compressing a Partitioned Data Set
| quote = A partitioned data set will contain unused areas (sometimes called gas) where a deleted member or the old version of an updated member once resided. This unused space is only reclaimed when a partitioned data set is copied to a new data set, or after a compress-in-place operation successfully completes. It has no meaning for a PDSE and is ignored if requested.
| url = https://www-01.ibm.com/servers/resourcelink/svc00100.nsf/pages/zOSV2R3sc236864/$file/idau100_v2r3.pdf
| mode = cs2
}}</ref>
moves all members to the front of the data space and leaves free usable space at the back. (Note that in modern parlance, this kind of operation might be called [[defragmentation]] or [[garbage collection (computer science)|garbage collection]]; [[data compression]] nowadays refers to a different, more complicated concept.) PDS files can only reside on [[direct access storage device|DASD]], not on [[magnetic tape]], in order to use the directory structure to access individual members. Partitioned data sets are most often used for storing multiple [[job control language]] files, [[IBM mainframe utility programs|utility]] control statements, and executable modules.
 
An improvement of this scheme is a [[Data Facility Storage Management Subsystem (MVS)#PDSE|Partitioned Data Set Extended]] (PDSE or PDS/E, sometimes just ''libraries'') introduced with [[Data Facility Storage Management Subsystem (MVS)#DFSMSdfp|DFSMSdfp]] for [[MVS/XA]] and [[MVS/ESA]] systems. A PDS/E library can store program objects or other types of members, but not both. BPAM cannot process a PDS/E containing program objects.
== See also ==
* [[Volume table of contents]] (VTOC), a structure describing data sets stored on the disk
 
PDS/E structure is similar to PDS and is used to store the same types of data. However, PDS/E files have a better directory structure which does not require pre-allocation of directory blocks when the PDS/E is defined (and therefore does not run out of directory blocks if not enough were specified). Also, PDS/E automatically stores members in such a way that compression operation is not needed to reclaim "dead" space.<ref name=Stephens/> PDS/E files can only reside on DASD in order to use the directory structure to access individual members.
 
== Generation Data Group ==
A '''Generation Data Group'''<ref>{{cite web
|title=Generation Data Groups (GDG's), an Introduction with Examples
|url=http://www.simotime.com/gdgone01.htm
|quote=create and process a Generation Data Group or GDG on ...}}</ref> (''GDG'')<ref>{{cite web
|title=JCL TUTORIAL REFERENCE - Generation Data Groups
|url=http://www.mainframegurukul.com/srcsinc/drona/programming/languages/jcl/jcl.chapter9.html
|quote=Generation Data Groups (GDG)}}</ref> is a group of non-VSAM data sets<ref>{{cite web
|url=https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.zconcepts/zconcepts_175.htm
|quote=... non-VSAM ...
|title=What is a generation data group? |website=IBM.com}}</ref> that are successive generations of historically-related data<ref name=G.sets>{{cite web |title=Generation data sets
|website=[[IBM]] |quote=successive, historically related, |url=https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.ieab500/iea3b5_Generation_data_sets_.htm}}</ref> stored on an IBM mainframe (running [[OS/360 and successors|OS/360 and its successors]] or [[DOS/360 and successors|DOS/360 and its successors]]).<ref name=VSE.VSAM>{{cite web |title=VSE/VSAM Commands |url=http://ftp.www.ibm.com/s390/zos/vse/pdf3/zvse31/doc/iesvoe10.pdf |access-date=2021-10-11 |archive-date=2022-01-31 |archive-url=https://web.archive.org/web/20220131235307/http://ftp.www.ibm.com/s390/zos/vse/pdf3/zvse31/doc/iesvoe10.pdf |url-status=dead }}</ref>
 
A GDG is usually cataloged.<ref name=G.sets/>
 
An individual member of the GDG collection is called a "''Generation Data Set''."<ref name=G.sets/><ref>"A generation data set is one of ...</ref> The latter may be identified by an absolute number, {{code|ACCTG.OURGDG(1234)}}, or a relative number: {{code|(-1)}} for the previous generation, {{code|(0)}} for the current one, and {{code|(+1)}} the next generation.<ref>{{cite web
|url=http://mainframewizard.com/content/what-gdg |title=What is a GDG?}}</ref>
 
A GDG specifies how many generations of a data set are to be kept and at what age a generation will be deleted. Whenever a new generation is created, the system checks whether one or more obsolete generations are to be deleted.
 
The purpose of GDGs is to automate archival, using the command language [[Job Control Language|JCL]], the data set name given is generic. When DSN appears, the GDG data set appears along with the history number, where
 
(0) is the most recent version
 
(-1), (-2), ... are previous generations
 
(+1) a new generation (see DD)
 
Another use of GDGs is to be able to address all generations simultaneously within a JCL script without having to know the number of currently available generations. To do this, you have to omit the parentheses and the generation number in the JCL when specifying the dataset.
 
===GDG JCL & features===
Generation Data Groups are defined using either the BLDG statement<ref>{{cite manual
| title = OS Utilities
| id = GC28-6586-15
| date = April 1973
| edition = Sixteenth
| section = BLDG (Build Generation Index) Statement
| section-url = http://bitsavers.org/pdf/ibm/360/os/R21.7_Apr73/GC28-6586-15_OS_Utilities_Rel_21.7_Apr73.pdf#page=269
| page = 269
| url = http://bitsavers.org/pdf/ibm/360/os/R21.7_Apr73/GC28-6586-15_OS_Utilities_Rel_21.7_Apr73.pdf
| series = IBM Systems Reference Library
| publisher = [[IBM]]
| access-date = May 19, 2022
}}
</ref> of the {{pslink|Support programs for OS/360 and successors|IEHPROGM}} utility or the {{code|DEFINE GENERATIONGROUP}} statement<ref>{{cite manual
| title = OS/VS Access Method Services
| id = GC26-3836-1
| date = May 1974
| edition = Second
| section = Defining a Generation Data Group
| section-url = http://bitsavers.org/pdf/ibm/370/OS_VS2/Release_2_1973/GC26-3836-1_OS_VS_Access_Method_Services_May1974.pdf#page=107
| pages = 107–110
| url = http://bitsavers.org/pdf/ibm/370/OS_VS2/Release_2_1973/GC26-3836-1_OS_VS_Access_Method_Services_May1974.pdf
| series = Systems
| publisher = [[IBM]]
| access-date = May 19, 2022
}}
</ref> of the newer [[IDCAMS]] utility,<ref name=How2>{{cite web
|title=IBM How to create and use Generation Data Groups (GDG)
| website=[[IBM]] | date=2 March 2012 |url=https://www.ibm.com/support/docview.wss?uid=swg21422334
|quote=Create a GDG... IDCAMS will do it}}</ref> which allows setting various parameters.
* {{code|LIMIT(10)}} would limit the number of generations limit to 10.
* {{code|SCRATCH FOR (91)}} would retain each member, up to the limited#generations, at least 91 days.
 
IDCAMS can also delete (and optionally uncatalog) a GDG.<ref>{{cite web
|title=IDCAMS – Create and delete GDG base using JCL
|url=http://code.xmlgadgets.com/2011/05/16/idcams-create-and-delete-gdg-base/comment-page-1}}</ref>
 
====Example====
Creation of a standard GDG for five safety scopes, each at least 35 days old:
<syntaxhighlight lang="jcl">
//STEP1 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE GDG (NAME('DB2.FULLCOPY.DSNDB04.TSTEST') LIMIT(5) SCRATCH FOR(35))
/*
</syntaxhighlight>
 
Delete a standard GDG:
<syntaxhighlight lang="jcl">
//STEP3 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DELETE DB2.FULLCOPY.DSNDB04.TSTEST GDG FORCE
/*
 
</syntaxhighlight>
 
==References==
{{Reflist}}
* [http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246366.html Introduction to the New Mainframe: z/OS Basics] {{Webarchive|url=https://web.archive.org/web/20190425225325/http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246366.html |date=2019-04-25 }}, Ch. 5, "Working with data sets", March 29, 2011. {{ISBN|0738435341}}
 
{{Mainframe I/O access methods}}
 
{{DEFAULTSORT:Data Set (IBM Mainframe)}}
[[Category:Data management]]
[[Category:IBM Mainframe computermainframe operating systems]]
[[Category:Computer file systems]]
[[Category:FilesComputer files]]
[[Category:Articles lacking sources (Erik9bot)]]
 
[[ja:データセット (IBMメインフレーム)]]