User:Markf129/Earth sciences data format interoperability: Difference between revisions

Content deleted Content added
Markf129 (talk | contribs)
 
(6 intermediate revisions by 4 users not shown)
Line 1:
{{Userspace draft|date=July 2010}}
When studying the Earth sciences throughby observation or analytical [[model (abstract)|models]], it is often a challenge for both the user and collector on how to best organize and store the vast amount of information available. Different organizations may have specific technical goals, timeline constraints, or model constraints that outoften of necessity derivedrive new unique file conventions, distributions techniques, and architectures. While developing new solutions sometimes solves short term goals, it often causes more complex long term problems when standards are not adhered to<ref>{{cite articlenews
| title = Model Data Interoperability for the United States Integrated Ocean Observing System
| author = Richard P. Signell
Line 6:
| url = http://www.usnfra.org/committees/modeling/signell_final%20report_mar8.pdf
}}
</ref>. In some cases, science data has been migrating less rapidly to a standards-based approach<ref>{{cite articlenews
| title = Standards-based data interoperability in the climate sciences
| author = AndrewWoolf, Ray Cramer, Marta Gutierrez, Kerstin Kleese van Dam, Siva Kondapalli,
Line 13:
| url = http://journals.cambridge.org/action/displayFulltext?type=1&fid=296181&jid=MAP&volumeId=12&issueId=01&aid=296180
}}
</ref>. Because of these issues, interoperability of data for collaboration is critical in building a continued quantitative understanding of the sciences<ref>{{cite articlenews
| title = Achieving interoperability of spatial data
| author = Clemens Portele, Freddy Fierens, Eva Klien
Line 21:
</ref>.
 
Interoperability allowsof dataobservational usersor to view, process, and analyze observationalmodel data ormust sciencebe model output easilyeasy and transparentlytransparent, without having to reformat the
data, write special tools to read or extract the data, or rely on specific proprietary software. If common formats wereare adhered to, many benefits would occur. First, it would promote the exchange of models and relevant science data. Second, observational data could be scaled and compared more easily to models. And third, it would eliminate confusion and unnecessary format conversions. Perhaps the most important reason is the latter, as considerable time can be spent converting between the different data formats<ref>{{cite articlenews
| title = Background on BUFR and GRIB Formats
| author = Doug McLain
Line 31:
 
==Overview and definition==
A [[data model]] (e.g. [[NetCDF]]) describes structured data by providing an unambiguous and neutral view on how the data is organized<ref>{{cite articlenews
| title = DIFFERENCES AMONG THE DATA MODELS USED BY THE GEOGRAPHIC INFORMATION SYSTEMS AND ATMOSPHERIC SCIENCE COMMUNITIES
| author = Stefano Nativi, University of Florence, Prato, Italy and M. B. Blumenthal, J. Caron, B. Domenico, T. Habermann, D. Hertzmann, Y. Ho, R. Raskin, and J. Weber
Line 42:
A [[file format]] defines how data is encoded for storage using a defined structure such as chunk, directory based, or unstructured. Usually the file format is easily identified by the file name extension (e.g. .jpg, .bufr). Thus, the data model describes how the data is organized, and the file format how the data is stored. Furthermore, conventions are used to describe what data types, formats, and design principles are applied for a given data model and/or format (e.g. [[Climate and Forecast Metadata Conventions]]). By identifying these three elements, data can be accurately described.
 
For example, data models contain datasets such as dimensions, variables, types, and attributes. Some models have the ability to even logically put these sets into groups. These components can be used together to capture the meaning of data and relations among data fields in an array-oriented dataset. In contrast to variables, which are intended for bulk data, attributes are intended for ancillary data, or information about the data<ref>{{cite articlenews
| title =
| url = http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/index.html
Line 52:
NetCDF is especially useful for gridded data and time series data, although it can be used with satellite swath data.
 
HDF is very useful in storing complex files with their associated metadata. HDF-EOS provides structural metadata at both the object and file level making it easier for client programs to read it. HDF-EOS defines certain kinds of earth science data objects, and specifies how to organize them in HDF4 and HDF5. HDF-EOS supports grid, swath, and point data.
 
GeoTIFF is a specialization of the TIFF
Line 58:
 
GRIB files contain one or more messages, or records with a single parameter and accompanying grid ___location (which can be a standard grid or user defined). Data is equally spaced at a defined latitude or longitutde step which is contained in the message. A single GRIB file can contain separate records for many different parameters. For examplem one file could contain humidity data for several elevations over several time periods as well as snow depth for the same elevations and time periods.
 
BUFR is the primary format used operationally on the World Meteorological Organization (WMO) Global Telecommunications System for real-time global exchange of weather and satellite observations. BUFR is a self-describing and is table-driven to encode a wide variety of meteorological data: land observations, radar data, climatological data, etc.
 
===Data model relationships===
Line 133 ⟶ 135:
===Coordinate systems===
[[georeference | Georeferencing]] is establishing the relationship between raster or vector images, coordinates, and also when determining the spatial ___location of other geographical features. When translating between different data formats, it is often required to establish a common coordinate system reference. In some cases, additional reference information, such as a [[world file]], may be needed in order to do the translation. For example, challenges occur when grid data is encoded in a "thinned" format, usually in the longitudinal dimension, where interoperability algorithms are needed. When used, translating between the formats will always have trade offs. There are various GIS tools available that can help transform image data to some geographic control framework, like [[ArcGIS|ArcMap]], PCI Geomatica, or [[ERDAS Imagine]].
 
 
 
* NetCDF
Line 212:
{| class="wikitable" style="text-align: center; width: 400px; height: 200px;"
|-
!
! [[File:srcdest.jpg|120px]]
! NetCDFclassic<br>classic<br>CF
! NetCDFenhanced<br>netCDF-4<br>CF
Line 231:
|-
| <b>HDF5<br>HDF5<br>HDF5</b> || No || Yes, but limited<ref>
{{cite articlenews
| url = http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#fv15
}}</ref> || Yes, but limited<ref>
{{cite articlenews
| url = http://www.hdfgroup.org/h5h4-diff.html
}}</ref> || || || || || || ||
|- valign="top" style="background: #cccccc;"
| <b>HDFEOS2<br>HDF4<br>HDF4</b> || || || || || || || Convert<ref>{{cite articlenews
| url = http://newsroom.gsfc.nasa.gov/sdptoolkit/HEG/HEGHome.html
}}</ref> || || ||
Line 249:
|- valign="top" style="background: #cccccc;"
| <b>GRIB2<br>GRIB2<br>GRIB2</b> || || || || || || || || Yes, but limited<ref>
{{cite articlenews
| url = http://www.ecmwf.int/publications/manuals/grib_api/conversion.html
}}</ref> || ||
Line 260:
{| class="wikitable" style="text-align: center; width: 400px; height: 200px;"
|-
!
! [[File:srcdest.jpg|120px]]
! NetCDFclassic<br>classic<br>CF
! NetCDFenhanced<br>netCDF-4<br>CF
Line 294:
 
==Data type representations==
For any given data stream there may be ambiguities regarding the appropriate structural data type to be used. As a general rule, the best way to resolve this ambiguity is to choose the most highly ordered data type that could describe the data.<ref>{{cite articlenews
| author = U.S. Department of Commerce
| year = 2006
Line 303:
The table below lists some of the structural data types, and their respective recommended data formats. The data formats are defined in three lines: the data model, file format, and convention.
 
{|type= class="wikitable sortable"
{{Table
|+ Structural data types and formats
|type=class="wikitable sortable"
|title=! Structural dataData typesClass!!Descriptions and subclasses!!Common formats
|-
|hdrs= Structural Data Class!!Descriptions and subclasses!!Common formats
|row1= Grids{{!!}} || rectilinear grids, curvilinear grids, finite element meshes outputs, “unstructured” grids (variable numbers of vertices)
||
{{!!}}
|-
|row2= Moving-sensor multidimensional fields{{!!}} || swaths, radials{{!!}} ||
|row3=Time series{{!!}}time-ordered sequence of records associated with a point in space or a more complex spatial feature{{!!}}
|-
|row4=Profiles{{!!}}height-or depth-ordered sequence of records at a fixed (or approximately fixed) point in time and position in lat/long{{!!}}
|row5=Trajectories{{!!}} Time series || time-ordered sequence of records alongassociated with a pathpoint throughin space{{!!}} or a more complex spatial feature ||
|-
|row6=Geospatial Framework Data{{!!}}lines, polygonal regions, map annotations{{!!}}
|row4= Profiles{{!!}} || height-or depth-ordered sequence of records at a fixed (or approximately fixed) point in time and position in lat/long{{!!}} ||
|row7=Point Data{{!!}}scattered points{{!!}}
|-
|row8=Metadata{{!!}}“data about data” – context information needed for the interpretation of data{{!!}}
| Trajectories || time-ordered sequence of records along a path through space ||
}}
|-
|row6= Geospatial Framework Data{{!!}} || lines, polygonal regions, map annotations{{!!}} ||
|-
|row7= Point Data{{!!}} || scattered points{{!!}} ||
|-
|row8= Metadata{{!!}} || “data about data” – context information needed for the interpretation of data{{!!}} ||
}|}
 
==Interoperability guidelines==
Data interoperability is critical to integrate different models, tools, and perspectives in order to collaborate effectively. Data must be taken from multiple sources in order to study the Earth sciences as a system rather than individual components. In many cases the chosen data types are the natural consequence of the manner in which the data is collected. However, without some sort of strict standard or policy, the ability to utilize observations and model data diminishes. The next best alternative is to incorporate best practices or established conventions (such as in climatology the [[Climate and Forecast Metadata Conventions]]). For example, the Hierarchical Data Format (HDF) is the standard data format for all NASA Earth Observing System (EOS) data products<ref>{{cite articlenews
| title = Hierarchical Data Format - Earth Observing System (HDF-EOS)
| url = http://nsidc.org/data/hdfeos/
Line 341 ⟶ 348:
 
<!--
[[:Category:Data types]]
[[:Category:Computer file formats]]
[[:Category:Science software]]
-->