Content deleted Content added
No edit summary |
categorization/tagging using AWB |
||
Line 1:
Array database management systems (DBMSs) provide [[
▲Array database management systems (DBMSs) provide [[Database_management_system|database]] services specifically for [[Array|arrays]] (also called [[Raster_graphics|raster data]]), that is: homogeneous collections of data items (often called [[Pixel|pixels]], [[Voxel|voxels]], etc.), sitting on a regular grid of one, two, or more dimensions.
Often arrays are used to represent sensor, simulation, image, or statistics data.
Such arrays tend to be [[
Array databases aim at offering flexible, scalable storage and retrieval on this information category.
[[Image:
== Overview ==
In the same style as standard [[
As in practice arrays never appear standalone, such an array model normally is embedded into some overall data model, such as the relational model.
Some systems implement arrays as an analogy to tables, some introduce arrays as an additional attribute type.
Line 18 ⟶ 16:
To this end, arrays get partitioned, during insertion, into so-called ''tiles'' or ''chunks'' of convenient size which then act as units of access during query evaluation.
Array DBMSs offer [[
Like with, e.g., [[SQL]], expressions of arbitrary complexity can be built on top of a set of core array operations.
Due to the extensions made in the data and query model, Array DBMSs sometimes are subsumed under the [[NoSQL]] category, in the sense of "not only SQL".
Query [[
Important application domains of Array DBMSs include Earth, Space, Life, and Social sciences, as well as the related commercial applications (such as [[
The variety occurring can be observed, e.g., in geo data where 1-D environmental sensor time series, 2-D satellite images, 3-D x/y/t image time series and x/y/z geophysics data, as well as 4-D x/y/z/t climate and ocean data can be found.
== History and Status ==
The [[
Another option is to resort to [[
First significant work in going beyond BLOBs has been established with PICDMS.<ref>Chock, M., Cardenas, A., Klinger, A.: Database structure and manipulation capabilities of a picture database management system (PICDMS). IEEE ToPAMI, 6(4):484-492, 1984</ref>
A first declarative query language suitable for multiple dimensions and with an algebra-based semantics has been published by [[
Another array database language, constrained to 2-D, has been presented by Marathe and Salem.<ref>Marathe, A., Salem, K.: A language for manipulating arrays. Proc. VLDB’97, Athens, Greece, August 1997, pages 46 - 55</ref>
Seminal theoretical work has been accomplished by Libkin et al.;<ref>Libkin, L., Machlin, R., Wong, L.: A query language for multidimensional arrays: design, implementation and optimization techniques. Proc. ACM SIGMOD’96, Montreal, Canada, pp. 228 - 239</ref>
A map algebra, suitable for 2-D and 3-D spatial raster data, has been published by Mennis et al.<ref>Mennis, J., Viger, R., Tomlin, C.D.: Cubic Map Algebra Functions for Spatio-Temporal Analysis. Cartography and Geographic Information Science 32(1)2005, pp. 17 - 32</ref>
In terms of Array DBMS implementations, the [[rasdaman]] system has the longest implementation track record of n-D arrays with full query support.
[[
[[TerraLib]] is an open-source GIS software that extends object-relational DBMS technology to handle spatio-temporal data types; while main focus is on vector data, there is also some support for rasters.
Starting with version 2.0, [[Postgis|PostGIS]] embeds raster support for 2-D rasters; a special function offers declarative raster query functionality.
SciQL is an array query language being added to the [[MonetDB]] DBMS. [[Michael_Stonebraker#SciDB|SciDB]] is a more recent initiative to establish array database support. Like SciQL, arrays are seen as an equivalent to tables, rather than a new attribute type as in rasdaman and PostGIS.
For the special case of [[
As this technique does not scale in density, standard databases are not used today for dense data, like satellite images, where most cells carry meaningful information; rather, proprietary ad-hoc implementations prevail in scientific data management and similar situations. Hence, this is where Array DBMSs can make a particular contribution.
Generally, Array DBMSs are an emerging technology. While operationally deployed systems exist, like [[
== Concepts ==
Line 60 ⟶ 57:
Examples for ''X'' include {0..767} × {0..1023} (for [[Xga#Extended_graphics_array|XGA]] sized images), examples for ''V'' include {0..255} for 8-bit greyscale images and {0..255} × {0..255} × {0..255} for standard [[RGB]] imagery.
Following established database practice, an array query language should be [[
As iteration over an array is at the heart of array processing, declarativeness very much centers on this aspect. The requirement, then, is that conceptually all cells should be inspected simultaneously - in other words, the query does not enforce any explicit iteration sequence over the array cells during evaluation.
Evaluation safety is achieved when every query terminates after a finite number of (finite-time) steps; again, avoiding general loops and recursion is a way of achieving this.
At the same time, avoiding explicit loop sequences opens up manifold optimization opportunities.
=== Array Querying ===
Line 108 ⟶ 104:
</source>
Through a principle called ''induced operations'',<ref>Ritter, G. and Wilson, J. and Davidson, J.: Image Algebra: An Overview. Computer Vision, Graphics, and Image Processing, 49(1)1994, 297-336</ref>
Hence, on numeric values all the usual unary and binary arithmetic, exponential, and trigonometric operations are available in a straightforward manner, plus the standard set of Boolean operators.
Line 151 ⟶ 147:
Commonly arrays are partitioned into sub-arrays which form the unit of access.
Regular partitioning where all partitions have the same size (except possibly for boundaries) is referred to as ''chunking''.<ref>Sarawagi, S., Stonebraker, M.: Efficient Organization of Large Multidimensional Arrays. Proc. ICDE'94, Houston, USA, 1994, pp. 328-336</ref>
Compression of tiles can sometimes reduce substantially the amount of storage needed. Also for transmission of results compression is useful, as for the large amounts of data under consideration networks bandwidth often constitutes a limiting factor.
Line 159 ⟶ 155:
A tile-based storage structure suggests a tile-by-tile processing strategy (in [[rasdaman]] called ''tile streaming''). A large class of practically relevant queries can be evaluated by loading tile after tile, thereby allowing servers to process arrays orders of magnitude beyoned their main memory.
[[File:
Due to the massive sizes of arrays in scientific/technical applications in combination with often complex queries, optimization plays a central role in making array queries efficient. Both hardware and software parallelization can be applied. An example for heuristic optimization is the rule "averaging over an array resulting from the cell-wise addition of two input images is equivalent to adding the averages of each input array". By replacing the left-hand variant by the right-hand expression, costs shrink from three (costly) array traversals to two array traversals plus one (cheap) scalar operation (see Figure, which uses the [[rasdaman]] query language introduced before).
Line 174 ⟶ 170:
These are but examples; generally, arrays frequently represent sensor, simulation, image, and statistics data.
More and more spatial and time dimensions are combined with ''abstract'' axes, such as sales and products; one example where such abstract axes are explicitl< foreseen is the [Open_Geospatial_Consortium |Open Geospatial Consortium] (OGC) [[
== Standardization ==
Many communities have established data exchange formats, such as [[Hdf|HDF]], [[Netcdf|NetCDF]], and [[
A de facto standard in the Earth Science communities is [[Opendap|OPeNDAP]], a data transport architecture and protocol. While this is not a database specification, it offers important components that characterize a database system, such as a conceptual model and client/server implementations.
A declarative geo raster query language, [[
== References ==
Line 190 ⟶ 184:
== See also ==
[[Data Intensive Computing]]
<!--- Categories --->
{{Uncategorized|date=August 2012}}
[[Category:Articles created via the Article Wizard]]
|