Content deleted Content added
authorlinks |
m add link |
||
(5 intermediate revisions by 4 users not shown) | |||
Line 1:
{{short description|System that provides database services specifically for arrays}}
An '''
[[File:Euclidean neighborhood in n-D arrays.png|thumb|150px|alt=Euclidean neighborhood of elements in arrays|Euclidean neighborhood of elements in arrays]]
Line 18 ⟶ 19:
First significant work in going beyond BLOBs has been established with PICDMS.<ref>Chock, M., Cardenas, A., Klinger, A.: Database structure and manipulation capabilities of a picture database management system (PICDMS). IEEE ToPAMI, 6(4):484–492, 1984</ref> This system offers the precursor of a 2-D array query language, albeit still procedural and without suitable storage support.
A first declarative query language suitable for multiple dimensions and with an algebra-based semantics has been published by [[Peter Baumann (computer scientist)|Baumann]], together with a scalable architecture.<ref>Baumann, P.: [http://www.informatik.uni-trier.de/~ley/db/journals/vldb/vldb3.html#Baumann94 On the Management of Multidimensional Discrete Data]. VLDB Journal 4(3)1994, Special Issue on Spatial Database Systems, pp. 401–444</ref><ref>Baumann, P.: [http://www.informatik.uni-trier.de/~ley/db/conf/ngits/ngits99.html#Baumann99 A Database Array Algebra for Spatio-Temporal Data and Beyond]. Proc.
In terms of Array DBMS implementations, the [[rasdaman]] system has the longest implementation track record of n-D arrays with full query support. [[Oracle Spatial|Oracle GeoRaster]] offers chunked storage of 2-D raster maps, albeit without SQL integration. [[TerraLib]] is an open-source GIS software that extends object-relational DBMS technology to handle spatio-temporal data types; while main focus is on vector data, there is also some support for rasters. Starting with version 2.0, [[Postgis|PostGIS]] embeds raster support for 2-D rasters; a special function offers declarative raster query functionality. [[SciQL]] is an array query language being added to the [[MonetDB]] DBMS. [[Michael Stonebraker#SciDB|SciDB]] is a more recent initiative to establish array database support. Like SciQL, arrays are seen as an equivalent to tables, rather than a new attribute type as in rasdaman and PostGIS.
Line 30 ⟶ 31:
=== Conceptual modeling ===
Formally, an array ''A'' is given by a (total or partial) function ''A'': ''X'' → ''V'' where ''X'', the ''___domain'' is a ''d''-dimensional integer interval for some {{math|''d'' > 0}} and ''V'', called ''range'', is some (non-empty) value set; in set notation, this can be rewritten as {{math|{{mset| (''p'',''v'') | ''p''
Following established database practice, an array query language should be [[Declarative programming|declarative]] and safe in evaluation.
Line 39 ⟶ 40:
The '''marray''' operator creates an array over some given ___domain extent and initializes its cells:
<syntaxhighlight lang="
marray index-range-specification
values cell-value-expression
Line 46 ⟶ 47:
where ''index-range-specification'' defines the result ___domain and binds an iteration variable to it, without specifying iteration sequence. The ''cell-value-expression'' is evaluated at each ___location of the ___domain.
'''Example:'''
<syntaxhighlight lang="
marray p in [10:20,40:50]
values A[p]
Line 53 ⟶ 54:
This special case, pure subsetting, can be abbreviated as
<syntaxhighlight lang="
A[10:20,40:50]
</syntaxhighlight>
This subsetting keeps the dimension of the array; to reduce dimension by extracting slices, a single slicepoint value is indicated in the slicing dimension.
'''Example:'''
<syntaxhighlight lang="
A[*:*,*:*,100]
</syntaxhighlight>
Line 67 ⟶ 68:
The above examples have simply copied the original values; instead, these values may be manipulated.
'''Example:'''
<syntaxhighlight lang="
marray p in ___domain(A)
values log( A[p] )
Line 74 ⟶ 75:
This can be abbreviated as:
<syntaxhighlight lang="
log( A )
</syntaxhighlight>
Line 81 ⟶ 82:
The '''condense''' operator aggregates cell values into one scalar result, similar to SQL aggregates. Its application has the general form:
<syntaxhighlight lang="
condense condense-op
over index-range-specification
Line 90 ⟶ 91:
'''Example:''' "The sum over all values in A."
<syntaxhighlight lang="
condense +
over p in sdom(A)
Line 97 ⟶ 98:
A shorthand for this operation is:
<syntaxhighlight lang="
add_cells( A )
</syntaxhighlight>
Line 106 ⟶ 107:
'''Example:''' "A histogram over 8-bit greyscale image A."
<syntaxhighlight lang="
marray bucket in [0:255]
values count_cells( A = bucket )
Line 133 ⟶ 134:
The following are representative domains in which large-scale multi-dimensional array data are handled:
*Earth sciences: geodesy / mapping, [[remote sensing]], geology, oceanography, hydrology, atmospheric sciences, cryospheric sciences
*Space sciences: Planetary sciences, astrophysics (optical and radio telescope observations, cosmological simulations)
*Life sciences: gene data, confocal microscopy, CAT scans
|