Array DBMS: Difference between revisions

Content deleted Content added
m Array querying: lang="text"
m add link
 
(3 intermediate revisions by 3 users not shown)
Line 19:
First significant work in going beyond BLOBs has been established with PICDMS.<ref>Chock, M., Cardenas, A., Klinger, A.: Database structure and manipulation capabilities of a picture database management system (PICDMS). IEEE ToPAMI, 6(4):484–492, 1984</ref> This system offers the precursor of a 2-D array query language, albeit still procedural and without suitable storage support.
 
A first declarative query language suitable for multiple dimensions and with an algebra-based semantics has been published by [[Peter Baumann (computer scientist)|Baumann]], together with a scalable architecture.<ref>Baumann, P.: [http://www.informatik.uni-trier.de/~ley/db/journals/vldb/vldb3.html#Baumann94 On the Management of Multidimensional Discrete Data]. VLDB Journal 4(3)1994, Special Issue on Spatial Database Systems, pp. 401–444</ref><ref>Baumann, P.: [http://www.informatik.uni-trier.de/~ley/db/conf/ngits/ngits99.html#Baumann99 A Database Array Algebra for Spatio-Temporal Data and Beyond]. Proc. NGITS’99NGITS'99, LNCS 1649, Springer 1999, pp.76-93</ref> Another array database language, constrained to 2-D, has been presented by Marathe and Salem.<ref>Marathe, A., Salem, K.: A language for manipulating arrays. Proc. VLDB’97VLDB'97, Athens, Greece, August 1997, pp. 46–55</ref> Seminal theoretical work has been accomplished by Libkin et al.;<ref>Libkin, L., Machlin, R., Wong, L.: A query language for multidimensional arrays: design, implementation and optimization techniques. Proc. ACM SIGMOD’96SIGMOD'96, Montreal, Canada, pp. 228–239</ref> in their model, called NCRA, they extend a nested relational calculus with multidimensional arrays; among the results are important contributions on array query complexity analysis. A map algebra, suitable for 2-D and 3-D spatial raster data, has been published by Mennis et al.<ref>Mennis, J., Viger, R., Tomlin, C.D.: Cubic Map Algebra Functions for Spatio-Temporal Analysis. Cartography and Geographic Information Science 32(1)2005, pp. 17–32</ref>
 
In terms of Array DBMS implementations, the [[rasdaman]] system has the longest implementation track record of n-D arrays with full query support. [[Oracle Spatial|Oracle GeoRaster]] offers chunked storage of 2-D raster maps, albeit without SQL integration. [[TerraLib]] is an open-source GIS software that extends object-relational DBMS technology to handle spatio-temporal data types; while main focus is on vector data, there is also some support for rasters. Starting with version 2.0, [[Postgis|PostGIS]] embeds raster support for 2-D rasters; a special function offers declarative raster query functionality. [[SciQL]] is an array query language being added to the [[MonetDB]] DBMS. [[Michael Stonebraker#SciDB|SciDB]] is a more recent initiative to establish array database support. Like SciQL, arrays are seen as an equivalent to tables, rather than a new attribute type as in rasdaman and PostGIS.
Line 31:
 
=== Conceptual modeling ===
Formally, an array ''A'' is given by a (total or partial) function ''A'': ''X'' → ''V'' where ''X'', the ''___domain'' is a ''d''-dimensional integer interval for some {{math|''d'' &gt; 0}} and ''V'', called ''range'', is some (non-empty) value set; in set notation, this can be rewritten as {{math|{{mset| (''p'',''v'') | ''p'' in ''X'', ''v'' in ''V'' }}}}. Each (''p'',''v'') in ''A'' denotes an array element or ''cell'', and following common notation we write ''A''[''p''] = ''v''. Examples for ''X'' include {0..767} × {0..1023} (for [[Xga#Extended Graphics Array|XGA]] sized images), examples for ''V'' include {0..255} for 8-bit [[greyscale imagesimage]]s and {0..255} × {0..255} × {0..255} for standard [[RGB]] imagery.
 
Following established database practice, an array query language should be [[Declarative programming|declarative]] and safe in evaluation.
Line 47:
where ''index-range-specification'' defines the result ___domain and binds an iteration variable to it, without specifying iteration sequence. The ''cell-value-expression'' is evaluated at each ___location of the ___domain.
 
'''Example:''' “A"A cutout of array A given by the corner points (10,20) and (40,50)."
<syntaxhighlight lang="text">
marray p in [10:20,40:50]
Line 59:
This subsetting keeps the dimension of the array; to reduce dimension by extracting slices, a single slicepoint value is indicated in the slicing dimension.
 
'''Example:''' “A"A slice through an x/y/t timeseries at position t=100, retrieving all available data in x and y."
<syntaxhighlight lang="text">
A[*:*,*:*,100]
Line 68:
The above examples have simply copied the original values; instead, these values may be manipulated.
 
'''Example:''' “Array"Array A, with a log() applied to each cell value."
<syntaxhighlight lang="text">
marray p in ___domain(A)
Line 134:
 
The following are representative domains in which large-scale multi-dimensional array data are handled:
*Earth sciences: geodesy / mapping, [[remote sensing]], geology, oceanography, hydrology, atmospheric sciences, cryospheric sciences
*Space sciences: Planetary sciences, astrophysics (optical and radio telescope observations, cosmological simulations)
*Life sciences: gene data, confocal microscopy, CAT scans