Array DBMS: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:05, 22 January 2022 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,702 edits authorlinks ← Previous edit		Latest revision as of 16:07, 16 June 2025 edit undo Kvng (talk \| contribs) Extended confirmed users, New page reviewers 116,036 edits m add link
(5 intermediate revisions by 4 users not shown)
Line 1: {{short description\|System that provides database services specifically for arrays}} An '''~~Array~~array database management ~~systems~~system''' (or '''array ~~DBMSs~~DBMS''') ~~provide~~provides [[Database management system\|database]] services specifically for [[array data structure\|array]]s (also called [[Raster graphics\|raster data]]), that is: homogeneous collections of data items (often called [[pixel]]s, [[voxel]]s, etc.), sitting on a regular grid of one, two, or more dimensions. Often arrays are used to represent sensor, simulation, image, or statistics data. Such arrays tend to be [[Big data\|Big Data]], with single objects frequently ranging into Terabyte and soon Petabyte sizes; for example, today's earth and space observation archives typically grow by Terabytes a day. Array databases aim at offering flexible, scalable storage and retrieval on this information category. [[File:Euclidean neighborhood in n-D arrays.png\|thumb\|150px\|alt=Euclidean neighborhood of elements in arrays\|Euclidean neighborhood of elements in arrays]] Line 18 ⟶ 19: First significant work in going beyond BLOBs has been established with PICDMS.<ref>Chock, M., Cardenas, A., Klinger, A.: Database structure and manipulation capabilities of a picture database management system (PICDMS). IEEE ToPAMI, 6(4):484–492, 1984</ref> This system offers the precursor of a 2-D array query language, albeit still procedural and without suitable storage support. A first declarative query language suitable for multiple dimensions and with an algebra-based semantics has been published by [[Peter Baumann (computer scientist)\|Baumann]], together with a scalable architecture.<ref>Baumann, P.: [http://www.informatik.uni-trier.de/~ley/db/journals/vldb/vldb3.html#Baumann94 On the Management of Multidimensional Discrete Data]. VLDB Journal 4(3)1994, Special Issue on Spatial Database Systems, pp. 401–444</ref><ref>Baumann, P.: [http://www.informatik.uni-trier.de/~ley/db/conf/ngits/ngits99.html#Baumann99 A Database Array Algebra for Spatio-Temporal Data and Beyond]. Proc. ~~NGITS’99~~NGITS'99, LNCS 1649, Springer 1999, pp.76-93</ref> Another array database language, constrained to 2-D, has been presented by Marathe and Salem.<ref>Marathe, A., Salem, K.: A language for manipulating arrays. Proc. ~~VLDB’97~~VLDB'97, Athens, Greece, August 1997, pp. 46–55</ref> Seminal theoretical work has been accomplished by Libkin et al.;<ref>Libkin, L., Machlin, R., Wong, L.: A query language for multidimensional arrays: design, implementation and optimization techniques. Proc. ACM ~~SIGMOD’96~~SIGMOD'96, Montreal, Canada, pp. 228–239</ref> in their model, called NCRA, they extend a nested relational calculus with multidimensional arrays; among the results are important contributions on array query complexity analysis. A map algebra, suitable for 2-D and 3-D spatial raster data, has been published by Mennis et al.<ref>Mennis, J., Viger, R., Tomlin, C.D.: Cubic Map Algebra Functions for Spatio-Temporal Analysis. Cartography and Geographic Information Science 32(1)2005, pp. 17–32</ref> In terms of Array DBMS implementations, the [[rasdaman]] system has the longest implementation track record of n-D arrays with full query support. [[Oracle Spatial\|Oracle GeoRaster]] offers chunked storage of 2-D raster maps, albeit without SQL integration. [[TerraLib]] is an open-source GIS software that extends object-relational DBMS technology to handle spatio-temporal data types; while main focus is on vector data, there is also some support for rasters. Starting with version 2.0, [[Postgis\|PostGIS]] embeds raster support for 2-D rasters; a special function offers declarative raster query functionality. [[SciQL]] is an array query language being added to the [[MonetDB]] DBMS. [[Michael Stonebraker#SciDB\|SciDB]] is a more recent initiative to establish array database support. Like SciQL, arrays are seen as an equivalent to tables, rather than a new attribute type as in rasdaman and PostGIS. Line 30 ⟶ 31: === Conceptual modeling === Formally, an array ''A'' is given by a (total or partial) function ''A'': ''X'' → ''V'' where ''X'', the ''___domain'' is a ''d''-dimensional integer interval for some {{math\|''d'' > 0}} and ''V'', called ''range'', is some (non-empty) value set; in set notation, this can be rewritten as {{math\|{{mset\| (''p'',''v'') \| ''p'' in∈ ''X'', ''v'' in∈ ''V'' }}}}. Each (''p'',''v'') in ''A'' denotes an array element or ''cell'', and following common notation we write ''A''[''p''] = ''v''. Examples for ''X'' include {0..767} × {0..1023} (for [[Xga#Extended Graphics Array\|XGA]] sized images), examples for ''V'' include {0..255} for 8-bit [[greyscale ~~images~~image]]s and {0..255} × {0..255} × {0..255} for standard [[RGB]] imagery. Following established database practice, an array query language should be [[Declarative programming\|declarative]] and safe in evaluation. Line 39 ⟶ 40: The '''marray''' operator creates an array over some given ___domain extent and initializes its cells: <syntaxhighlight lang="~~sql~~text"> marray index-range-specification values cell-value-expression Line 46 ⟶ 47: where ''index-range-specification'' defines the result ___domain and binds an iteration variable to it, without specifying iteration sequence. The ''cell-value-expression'' is evaluated at each ___location of the ___domain. '''Example:''' “A"A cutout of array A given by the corner points (10,20) and (40,50).”" <syntaxhighlight lang="~~sql~~text"> marray p in [10:20,40:50] values A[p] Line 53 ⟶ 54: This special case, pure subsetting, can be abbreviated as <syntaxhighlight lang="~~sql~~text"> A[10:20,40:50] </syntaxhighlight> This subsetting keeps the dimension of the array; to reduce dimension by extracting slices, a single slicepoint value is indicated in the slicing dimension. '''Example:''' “A"A slice through an x/y/t timeseries at position t=100, retrieving all available data in x and y.”" <syntaxhighlight lang="~~sql~~text"> A[:,:,100] </syntaxhighlight> Line 67 ⟶ 68: The above examples have simply copied the original values; instead, these values may be manipulated. '''Example:''' ~~“Array~~"Array A, with a log() applied to each cell value.”" <syntaxhighlight lang="~~sql~~text"> marray p in ___domain(A) values log( A[p] ) Line 74 ⟶ 75: This can be abbreviated as: <syntaxhighlight lang="~~sql~~text"> log( A ) </syntaxhighlight> Line 81 ⟶ 82: The '''condense''' operator aggregates cell values into one scalar result, similar to SQL aggregates. Its application has the general form: <syntaxhighlight lang="~~sql~~text"> condense condense-op over index-range-specification Line 90 ⟶ 91: '''Example:''' "The sum over all values in A." <syntaxhighlight lang="~~sql~~text"> condense + over p in sdom(A) Line 97 ⟶ 98: A shorthand for this operation is: <syntaxhighlight lang="~~sql~~text"> add_cells( A ) </syntaxhighlight> Line 106 ⟶ 107: '''Example:''' "A histogram over 8-bit greyscale image A." <syntaxhighlight lang="~~sql~~text"> marray bucket in [0:255] values count_cells( A = bucket ) Line 133 ⟶ 134: The following are representative domains in which large-scale multi-dimensional array data are handled: Earth sciences: geodesy / mapping, [[remote sensing]], geology, oceanography, hydrology, atmospheric sciences, cryospheric sciences Space sciences: Planetary sciences, astrophysics (optical and radio telescope observations, cosmological simulations) *Life sciences: gene data, confocal microscopy, CAT scans