Array DBMS: Difference between revisions

Content deleted Content added
clean up
FrescoBot (talk | contribs)
m Bot: link syntax/spacing and minor changes
Line 21:
Like with, e.g., [[SQL]], expressions of arbitrary complexity can be built on top of a set of core array operations.
Due to the extensions made in the data and query model, Array DBMSs sometimes are subsumed under the [[NoSQL]] category, in the sense of "not only SQL".
Query [[Query_optimization|optimization]] and [[Parallel_computing|parallelization]] are important for achieving [[Scalability|scalability]]; actually, many array operators lend themselves well towards parallel evaluation, by processing each tile on separate nodes or cores.
 
Important application domains of Array DBMSs include Earth, Space, Life, and Social sciences, as well as the related commercial applications (such as [[Oil_exploration|hydrocarbon exploration]] in industry and [[OLAP]] in business).
Line 39:
A map algebra, suitable for 2-D and 3-D spatial raster data, has been published by Mennis et al<ref>Mennis, J., Viger, R., Tomlin, C.D.: Cubic Map Algebra Functions for Spatio-Temporal Analysis. Cartography and Geographic Information Science 32(1)2005, pp. 17 - 32</ref>.
 
In terms of Array DBMS implementations, the [[Rasdaman|rasdaman]] system has the longest implementation track record of n-D arrays with full query support.
[[Oracle_Spatial|Oracle GeoRaster]] offers chunked storage of 2-D raster maps, albeit without SQL integration.
[[TerraLib|TerraLib]] is an open-source GIS software that extends object-relational DBMS technology to handle spatio-temporal data types; while main focus is on vector data, there is also some support for rasters.
Starting with version 2.0, [[Postgis|PostGIS]] embeds raster support for 2-D rasters; a special function offers declarative raster query functionality.
SciQL is an array query language being added to the [[MonetDB]] DBMS. [[Michael_Stonebraker#SciDB|SciDB]] is a more recent initiative to establish array database support. Like SciQL, arrays are seen as an equivalent to tables, rather than a new attribute type as in rasdaman and PostGIS.
Line 48:
As this technique does not scale in density, standard databases are not used today for dense data, like satellite images, where most cells carry meaningful information; rather, proprietary ad-hoc implementations prevail in scientific data management and similar situations. Hence, this is where Array DBMSs can make a particular contribution.
 
Generally, Array DBMSs are an emerging technology. While operationally deployed systems exist, like [[Oracle_Spatial|Oracle GeoRaster]], [[Postgis|PostGIS 2.0] and [[Rasdaman|rasdaman]], there are still many open research questions, including query language design and formalization, query optimization, parallelization and distributed processing, and scalability issues in general. Besides, scientific communities still appear reluctant in taking up array database technology and tend to favor specialized, proprietary technology.
 
== Concepts ==
Line 68:
=== Array Querying ===
 
As an example for array query operators the [[Rasdaman|rasdaman]] algebra and query language can serve, which establish an expression language over a minimal set of array primitives.
We begin with the generic core operators and then present common special cases and shorthands.
 
Line 157:
=== Query Processing ===
 
A tile-based storage structure suggests a tile-by-tile processing strategy (in [[Rasdaman|rasdaman]] called ''tile streaming''). A large class of practically relevant queries can be evaluated by loading tile after tile, thereby allowing servers to process arrays orders of magnitude beyoned their main memory.
 
[[File:Sample_heuristic_optimization_of_array_query.png|frame|x200px|alt=Sample rule for heuristic array query optimization|Sample rule for heuristic array query optimization]]
 
Due to the massive sizes of arrays in scientific/technical applications in combination with often complex queries, optimization plays a central role in making array queries efficient. Both hardware and software parallelization can be applied. An example for heuristic optimization is the rule "averaging over an array resulting from the cell-wise addition of two input images is equivalent to adding the averages of each input array". By replacing the left-hand variant by the right-hand expression, costs shrink from three (costly) array traversals to two array traversals plus one (cheap) scalar operation (see Figure, which uses the [[Rasdaman|rasdaman]] query language introduced before).
 
== Application Domains ==
In many - if not most - cases where some phenomenon is sampled or simulated the result is a rasterized data set which can conveniently be stored, retrieved, and forwarded as an array. Typically, the array data are ornamented with metadata describing them further; for example, geographically referenced imagery will carry its geographic position and the coordinate reference system in which it is expressed.
 
Line 182:
A de facto standard in the Earth Science communities is [[Opendap|OPeNDAP]], a data transport architecture and protocol. While this is not a database specification, it offers important components that characterize a database system, such as a conceptual model and client/server implementations.
 
A declarative geo raster query language, [[WCPS|Web Coverage Processing Service]] (WCPS), has been standardized by the [[Open_Geospatial_Consortium|Open Geospatial Consortium]] (OGC).