In-database processing: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: title. | Use this bot. Report bugs. | Suggested by BrownHairedGirl | #UCB_webform 2380/3834
Bender the Bot (talk | contribs)
m Related Technologies: HTTP to HTTPS for Blogspot
 
(3 intermediate revisions by 2 users not shown)
Line 19:
 
===Loading C or C++ libraries into the database process space===
With C or C++ UDF libraries that run in process, the functions are typically registered as built-in functions within the database server and called like any other built-in function in a SQL statement. Running in process allows the function to have full access to the database server’sserver's memory, parallelism and processing management capabilities. Because of this, the functions must be well-behaved so as not to negatively impact the database or the engine. This type of UDF gives the highest performance out of any method for OLAP, mathematical, statistical, univariate distributions and data mining algorithms.
 
===Out-of-process===
Out-of-process UDFs are typically written in C, C++ or Java. By running out of process, they do not run the same risk to the database or the engine as they run in their own process space with their own resources. Here, they wouldn’twouldn't be expected to have the same performance as an in-process UDF. They are still typically registered in the database engine and called through standard SQL, usually in a stored procedure. Out-of-process UDFs are a safe way to extend the capabilities of a database server and are an ideal way to add custom data mining libraries.
 
==Uses==
In-database processing makes data analysis more accessible and relevant for high-throughput, real-time applications including fraud detection, credit scoring, risk management, transaction processing, pricing and margin analysis, usage-based micro-segmenting, behavioral ad targeting and recommendation engines, such as those used by customer service organizations to determine next-best actions.<ref name=Kobelius>{{citation|last=Kobelius|first=James|title=The Power of Predictions: Case Studies in CRM Next Best Action|url=http://www.forrester.com/The+Power+Of+Predictions/fulltext/-/E-RES60094|publisher=Forrester|date=June 22, 2011|access-date=May 15, 2012|archive-date=April 13, 2012|archive-url=https://web.archive.org/web/20120413193606/http://www.forrester.com/The+Power+Of+Predictions/fulltext/-/E-RES60094|url-status=dead}}</ref>
 
==Vendors==
In-database processing is performed and promoted as a feature by many of the major data warehousing vendors, including [[Teradata]] (and [[Aster Data Systems]], which it acquired), IBM (with its [[Netezza]], PureData Systems, and [https://www.ibm.com/analytics/data-management/data-warehouse Db2 Warehouse] products), IEMC [[Greenplum]], [[Sybase]], [[ParAccel]], SAS, and [[EXASOL]]. Some of the products offered by these vendors, such as CWI's [[MonetDB]] or IBM's Db2 Warehouse, offer users the means to write their own functions (UDFs) or extensions (UDXs) to enhance the products' capabilities.<ref>{{cite web | url = https://www.monetdb.org/content/embedded-r-monetdb | title = Embedded R in MonetDB | date = 22 December 2014 | access-date = 22 December 2014 | archive-date = 13 November 2014 | archive-url = https://web.archive.org/web/20141113025427/https://www.monetdb.org/content/embedded-r-monetdb | url-status = dead }}</ref> [[Fuzzy Logix]] offers libraries of in-database models used for mathematical, statistical, data mining, simulation, and classification modelling, as well as financial models for equity, fixed income, interest rate, and portfolio optimization. [http://in-database.com In-DataBase Pioneers] collaborates with marketing and IT teams to institutionalize data mining and analytic processes inside the data warehouse for fast, reliable, and customizable consumer-behavior and predictive analytics.
 
==Related Technologies==
In-database processing is one of several technologies focused on improving data warehousing performance. Others include [[parallel computing]], shared everything architectures, [[shared nothing architecture]]s and [[massive parallel processing]]. It is an important step towards improving [[predictive analytics]] capabilities.<ref name="TimManns">httphttps://timmanns.blogspot.com/2009/01/isnt-in-database-processing-old-news.html "Isn't In-database processing old news yet?," "Blog by Tim Manns (Data Mining Blog)," January 8, 2009</ref>
 
==External links==