In-database processing: Difference between revisions

Content deleted Content added
m Emphasize the alternative (in-database analytics) name
Types: in either in → in either
Line 13:
 
==Types==
There are three main types of in-database processing: translating a model into SQL code, loading C or C++ libraries into the database process space as a built-in user-defined function (UDF), and out-of-process libraries typically written in C, C++ or JAVAJava and registering them in the database as a built-in UDFs in a SQL statement.
 
===Translating Modelsmodels into SQL Codecode===
In this type of in-database processing, a predictive model is converted from its source language into SQL that can run in the database usually in a stored procedure. Many analytic model-building tools have the ability to export their models in either in SQL or [[PMML]] (Predictive Modeling Markup Language). Once the SQL is loaded into a stored procedure, values can be passed in through parameters and the model is executed natively in the database. Tools that can use this approach include SAS, R and KXEN.
 
===Loading C or C++ Librarieslibraries into the database process space===
With C or C++ UDF libraries that run in process, the functions are typically registered as built-in functions within the database server and called like any other built-in function in a SQL statement. Running in process allows the function to have full access to the database server’s memory, parallelism and processing management capabilities. Because of this, the functions must be well-behaved so as not to negatively impact the database or the engine. This type of UDF gives the highest performance out of any method for OLAP, mathematical, statistical, univariate distributions and data mining algorithms.
 
===Out-of-Processprocess===
Out-of-Processprocess UDFs are typically written in C, C++ or JAVAJava. By running out of process, they do not run the same risk to the database or the engine as they run in their own process space with their own resources. Here, they wouldn’t be expected to have the same performance as an in-process UDF. They are still typically registered in the database engine and called through standard SQL, usually in a stored procedure. Out-of-process UDFs are a safe way to extend the capabilities of a database server and are an ideal way to add custom data mining libraries.
 
==Uses==