In-memory processing

Definition

In-memory processing is an emerging technology that enables users to have immediate access to information which results in more informed decisions. Traditional Business Intelligence (BI) technology loads data onto the computer's hard disk in the form of tables and multi-dimensional cubes against which queries are run. In-memory data is loaded into memory (RAM or flash) instead and therefore IT staff spend less development time on data modeling, query analysis, cube building and table design.^[1]

Traditional Business Intelligence technology

Historically, every computer has two types of data storage mechanisms: the hard disk and RAM. Modern computers have more available disk storage than RAM, but reading data from the disk is much slower (possibly hundreds of times) when compared to reading the same data from RAM. When using traditional disk technology (Relational Database Management Systems such as MySQL and Oracle), a query accesses information from multiple tables stored on the server’s hard disk. RDMS are designed around using transactional processing. Having a database that supports insertions and updates as well as performs aggregations and joins is not possible for these traditional systems. Also, SQL is designed to more efficiently fetch rows of data than BI queries, which usually involve fetching of partial rows of data involving heavy calculations.

Though SQL is a very powerful tool for running complex queries, it takes a long time to execute and often results in bringing down transactional processing. To improve query performance, multidimensional databases or cubes (also called multidimensional online analytical processing) were formed. Designing a cube involved an elaborate and lengthy process and it was cumbersome to change its structure to adapt to dynamically changing business needs. Cubes are pre-populated with data to answer specific queries and although they increased performance, they still failed to answer ad hoc queries.^[2]

Disadvantages of traditional Business Intelligence

To avoid performance issues and provide faster query processing when dealing with large volumes of data, organizations needed optimized database methods like creating indexes, and using specialized data structures and aggregate tables. The point of having a data warehouse is to be able to get results for any queries asked at any time. However, in order to achieve better response times, many data marts are designed to precalculate summaries and answer specific queries, which defeats the purpose of a data warehouse. Optimized aggregation algorithms needed to be used to increase performance. Traditional BI tools were neither able to keep up with BI requirements, nor were they able to deliver real-time data to the end user.^[3]

Detailed description of in-memory processing

The arrival of column centric databases, which store similar information together, allowed data to be stored more efficiently and compressed to a higher degree. This allows the storage of huge amounts of data in the same physical space, which in turn reduces the amount of memory needed to perform a query and increases the processing speed. With in-memory databases, all information is initially loaded into memory, thereby eliminating the need for database-optimizing techniques like creating indexes, aggregates, cubes and star schemas.

Most in-memory tools use compression algorithms to reduce the size of in-memory data. Users query the data loaded into the system’s memory, thereby avoiding database access, which is slower and has performance bottlenecks. This is different from caching in that caches are subsets of very specific pre-defined organized data. With in-memory tools, data can be as large as data mart or small data warehouse which is stored entirely in memory. This can be accessed within seconds by multiple concurrent users at a detailed level. Theoretically, the improvement in data access is 10,000 to 1,000,000 times faster than if the data was stored on a hard disk. It also minimizes the need for performance tuning and provides faster service to end users.

Factors driving in-memory products

Cheaper and higher performing hardware: According to Moore’s law the computing power doubles every two to three years while decreasing in costs. CPU processing, memory and disk storage are all subject to some variation of this law. Also hardware innovations like multi-core architecture, NAND flash memory, parallel servers, increased memory processing capability, etc. and software innovations like column centric databases, compression techniques and handling aggregate tables, etc. have all contributed to the demand of In-memory products.^[4]

64-bit operating systems: Due to affordable 64-bit processors and the declining prices of memory chips, in-memory technology is being used more widely. The operating systems allow access to more RAM (up to 100GB or more) than 32-bit systems (2 or 4 GB). By providing terabytes of space available for storage and analysis, 64-bit operating systems make in-memory processing scalable. Flash memory also enables systems to scale to many terabytes more economically.

Data Volumes: As the data used by organizations grew traditional data warehouses just couldn’t deliver a timely, accurate and real time data. The ETL process that periodically updates data warehouses with operational data can take anywhere from a few hours to weeks to complete. So at any given point of time data is at least a day old. In-memory processing makes easy to have instant access to terabytes of data for real time reporting.

Reduced Costs: In-memory processing comes at a lower cost and can be easily deployed and maintained when compared to traditional BI tools. According to Gartner survey deploying traditional BI tools can take as long as 17 months. Many data warehouse vendors are choosing In-memory technology over traditional BI to speed up implementation times.

Advantages of In-memory BI

Several in-memory vendors provide ability to connect to existing data sources and access to visually rich interactive dashboards. This allows business analysts and end users to create custom reports and queries without much training or expertise. Easy navigation and ability to modify queries on the fly is an appealing factor to many users. Since these dashboards can be populated with fresh data, it allows users to have access to real time data and create reports within minutes, which is a critical factor in any business intelligence application.

With In-memory processing the source database is queried only once instead of accessing the database every time a query is run thereby eliminating repetitive processing and reducing the burden on database servers. By scheduling to populate In-memory database overnight the database servers can be used for operational purposes during peak hours.

In-memory processing can be a blessing in disguise for operational workers such as call center representatives or warehouse managers who need instant and accurate data to make fast decisions.^[5]

Disadvantages of In-memory BI

In any typical BI solution a large number of users need to have access to data. With increase in number of users and data volumes the amount of RAM needed also increases which in turn affects the hardware costs. Many users and software vendors have integrated flash memory into their systems to allow systems to scale to larger datasets more economically. Oracle has been integrating flash memory into the Oracle Exadata products for increased performance. Microsoft SQL Server 2012 BI/Data Warehousing software has been coupled with Violin Memory flash memory arrays to enable in-memory processing of datasets greater than 20TB.^[6]

Who is it for?

While In-memory processing has a great potential for end users it is not the answer to everyone. Important question organizations need to ask is if slower query response times are preventing users from making important decisions. If company is a slow moving business where things don’t change often then in-memory solution is not effective. Organizations where there is a significant growth in data volume and increase in demand for reporting functionalities that facilitate new opportunities would be a right scenario to deploy in-memory BI.

Security needs to be the first and foremost concern when deploying In-memory tools as they expose huge amounts of data to end users. Care should be taken as to who has access to the data, how and where data is stored. End users download huge amounts of data onto their desktops and there is danger of data getting compromised. It could get lost or stolen. Measures should be taken to provide access to the data only to authorized users.^[7]

References

^ Earls, A (2011). Tips on evaluating, deploying and managing in-memory analytics tools (PDF). Tableau.
^ Gill, John (2007). "Shifting the BI Paradigm with In-Memory Database Technologies". Business Intelligence Journal. 12 (2): 58–62. {{cite journal}}: Unknown parameter |month= ignored (help)
^ "In_memory Analytics". yellowfin. p. 6.
^ Kote, Sparjan. "In-memory computing in Business Intelligence".
^ "In_memory Analytics". yellowfin. p. 9.
^ "SQL Server 2012 with Violin Memory" (PDF). Microsoft.
^ "In_memory Analytics". yellowfin. p. 12.

[1] Earls, A (2011). Tips on evaluating, deploying and managing in-memory analytics tools (PDF). Tableau.

[2] Gill, John (2007). "Shifting the BI Paradigm with In-Memory Database Technologies". Business Intelligence Journal. 12 (2): 58–62. {{cite journal}}: Unknown parameter |month= ignored (help)

[3] "In_memory Analytics". yellowfin. p. 6.

[4] Kote, Sparjan. "In-memory computing in Business Intelligence".

[5] "In_memory Analytics". yellowfin. p. 9.

[6] "SQL Server 2012 with Violin Memory" (PDF). Microsoft.

[7] "In_memory Analytics". yellowfin. p. 12.

[1]

[2]

[3]

[4]

[5]

[6]

[7]