In-memory processing: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Added bibcode. Removed URL that duplicated identifier. Removed access-date with no URL. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 290/990
 
(143 intermediate revisions by 80 users not shown)
Line 1:
{{Short description|Processing data technology}}
{{multiple issues| lead missing=December 2011| wikify=December 2011| orphan=December 2011}}
{{advert|date=November 2018}}
 
The term is used for two different things:
== Definition ==
With businesses demanding faster and easy access to information in order to make reliable and smart decisions, In-memory processing is an emerging technology that is gaining attention. It enables users to have immediate access to right information which results in more informed decisions. Traditional [[Business Intelligence]] (BI) technology loads data onto the disk in the form of tables and multi-dimensional cubes against which queries are run. Using In-memory data is loaded into [[Random Access Memory]] (RAM) instead of hard disks and therefore information technology (IT) staff spends less development time on [[data modeling]], query analysis, cube building and table design.<ref>{{cite book|last=Earls|first=A|title=Tips on evaluating, deploying and managing in-memory analytics tools|year=2011|publisher=Tableau|url=http://www.analyticsearches.com/site/files/776/66977/259607/579091/In-Memory_Analytics_11.10.11.pdf}}</ref>
 
# In [[computer science]], '''in-memory processing''', also called '''compute-in-memory''' (CIM), or '''processing-in-memory''' (PIM), is a [[computer architecture]] in which data operations are available directly on the data memory, rather than having to be transferred to [[Central processing unit|CPU]] registers first.<ref>{{Cite journal |last=Ghose |first=S. |date=November 2019 |title=Processing-in-memory: A workload-driven perspective |url=https://www.pdl.cmu.edu/PDL-FTP/associated/19ibmjrd_pim.pdf |journal=IBM Journal of Research and Development |volume=63 |issue=6 |pages=3:1–19|doi=10.1147/JRD.2019.2934048 |s2cid=202025511 }}</ref> This may improve the [[Electric power|power usage]] and [[Computer performance|performance]] of moving data between the processor and the main memory.
== Traditional BI ==
# In [[software engineering]], '''in-memory processing''' is a [[software architecture]] where a database is kept entirely in [[random-access memory]] (RAM) or [[flash memory]] so that usual accesses, in particular read or query operations, do not require access to [[disk storage]].<ref>{{cite journal|first=Hao|last=Zhang|author2=Gang Chen|author3=Beng Chin Ooi|author4=Kian-Lee Tan|author5=Meihui Zhang|title=In-Memory Big Data Management and Processing: A Survey|journal=IEEE Transactions on Knowledge and Data Engineering|date=July 2015|volume=27|issue=7|pages=1920–1948|doi=10.1109/TKDE.2015.2427795|bibcode=2015ITKDE..27.1920Z |doi-access=free}}</ref> This may allow faster data operations such as "joins", and faster reporting and decision-making in business.<ref>{{cite book|last1=Plattner|first1=Hasso|last2=Zeier|first2=Alexander|title=In-Memory Data Management: Technology and Applications|date=2012|publisher=Springer Science & Business Media|isbn=9783642295744|url=https://books.google.com/books?id=HySCgzCApsEC&q=%22in-memory%22|language=en}}</ref>
Every computer has two types of data storage mechanisms – disk (hard disk) and RAM (Random Access Memory). Modern computers have more available disk storage than RAM but reading data from the disk is much slower (possibly hundreds of times) when compared to reading the same data from RAM. Especially when analyzing large volumes of data, performance is severely degraded. Using traditional disk based technology the query accesses information from multiple tables stored on a server’s hard disk. Traditional disk based technologies means [[Relational Database Management Systems]] such as SQL Server, MySQL, [[Oracle]] and many others. RDMS are designed keeping transactional processing in mind. Having a database that supports both insertions, updates as well as performing aggregations, joins (typical in BI solutions) is not possible. Also the structured query language ([[SQL]]) is designed to efficiently fetch rows of data while BI queries usually involve fetching of partial rows of data involving heavy calculations.
 
Extremely large datasets may be divided between co-operating systems as in-memory [[data grid]]s.
Though SQL is a very powerful tool running complex queries took very long time to execute and often resulted in bringing down transactional processing. To improve query performance multidimensional databases or cubes also called multidimensional online analytical processing (MOLAP) were formed. Designing a cube design involved an elaborate and lengthy process which took a significant amount of time from IT staff. Changing the cubes structure to adapt to dynamically changing business needs was cumbersome. Cubes are pre populated with data to answer specific queries and although it increased performance it still failed to answer ad hoc queries.<ref>{{cite journal|last=Gill|first=John|title=Shifting the BI Paradigm with In-Memory Database Technologies|journal=Business Intelligence Journal|year=2007|month=Second Quarter|volume=12|issue=2|pages=58–62|url=http://www.highbeam.com/doc/1P3-1636785121.html}}</ref>
 
==Hardware (PIM)==
== Disadvantages of traditional BI ==
PIM could be implemented by:<ref>{{cite web |title=Processing-in-Memory Course: Lecture 1: Exploring the PIM Paradigm for Future Systems - Spring 2022 | website=[[YouTube]] | date=10 March 2022 |url=https://www.youtube.com/watch?v=R-sEqnOmDT4 |language=en}}</ref>
To avoid performance issues and provide faster query processing when dealing with large volumes of data, organizations needed optimized database methods like creating [[index (database)|index]]es, use specialized data structures and aggregate tables.
The point of having a data warehouse is to be able to get results for any queries asked at any time. But in order to achieve better response time for users many data marts are designed to pre calculate summaries and answer specific queries defeating the purpose of a data warehouse. Optimized aggregation algorithms needed to be used to increase performance.
Traditional BI tools couldn’t keep up with the ever growing BI requirements and were unable to deliver real time data for end users.<ref>{{cite web|title=In_memory Analytics|url=http://www.yellowfinbi.com/Document.i4?DocumentId=104879|publisher=yellowfin|pages=6}}</ref>
 
* Processing-using-Memory (PuM)
== How does In-memory processing Work? ==
** Adding limited processing capability (e.g., floating point multiplication units, 4K row operations such as copy or zero, bitwise operations on two rows) to conventional memory modules (e.g., DIMM modules); or
The arrival of column centric databases which stored similar information together allowed storing data more efficiently and with greater compression. This in turn allowed to store huge amounts of data in the same physical space which in turn reduced the amount memory needed to perform a query and increased the processing speed. With in-memory database, all information is initially loaded into memory. It eliminates the need for optimizing database like creating indexes, aggregates and designing of cubes and star schemas.
** Adding processing capability to memory controllers so that the data that is accessed does not need to be forwarded to the CPU or affect the CPU' cache, but is dealt with immediately.
 
* Processing-near-Memory (PnM)
Most in-memory tools use compression algorithms which reduce the size of in-memory data than what would be needed for hard disks. Users query the data loaded into the system’s memory thereby avoiding slower database access and performance bottlenecks. This is different from caching, a very widely used method to speed up query performance, in that caches are subsets of very specific pre-defined organized data. With in-memory tools, data available for analysis can be as large as data mart or small data warehouse which is entirely in memory. This can be accessed within seconds by multiple concurrent users at a detailed level and offers the potential for excellent analytics. Theoretically the improvement in data access is 10,000 to 1,000,000 times faster than from disk. It also minimizes the need for performance tuning by IT staff and provides faster service for end users.
** New 3D arrangements of silicon with memory layers and processing layers.
=== Application of in-memory technology in everyday life ===
In-memory processing techniques are frequently used by modern smartphones and tablets to improve application performance. This can result in speedier app loading times and more enjoyable user experiences.
 
* In-memory processing may be used by gaming consoles such as the [[PlayStation]] and [[Xbox]] to improve game speed.<ref>{{Cite web |last=Park |first=Kate |date=2023-07-27 |title=Samsung extends cut in memory chip production, will focus on high-end AI chips instead |url=https://techcrunch.com/2023/07/27/samsung-extends-cut-in-memory-chip-production-will-focus-on-high-end-ai-chips-instead/ |access-date=2023-12-05 |website=TechCrunch |language=en-US}}</ref>{{Failed verification|date=July 2024}} Rapid data access is critical for providing a smooth game experience.
== Factors driving In-memory products ==
Cheaper and higher performing hardware: According to Moore’s law the computing power doubles every two to three years while decreasing in costs. CPU processing, RAM and disk storage are all subject to some variation of this law. Also hardware innovations like multi-core architecture, parallel servers, increased memory processing capability, etc. and software innovations like column centric databases, compression techniques and handling aggregate tables, etc. have all contributed to the demand of In-memory products.<ref>{{cite web|last=Kote|first=Sparjan|title=In-memory computing in Business Intelligence|url=http://www.infosysblogs.com/oracle/2011/03/in-memory_computing_in_busines.html}}</ref>
 
* Certain wearable devices, like smartwatches and fitness trackers, may incorporate in-memory processing to swiftly process sensor data and provide real-time feedback to users. Several commonplace gadgets use in-memory processing to improve performance and responsiveness.<ref>{{Cite journal |last1=Tan |first1=Kian-Lee |last2=Cai |first2=Qingchao |last3=Ooi |first3=Beng Chin |last4=Wong |first4=Weng-Fai |last5=Yao |first5=Chang |last6=Zhang |first6=Hao |date=2015-08-12 |title=In-memory Databases: Challenges and Opportunities From Software and Hardware Perspectives |url=https://doi.org/10.1145/2814710.2814717 |journal=ACM SIGMOD Record |volume=44 |issue=2 |pages=35–40 |doi=10.1145/2814710.2814717 |s2cid=14238437 |issn=0163-5808|url-access=subscription }}</ref>
64-bits operating system: Though the idea of In-memory technology is not new, it is only recently emerging thanks to the widely popular and affordable 64-bit processors and declining memory chips prices. [[64 bit]] operating systems allows access to far more RAM (up to 100GB or more) than the 2 or 4 GB accessible on 32-bit systems. By providing Terabytes (1 TB = 1,024 GB) of space available for storage and analysis, 64-bit operating systems make in-memory processing scalable.
 
* In-memory processing is used by smart TVs to enhance interface navigation and content delivery. It is used in digital cameras for real-time image processing, filtering, and effects.<ref>{{Cite book |doi=10.1109/ISCAS48785.2022.9937475 |s2cid=253462291 |chapter=Approximate In-Memory Computing using Memristive IMPLY Logic and its Application to Image Processing |title=2022 IEEE International Symposium on Circuits and Systems (ISCAS) |date=2022 |last1=Fatemieh |first1=Seyed Erfan |last2=Reshadinezhad |first2=Mohammad Reza |last3=Taherinejad |first3=Nima |pages=3115–3119 |isbn=978-1-6654-8485-5 }}</ref> Voice-activated assistants and other home automation systems may benefit from faster understanding and response to user orders.
Data Volumes: As the data used by organizations grew traditional data warehouses just couldn’t deliver a timely, accurate and real time data. The extract, transform, load ([[Extract, transform, load|ETL]]) process that periodically updates data warehouses with operational data can take anywhere from a few hours to weeks to complete. So at any given point of time data is at least a day old. In-memory processing makes easy to have instant access to terabytes of data for real time reporting.
 
* In-memory processing is also used by embedded systems in appliances and high-end digital cameras for efficient data handling. Through in-memory processing techniques, certain IoT devices prioritize fast data processing and response times.<ref>{{Cite web |title=What is processing in memory (PIM) and how does it work? |url=https://www.techtarget.com/searchbusinessanalytics/definition/processing-in-memory-PIM |access-date=2023-12-05 |website=Business Analytics |language=en}}</ref>
Reduced Costs: In-memory processing comes at a lower cost and can be easily deployed and maintained when compared to traditional BI tools. According to Gartner survey deploying traditional BI tools can take as long as 17 months. Many data warehouse vendors are choosing In-memory technology over traditional BI to speed up implementation times.
 
==Software==
== Advantages of In-memory BI ==
=== Disk-based data access ===
Several in-memory vendors provide ability to connect to existing data sources and access to visually rich interactive dashboards. This allows business analysts and end users to create custom reports and queries without much training or expertise. Easy navigation and ability to modify queries on the fly is an appealing factor to many users. Since these dashboards can be populated with fresh data, it allows users to have access to real time data and create reports within minutes, which is a critical factor in any business intelligence application.
 
====Data structures====
With In-memory processing the source database is queried only once instead of accessing the database every time a query is run thereby eliminating repetitive processing and reducing the burden on database servers. By scheduling to populate In-memory database overnight the database servers can be used for operational purposes during peak hours.
With disk-based technology, data is loaded on to the computer's [[hard disk]] in the form of multiple tables and multi-dimensional structures against which queries are run. Disk-based technologies are often [[relational database management system]]s (RDBMS), often based on the structured query language ([[SQL]]), such as [[Microsoft SQL Server|SQL Server]], [[MySQL]], [[Oracle database|Oracle]] and many others. RDBMS are designed for the requirements of [[Software transactional memory|transactional processing]]. Using a database that supports insertions and updates as well as performing aggregations, [[join (SQL)|join]]s (typical in BI solutions) are typically very slow. Another drawback is that SQL is designed to efficiently fetch rows of data, while BI queries usually involve fetching of partial rows of data involving heavy calculations.
 
Though SQL is a very powerful tool running complex queries took very long time to execute and often resulted in bringing down transactional processing. To improve query performance, multidimensional databases or cubes[[OLAP cube]]s - also called multidimensional online analytical processing (MOLAP) were- may be formedconstructed. Designing a cube designmay involvedbe an elaborate and lengthy process, whichand took a significant amount of time from IT staff. Changingchanging the cubescube's structure to adapt to dynamically changing business needs wasmay be cumbersome. Cubes are pre -populated with data to answer specific queries and although itthey increasedincrease performance, itthey are still failednot tooptimal answerfor adanswering all ad-hoc queries.<ref>{{cite journal|last=Gill|first=John|title=Shifting the BI Paradigm with In-Memory Database Technologies|journal=Business Intelligence Journal|year=2007|month=Second Quarter|volume=12|issue=2|pages=58–62|url=http://www.highbeam.com/doc/1P3-1636785121.html|archive-url=https://web.archive.org/web/20150924203158/http://www.highbeam.com/doc/1P3-1636785121.html|url-status=dead|archive-date=2015-09-24}}</ref>
In-memory processing can be a blessing in disguise for operational workers such as call center representatives or warehouse managers who need instant and accurate data to make fast decisions.<ref>{{cite web|title=In_memory Analytics|url=http://www.yellowfinbi.com/Document.i4?DocumentId=104879|publisher=yellowfin|pages=9}}</ref>
 
Information technology (IT) staff may spend substantial development time on optimizing databases, constructing [[index (database)|index]]es and [[aggregate (data warehouse)|aggregate]]s, designing cubes and [[star schema]]s, [[data modeling]], and query analysis.<ref>{{cite book|last=Earls|first=A|title=Tips on evaluating, deploying and managing in-memory analytics tools|year=2011|publisher=Tableau|url=http://www.analyticsearches.com/site/files/776/66977/259607/579091/In-Memory_Analytics_11.10.11.pdf |archiveurl=https://web.archive.org/web/20120425232535/http://www.analyticsearches.com/site/files/776/66977/259607/579091/In-Memory_Analytics_11.10.11.pdf |archivedate=2012-04-25}}</ref>
== Disadvantages of In-memory BI ==
In any typical BI solution a large number of users need to have access to data. With increase in number of users and data volumes the amount of RAM needed also increases which in turn affects the hardware costs.
 
====Processing speed====
== Who is it for? ==
Reading data from the hard disk is much slower (possibly hundreds of times) when compared to reading the same data from RAM. Especially when analyzing large volumes of data, performance is severely degraded. Though SQL is a very powerful tool, arbitrary complex queries with a disk-based implementation take a relatively long time to execute and often result in bringing down the performance of transactional processing. In order to obtain results within an acceptable response time, many [[data warehouse]]s have been designed to pre-calculate summaries and answer specific queries only. Optimized aggregation algorithms are needed to increase performance.
While In-memory processing has a great potential for end users it is not the answer to everyone. Important question organizations need to ask is if slower query response times are preventing users from making important decisions. If company is a slow moving business where things don’t change often then in-memory solution is not effective. Organizations where there is a significant growth in data volume and increase in demand for reporting functionalities that facilitate new opportunities would be a right scenario to deploy in-memory BI.
 
=== In-memory data access ===
Security needs to be the first and foremost concern when deploying In-memory tools as they expose huge amounts of data to end users. Care should be taken as to who has access to the data, how and where data is stored. End users download huge amounts of data onto their desktops and there is danger of data getting compromised. It could get lost or stolen. Measures should be taken to provide access to the data only to authorized users.<ref>{{cite web|title=In_memory Analytics|url=http://www.yellowfinbi.com/Document.i4?DocumentId=104879|publisher=yellowfin|pages=12}}</ref>
With both in-memory database and [[data grid]], all information is initially loaded into memory RAM or flash memory instead of [[hard disk]]s. With a [[data grid]] processing occurs at three [[orders of magnitude|order of magnitude]] faster than relational databases which have advanced functionality such as [[ACID]] which degrade performance in compensation for the additional functionality. The arrival of [[Column-oriented DBMS|column centric databases]], which store similar information together, allow data to be stored more efficiently and with greater [[Data compression|compression]] ratios. This allows huge amounts of data to be stored in the same physical space, reducing the amount of memory needed to perform a query and increasing processing speed. Many users and software vendors have integrated flash memory into their systems to allow systems to scale to larger data sets more economically.
Most in-memory tools use compression algorithms which reduce the size of in-memory data than what would be needed for hard disks. Users query the data loaded into the system’ssystem's memory, thereby avoiding slower database access and performance [[Bottleneck (software)|bottlenecks]]. This is differentdiffers from [[caching (computing)|caching]], a very widely used method to speed up query performance, in that caches are subsets of very specific pre-defined organized data. With in-memory tools, data available for analysis can be as large as a [[data mart]] or small data warehouse which is entirely in memory. This can be accessed within secondsquickly by multiple concurrent users or applications at a detailed level and offers the potential for excellentenhanced analytics and for scaling and increasing the speed of an application. Theoretically, the improvement in data access speed is 10,000 to 1,000,000 times fastercompared thanto fromthe disk.{{citation needed|date=January 2016}} It also minimizes the need for performance tuning by IT staff and provides faster service for end users.
 
==== InAdvantages of in-memory marketprocessing Vendorstechnology ====
TraditionalCertain BIdevelopments toolsin couldn’tcomputer keeptechnology upand withbusiness theneeds everhave growing BI requirements and were unabletended to deliverincrease realthe timerelative dataadvantages forof endin-memory userstechnology.<ref>{{cite web|title=In_memory Analytics|url=http://www.yellowfinbi.com/Document.i4?DocumentId=104879|publisher=yellowfin|pagespage=6}}</ref>
The idea of running memory based databases was first developed by [[QlikTech]] in 1997 with their business intelligence product QlikView. Since lower costs are one of the benefits of In-memory processing many organizations are looking to adopt this technology and many vendors have since added in-memory to their platforms. Biggies like SAP recently unveiled High-Performance Analytical Appliance (HANA) for in-memory computing, Oracle acquired [[TimesTen]], an in-memory relational database. IBM Cognos (formerly Applix TM1) offers financial application and have many financial institutions as customers. Products such as Spotfire acquired by TIBCO, IBM SolidDB are already popular and have made their mark.<ref>{{cite journal|last=Henschen|first=Doug|title=Next-Gen BI Is Here|journal=Information Week|date=31|year=2009|month=August|issue=1239|pages=6|url=http://www.businessintelligence.info/docs/revistas/bispain_tendencias_business_intelligence.pdf}}</ref>
 
* Following [[Moore's law]], the number of transistors per square unit doubles every two or so years. This is reflected in changes to price, performance, packaging and capabilities of the components. [[Random-access memory]] price and CPU computing power in particular have improved over the decades. CPU processing, memory and disk storage are all subject to some variation of this law. As well, hardware innovations such as [[Multi-core processor|multi-core architecture]], [[NAND flash memory]], [[Parallel computing|parallel servers]], and increased memory processing capability, have contributed to the technical and economic feasibility of in-memory approaches.
== References ==
* In turn, software innovations such as column centric databases, compression techniques and handling aggregate tables, enable efficient in-memory products.<ref>{{cite web|last=Kote |first=Sparjan |title=In-memory computing in Business Intelligence |url=http://www.infosysblogs.com/oracle/2011/03/in-memory_computing_in_busines.html |url-status=dead |archiveurl=https://web.archive.org/web/20110424013629/http://www.infosysblogs.com/oracle/2011/03/in-memory_computing_in_busines.html |archivedate=April 24, 2011 }}</ref>
{{Reflist}}
64-bits* operatingThe system: Though the ideaadvent of In-memory technology is not new, it is only recently emerging thanks to the widely popular and affordable ''[[64-bit processorsoperating and declining memory chips prices. [[64 bitsystem]]s'', operatingwhich systems allowsallow access to far more RAM (up to 100GB100&nbsp;GB or more) than the 2 or 4 GB accessible on [[32-bit computing|32-bit systems]]. By providing Terabytes (1 TB = 1,024 GB) of space available for storage and analysis, 64-bit operating systems make in-memory processing scalable. The use of flash memory enables systems to scale to many Terabytes more economically.
Data* Volumes:Increasing As''volumes theof data'' usedhave bymeant organizations grewthat traditional data warehouses justmay couldn’tbe deliverless able to process the data in a timely, accurate and realaccurate time dataway. The [[extract, transform, load]] ([[Extract, transform, load|ETL]]) process that periodically updates disk-based data warehouses with operational data canmay takeresult anywherein fromlags aand few hours to weeks to complete. So at any given point of timestale data is at least a day old. In-memory processing makesmay easyenable to have instantfaster access to terabytes of data for better real time reporting.
* In-memory processing may be available at a ''lower cost'' compared to disk-based processing, and can be more easily deployed and maintained. According to Gartner survey,<ref>{{Cite web |title=Survey Analysis: Why BI and Analytics Adoption Remains Low and How to Expand Its Reach |url=https://www.gartner.com/en/documents/3753469 |access-date=2023-12-05 |website=Gartner |language=en}}</ref> deploying traditional BI tools can take as long as 17 months.
*Decreases in power consumption and increases in throughput due to a lower access latency, and greater memory bandwidth and hardware parallelism.<ref>{{Cite book|last1=Upchurch|first1=E.|last2=Sterling|first2=T.|last3=Brockman|first3=J.|title=Proceedings of the ACM/IEEE SC2004 Conference |chapter=Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs |date=2004|chapter-url=https://ieeexplore.ieee.org/document/1392942|___location=Pittsburgh, PA, USA|publisher=IEEE|pages=12|doi=10.1109/SC.2004.11|isbn=978-0-7695-2153-4|s2cid=9089044 |url=https://resolver.caltech.edu/CaltechAUTHORS:20170103-172751346 }}</ref>
 
==== Application in business ====
<!--- Categories --->
SeveralA range of in-memory vendorsproducts provide ability to connect to existing data sources and access to visually rich interactive dashboards. This allows business analysts and end users to create custom reports and queries without much training or expertise. Easy navigation and ability to modify queries on the fly is anof appealing factorbenefit to many users. Since these dashboards can be populated with fresh data, it allows users to have access to real time data and can create reports within minutes,. whichIn-memory isprocessing amay criticalbe factorof particular benefit in any[[call businesscenter]]s and intelligencewarehouse applicationmanagement.
 
With Inin-memory processing, the source database is queried only once instead of accessing the database every time a query is run, thereby eliminating repetitive processing and reducing the burden on database servers. By scheduling to populate Inthe in-memory database overnight, the database servers can be used for operational purposes during peak hours.
{{uncategorized|date=December 2011}}
 
==== AdvantagesAdoption of Inin-memory BItechnology ====
With a large number of users, a large amount of [[Random-access memory|RAM]] is needed for an in-memory configuration, which in turn affects the hardware costs. The investment is more likely to be suitable in situations where speed of query response is a high priority, and where there is significant growth in data volume and increase in demand for reporting facilities; it may still not be cost-effective where information is not subject to rapid change. [[Computer security|Security]] is another consideration, as in-memory tools expose huge amounts of data to end users. Makers advise ensuring that only authorized users are given access to the data.
 
== See also ==
 
* [[Computational RAM]]
* [[System on a chip]]
* [[Network on a chip]]
 
== References ==
{{Reflist}}
 
[[Category:ArticlesComputer created via the Article Wizardmemory]]
[[Category:Database management systems]]