Content deleted Content added
VulcanSphere (talk | contribs) Adding local short description: "Category of programming languages", overriding Wikidata description "programming language for the management and manipulation of (big) data" (Shortdesc helper) |
GreenC bot (talk | contribs) Move 1 url. Wayback Medic 2.5 per WP:URLREQ#ieee.org |
||
(One intermediate revision by the same user not shown) | |||
Line 5:
The rapid growth of the [[Internet]] and [[World Wide Web]] has led to huge amounts of information available online and the need for [[Big Data]] processing capabilities. Business and government organizations create large amounts of both structured and [[unstructured data|unstructured]] information which needs to be processed, analyzed, and linked.<ref>[https://www.springer.com/computer/communication+networks/book/978-1-4419-6523-3/ Handbook of Cloud Computing], "Data-Intensive Technologies for Cloud Computing" by A. M. Middleton. Handbook of Cloud Computing. Springer, 2010.</ref> The storing, managing, accessing, and processing of this vast amount of data represents a fundamental need and an immense challenge in order to satisfy needs to search, analyze, mine, and visualize this data as information.<ref>"[http://www.csc.liv.ac.uk/~leszek/COMP526/2009/Akadej.pdf Got Data? A Guide to Data Preservation in the Information Age]" by F. Berman. Communications of the ACM, Vol. 51, No. 12, 2008, pp. 50–66.</ref> Declarative, data-centric languages are increasingly addressing these problems, because focusing on the data makes these problems much simpler to express.<ref>[http://www.cccblog.org/2008/10/20/the-data-centric-gambit/ The Data Centric Gambit], by J. Hellerstein, 2008.</ref>
Computer system architectures such as [[Hadoop]] and [[HPCC]] which can support data-parallel applications are a potential solution to the terabyte and petabyte scale data processing requirements of [[data-intensive computing]].<ref>"A Design Methodology for Data-Parallel Applications" by L. S. Nyland, J. F. Prins, A. Goldberg, and P. H. Mills. Handbook of Cloud Computing. Springer, 2010.</ref><ref>"[http://www.academia.edu/download/30742657/msw2004_proceedings.pdf#page=7 The terascale challenge]{{dead link|date=July 2022|bot=medic}}{{cbignore|bot=medic}}" by D. Ravichandran, P. Pantel, and E. Hovy. Proceedings of the KDD Workshop on Mining for and from the Semantic Web, 2004.</ref> Clusters of commodity hardware are commonly being used to address Big Data problems.<ref>"[http://www.academia.edu/download/5493555/eecs-2009-98.pdf BOOM: Data-Centric Programming in the Datacenter]{{dead link|date=July 2022|bot=medic}}{{cbignore|bot=medic}}" by P. Alvaro, T. Condie, N. Conway, K. Elmeleegy, J. Hellerstein, and R. Sears. Electrical Engineering and Computer Sciences Department, University of California at Berkeley, Technical Report, 2009.</ref> The fundamental challenges for Big Data applications and data-intensive computing<ref>"[https://ieeexplore.ieee.org/
Data-centric programming languages provide a processing approach in which applications are expressed in terms of high-level operations on data, and the runtime system transparently controls the scheduling, execution, load balancing, communications, and movement of programs and data across the computing cluster.<ref>[https://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt Data Intensive Scalable Computing], by R. E. Bryant, 2008.</ref> The programming abstraction and language tools allow the processing to be expressed in terms of data flows and transformations incorporating shared libraries of common data manipulation algorithms such as sorting.
|