Data (computer science): Difference between revisions

Content deleted Content added
m Reverted 1 edit by 197.231.203.107 (talk) to last revision by 182.178.232.67
Symbol is already linked in the main data article
 
(40 intermediate revisions by 36 users not shown)
Line 2:
{{other uses|Data (disambiguation)|Datum (disambiguation)}}
{{broader|Data}}
{{Merge to|Digital data|discuss=Talk:Digital data#Proposed merge of Data (computer science) into Digital data|date=March 2025}}
[[File:Data types - en.svg|thumb|Various types of data which can be visualized through a computer device.]]
 
In [[computer science]], '''data''' (treated as singular, plural, or as a [[mass noun]]) is [[Data|any sequence of one or more [[symbolsymbols]]s; '''datum''' is a single symbolunit of data. Data requires [[Interpretation (logic)|interpretation]] to become [[information]]. [[Digital data]] is data that is represented using the [[binary number]] system of ones (1) and zeros (0), instead of [[analogAnalog signature signalanalysis|analog]] representation. In modern (post-1960) computer systems, all data is digital.
 
Data exists in three states: [[data at rest]], [[data in transit]] and [[data in use]]. Data within a computer, in most cases, [[Parallel communication|moves as parallel data]]. Data moving to or from a computer, in most cases, [[Serial communication|moves as serial data]]. Data sourced from an analog device, such as a temperature sensor, may be converted to digital using an [[analog-to-digital converter]]. Data representing [[Quantity|quantities]], characters, or symbols on which operations are performed by a [[computer]] are [[Data storage|stored]] and [[Record (computer science)|recorded]] on [[magnetic tape data storage|magnetic]], [[optical storage|optical]], electronic, or mechanical recording media, and [[Data communication|transmitted]] in the form of digital electrical or optical signals.<ref>{{cite web|url=https://www.lexico.com/en/definition/data|title=Data|work=Lexico|access-date=14 January 2022|url-status=dead|archive-url=https://web.archive.org/web/20190623094330/https://www.lexico.com/en/definition/data |archive-date=2019-06-23 }}</ref> Data pass in and out of computers via [[peripheral|peripheral devices]].
 
Physical [[computer memory]] elements consist of an address and a byte/word of data storage. Digital data are often stored in [[Relational database#RDBMS|relational databases]], like [[table (database)|tables]] or SQL databases, and can generally be represented as abstract key/value pairs. Data can be organized in many different types of [[data structure]]s, including arrays, [[Graph (abstract data type)|graphs]], and [[Object (computer science)|objects]]. Data structures can store data of many different [[data type|types]], including [[Floating-point arithmetic|numbers]], [[string (computer science)|strings]] and even other [[Recursive data type|data structures]].
Line 13 ⟶ 14:
[[Metadata]] helps translate data to information. Metadata is data about the data. Metadata may be implied, specified or given.
 
Data relating to physical events or processes will have a temporal component. This temporal component may be implied. This is the case when a device such as a temperature logger receives data from a temperature [[sensor]]. When the temperature is received it is assumed that the data has a temporal reference of ''now''. So the device records the date, time and temperature together. When the data logger communicates temperatures, it must also report the date and time as metadata for each temperature reading.
 
Fundamentally, computers follow a sequence of instructions they are given in the form of data. A set of instructions to perform a given task (or tasks) is called a ''[[computer program|program]]''. A program is data in the form of coded instructions to control the operation of a computer or other machine.<ref>{{cite web|url=http://www.encyclopedia.com/topic/computer_program.aspx#2|title=Computer program|work=The Oxford pocket dictionary of current english|access-date=11 October 2012|url-status=live|archive-url=https://web.archive.org/web/20111128202415/http://www.encyclopedia.com/topic/computer_program.aspx#2|archive-date=28 November 2011}}</ref> In the nominal case, the program, as [[Execution (computing)|executed]] by the computer, will consist of [[machine code]]. The elements of [[computer data storage|storage]] manipulated by the program, but not actually executed by the [[central processing unit]] (CPU), are also data. At its most essential, a single datum is a [[Value (computer science)|value]] stored at a specific ___location. Therefore, it is possible for computer programs to operate on other computer programs, by manipulating their programmatic data.
 
To store data [[byte]]s in a file, they have to be [[Serialization|serialized]] in a [[file format]]. Typically, programs are stored in special file types, different from those used for other data. [[Executable|Executable file]]s contain programs; all other files are also [[data file]]s. However, executable files may also contain data used by the program which is built into the program. In particular, some executable files have a [[data segment]], which nominally contains constants and initial values for variables, both of which can be considered data.
 
The line between program and data can become blurry. An [[interpreter (computing)|interpreter]], for example, is a program. The input data to an interpreter is itself a program, just not one expressed in native [[Machine code|machine language]]. In many cases, the interpreted program will be a human-readable [[text file]], which is manipulated with a [[text editor]] program. [[Metaprogramming]] similarly involves programs manipulating other programs as data. Programs like [[compiler]]s, [[Linker (computing)|linker]]s, [[debugger]]s, [[Software Updater|program updaters]], [[Antivirus software|virus scanners]] and such use other programs as their data.
 
For example, a [[user (computing)|user]] might first instruct the [[operating system]] to load a [[word processor]] program from one file, and then use the running program to open and edit a [[Document file format|document]] stored in another file. In this example, the document would be considered data. If the word processor also features a [[spell checker]], then the dictionary (word list) for the spell checker would also be considered data. The [[algorithm]]s used by the spell checker to suggest corrections would be either [[machine code]] data or text in some interpretable [[programming language]].
 
In an alternate usage, [[binary file]]s (which are not [[Human-readable medium|human-readable]]) are sometimes called ''data'' as distinguished from human-readable ''[[text file|text]]''.<ref>{{cite web|url=https://man.openbsd.org/file.1|title=file(1)|work=OpenBSD manual pages|date=24 December 2015|access-date=4 February 2018|url-status=live|archive-url=https://web.archive.org/web/20180205000843/https://man.openbsd.org/file.1|archive-date=5 February 2018}}</ref>
Line 28 ⟶ 29:
 
==Data keys and values, structures and persistence==
Keys in data provide the context for values. Regardless of the structure of data, there is always a key component present. Keys in data and data-structures are essential for giving meaning to data values. Without a key that is directly or indirectly associated with a value, or collection of values in a structure, the values become meaningless and cease to be data. That is to say, there has to be a key component linked to a value component in order for it to be considered data.{{cn|date=August 2021}}
 
Data can be represented in computers in multiple ways, as per the following examples:
 
===RAM===
* [[Random access memory]] (RAM) holds data that the CPU has direct access to. A CPU may only manipulate data within its [[processor register]]s or memory. This is as opposed to data storage, where the CPU must direct the transfer of data between the storage device (disk, tape...) and memory. RAM is an array of linear contiguous locations that a processor may read or write by providing an address for the read or write operation. The processor may operate on any ___location in memory at any time in any order. In RAM the smallest element of data is the binary [[bit]]. The capabilities and limitations of accessing RAM are processor specific. In general [[Computer data storage|main memory]] is arranged as an array of [[Memory address|locations]] beginning at address 0 ([[hexadecimal]] 0). Each ___location can store usually 8 or 32 bits depending on the [[computer architecture]].
 
===Keys===
* Data keys need not be a direct hardware address in memory. [[Indirection|Indirect]], abstract and logical keys codes can be stored in association with values to form a [[data structure]]. Data structures have predetermined [[Offset (computer science)|offsets]] (or links or paths) from the start of the structure, in which data values are stored. Therefore, the data key consists of the key to the structure plus the offset (or links or paths) into the structure. When such a structure is repeated, storing variations of the data values and the data keys within the same repeating structure, the result can be considered to resemble a [[Table (information)|table]], in which each element of the repeating structure is considered to be a column and each repetition of the structure is considered as a row of the table. In such an organization of data, the data key is usually a value in one (or a composite of the values in several) of the columns.
 
===Organised recurring data structures===
* The [[Table (information)|tabular]] view of repeating data structures is only one of many possibilities. Repeating data structures can be organised [[Hierarchy|hierarchically]], such that nodes are linked to each other in a cascade of parent-child relationships. Values and potentially more complex data-structures are linked to the nodes. Thus the nodal hierarchy provides the key for addressing the data structures associated with the nodes. This representation can be thought of as an [[Tree (data structure)|inverted tree]]. Modern computer operating system [[file system]]s are a common example; and [[XML]] is another.
 
===Sorted or ordered data===
Line 48 ⟶ 49:
 
===Indexed data===
* Retrieving a small subset of data from a much larger set may imply inefficiently searching through the data sequentially. '''[[Database index|Index]]es''' are a way to copy out keys and ___location addresses from data structures in files, tables and data sets, then organize them using [[Tree (data structure)|inverted tree structures]] to reduce the time taken to retrieve a subset of the original data. In order to do this, the key of the subset of data to be retrieved must be known before retrieval begins. The most popular indexes are the [[B-tree]] and the dynamic [[Hash function|hash]] key indexing methods. Indexing is overhead for filing and retrieving data. There are other ways of organizing indexes, e.g. sorting the keys and using a [[binary search algorithm]].
 
===Abstraction and indirection===
Line 60 ⟶ 61:
 
===Parallel distributed data processing===
* Modern scalable and high-performance data persistence technologies, such as [[Apache Hadoop]], rely on massively parallel distributed data processing across many commodity computers on a high bandwidth network. In such systems, the data is distributed across multiple computers and therefore any particular computer in the system must be represented in the key of the data, either directly, or indirectly. This enables the differentiation between two identical sets of data, each being processed on a different computer at the same time.
 
==See also==
Line 81 ⟶ 82:
{{Authority control}}
 
{{DEFAULTSORT:Data (Computing)}}
[[Category:Computer data| ]]