Database: Difference between revisions

Content deleted Content added
m Reverted edits by 27.251.171.83 (talk) to last version by ClueBot NG
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5
 
Line 1:
{{Short description|Organized collection of data in computing}}
{{refimprove|date=December 2011}}
{{about|the computing concept|instances of the general concept|Lists of databases}}
{{condense|date=November 2011}}
{{Merge from|Data bank|discuss=Talk:Database#Proposed merge of Data bank into Database|date=May 2025}}
{{protection padlock|1=vandalism|small=y}}
[[Image:DVD Rental Query.png|thumb|An [[SQL]] select statement and its result|upright=1.35]]
 
In [[computing]], a '''database''' is an organized collection of [[Data (computing)|data]] or a type of [[data store]] based on the use of a '''database management system''' ('''DBMS'''), the [[software]] that interacts with [[end user]]s, [[Application software|applications]], and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a '''database system'''. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.
A '''database''' is an organized collection of [[data]] for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for example, finding a hotel with vacancies). This definition is very general, and is independent of the technology used.
 
Before digital storage and retrieval of data have become widespread, [[index cards]] were used for [[data storage]] in a wide range of applications and environments: in the home to record and store recipes, shopping lists, contact information and other organizational data; in business to record presentation notes, project research and notes, and contact information; in schools as flash cards or other visual aids; and in academic research to hold data such as bibliographical citations or notes in a [[card file]]. Professional book indexers used index cards in the creation of book indexes until they were replaced by [[indexing software]] in the 1980s and 1990s.
The term "database" may be narrowed to specify particular aspects of organized collection of data and may refer to the logical database, to physical database as data content in [[computer data storage]] or to many other database sub-definitions.
 
Small databases can be stored on a [[file system]], while large databases are hosted on [[computer clusters]] or [[cloud storage]]. The [[Database design|design of databases]] spans formal techniques and practical considerations, including [[data modeling]], efficient data representation and storage, [[query language]]s, [[Database security|security]] and [[Information privacy|privacy]] of sensitive data, and [[distributed computing]] issues, including supporting [[concurrent computing|concurrent]] access and [[fault tolerance]].
The term database is correctly applied to the data and their supporting data structures, and not to the [[database management system]] (referred to by the acronym [[DBMS]]). The database data collection with DBMS is called a [[database system]].
 
[[Computer scientists]] may classify database management systems according to the [[database model]]s that they support. [[Relational database]]s became dominant in the 1980s. These model data as [[Row (database)|rows]] and [[Column (database)|columns]] in a series of [[Table (database)|tables]], and the vast majority use [[SQL]] for writing and querying data. In the 2000s, non-relational databases became popular, collectively referred to as [[NoSQL]], because they use different [[query language]]s.
The term database system implies that the data is managed to some level of quality (measured in terms of accuracy, availability, usability, and resilience) and this in turn often implies the use of a general-purpose database management system (DBMS).<ref name=Ullman>[[Jeffrey Ullman]] and Jennifer widom 1997: ''First course in database systems'', Prentice-Hall Inc., Simon & Schuster, Page 1, ISBN 0-13-861337-0.</ref> A general-purpose DBMS is typically a complex [[Computer software|software]] system that meets many usage requirements, and the databases that it maintains are often large and complex. The utilization of databases is now spread to such a wide degree that virtually every technology and product relies on databases and DBMSs for its development and commercialization, or even may have such embedded in it. Also, organizations and companies, from small to large, heavily depend on databases for their operations.
 
==Terminology and overview==
Well known DBMSs include [[Oracle Database|Oracle]], [[IBM DB2]], [[Microsoft SQL Server]], [[PostgreSQL]], [[MySQL]] and [[SQLite]]. A database is not generally [[Software portability|portable]] across different DBMS, but different DBMSs can [[Interoperation|inter-operate]] to some degree by using [[Technical standard|standard]]s like [[SQL]] and [[ODBC]] to support together a single application. A DBMS also needs to provide effective [[Run time (program lifecycle phase)|run-time]] execution to properly support (e.g., in terms of [[IT Performance Management|performance]], [[availability]], and [[security]]) as many [[end-user]]s as needed.
Formally, a "database" refers to a set of related data accessed through the use of a "database management system" (DBMS), which is an integrated set of [[computer software]] that allows [[user (computing)|users]] to interact with one or more databases and provides access to all of the data contained in the database (although restrictions may exist that limit access to particular data). The DBMS provides various functions that allow entry, storage and retrieval of large quantities of information and provides ways to manage how that information is organized.
 
Because of the close relationship between them, the term "database" is often used casually to refer to both a database and the DBMS used to manipulate it.
The design, construction, and maintenance of a complex database requires specialist skills: the staff performing these functions are referred to as database application [[programmer]]s and [[database administrator]]s. Their tasks are supported by tools provided either as part of the DBMS or as stand-alone software products. These tools include specialized [[database language]]s including [[data definition language]]s (DDL), [[data manipulation language]]s (DML), and [[query language]]s. These can be seen as special-purpose [[programming language]]s, tailored specifically to manipulate databases; sometimes they are provided as extensions of existing programming languages, with added database commands. Database languages are generally specific to one [[data model]], and in many cases they are specific to one DBMS type. The most widely supported database language is [[SQL]], which has been developed for the [[Relational model|relational data model]] and combines the roles of both DDL, DML, and a query language.
 
Outside the world of professional [[information technology]], the term ''database'' is often used to refer to any collection of related data (such as a [[spreadsheet]] or a card index) as size and usage requirements typically necessitate use of a database management system.{{sfn|Ullman|Widom|1997|p=1}}
A way to classify databases involves the type of their contents, for example: bibliographic, document-text, statistical, or multimedia objects. Another way is by their application area, for example: accounting, music compositions, movies, banking, manufacturing, or insurance.
 
Existing DBMSs provide various functions that allow management of a database and its data which can be classified into four main functional groups:
==History==
===Database concept===
 
* '''Data definition''' – Creation, modification and removal of definitions that detail how the data is to be organized.
The database concept has evolved since the 1960s to ease increasing difficulties in designing, building, and maintaining complex [[information system]]s (typically with many concurrent end-users, and with a diverse large amount of data). It has evolved together with database management systems which enable the effective handling of databases. Though the terms database and DBMS define different entities, they are inseparable: a database's properties are determined by its supporting DBMS and vice-versa. The [[Oxford English dictionary]] cites{{citation needed|at lest here citation needed|date=November 2011}} a 1962 technical report as the first to use the term "data-base." With the progress in technology in the areas of [[processors]], [[computer memory]], [[computer storage]]. and [[computer networks]], the sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitudes. For decades it has been unlikely that a complex information system can be built effectively without a proper database supported by a DBMS. The utilization of databases is now spread to such a wide degree that virtually every technology and product relies on databases and DBMSs for its development and commercialization, or even may have such embedded in it. Also, organizations and companies, from small to large, heavily depend on databases for their operations.
* '''Update''' – Insertion, modification, and deletion of the data itself.<ref>{{cite web|url=http://www.merriam-webster.com/dictionary/update|title=Update Definition & Meaning |work=Merriam-Webster |url-status=live |archive-url=https://web.archive.org/web/20240225065959/https://www.merriam-webster.com/dictionary/update |archive-date= Feb 25, 2024 }}</ref>
* '''Retrieval''' – Selecting data according to specified criteria (e.g., a query, a position in a hierarchy, or a position in relation to other data) and providing that data either directly to the user, or making it available for further processing by the database itself or by other applications. The retrieved data may be made available in a more or less direct form without modification, as it is stored in the database, or in a new form obtained by altering it or combining it with existing data from the database.<ref>{{cite web|url=http://www.merriam-webster.com/dictionary/retrieval|title=Retrieval Definition & Meaning |work=Merriam-Webster |url-status=live |archive-url= https://web.archive.org/web/20230627174611/https://www.merriam-webster.com/dictionary/retrieval |archive-date= Jun 27, 2023 }}</ref>
* '''Administration''' – Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information that has been corrupted by some event such as an unexpected system failure.<ref>{{cite web|url=http://www.merriam-webster.com/dictionary/administration|title=Administration Definition & Meaning |work=Merriam-Webster |url-status=live |archive-url=https://web.archive.org/web/20231206055116/https://www.merriam-webster.com/dictionary/administration |archive-date= Dec 6, 2023 }}</ref>
 
Both a database and its DBMS conform to the principles of a particular [[database model]].{{sfn|Tsitchizris| Lochovsky| 1982}} "Database system" refers collectively to the database model, database management system, and database.{{sfn|Beynon-Davies|2003}}
No widely accepted exact definition exists for DBMS. However, a system needs to provide considerable functionality to qualify as a DBMS. Accordingly its supported data collection needs to meet respective usability requirements (broadly defined by [[Database#Major database usage requirements|the requirements below]]) to qualify as a database. Thus, a database and its supporting DBMS are defined here by a set of general requirements listed below. Virtually all existing mature DBMS products meet these requirements to a great extent, while less mature either meet them or converge to meet them.
 
Physically, database [[Server (computing)|servers]] are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually [[multiprocessor]] computers, with generous memory and [[RAID]] disk arrays used for stable storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large-volume [[Transaction processing system|transaction processing environments]]. DBMSs are found at the heart of most [[database application]]s. DBMSs may be built around a custom [[computer multitasking|multitasking]] [[Kernel (operating system)|kernel]] with built-in [[Computer network|networking]] support, but modern DBMSs typically rely on a standard [[operating system]] to provide these functions.{{citation needed|reason=This has become a mish-mash people's opinion and while boardly accurate may be in places undue and ignored e.g. SAN storage and needs cited rewrite from scrach|date=January 2020}}
===Evolution of database and DBMS technology===
:See also ''[[Database management system#History]]''
 
Since DBMSs comprise a significant [[Market (economics)|market]], computer and storage vendors often take into account DBMS requirements in their own development plans.{{sfn|Nelson|Nelson|2001}}
The introduction of the term ''database'' coincided with the availability of direct-access storage (disks and drums) from the mid-1960s onwards. The term represented a contrast with the tape-based systems of the past, allowing shared interactive use rather than daily batch processing.
 
Databases and DBMSs can be categorized according to the database model(s) that they support (such as [[Relational database|relational]] or [[XML]]), the type(s) of computer they run on (from a [[server cluster]] to a [[mobile phone]]), the [[query language]](s) used to access the database (such as SQL or [[XQuery]]), and their internal engineering, which affects performance, [[scalability]], resilience, and security.
In the earliest database systems, efficiency was perhaps the primary concern, but it was already recognized that there were other important objectives. One of the key aims was to make the data independent of the logic of application programs, so that the same data could be made available to different applications.
 
==History==
The first generation of database systems were ''[[Navigational database|navigational]]'',<ref>{{citation | author = C. W. Bachmann | title = The Programmer as Navigator}}</ref> applications typically accessed data by following pointers from one record to another. The two main data models at this time were the [[Hierarchical database model|hierarchical model]], epitomized by IBM's IMS system, and the [[Codasyl]] model ([[Network model]]), implemented in a number of products such as [[IDMS]].
The sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitude. These performance increases were enabled by the technology progress in the areas of [[Central processing unit|processors]], [[computer memory]], [[computer storage]], and [[computer network]]s. The concept of a database was made possible by the emergence of direct access [[storage media]] such as [[Hard disk drive|magnetic disks]], which became widely available in the mid-1960s; earlier systems relied on sequential storage of data on [[magnetic tape]]. The subsequent development of database technology can be divided into three eras based on data model or structure: [[navigational database|navigational]],{{sfn|Bachman|1973}} SQL/[[relational database|relational]], and post-relational.
 
The two main early navigational [[data model]]s were the [[hierarchical model]] and the [[CODASYL]] model ([[network model]]). These were characterized by the use of [[Pointer (computer programming)|pointers]] (often physical disk addresses) to follow relationships from one record to another.
The [[Relational model]], first proposed in 1970 by [[Edgar F. Codd]], departed from this tradition by insisting that applications should search for data by content, rather than by following links. This was considered necessary to allow the content of the database to evolve without constant rewriting of applications. Relational systems placed heavy demands on processing resources, and it was not until the mid 1980s that computing hardware became powerful enough to allow them to be widely deployed. By the early 1990s, however, relational systems were dominant for all large-scale data processing applications, and they remain dominant today (2012) except in niche areas. The dominant database language is the standard SQL for the Relational model, which has influenced database languages also for other data models.
 
The [[relational model]], first proposed in 1970 by [[Edgar F. Codd]], departed from this tradition by insisting that [[Application software|applications]] should search for data by content, rather than by following links. The relational model employs sets of ledger-style tables, each used for a different type of [[entity]]. Only in the mid-1980s did computing hardware become powerful enough to allow the wide deployment of relational systems (DBMSs plus applications). By the early 1990s, however, relational systems dominated in all large-scale [[data processing]] applications, and {{as of | 2018 | lc = on}} they remain dominant: [[IBM Db2]], [[Oracle database|Oracle]], [[MySQL]], and [[Microsoft SQL Server]] are the most searched [[DBMS]].<ref>{{cite web|url=https://pypl.github.io/DB.html|title=TOPDB Top Database index|work=pypl.github.io}}</ref> The dominant database language, standardized SQL for the relational model, has influenced database languages for other data models.{{Citation needed|date=March 2013}}
Because the relational model emphasizes search rather than navigation, it does not make relationships between different entities explicit in the form of pointers, but represents them rather using ''primary keys'' and ''foreign keys''. While this is a good basis for a query language, it is less well suited as a modeling language. For this reason a different model, the [[Entity-relationship model]] which emerged shortly later (1976), gained popularity for [[database design]].
 
[[Object database]]s were developed in the 1980s to overcome the inconvenience of [[object–relational impedance mismatch]], which led to the coining of the term "post-relational" and also the development of hybrid [[object–relational database]]s.
In the period since the 1970s database technology has kept pace with the increasing resources becoming available from the computing platform: notably the rapid increase in the capacity and speed (and reduction in price) of disk storage, and the increasing capacity of main memory. This has enabled ever larger databases and higher throughputs to be achieved.
 
The next generation of post-relational databases in the late 2000s became known as [[NoSQL]] databases, introducing fast [[key–value store]]s and [[document-oriented database]]s. A competing "next generation" known as [[NewSQL]] databases attempted new implementations that retained the relational/SQL model while aiming to match the high performance of NoSQL compared to commercially available relational DBMSs.
The rigidity of the relational model, in which all data is held in tables with a fixed structure of rows and columns, has increasingly been seen as a limitation when handling information that is richer or more varied in structure than the traditional 'ledger-book' data of corporate information systems: for example, document databases, engineering databases, multimedia databases, or databases used in the molecular sciences. Various attempts have been made to address this problem, many of them gathering under banners such as ''post-relational'' or ''NoSQL''. Two developments of note are the [[Object database]] and the [[XML database]]. The vendors of relational databases have fought off competition from these newer models by extending the capabilities of their own products to support a wider variety of data types.
 
====General-purpose1960s, navigational DBMS====
{{Further|Navigational database}}
[[File:CodasylB.png|thumb|upright=1.15|Basic structure of navigational [[CODASYL]] database model]]
 
The introduction of the term ''database'' coincided with the availability of direct-access storage (disks and drums) from the mid-1960s onwards. The term represented a contrast with the tape-based systems of the past, allowing shared interactive use rather than daily [[batch processing]]. The [[Oxford English Dictionary]] cites a 1962 report by the [[System Development Corporation]] of California as the first to use the term "data-base" in a specific technical sense.<ref>{{cite web|title=database, n|url=http://www.oed.com/view/Entry/47411|work=OED Online|publisher=Oxford University Press|access-date=July 12, 2013|date=June 2013}} {{subscription required|s}}</ref>
A DBMS has evolved into a complex software system and its development typically requires thousands of person-years of development effort.{{citation needed|date=December 2011}} Some general-purpose DBMSs, like Oracle, [[Microsoft SQL server]], and IBM DB2, have been undergoing upgrades for thirty years or more. General-purpose DBMSs aim to satisfy as many applications as possible, which typically makes them even more complex than special-purpose databases. However, the fact that they can be used "off the shelf", as well as their amortized cost over many applications and instances, makes them an attractive alternative (Vs. one-time development) whenever they meet an application's requirements.
 
As computers grew in speed and capability, a number of general-purpose database systems emerged; by the mid-1960s a number of such systems had come into commercial use. Interest in a standard began to grow, and [[Charles Bachman]], author of one such product, the [[Integrated Data Store]] (IDS), founded the [[Data Base Task Group|Database Task Group]] within [[CODASYL]], the group responsible for the creation and standardization of [[COBOL]]. In 1971, the Database Task Group delivered their standard, which generally became known as the ''CODASYL approach'', and soon a number of commercial products based on this approach entered the market.
Though attractive in many cases, a general-purpose DBMS is not always the optimal solution: When certain applications are pervasive with many operating instances, each with many users, a general-purpose DBMS may introduce unnecessary overhead and too large "footprint" (too large amount of unnecessary, unutilized software code). Such applications usually justify dedicated development. Typical examples are [[email]] systems, though they need to possess certain DBMS properties: email systems are built in a way that optimizes email messages handling and managing, and do not need significant portions of a general-purpose DBMS functionality.
 
The CODASYL approach offered applications the ability to navigate around a linked data set which was formed into a large network. Applications could find records by one of three methods:
=====Types of people involved=====
#Use of a primary key (known as a CALC key, typically implemented by [[Hash function|hashing]])
#Navigating relationships (called ''sets'') from one record to another
#Scanning all the records in a sequential order
 
Later systems added [[B-tree]]s to provide alternate access paths. Many CODASYL databases also added a declarative query language for end users (as distinct from the navigational [[API]]). However, CODASYL databases were complex and required significant training and effort to produce useful applications.
Three types of people are involved with a general-purpose DBMS:
#'''DBMS developers''' - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., [[Oracle Corporation|Oracle]], [[IBM]], [[Microsoft]], [[Sybase]]), or, in the case of [[Open source]] DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled [[Systems programming|systems programmers]]. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades.
#'''[[Computer programming|Application developers]]''' and '''[[Database administrator]]s''' - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see [[Embedded database]]; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS.
#'''Application's end-users''' (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it.
 
[[IBM]] also had its own DBMS in 1966, known as [[Information Management System]] (IMS). IMS was a development of software written for the [[Apollo program]] on the [[System/360]]. IMS was generally similar in concept to CODASYL, but used a strict hierarchy for its model of data navigation instead of CODASYL's network model. Both concepts later became known as navigational databases due to the way data was accessed: the term was popularized by Bachman's 1973 [[Turing Award]] presentation ''The Programmer as Navigator''. IMS is classified by IBM as a [[hierarchical database]]. IDMS and [[Cincom Systems]]' [[Cincom Systems#1970s|TOTAL]] databases are classified as network databases. IMS remains in use {{as of | 2014 | lc = on}}.<ref>{{cite web|last=IBM Corporation|title=IBM Information Management System (IMS) 13 Transaction and Database Servers delivers high performance and low total cost of ownership|url=http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS213-381|access-date=Feb 20, 2014|date=October 2013}}</ref>
====Database machines and appliances====
{{Main|Database machine}}
 
===1970s, relational DBMS===
In the 1970s and 1980s attempts were made to build database systems with integrated hardware and software. The underlying philosophy was that such integration would provide higher performance at lower cost. Examples were IBM [[System/38]], the early offering of [[Teradata]], and the [[Britton Lee, Inc.]] database machine.
[[Edgar F. Codd]] worked at IBM in [[San Jose, California]], in an office primarily involved in the development of [[hard disk]] systems.{{r|rdbmsearlyyearsoh20070612}} He was unhappy with the navigational model of the CODASYL approach, notably the lack of a "search" facility. In 1970, he wrote a number of papers that outlined a new approach to database construction that eventually culminated in the groundbreaking ''A Relational Model of Data for Large Shared Data Banks''.{{sfn|Codd|1970}}
Another approach to hardware support for database management was [[International Computers Limited|ICL]]'s [[Content Addressable File Store|CAFS]] accelerator, a hardware disk controller with programmable search capabilities.
In the long term these efforts were generally unsuccessful because specialized database machines could not keep pace with the rapid development and progress of general-purpose computers. Thus most database systems nowadays are software systems running on general-purpose hardware, using general-purpose computer data storage. However this idea is still pursued for certain applications by some companies like [[Netezza]] and [[Oracle]] ([[Exadata]]).
 
The paper described a new system for storing and working with large databases. Instead of records being stored in some sort of [[linked list]] of free-form records as in CODASYL, Codd's idea was to organize the data as a number of "[[Table (database)|tables]]", each table being used for a different type of entity. Each table would contain a fixed number of columns containing the attributes of the entity. One or more columns of each table were designated as a [[primary key]] by which the rows of the table could be uniquely identified; cross-references between tables always used these primary keys, rather than disk addresses, and queries would join tables based on these key relationships, using a set of operations based on the mathematical system of [[relational calculus]] (from which the model takes its name). Splitting the data into a set of normalized tables (or ''relations'') aimed to ensure that each "fact" was only stored once, thus simplifying update operations. Virtual tables called ''views'' could present the data in different ways for different users, but views could not be directly updated.
===Database research===
Database research has been an active and diverse area, with many specializations, carried out since the early days of dealing with the database concept in the 1960s. It has strong ties with database technology and DBMS products. Database research has taken place at research and development groups of companies (e.g., notably at [[IBM Research]], who contributed technologies and ideas virtually to any DBMS existing today), [[research institute]]s, and [[Academia]]. Research has been done both through [[Database theory|Theory]] and [[Prototype]]s. The interaction between research and database related product development has been very productive to the database area, and many related key concepts and technologies emerged from it. Notable are the Relational and the Entity-relationship [[Data model|models]], the [[Database transaction|Atomic transaction]] concept and related [[Concurrency control]] techniques, Query languages and [[Query optimization]] methods, [[RAID]], and more. Research has provided deep [[insight]] to virtually all aspects of databases, though not always has been pragmatic, effective (and cannot and should not always be: research is exploratory in nature, and not always leads to accepted or useful ideas). Ultimately market forces and real needs determine the selection of problem solutions and related technologies, also among those proposed by research. However, occasionally, not the best and most [[Elegance|elegant]] solution wins (e.g., SQL). Along their history DBMSs and respective databases, to a great extent, have been the outcome of such research, while real product requirements and challenges triggered database research directions and sub-areas.
 
Codd used mathematical terms to define the model: relations, tuples, and domains rather than tables, rows, and columns. The terminology that is now familiar came from early implementations. Codd would later criticize the tendency for practical implementations to depart from the mathematical foundations on which the model was based.
The database research area has several notable dedicated [[academic journal]]s (e.g., [[ACM Transactions on Database Systems]]-TODS, [[Data and Knowledge Engineering]]-DKE, and more) and annual [[Academic conference|conference]]s (e.g., [[Association for Computing Machinery|ACM]] [[SIGMOD]], ACM [[Symposium on Principles of Database Systems|PODS]], [[VLDB]], [[IEEE]] ICDE, and more), as well as an active and quite heterogeneous (subject-wise) research community all over the world.
 
[[File:Relational key SVG.svg|thumb|In the [[relational model]], records are "linked" using virtual keys not stored in the database but defined as needed between the data contained in the records.]]
==Database type examples==
The use of primary keys (user-oriented identifiers) to represent cross-table relationships, rather than disk addresses, had two primary motivations. From an engineering perspective, it enabled tables to be relocated and resized without expensive database reorganization. But Codd was more interested in the difference in semantics: the use of explicit identifiers made it easier to define update operations with clean mathematical definitions, and it also enabled query operations to be defined in terms of the established discipline of [[first-order predicate calculus]]; because these operations have clean mathematical properties, it becomes possible to rewrite queries in provably correct ways, which is the basis of query optimization. There is no loss of expressiveness compared with the hierarchic or network models, though the connections between tables are no longer so explicit.
 
In the hierarchic and network models, records were allowed to have a complex internal structure. For example, the salary history of an employee might be represented as a "repeating group" within the employee record. In the relational model, the process of normalization led to such internal structures being replaced by data held in multiple tables, connected only by logical keys.
The following are examples of various database types. Some of them are not main-stream types, but most of them have received special attention (e.g., in research) due to end-user requirements. Some exist as specialized DBMS products, and some have their functionality types incorporated in existing general-purpose DBMSs.
 
For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers. In the navigational approach, all of this data would be placed in a single variable-length record. In the relational approach, the data would be ''normalized'' into a user table, an address table and a phone number table (for instance). Records would be created in these optional tables only if the address or phone numbers were actually provided.
*'''Active database'''
{{Main|Active database}}
 
As well as identifying rows/records using logical identifiers rather than disk addresses, Codd changed the way in which applications assembled data from multiple records. Rather than requiring applications to gather data one record at a time by navigating the links, they would use a declarative query language that expressed what data was required, rather than the access path by which it should be found. Finding an efficient access path to the data became the responsibility of the database management system, rather than the application programmer. This process, called query optimization, depended on the fact that queries were expressed in terms of mathematical logic.
::An ''active database'' is a database that includes an event-driven architecture which can respond to conditions both inside and outside the database. Possible uses include security monitoring, alerting, statistics gathering and authorization.
 
Codd's paper inspired teams at various universities to research the subject, including one at [[University of California, Berkeley]]{{r|rdbmsearlyyearsoh20070612}} led by [[Eugene Wong]] and [[Michael Stonebraker]], who started [[INGRES]] using funding that had already been allocated for a geographical database project and student programmers to produce code. Beginning in 1973, INGRES delivered its first test products which were generally ready for widespread use in 1979. INGRES was similar to [[IBM System R|System R]] in a number of ways, including the use of a "language" for [[data access]], known as [[QUEL query languages|QUEL]]. Over time, INGRES moved to the emerging SQL standard.
::Most modern relational databases include active database features in the form of [[database trigger]].
 
IBM itself did one test implementation of the relational model, [[PRTV]], and a production one, [[Business System 12]], both now discontinued. [[Honeywell]] wrote [[Multics Relational Data Store|MRDS]] for [[Multics]], and now there are two new implementations: [[Dataphor|Alphora Dataphor]] and Rel. Most other DBMS implementations usually called ''relational'' are actually SQL DBMSs.
*'''Cloud database'''
{{Main|Cloud database}}
::A ''Cloud database'' is a database that relies on [[Cloud computing|cloud technology]]. Both the database and most of its DBMS reside remotely, "in the cloud," while its applications are both developed by programmers and later maintained and utilized by (application's) end-users through a [[Web browser]] and [[Open API]]s. More and more such database products are emerging, both of new vendors and by virtually all established database vendors.
 
In 1970, the University of Michigan began development of the [[MICRO Information Management System]]{{sfn|Hershey|Easthope|1972}} based on [[David L. Childs|D.L. Childs]]' Set-Theoretic Data model.{{sfn|North|2010}}{{sfn|Childs|1968a}}{{sfn|Childs|1968b}} The university in 1974 hosted a debate between Codd and Bachman which Bruce Lindsay of IBM later described as "throwing lightning bolts at each other!".<ref name="rdbmsearlyyearsoh20070612">{{Cite interview |interviewer=Burton Grad |title=RDBMS Plenary 1: Early Years |url=https://archive.computerhistory.org/resources/access/text/2013/05/102702562-05-01-acc.pdf |access-date=2025-05-30 |publisher=Computer History Museum |date=2007-06-12}}</ref> MICRO was used to manage very large data sets by the [[US Department of Labor]], the [[U.S. Environmental Protection Agency]], and researchers from the [[University of Alberta]], the [[University of Michigan]], and [[Wayne State University]]. It ran on IBM mainframe computers using the [[Michigan Terminal System]].<ref name=MICROManual1977>{{cite book |author1=M.A. Kahn |author2=D.L. Rumelhart |author3=B.L. Bronson |date=October 1977 |url=https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B4t_NX-QeWDYZGMwOTRmOTItZTg2Zi00YmJkLTg4MTktN2E4MWU0YmZlMjE3 |title=MICRO Information Management System (Version 5.0) Reference Manual |publisher=Institute of Labor and Industrial Relations (ILIR), University of Michigan and Wayne State University}}</ref> The system remained in production until 1998.
*'''Data warehouse'''
{{Main|Data warehouse}}
 
===Integrated approach===
::Data warehouses archive data from operational databases and often from external sources such as market research firms. Often operational data undergoes transformation on its way into the warehouse, getting summarized, anonymized, reclassified, etc. The warehouse becomes the central source of data for use by managers and other end-users who may not have access to operational data. For example, sales data might be aggregated to weekly totals and converted from internal product codes to use [[Universal Product Code|UPC]]s so that it can be compared with [[ACNielsen]] data. Some basic and essential components of data warehousing include retrieving, analyzing, and [[Data mining|mining]] data, transforming,loading and managing data so as to make it available for further use.
{{Main|Database machine}}
 
In the 1970s and 1980s, attempts were made to build database systems with integrated hardware and software. The underlying philosophy was that such integration would provide higher performance at a lower cost. Examples were [[IBM System/38]], the early offering of [[Teradata]], and the [[Britton Lee, Inc.]] database machine.
::Operations in a data warehouse are typically concerned with bulk data manipulation, and as such, it is unusual and inefficient to target individual rows for update, insert or delete. Bulk native loaders for input data and bulk SQL passes for aggregation are the norm.
 
Another approach to hardware support for database management was [[International Computers Limited|ICL]]'s [[Content Addressable File Store|CAFS]] accelerator, a hardware disk controller with programmable search capabilities. In the long term, these efforts were generally unsuccessful because specialized database machines could not keep pace with the rapid development and progress of general-purpose computers. Thus most database systems nowadays are software systems running on general-purpose hardware, using general-purpose computer data storage. However, this idea is still pursued in certain applications by some companies like [[Netezza]] and Oracle ([[Exadata]]).
*'''Distributed database'''
{{Main|Distributed database}}
 
===Late 1970s, SQL DBMS===
::The definition of a ''distributed database'' is broad, and may be utilized in different meanings. In general it typically refers to a modular DBMS architecture that allows distinct DBMS instances to cooperate as a single DBMS over processes, computers, and sites, while managing a single database distributed itself over multiple computers, and different sites.
IBM formed a team led by Codd that started working on a prototype system, ''[[IBM System R|System R]]'' despite opposition from others at the company.{{r|rdbmsearlyyearsoh20070612}} The first version was ready in 1974/5, and work then started on multi-table systems in which the data could be split so that all of the data for a record (some of which is optional) did not have to be stored in a single large "chunk". Subsequent multi-user versions were tested by customers in 1978 and 1979, by which time a standardized [[query language]] – SQL{{Citation needed|reason=First version of SQL standard was SQL-86 adopted in 1986|date=May 2012}} – had been added. Codd's ideas were establishing themselves as both workable and superior to CODASYL, pushing IBM to develop a true production version of System R, known as ''SQL/DS'', and, later, ''Database 2'' ([[IBM Db2]]).
 
[[Larry Ellison]]'s Oracle Database (or more simply, [[Oracle Database|Oracle]]) started from a different chain, based on IBM's papers on System R. Though Oracle V1 implementations were completed in 1978, it was not until Oracle Version 2 when Ellison beat IBM to market in 1979.<ref>{{cite web|url=https://www.oracle.com/us/corporate/profit/p27anniv-timeline-151918.pdf |archive-url=https://web.archive.org/web/20110320220813/http://www.oracle.com/us/corporate/profit/p27anniv-timeline-151918.pdf |archive-date=2011-03-20 |url-status=live |title=Oracle 30th Anniversary Timeline |access-date=23 August 2017}}</ref>
::Examples are databases of local work-groups and departments at regional offices, branch offices, manufacturing plants and other work sites. These databases can include both segments shared by multiple sites, and segments specific to one site and used only locally in that site.
 
Stonebraker went on to apply the lessons from INGRES to develop a new database, Postgres, which is now known as [[PostgreSQL]]. PostgreSQL is often used for global mission-critical applications (the .org and .info ___domain name registries use it as their primary [[data store]], as do many large companies and financial institutions).
*'''Document-oriented database'''
{{Main|Document-oriented database}}
{{Expand section|date=June 2011}}
::Utilized to conveniently store, manage, edit and retrieve documents.
 
In Sweden, Codd's paper was also read and [[Mimer SQL]] was developed in the mid-1970s at [[Uppsala University]]. In 1984, this project was consolidated into an independent enterprise.
*'''Embedded database'''
{{Main|Embedded database}}
::An ''embedded database'' system is a DBMS which is tightly integrated with an [[application software]] that requires access to stored data in a way that the DBMS is “hidden” from the application’s end-user and requires little or no ongoing maintenance. It is actually a broad technology category that includes DBMSs with differing properties and target markets. The term "embedded database" can be confusing because only a small subset of embedded database products is used in [[Real-time computing|real-time]] [[embedded systems]] such as [[Telephone switch#Digital switches|telecommunications switches]] and [[consumer electronics]] devices.<ref>Graves, Steve. [http://www.embedded-computing.com/articles/id/?2020 "COTS Databases For Embedded Systems"], ''Embedded Computing Design'' magazine, January, 2007. Retrieved on August 13, 2008.</ref>
 
Another data model, the [[entity–relationship model]], emerged in 1976 and gained popularity for [[database design]] as it emphasized a more familiar description than the earlier relational model. Later on, entity–relationship constructs were retrofitted as a [[data modeling]] construct for the relational model, and the difference between the two has become irrelevant.{{Citation needed|date=March 2013}}
*'''End-user database'''
 
===1980s, on the desktop===
::These databases consist of data developed by individual end-users. Examples of these are collections of documents, spreadsheets, presentations, multimedia, and other files. Several products exist to support such databases. Some of them are much simpler than full fledged DBMSs, with more elementary DBMS functionality (e.g., not supporting multiple concurrent end-users on a same database), with basic programming interfaces, and a relatively small "foot-print" (not much code to run as in "regular" general-purpose databases). However, also available general-purpose DBMSs can often be used for such purpose, if they provide basic user-interfaces for straightforward database applications (limited query and data display; no real programming needed), while still enjoying the database qualities and protections that these DBMSs can provide.
Besides IBM and various software companies such as [[Sybase]] and [[Informix Corporation]], most large computer hardware vendors by the 1980s had their own database systems such as [[DEC (company)|DEC]]'s [[VAX Rdb/VMS]].<ref name="rdbmslateryears20070612">{{Cite interview |interviewer=Burton Grad |title=RDBMS Plenary Session: The Later Years |url=https://archive.computerhistory.org/resources/access/text/2013/05/102701921-05-01-acc.pdf |access-date=2025-05-30 |publisher=Computer History Museum |date=2007-06-12}}</ref> The decade ushered in the age of [[desktop computing]]. The new computers empowered their users with spreadsheets like [[Lotus 1-2-3]] and database software like [[dBASE]]. The dBASE product was lightweight and easy for any computer user to understand out of the box. [[C. Wayne Ratliff]], the creator of dBASE, stated: "dBASE was different from programs like BASIC, C, FORTRAN, and COBOL in that a lot of the dirty work had already been done. The data manipulation is done by dBASE instead of by the user, so the user can concentrate on what he is doing, rather than having to mess with the dirty details of opening, reading, and closing files, and managing space allocation."<ref>[http://www.foxprohistory.org/interview_wayne_ratliff.htm Interview with Wayne Ratliff]. The FoxPro History. Retrieved on 2013-07-12.</ref> dBASE was one of the top selling software titles in the 1980s and early 1990s.
 
===1990s, object-oriented===
*'''Federated database and multi-database'''
By the start of the decade databases had become a billion-dollar industry in about ten years.{{r|rdbmslateryears20070612}} The 1990s, along with a rise in [[object-oriented programming]], saw a growth in how data in various databases were handled. Programmers and designers began to treat the data in their databases as [[object (computer science)|objects]]. That is to say that if a person's data were in a database, that person's attributes, such as their address, phone number, and age, were now considered to belong to that person instead of being extraneous data. This allows for relations between data to be related to objects and their [[property (programming)|attributes]] and not to individual fields.<ref>Development of an object-oriented DBMS; Portland, Oregon, United States; Pages: 472–482; 1986; {{ISBN|0-89791-204-7}}</ref> The term "[[object–relational impedance mismatch]]" described the inconvenience of translating between programmed objects and database tables. [[Object database]]s and [[object–relational database]]s attempt to solve this problem by providing an object-oriented language (sometimes as extensions to SQL) that programmers can use as alternative to purely relational SQL. On the programming side, libraries known as [[object–relational mapping]]s (ORMs) attempt to solve the same problem.
{{Main|Federated database system|Heterogeneous Database System}}
 
===2000s, NoSQL and NewSQL===
::A ''federated database'' is an integrated database that comprises several distinct databases, each with its own DBMS. It is handled as a single database by a [[federated database system|federated database management system]] (FDBMS), which transparently integrates multiple autonomous DBMSs, possibly of different types (which makes it a [[Heterogeneous Database System|heterogeneous database]]), and provides them with an integrated conceptual view. The constituent databases are interconnected via [[computer network]], and may be geographically decentralized.
{{Main|NoSQL|NewSQL}}
 
Database sales grew rapidly during the [[dotcom bubble]] and, after its end, the rise of [[ecommerce]]. The popularity of [[open source]] databases such as [[MySQL]] has grown since 2000, to the extent that Ken Jacobs of Oracle said in 2005 that perhaps "these guys are doing to us what we did to IBM".{{r|rdbmslateryears20070612}}
::Sometime the term ''multi-database'' is used as a synonym to federated database, though it may refer to a less integrated (e.g., without an FDBMS and a managed integrated schema) group of databases that cooperate in a single application. In this case typically [[middleware]] for distribution is used which typically includes an [[atomic commit protocol]] (ACP), e.g., the [[two-phase commit protocol]], to allow [[Distributed transaction|distributed (global) transactions]] (vs. local transactions confined to a single DBMS) across the participating databases.
 
[[XML database]]s are a type of structured document-oriented database that allows querying based on [[XML]] document attributes. XML databases are mostly used in applications where the data is conveniently viewed as a collection of documents, with a structure that can vary from the very flexible to the highly rigid: examples include scientific articles, patents, tax filings, and personnel records.
*'''Graph database'''
{{Main|Graph database}}
 
[[NoSQL]] databases are often very fast,<ref>{{Cite web |last=Jordan |first=Meghan |title=NoSQL Latency |url=https://www.scylladb.com/glossary/nosql-latency/ |access-date=2025-06-09 |website=ScyllaDB |language=en-US}}</ref><ref>{{Cite web |title=SQL vs. NoSQL: Full comparison of features, differences, and more |url=https://www.testgorilla.com/blog/sql-vs-nosql/ |access-date=2025-06-09 |website=www.testgorilla.com |language=en}}</ref> do not require fixed table schemas, avoid join operations by storing [[denormalization|denormalized]] data, and are designed to [[horizontal scaling|scale horizontally]].
::A ''graph database'' is a kind of [[NoSQL (concept)|NoSQL database]] that uses [[Graph (data structure)|graph structures]] with nodes, edges, and properties to represent and store information. General graph databases that can store any graph are distinct from specialized graph databases such as [[triplestore]]s and [[network database model|network databases]].
 
In recent years, there has been a strong demand for massively distributed databases with high partition tolerance, but according to the [[CAP theorem]], it is impossible for a [[distributed system]] to simultaneously provide [[consistency model|consistency]], availability, and partition tolerance guarantees. A distributed system can satisfy any two of these guarantees at the same time, but not all three. For that reason, many NoSQL databases are using what is called [[eventual consistency]] to provide both availability and partition tolerance guarantees with a reduced level of data consistency.
*'''Hypermedia databases'''
 
[[NewSQL]] is a class of modern relational databases that aims to provide the same scalable performance of NoSQL systems for online transaction processing (read-write) workloads while still using SQL and maintaining the [[ACID]] guarantees of a traditional database system.
::The [[World Wide Web]] can be thought of as a database, albeit one spread across millions of independent computing systems. Web browsers "process" this data one page at a time, while [[Web crawlers]] and other software provide the equivalent of database indexes to support search and other activities.
 
==Use cases==
*'''In-memory database'''
{{unreferenced section|date=March 2013}}
{{Main|In-memory database}}
 
Databases are used to support internal operations of organizations and to underpin online interactions with customers and suppliers (see [[Enterprise software]]).
::An ''in-memory database'' (IMDB; also ''main memory database'' or '''MMDB''') is a database that primarily resides in [[main memory]], but typically backed-up by non-volatile computer data storage. Main memory databases are faster than disk databases. Accessing data in memory reduces the I/O reading activity when, for example, querying the data. In applications where response time is critical, such as telecommunications network equipment, main memory databases are often used.<ref>{{cite news| url=http://findarticles.com/p/articles/mi_m0EIN/is_2002_June_24/ai_87694370 | work=Business Wire | title=TeleCommunication Systems Signs up as a Reseller of TimesTen; Mobile Operators and Carriers Gain Real-Time Platform for Location-Based Services | date=2002-06-24}}</ref>
 
Databases are used to hold administrative information and more specialized data, such as engineering data or economic models. Examples include computerized [[library]] systems, [[flight reservation system]]s, computerized [[parts inventory system]]s, and many [[content management system]]s that store [[website]]s as collections of webpages in a database.
*'''Knowledge base'''
{{Main|Knowledge base}}
 
==Classification==
::A '''knowledge base''' (abbreviated '''KB''', '''kb''' or Δ<ref>Argumentation in Artificial Intelligence by Iyad Rahwan, Guillermo R. Simari</ref><ref>{{cite web
One way to classify databases involves the type of their contents, for example: [[bibliographic database|bibliographic]], document-text, statistical, or multimedia objects. Another way is by their application area, for example: accounting, music compositions, movies, banking, manufacturing, or insurance. A third way is by some technical aspect, such as the database structure or interface type. This section lists a few of the adjectives used to characterize different kinds of databases.
 
* An [[in-memory database]] is a database that primarily resides in [[main memory]], but is typically backed-up by non-volatile computer data storage. Main memory databases are faster than disk databases, and so are often used where response time is critical, such as in telecommunications network equipment.
* An [[active database]] includes an event-driven architecture which can respond to conditions both inside and outside the database. Possible uses include security monitoring, alerting, statistics gathering and authorization. Many databases provide active database features in the form of [[database trigger]]s.
* A [[cloud database]] relies on [[cloud technology]]. Both the database and most of its DBMS reside remotely, "in the cloud", while its applications are both developed by programmers and later maintained and used by end-users through a [[web browser]] and [[Open API]]s.
* [[Data warehouse]]s{{citation needed|date=December 2022|reason=Data warehouses usually aren't classified as a type of database.}} archive data from operational databases and often from external sources such as market research firms. The warehouse becomes the central source of data for use by managers and other end-users who may not have access to operational data. For example, sales data might be aggregated to weekly totals and converted from internal product codes to use [[Universal Product Code|UPCs]] so that they can be compared with [[ACNielsen]] data. Some basic and essential components of data warehousing include extracting, analyzing, and [[Data mining|mining]] data, transforming, loading, and managing data so as to make them available for further use.
* A [[deductive database]] combines [[logic programming]] with a relational database.
* A [[distributed database]] is one in which both the data and the DBMS span multiple computers.
* A [[document-oriented database]] is designed for storing, retrieving, and managing document-oriented, or semi structured, information. Document-oriented databases are one of the main categories of NoSQL databases.
* An [[embedded database]] system is a DBMS which is tightly integrated with an application software that requires access to stored data in such a way that the DBMS is hidden from the application's end-users and requires little or no ongoing maintenance.<ref>Graves, Steve. [http://www.embedded-computing.com/articles/id/?2020 "COTS Databases For Embedded Systems"] {{Webarchive|url=https://web.archive.org/web/20071114050734/http://www.embedded-computing.com/articles/id/?2020 |date=2007-11-14 }}, ''Embedded Computing Design'' magazine, January 2007. Retrieved on August 13, 2008.</ref>
*End-user databases consist of data developed by individual end-users. Examples of these are collections of documents, spreadsheets, presentations, multimedia, and other files. Several products{{which|date=December 2022}} exist to support such databases.
* A [[federated database system]] comprises several distinct databases, each with its own DBMS. It is handled as a single database by a federated database management system (FDBMS), which transparently integrates multiple autonomous DBMSs, possibly of different types (in which case it would also be a [[heterogeneous database system]]), and provides them with an integrated conceptual view.
* Sometimes the term ''multi-database'' is used as a synonym for federated database, though it may refer to a less integrated (e.g., without an FDBMS and a managed integrated schema) group of databases that cooperate in a single application. In this case, typically [[Middleware (distributed applications)|middleware]] is used for distribution, which typically includes an atomic commit protocol (ACP), e.g., the [[two-phase commit protocol]], to allow [[Distributed transaction|distributed (global) transactions]] across the participating databases.
* A [[graph database]] is a kind of NoSQL database that uses [[Graph (data structure)|graph structures]] with nodes, edges, and properties to represent and store information. General graph databases that can store any graph are distinct from specialized graph databases such as [[triplestore]]s and [[network database model|network databases]].
* An [[array DBMS]] is a kind of NoSQL DBMS that allows modeling, storage, and retrieval of (usually large) multi-dimensional [[Array data structure|arrays]] such as satellite images and climate simulation output.
* In a [[hypertext]] or [[hypermedia]] database, any word or a piece of text representing an object, e.g., another piece of text, an article, a picture, or a film, can be [[hyperlink]]ed to that object. Hypertext databases are particularly useful for organizing large amounts of disparate information. For example, they are useful for organizing [[online encyclopedia]]s, where users can conveniently jump around the text. The [[World Wide Web]] is thus a large distributed hypertext database.
* A [[knowledge base]] (abbreviated '''KB''', '''kb''' or Δ<ref>Argumentation in Artificial Intelligence by Iyad Rahwan, Guillermo R. Simari</ref><ref>{{cite web
| title = OWL DL Semantics
| url = http://www.obitko.com/tutorials/ontologies-semantic-web/owl-dl-semantics.html
| accessdateaccess-date = 10 December 2010}}</ref>) is a special kind of database for [[knowledge management]], providing the means for the computerized collection, organization, and [[Information retrieval|retrieval]] of [[knowledge]]. Also a collection of data representing problems with their solutions and related experiences.
 
*'''Operational database'''
 
::These databases store detailed data about the operations of an organization. They are typically organized by subject matter, process relatively high volumes of updates using [[transaction (database)|transactions]]. Essentially every major organization on earth uses such databases. Examples include [[Customer relationship management|customer databases]] that record contact, credit, and demographic information about a business' customers, personnel databases that hold information such as salary, benefits, skills data about employees, Enterprise resource planning that record details about product components, parts inventory, and financial databases that keep track of the organization's money, accounting and financial dealings.
 
*'''Parallel database'''
{{Main|Parallel database}}
 
* A [[mobile database]] can be carried on or synchronized from a mobile computing device.
::A '''parallel database''', run by a parallel DBMS, seeks to improve performance through [[Parallel computing|parallelization]] for tasks such as loading data, building indexes and evaluating queries. Parallel databases improve processing and [[input/output]] speeds by using multiple [[central processing unit]]s (CPUs) (including [[multi-core processor]]s) and [[Data storage|storage]] in parallel. In parallel processing, many operations are performed simultaneously, as opposed to serial, sequential processing, where operations are performed with no time overlap.
* [[Operational database]]s store detailed data about the operations of an organization. They typically process relatively high volumes of updates using [[transaction (database)|transactions]]. Examples include [[Customer relationship management|customer databases]] that record contact, credit, and demographic information about a business's customers, personnel databases that hold information such as salary, benefits, skills data about employees, [[enterprise resource planning]] systems that record details about product components, parts inventory, and financial databases that keep track of the organization's money, accounting and financial dealings.
* A [[parallel database]] seeks to improve performance through [[parallelization]] for tasks such as loading data, building indexes and evaluating queries.
 
::The major parallel DBMS architectures (which are induced by the underlying [[Computer hardware|hardware]] architecture are:
::* '''[[Shared memory#In hardware|Shared memory architecture]]''', where multiple processors share the main memory space, as well as other data storage.
::* '''Shared disk architecture''', where each processing unit (typically consisting of multiple processors) has its own main memory, but all units share the other storage.
::* '''[[Shared -nothing architecture]]''', where each processing unit has its own main memory and other storage.
 
* [[Probabilistic database]]s employ [[fuzzy logic]] to draw inferences from imprecise data.
*'''Real-time database'''
* [[Real-time database]]s process transactions fast enough for the result to come back and be acted on right away.
{{Main|Real time database}}
* A [[spatial database]] can store the data with multidimensional features. The queries on such data include ___location-based queries, like "Where is the closest hotel in my area?".
{{Expand section|date=June 2011}}
* A [[temporal database]] has built-in time aspects, for example a temporal data model and a temporal version of [[SQL]]. More specifically the temporal aspects usually include valid-time and transaction-time.
* A [[terminology-oriented database]] builds upon an [[object-oriented database]], often customized for a specific field.
* An [[unstructured data]] database is intended to store in a manageable and protected way diverse objects that do not fit naturally and conveniently in common databases. It may include email messages, documents, journals, multimedia objects, etc. The name may be misleading since some objects can be highly structured. However, the entire possible object collection does not fit into a predefined structured framework. Most established DBMSs now support unstructured data in various ways, and new dedicated DBMSs are emerging.<!-- Isn't this a document-oriented database? If not, clearly distinguish. -->
 
==Database management system==
*'''Spatial database'''
{{Main|Spatial database}}
{{Expand section|date=June 2011}}
 
Connolly and Begg define database management system (DBMS) as a "software system that enables users to define, create, maintain and control access to the database."{{sfn|Connolly|Begg|2014|p=64}} Examples of DBMS's include [[MySQL]], [[MariaDB]], [[PostgreSQL]], [[Microsoft SQL Server]], [[Oracle Database]], and [[Microsoft Access]].
*'''Temporal database'''
{{Main|Temporal database}}
{{Expand section|date=June 2011}}
 
The DBMS acronym is sometimes extended to indicate the underlying [[database model]], with RDBMS for the [[Relational model|relational]], OODBMS for the [[Object model|object (oriented)]] and ORDBMS for the [[object–relational model]]. Other extensions can indicate some other characteristics, such as DDBMS for a distributed database management systems.
*'''Unstructured-data database'''
{{Main|Unstructured data}}
::An unstructured-data database is intended to store in a manageable and protected way diverse objects that do not fit naturally and conveniently in common databases. It may include email messages, documents, journals, multimedia objects etc. The name may be misleading since some objects can be highly structured. However, the entire possible object collection does not fit into a predefined structured framework. Most established DBMSs now support unstructured data in various ways, and new dedicated DBMSs are emerging.
 
The functionality provided by a DBMS can vary enormously. The core functionality is the storage, retrieval and update of data. [[Edgar F. Codd|Codd]] proposed the following functions and services a fully-fledged general purpose DBMS should provide:{{sfn|Connolly|Begg|2014|pp=97–102}}
==Major database usage requirements==
{{Expand section|date=May 2011}}
 
* Data storage, retrieval and update
The major purpose of a database is to provide the information system (in its broadest sense) that utilizes it with the information the system needs according to its own requirements. A certain broad set of requirements refines this general goal. These database requirements translate to requirements for the respective DBMS, to allow conveniently building a proper database for the given application. If this goal is met by a DBMS, then the designers and builders of the specific database can concentrate on the application's aspects, and not deal with building and maintaining the underlying DBMS. Also, since a DBMS is complex and expensive to build and maintain, it is not economical to build such a new tool (DBMS) for every application. Rather it is desired to provide a flexible tool for handling databases for as many as possible given applications, i.e., a general-purpose DBMS.
* User accessible catalog or [[data dictionary]] describing the metadata
* Support for transactions and concurrency
* Facilities for recovering the database should it become damaged
* Support for authorization of access and update of data
* Access support from remote locations
* Enforcing constraints to ensure data in the database abides by certain rules
 
It is also generally to be expected the DBMS will provide a set of utilities for such purposes as may be necessary to administer the database effectively, including import, export, monitoring, defragmentation and analysis utilities.{{sfn|Connolly|Begg|2014|p=102}} The core part of the DBMS interacting between the database and the application interface sometimes referred to as the [[database engine]].
===Functional requirements===
 
Often DBMSs will have configuration parameters that can be statically and dynamically tuned, for example the maximum amount of main memory on a server the database can use. The trend is to minimize the amount of manual configuration, and for cases such as [[embedded database]]s the need to target zero-administration is paramount.
Certain general functional requirements need to be met in conjunction with a database. They describe what is needed to be defined in a database for any specific application.
 
The large major enterprise DBMSs have tended to increase in size and functionality and have involved up to thousands of human years of development effort throughout their lifetime.{{efn|This article quotes a development time of five years involving 750 people for DB2 release 9 alone.{{sfn|Chong|Wang|Dang|Snow|2007}}}}
====Defining the structure of data: Data modeling and Data definition languages====
 
Early multi-user DBMS typically only allowed for the application to reside on the same computer with access via [[Computer terminal|terminals]] or terminal emulation software. The [[client–server architecture]] was a development where the application resided on a client desktop and the database on a server allowing the processing to be distributed. This evolved into a [[multitier architecture]] incorporating [[application server]]s and [[web server]]s with the end user interface via a [[web browser]] with the database only directly connected to the adjacent tier.{{sfn|Connolly|Begg|2014|pp=106–113}}
The database needs to be based on a data model that is sufficiently rich to describe in the database all the needed respective application's aspects. A data definition language exists to describe the databases within the data model. Such language is typically data model specific.
 
A general-purpose DBMS will provide public [[application programming interface]]s (API) and optionally a processor for [[database language]]s such as [[SQL]] to allow applications to be written to interact with and manipulate the database. A special purpose DBMS may use a private API and be specifically customized and linked to a single application. For example, an [[email]] system performs many of the functions of a general-purpose DBMS such as message insertion, message deletion, attachment handling, blocklist lookup, associating messages an email address and so forth however these functions are limited to what is required to handle email.
====Manipulating the data: Data manipulation languages and Query languages====
 
==Application==
A database data model needs support by a sufficiently rich data manipulation language to allow all database manipulations and information generation (from the data) as needed by the respective application. Such language is typically data model specific.
{{main|Database application}}
External interaction with the database will be via an application program that interfaces with the DBMS.{{sfn|Connolly|Begg|2014|p=65}} This can range from a [[Comparison of database tools|database tool]] that allows users to execute SQL queries textually or graphically, to a website that happens to use a database to store and search information.
 
===Application program interface===
====Protecting the data: Setting database security types and levels====
 
A [[programmer]] will [[Computer programming|code]] interactions to the database (sometimes referred to as a [[datasource]]) via an [[application program interface]] (API) or via a [[#database language|database language]]. The particular API or language chosen will need to be supported by DBMS, possibly indirectly via a [[preprocessor]] or a bridging API. Some API's aim to be database independent, [[ODBC]] being a commonly known example. Other common API's include [[JDBC]] and [[ADO.NET]].
The DB needs built-in security means to protect its content (and users) from dangers of unauthorized users (either [[human]]s or [[Computer program|programs]]). Protection is also provided from types of unintentional breach. Security types and levels should be defined by the database owners.
 
==Database languages==
====Describing processes that use the data: Workflow and Business process modeling====
{{Main|Workflow|Business process modeling}}
 
Database languages are special-purpose languages, which allow one or more of the following tasks, sometimes distinguished as [[sublanguage]]s:
Manipulating database data often involves processes of several interdependent steps, at different times (e.g., when different people's interactions are involved; e.g., generating an insurance policy). Data manipulation languages are typically intended to describe what is needed in a single such step. Dealing with multiple steps typically requires writing quite complex programs. Most applications are programmed using common [[programming language]]s and software development tools. However the area of process description has evolved in the frameworks of ''[[workflow]]'' and ''[[Business process modeling|business processes]]'' with supporting languages and software packages which considerably simplify the tasks. Traditionally these frameworks have been out of the scope of common DBMSs, but utilization of them has become common-place, and often they are provided as add-on's to DBMSs.
 
* [[Data control language]] (DCL) – controls access to data;
===Operational requirements===
* [[Data definition language]] (DDL) – defines data types such as creating, altering, or dropping tables and the relationships among them;
* [[Data manipulation language]] (DML) – performs tasks such as inserting, updating, or deleting data occurrences;
* [[Data query language]] (DQL) – allows searching for information and computing derived information.
 
Database languages are specific to a particular data model. Notable examples include:
Operational requirements are needed to be met by a database in order to effectively support an application when operational. Though it typically may be expected that operational requirements are automatically met by a DBMS, in fact it is not so in most of the cases: To be met substantial work of design and tuning is typically needed by database administrators. This is typically done by specific instructions/operations through special database user interfaces and tools, and thus may be viewed as secondary functional requirements (which are not less important than the primary).
 
* SQL combines the roles of data definition, data manipulation, and query in a single language. It was one of the first commercial languages for the relational model, although it departs in some respects from [[Codd's 12 rules|the relational model as described by Codd]] (for example, the rows and columns of a table can be ordered). SQL became a standard of the [[American National Standards Institute]] (ANSI) in 1986, and of the [[International Organization for Standardization]] (ISO) in 1987. The standards have been regularly enhanced since and are supported (with varying degrees of conformance) by all mainstream commercial relational DBMSs.{{sfn|Chapple|2005}}<ref name="IBM-sql">{{cite web | title = Structured Query Language (SQL) | publisher = International Business Machines | url = http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=com.ibm.db2.udb.admin.doc/doc/c0004100.htm | date = October 27, 2006 | access-date = 2007-06-10 }}</ref>
====Availability====
* [[OQL]] is an object model language standard (from the [[Object Data Management Group]]). It has influenced the design of some of the newer query languages like [[JDOQL]] and [[EJB QL]].
* [[XQuery]] is a standard XML query language implemented by XML database systems such as [[MarkLogic]] and [[eXist]], by relational databases with XML capability such as Oracle and Db2, and also by in-memory XML processors such as [[Saxon XSLT|Saxon]].
* [[SQL/XML]] combines [[XQuery]] with SQL.{{sfn|Wagner|2010}}
 
A database language may also incorporate features like:
A DB should maintain needed levels of availability, i.e., the DB needs to be available in a way that a user's action does not need to wait beyond a certain time range before starting executing upon the DB. Availability also relates to failure and recovery from it (see [[Database#Recovery from failure and disaster|Recovery from failure and disaster]] below): Upon failure and during recovery normal availability changes, and special measures are needed to satisfy availability requirements.
 
* DBMS-specific configuration and storage engine management
====Performance====
* Computations to modify query results, like counting, summing, averaging, sorting, grouping, and cross-referencing
* Constraint enforcement (e.g. in an automotive database, only allowing one engine type per car)
* Application programming interface version of the query language, for programmer convenience
 
==Storage==
Users' actions upon the DB should be executed within needed time ranges.
{{Main|Computer data storage|Database engine}}
 
Database storage is the container of the physical materialization of a database. It comprises the ''internal'' (physical) ''level'' in the database architecture. It also contains all the information needed (e.g., [[metadata]], "data about the data", and internal [[data structure]]s) to reconstruct the ''conceptual level'' and ''external level'' from the internal level when needed. Databases as digital objects contain three layers of information which must be stored: the data, the structure, and the semantics. Proper storage of all three layers is needed for future [[Database preservation|preservation]] and longevity of the database.<ref>{{cite journal |author1=Ramalho, J.C. |author2=Faria, L. |author3=Helder, S. |author4=Coutada, M. |date=31 December 2013 |title=Database Preservation Toolkit: A flexible tool to normalize and give access to databases |journal=Biblioteca Nacional de Portugal |publisher=University of Minho |url=https://core.ac.uk/display/55635702?algorithmId=15&similarToDoc=55614406&similarToDocKey=CORE&recSetID=f3ffea4d-1504-45e9-bfd6-a0495f5c8f9c&position=2&recommendation_type=same_repo&otherRecs=55614407,55635702,55607961,55613627,2255664 }}</ref> Putting data into permanent storage is generally the responsibility of the [[database engine]] a.k.a. "storage engine". Though typically accessed by a DBMS through the underlying operating system (and often using the operating systems' [[file system]]s as intermediates for storage layout), storage properties and configuration settings are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look at the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database).
====Isolation between users====
 
Some DBMSs support specifying which [[character encoding]] was used to store data, so multiple encodings can be used in the same database.
When multiple users access the database concurrently the actions of a user should be uninterrupted and unaffected by actions of other users. These concurrent actions should maintain the DB's consistency (i.e., keep the DB from corruption).
 
Various low-level database storage structures are used by the storage engine to serialize the data model so it can be written to the medium of choice. Techniques such as indexing may be used to improve performance. Conventional storage is row-oriented, but there are also [[column-oriented DBMS|column-oriented]] and [[correlation database]]s.
====Recovery from failure and disaster====
{{Main|Data recovery|Disaster recovery}}
All computer systems, including DBMSs, are prone to failures for many reasons (both software and hardware related). Failures typically corrupt the DB, typically to the extent that it is impossible to repair it without special measures. The DBMS should provide automatic [[Data recovery|recovery]] from failure procedures that repair the DB and return it to a well defined state.
 
====BackupMaterialized and restore=views===
{{Main|BackupMaterialized view}}
Sometimes it is desired to bring a database back to a previous state (for many reasons, e.g., cases when the database is found corrupted due to a software error, or if it has been updated with erroneous data). To achieve this a '''backup''' operation is done occasionally or continuously, where each desired database state (i.e., the values of its data and their embedding in database's data structures) is kept within dedicated backup files (many techniques exist to do this effectively). When this state is needed, i.e., when it is decided by a database administrator to bring the database back to this state (e.g., by specifying this state by a desired point in time when the database was in this state), these files are utilized to '''restore''' that state.
 
Often storage redundancy is employed to increase performance. A common example is storing ''materialized views'', which consist of frequently needed ''external views'' or query results. Storing such views saves the expensive computing them each time they are needed. The downsides of materialized views are the overhead incurred when updating them to keep them synchronized with their original updated database data, and the cost of storage redundancy.
====Data independence====
{{Main|Data independence}}
 
===Replication===
Data independence pertains to a database's [[Systems Development Life Cycle|life cycle]] (see [[Database#Database building, maintaining, and tuning|Database building, maintaining, and tuning]] below). It strongly impacts the convenience and cost of maintaining an application and its database, and has been the major motivation for the emergence and success of the Relational model, as well as the convergence to a common database architecture. In general the term "data independence" means that changes in the database's structure do not require changes in its application's computer programs, and that changes in the database at a certain architectural level (see below) do not affect the database's levels above. Data independence is achieved to a great extent in contemporary DBMS, but it is not completely attainable, and achieved at different degrees for different types of database structural changes.
{{See also|Replication (computing)#Database replication}}
 
Occasionally a database employs storage redundancy by [[database object]]s replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to the same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases, the entire database is replicated.
==Major database functional areas==
 
===Virtualization===
The functional areas are domains and subjects that have evolved in order to provide proper answers and solutions to the functional requirements above.
With [[data virtualization]], the data used remains in its original locations and real-time access is established to allow analytics across multiple sources. This can aid in resolving some technical difficulties such as compatibility problems when combining data from various platforms, lowering the risk of error caused by faulty data, and guaranteeing that the newest data is used. Furthermore, avoiding the creation of a new database containing personal information can make it easier to comply with privacy regulations. However, with data virtualization, the connection to all necessary data sources must be operational as there is no local copy of the data, which is one of the main drawbacks of the approach.<ref name="Paiho">{{cite journal | doi=10.1049/smc2.12044 | title=Opportunities of collected city data for smart cities | year=2022 | last1=Paiho | first1=Satu | last2=Tuominen | first2=Pekka | last3=Rökman | first3=Jyri | last4=Ylikerälä | first4=Markus | last5=Pajula | first5=Juha | last6=Siikavirta | first6=Hanne | journal=IET Smart Cities | volume=4 | issue=4 | pages=275–291 | s2cid=253467923 | doi-access=free | issn=2631-7680}}</ref>
 
===Data models=Security==
{{Main|Data model|Database model}}
 
A data model is an abstract structure that provides the means to effectively describe specific data structures needed to model an application. As such a data model needs sufficient expressive power to capture the needed aspects of applications. These applications are often typical to commercial companies and other organizations (like manufacturing, human-resources, stock, banking, etc.). For effective utilization and handling it is desired that a data model is relatively simple and intuitive. This may be in conflict with high expressive power needed to deal with certain complex applications. Thus any popular general-purpose data model usually well balances between being intuitive and relatively simple, and very complex with high expressive power. The application's semantics is usually not explicitly expressed in the model, but rather implicit (and detailed by documentation external to the model) and hinted to by data item types' names (e.g., "part-number") and their connections (as expressed by generic data structure types provided by each specific model).
 
====Early data models====
These models were popular in the 1960s, 1970s, but nowadays can be found primarily in old [[legacy system]]s. They are characterized primarily by being [[Navigational database|navigational]] with strong connections between their logical and physical representations, and deficiencies in [[data independence]].
 
=====Hierarchical model=====
{{Main|Hierarchical database model}}
 
In the Hierarchical model different record types (representing real-world entities) are embedded in a predefined hierarchical ([[Tree (data structure)|tree]]-like) structure. This hierarchy is used as the physical order of records in storage. Record access is done by navigating through the data structure using [[pointer (computer programming)|pointer]]s combined with sequential accessing.
 
This model has been supported primarily by the IBM [[Information Management System|IMS]] DBMS, one of the earliest DBMSs. Various limitations of the model have been compensated at later IMS versions by additional logical hierarchies imposed on the base physical hierarchy.
 
=====Network model=====
{{Main|Network model (database)}}
 
In this model a hierarchical relationship between two record types (representing real-world entities) is established via the ''set'' construct. A set consists of circular [[linked list]]s where one record type, the set owner or parent, appears once in each circle, and a second record type, the subordinate or child, may appear multiple times in each circle. In this way a hierarchy may be established between any two record types, e.g., type A is the owner of B. At the same time another set may be defined where B is the owner of A. Thus all the sets comprise a general [[directed graph]] (ownership defines a direction), or ''network'' construct. Access to records is either sequential (usually in each record type) or by navigation in the circular linked lists.
 
This model is more general and powerful than the hierarchical, and has been the most popular before being replaced by the Relational model. It has been [[standardization|standardized]] by [[CODASYL]]. Popular DBMS products that utilized it were [[Cincom Systems]]' Total and [[Cullinet]]'s [[IDMS]].
 
=====Inverted file model=====
{{Main|Inverted index}}
 
An ''inverted file'' or ''[[inverted index]]'' of a first file, by a field in this file (the inversion field), is a second file in which this field is the key. A record in the second file includes a key and pointers to records in the first file where the inversion field has the value of the key. This is also the logical structure of contemporary [[Index (database)|database indexes]]. The related ''Inverted file data model'' utilizes inverted files of primary database files to efficiently directly access needed records in these files.
 
Notable for using this data model is the [[ADABAS]] DBMS of [[Software AG]], introduced in 1970. ADABAS has gained considerable customer base and exists and supported until today. In the 1980s it has adopted the Relational model and SQL in addition to its original tools and languages.
 
====Relational model====
{{Main|Relational model}}
{{Expand section|date=June 2011}}
 
====Entity-relationship model====
{{Main|Entity-relationship model}}
{{Expand section|date=June 2011}}
 
====Object model====
{{Main|Object model|Object database}}
{{Expand section|date=June 2011}}
In {{as of | 2009 | alt = recent years}}, the [[object-oriented]] paradigm has been applied in areas such as engineering and spatial databases, telecommunications and in various scientific domains. The conglomeration of object oriented programming and database technology led to this new kind of database. These databases attempt to bring the database world and the application-programming world closer together, in particular by ensuring that the database uses the same [[type system]] as the application program. This aims to avoid the overhead (sometimes referred to as the ''[[Object-Relational impedance mismatch|impedance mismatch]]'') of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce key ideas of object programming, such as [[encapsulation (computer science)|encapsulation]] and [[polymorphism (computer science)|polymorphism]], into the world of databases.
 
A variety of these ways have been tried{{By whom|date=October 2009}} for storing objects in a database. Some products have approached the problem from the application-programming side, by making the objects manipulated by the program [[Persistence (computer science)|persistent]]. This also typically requires the addition of some kind of query language, since conventional programming languages do not provide language-level functionality for finding objects based on their information content. Others{{Which?|date=November 2009}} have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.
 
====Object relational model====
{{Main|Object-relational database}}
{{Expand section|date=June 2011}}
 
====XML as a database data model====
{{Main|XML database|XML|Databaseless_design}}
{{Expand section|date=June 2011}}
 
====Other database models====
{{Expand section|date=June 2011}}
Products offering a more general data model than the relational model are sometimes classified as [[post-relational]].<ref name="CONR">''Introducing databases'' by Stephen Chu, in Conrick, M. (2006) ''Health informatics: transforming healthcare with technology'', Thomson, ISBN 0-17-012731-1, p. 69.</ref> Alternate terms include "hybrid database", "Object-enhanced RDBMS" and others. The data model in such products incorporates [[relation (database)|relations]] but is not constrained by [[E.F. Codd]]'s Information Principle, which requires that{{quote| all information in the database must be cast explicitly in terms of values in relations and in no other way<ref>{{cite journal
|journal=Intelligent Enterprise
|url=http://intelligent-enterprise.informationweek.com/db_area/archives/1999/990106/online1.jhtml;jsessionid=Y2UNK1QFKXMBTQE1GHRSKH4ATMY32JVN
|date=June 1, 1999
|volume=2 |issue=8
|title=When's an extension not an extension?
|last=Date |first=C. J. |authorlink=Christopher J. Date}}</ref>}}
 
Some of these extensions to the relational model integrate concepts from technologies that pre-date the relational model. For example, they allow representation of a directed graph with [[tree data structure|trees]] on the nodes. The German company ''sones'' implements this concept in its [[GraphDB]].
 
Some post-relational products extend relational systems with non-relational features. Others arrived in much the same place by adding relational features to pre-relational systems. Paradoxically, this allows products that are historically pre-relational, such as [[Pick operating system|PICK]] and [[MUMPS]], to make a plausible claim to be post-relational.
 
The resource space model (RSM) is a non-relational data model based on multi-dimensional classification.<ref>{{cite book |last= Zhuge |first=H. |title=The Web Resource Space Model |publisher=Springer |year=2008 |isbn=978-0-387-72771-4 |series=Web Information Systems Engineering and Internet Technologies Book Series |volume=4}}</ref>
 
===Database languages===
{{Main|Data definition language|Data manipulation language|Query language}}
 
Database languages are dedicated programming languages, tailored and utilized to
*define a database (i.e., its specific data types and the relationships among them),
*manipulate its content (e.g., insert new data occurrences, and update or delete existing ones), and
*query it (i.e., request information: compute and retrieve any information based on its data).
Database languages are data-model-specific, i.e., each language assumes and is based on a certain structure of the data (which typically differs among different data models). They typically have commands to instruct execution of the desired operations in the database. Each such command is equivalent to a complex expression (program) in a regular programming language, and thus programming in dedicated (database) languages simplifies the task of handling databases considerably. An expressions in a database language is automatically transformed (by a [[compiler]] or interpreter, as regular programming languages) to a proper computer program that runs while accessing the database and providing the needed results. The following are notable examples:
 
====SQL for the Relational model====
{{Main|SQL}}
A major Relational model language supported by all the relational DBMSs and a standard.
 
SQL was one of the first commercial languages for the relational model. Despite not adhering to [[Codd's 12 rules|the relational model as described by Codd]], it has become the most widely used database language.<ref name="SQL-Fundamentals">{{cite web
| last = Chapple
| first = Mike
| title = SQL Fundamentals
| work = Databases
| publisher = About.com
| url = http://databases.about.com/od/sql/a/sqlfundamentals.htm
| accessdate = 2009-01-28 }}</ref><ref name="IBM-sql">{{cite web | title = Structured Query Language (SQL) | publisher = International Business Machines | url = http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=com.ibm.db2.udb.admin.doc/doc/c0004100.htm | date = October 27, 2006 | accessdate = 2007-06-10 }}</ref> Though often described as, and to a great extent is a [[Declarative programming|declarative language]], SQL also includes [[Procedural programming|procedural]] elements. SQL became a [[Technical standard|standard]] of the [[American National Standards Institute]] (ANSI) in 1986, and of the [[International Organization for Standards]] (ISO) in 1987. Since then the standard has been enhanced several times with added features. However, issues of SQL code portability between major RDBMS products still exist due to lack of full compliance with, or different interpretations of the standard. Among the reasons mentioned are the large size, and incomplete specification of the standard, as well as [[vendor lock-in]].
 
====OQL for the Object model====
{{Main|OQL}}
An [[Object database|Object model]] language standard (by the [[Object Data Management Group]]) that has influenced the design of some of the newer query languages like [[JDOQL]] and [[EJB QL]], though they cannot be considered as different flavors of OQL.
 
====XQuery for the XML model====
{{Main|XQuery|XML}}
XQuery is an [[XML]] based database language (also named [[XQL]]).
 
===Database architecture===
 
Database architecture (to be distinguished from DBMS architecture; [[Database#DBMS architecture: major DBMS components|see below]]) may be viewed, to some extent, as an extension of [[Data modeling]]. It is used to conveniently answer requirements of different end-users from a same database, as well as for other benefits. For example, a financial department of a company needs the payment details of all employees as part of the company's expenses, but not other many details about employees, that are the interest of the [[human resources]] department. Thus different departments need different ''views'' of the company's database, that both include the employees' payments, possibly in a different level of detail (and presented in different visual forms). To meet such requirement effectively database architecture consists of three levels: ''external'', ''conceptual'' and ''internal''. Clearly separating the three levels was a major feature of the relational database model implementations that dominate 21st century databases.<ref name=date31>{{harvnb|Date|1990|pages=31–32}}</ref>
 
*The '''external level''' defines how each end-user type understands the organization of its respective relevant data in the database, i.e., the different needed end-user views. A single database can have any number of views at the external level.
*The '''conceptual level''' unifies the various external views into a coherent whole, global view.<ref name=date31/> It provides the common-denominator of all the external views. It comprises all the end-user needed generic data, i.e., all the data from which any view may be derived/computed. It is provided in the simplest possible way of such generic data, and comprises the back-bone of the database. It is out of the scope of the various database end-users, and serves database application developers and defined by database administrators that build the database.
*The '''Internal level''' (or ''Physical level'') is as a matter of fact part of the database implementation inside a DBMS (see Implementation section below). It is concerned with cost, performance, scalability and other operational matters. It deals with storage layout of the conceptual level, provides supporting storage-structures like [[Index (database)|indexes]], to enhance performance, and occasionally stores data of individual views ([[materialized view]]s), computed from generic data, if performance justification exists for such redundancy. It balances all the external views' performance requirements, possibly conflicting, in attempt to optimize the overall database usage by all its end-uses according to the database goals and priorities.
 
All the three levels are maintained and updated according to changing needs by database administrators who often also participate in the database design.
 
The above three-level database architecture also relates to and being motivated by the concept of ''[[data independence]]'' which has been described for long time as a desired database property and was one of the major initial driving forces of the Relational model. In the context of the above architecture it means that changes made at a certain level do not affect definitions and software developed with higher level interfaces, and are being incorporated at the higher level automatically. For example, changes in the internal level do not affect application programs written using conceptual level interfaces, which saves substantial change work that would be needed otherwise.
 
In summary, the conceptual is a level of indirection between internal and external. On one hand it provides a common view of the database, independent of different external view structures, and on the other hand it is uncomplicated by details of how the data is stored or managed (internal level). In principle every level, and even every external view, can be presented by a different data model. In practice usually a given DBMS uses the same data model for both the external and the conceptual levels (e.g., relational model). The internal level, which is hidden inside the DBMS and depends on its implementation (see Implementation section below), requires a different level of detail and uses its own data structure types, typically different in nature from the structures of the external and conceptual levels which are exposed to DBMS users (e.g., the data models above): While the external and conceptual levels are focused on and serve DBMS users, the concern of the internal level is effective implementation details.
 
===Database security===
{{Main|Database security}}
{{Contradicts other|date=March 2013|1=Database security}}
 
[[Database security]] deals with all various aspects of protecting the database content, its owners, and its users. It ranges from protection from intentional unauthorized database uses to unintentional database accesses by unauthorized entities (e.g., a person or a computer program).
 
Database access control deals with controlling who (a person or a certain computer program) are allowed to access what information in the database. The information may comprise specific database objects (e.g., record types, specific records, data structures), certain computations over certain objects (e.g., query types, or specific queries), or using specific access paths to the former (e.g., using specific indexes or other data structures to access information). Database access controls are set by special authorized (by the database owner) personnel that uses dedicated protected security DBMS interfaces.
The following are major areas of database security (among many others).
 
====Access control====
{{Main|Access control}}
 
Database access control deals with controlling who (a person or a certain computer program) is allowed to access what information in the database. The information may comprise specific database objects (e.g., record types, specific records, data structures), certain computations over certain objects (e.g., query types, or specific queries), or utilizing specific access paths to the former (e.g., using specific indexes or other data structures to access information).
 
Database access controls are set by special authorized (by the database owner) personnel that uses dedicated protected security DBMS interfaces.
 
====Data security====
{{Main|Data security|Encryption}}
 
The definition of data security varies and may overlap with other database security aspects. Broadly it deals with protecting specific chunks of data, both physically (i.e., from corruption, or destruction, or removal; e.g., see [[Physical security]]), or the interpretation of them, or parts of them to meaningful information (e.g., by looking at the strings of bits that they comprise, concluding specific valid credit-card numbers; e.g., see [[Data encryption]]).
 
====Database audit====
{{Main|Database audit}}
 
Database audit primarily involves monitoring that no security breach, in all aspects, has happened. If security breach is discovered then all possible corrective actions are taken.
 
===Database design===
{{Main|Database design}}
 
This may be managed directly on an individual basis, or by the assignment of individuals and [[Privilege (Computing)|privileges]] to groups, or (in the most elaborate models) through the assignment of individuals and groups to roles which are then granted entitlements. Data security prevents unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called "subschemas". For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data. If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases.
Database design is done before building it to meet needs of end-users within a given application/information-system that the database is intended to support. The database design defines the needed data and data structures that such a database comprises. A design is typically carried out according to the common three architectural levels of a database (see Database architecture above). First, the conceptual level is designed, which defines the over-all picture/view of the database, and reflects all the real-world elements (entities) the database intends to model, as well as the relationships among them. On top of it the external level, various views of the database, are designed according to (possibly completely different) needs of specific end-user types. More external views can be added later. External views requirements may modify the design of the conceptual level (i.e., add/remove entities and relationships), but usually a well designed conceptual level for an application well supports most of the needed external views. The conceptual view also determines the internal level (which primarily deals with data layout in storage) to a great extent. External views requirement may add supporting storage structures, like materialized views and indexes, for enhanced performance. Typically the internal layer is optimized for top performance, in an average way that takes into account performance requirements (possibly conflicting) of different external views according to their relative importance. While the conceptual and external levels design can usually be done independently of any DBMS (DBMS-independent design software packages exist, possibly with interfaces to some specific popular DBMSs), the internal level design highly relies on the capabilities and internal data structure of the specific DBMS utilized (see the Implementation section below).
 
[[Data security]] in general deals with protecting specific chunks of data, both physically (i.e., from corruption, or destruction, or removal; e.g., see [[physical security]]), or the interpretation of them, or parts of them to meaningful information (e.g., by looking at the strings of bits that they comprise, concluding specific valid credit-card numbers; e.g., see [[data encryption]]).
A common way to carry out conceptual level design is to use the [[Entity-relationship model]] (ERM) (both the basic one, and with possible enhancement that it has gone over), since it provides a straightforward, intuitive perception of an application's elements and semantics. An alternative approach, which preceded the ERM, is using the Relational model and dependencies (mathematical relationships) among data to [[Database normalization|normalize]] the database, i.e., to define the ("optimal") relations (data record or tupple types) in the database. Though a large body of research exists for this method it is more complex, less intuitive, and not more effective than the ERM method. Thus normalization is less utilized in practice than the ERM method.
 
Change and access logging records who accessed which attributes, what was changed, and when it was changed. Logging services allow for a forensic [[database audit]] later by keeping a record of access occurrences and changes. Sometimes application-level code is used to record changes rather than leaving this in the database. Monitoring can be set up to attempt to detect security breaches. Therefore, organizations must take database security seriously because of the many benefits it provides. Organizations will be safeguarded from security breaches and hacking activities like firewall intrusion, virus spread, and ransom ware. This helps in protecting the company's essential information, which cannot be shared with outsiders at any cause.<ref>{{Cite book |title=Continuous auditing : theory and application |date=2018 |author1=David Y. Chan |author2=Victoria Chiu |author3=Miklos A. Vasarhelyi |isbn=978-1-78743-413-4 |edition=1st |___location=Bingley, UK |oclc=1029759767 |publisher=Emerald Publishing }}</ref>
The ERM may be less subtle than normalization in several aspects, but it captures the main needed dependencies which are induced by [[Unique key|keys]]/identifiers of entities and relationships. Also the ERM inherently includes the important inclusion dependencies (i.e., an entity instance that does not exist (has not been explicitly inserted) cannot appear in a relationship with other entities) which usually have been ignored in normalization.<ref name=Makowsky1986>Johann A. Makowsky, [[Victor M. Markowitz]] and Nimrod Rotics, 1986: [http://www.springerlink.com/content/p67756164r127m18/ "Entity-relationship consistency for relational schemas"] ''Proceedings of the 1986 Conference on Database Theory'' (ICDT '86), Lecture Notes in Computer Science, 1986, Volume 243/1986, pp. 306-322, Springer, DOI: 10.1007/3-540-17187-8_43</ref> In addition the ERM allows entity type generalization (the [[Is-a]] relationship) and implied property (attribute) [[Inheritance (object-oriented programming)|inheritance]] (similarly to the that found in the [[Object model]]).
 
==Transactions and concurrency==
Another aspect of database design is its security. It involves both defining [[access control]] to database objects (e.g., Entities, Views) as well as defining security levels and methods for the data itself (See Database security above).
{{Further|Concurrency control}}
 
[[Database transactions]] can be used to introduce some level of [[fault tolerance]] and [[data integrity]] after recovery from a [[Crash (computing)|crash]]. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring or releasing a [[Lock (database)|lock]], etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands).
====Entities and relationships====
{{Main|Entity-relationship model}}
 
The acronym [[ACID]] describes some ideal properties of a database transaction: [[Atomicity (database systems)|atomicity]], [[Consistency (database systems)|consistency]], [[Isolation (database systems)|isolation]], and [[Durability (database systems)|durability]].
The most common database design methods are based on the [[Entity relationship model]] (ERM, or ER model). This model views the world in a simplistic but very powerful way: It consists of "Entities" and the "Relationships" among them. Accordingly a database consists of ''entity'' and ''relationship'' types, each with defined ''attributes'' (field types) that model concrete entities and relationships. Modeling a database in this way typically yields an effective one with desired properties (as in some ''normal forms''; see normalization below). Such models can be translated to any other data model required by any specific DBMS for building an effective database.
 
==Migration==
====Database normalization====
{{MainSee also|Data migration#Database normalizationmigration}}
 
A database built with one DBMS is not portable to another DBMS (i.e., the other DBMS cannot run it). However, in some situations, it is desirable to migrate a database from one DBMS to another. The reasons are primarily economical (different DBMSs may have different [[Total cost of ownership|total costs of ownership]] or TCOs), functional, and operational (different DBMSs may have different capabilities). The migration involves the database's transformation from one DBMS type to another. The transformation should maintain (if possible) the database related application (i.e., all related application programs) intact. Thus, the database's conceptual and external architectural levels should be maintained in the transformation. It may be desired that also some aspects of the architecture internal level are maintained. A complex or large database migration may be a complicated and costly (one-time) project by itself, which should be factored into the decision to migrate. This is in spite of the fact that tools may exist to help migration between specific DBMSs. Typically, a DBMS vendor provides tools to help import databases from other popular DBMSs.
In the design of a [[relational database]], the process of organizing database relations to minimize redundancy is called ''normalization''. The goal is to produce well-structured relations so that additions, deletions, and modifications of a field can be made in just one relation (table) without worrying about appearance and update of the same field in other relations. The process is [[algorithm]]ic and based on dependencies (mathematical relations) that exist among relations' field types. The process result is bringing the database relations into a certain [[Database normalization#Normal forms|"normal form"]]. Several normal forms exist with different properties.
 
===Database buildingBuilding, maintaining, and tuning===
{{Main|Database tuning}}
 
After designing a database for an application arrives, the next stage ofis building the database. Typically, an appropriate [[Database#General-purpose DBMS|general-purpose DBMS]] can be selected to be utilizedused for this purpose. A DBMS provides the needed [[user interface]]s to be utilizedused by database administrators to define the needed application's data structures within the DBMS's respective data model. Other user interfaces are used to select needed DBMS parameters (like security related, storage allocation parameters, etc.).
 
When the database is ready (all its data structures and other needed components are defined), it is typically populated with initial application's data (database initialization, which is typically a distinct project; in many cases using specialized DBMS interfaces that support bulk insertion) before making it operational. In some cases, the database becomes operational while empty fromof application's data, and data are accumulated alongduring its operation.
 
After completing building the database andis makingcreated, itinitialized operationaland arrivespopulated theit databaseneeds maintenanceto stage:be maintained. Various database parameters may need changeschanging and the database may need to be tuned ([[Database tuning|tuning]]) for better performance,; application's data structures may be changed or added, new related application programs may be written to add to the application's functionality, etc.
 
==Backup and restore==
===Miscellaneous areas===
{{Main|Backup}}
====Database migration between DBMSs====
Sometimes it is desired to bring a database back to a previous state (for many reasons, e.g., cases when the database is found corrupted due to a software error, or if it has been updated with erroneous data). To achieve this, a backup operation is done occasionally or continuously, where each desired database state (i.e., the values of its data and their embedding in database's data structures) is kept within dedicated backup files (many techniques exist to do this effectively). When it is decided by a database administrator to bring the database back to this state (e.g., by specifying this state by a desired point in time when the database was in this state), these files are used to restore that state.
:See also ''[[Data migration#Database migration|Database migration]]'' in ''[[Data migration]]''
 
==Static analysis==
A database built with one DBMS is not [[portable]] to another DBMS (i.e., the other DBMS cannot run it). However, in some situations it is desirable to move, migrate a database from one DBMS to another. The reasons are primarily economical (different DBMSs may have different [[Total cost of ownership|total costs of ownership]]-TCO), functional, and operational (different DBMSs may have different capabilities). The migration involves the database's transformation from one DBMS type to another. The transformation should maintain (if possible) the database related application (i.e., all related application programs) intact. Thus, the database's conceptual and external architectural levels should be maintained in the transformation. It may be desired that also some aspects of the architecture internal level are maintained. A complex or large database migration may be a complicated and costly (one-time) project by itself, which should be factored into the decision to migrate. This in spite of the fact that tools may exist to help migration between specific DBMS. Typically a DBMS vendor provides tools to help importing databases from other popular DBMSs.
Static analysis techniques for software verification can be applied also in the scenario of query languages. In particular, the *[[Abstract interpretation]] framework has been extended to the field of query languages for relational databases as a way to support sound approximation techniques.{{sfn|Halder|Cortesi|2011}} The semantics of query languages can be tuned according to suitable abstractions of the concrete ___domain of data. The abstraction of relational database systems has many interesting applications, in particular, for security purposes, such as fine-grained access control, watermarking, etc.
 
==Miscellaneous features==
==Implementation: Database management systems==
or '''How database usage requirements are met'''
{{Main|Database management system}}
A ''database management system'' (DBMS) is a system that allows to build and maintain databases, as well as to utilize their data and retrieve information from it. A DBMS defines the database type that it supports, as well as its functionality and operational capabilities. A DBMS provides the internal processes for external applications built on them. The end-users of some such specific application are usually exposed only to that application and do not directly interact with the DBMS. Thus end-users enjoy the effects of the underlying DBMS, but its internals are completely invisible to end-users. Database designers and database administrators interact with the DBMS through dedicated interfaces to build and maintain the applications' databases, and thus need some more knowledge and understanding about how DBMSs operate and the DBMSs' external interfaces and tuning parameters.
 
Other DBMS features might include:
A DBMS consists of [[software]] that operates databases, providing storage, access, security, backup and other facilities to meet needed requirements. DBMSs can be categorized according to the [[database model]](s) that they support, such as relational or XML, the type(s) of computer they support, such as a server cluster or a mobile phone, the [[query language]](s) that access the database, such as [[SQL]] or [[XQuery]], performance trade-offs, such as maximum scale or maximum speed or others. Some DBMSs cover more than one entry in these categories, e.g., supporting multiple query languages. Database software typically support the [[Open Database Connectivity]] (ODBC) standard which allows the database to integrate (to some extent) with other databases.
 
* [[Database log]]s – This helps in keeping a history of the executed functions.
The development of a mature general-purpose DBMS typically takes several years and many man-years. Developers of DBMS typically update their product to follow and take advantage of progress in computer and storage technologies. Several DBMS products like Oracle and IBM DB2 have been in on-going development since the 1970s-1980s. Since DBMSs comprise a significant [[Economy|economical]] [[market]], computer and storage vendors often take into account DBMS requirements in their own development plans.
* Graphics component for producing graphs and charts, especially in a data warehouse system.
* [[Query optimizer]] – Performs query optimization on every query to choose an efficient ''[[query plan]]'' (a partial order (tree) of operations) to be executed to compute the query result. May be specific to a particular storage engine.
* Tools or hooks for database design, application programming, application program maintenance, database performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a DBMS and related database may span computers, networks, and storage units) and related database mapping (especially for a distributed DBMS), storage allocation and database layout monitoring, storage migration, etc.
 
Increasingly, there are calls for a single system that incorporates all of these core functionalities into the same build, test, and deployment framework for database management and source control. Borrowing from other developments in the software industry, some market such offerings as "[[DevOps]] for database".<ref>{{cite web
===DBMS architecture: major DBMS components===
| title = How Database Administration Fits into DevOps |author= Ben Linders |date= January 28, 2016
| url = https://www.infoq.com/news/2016/01/database-administration-devops
| access-date = April 15, 2017 }}</ref>
 
==Design and modeling==
DBMS [[Software architecture|architecture]] specifies its components (including descriptions of their functions) and their interfaces. DBMS architecture is distinct from database architecture. The following are major DBMS components:
{{Main|Database design}}
[[File:Process of database design v2.png|upright=2|thumb]]
The first task of a database designer is to produce a [[conceptual data model]] that reflects the structure of the information to be held in the database. A common approach to this is to develop an [[entity–relationship model]], often with the aid of drawing tools. Another popular approach is the [[Unified Modeling Language]]. A successful data model will accurately reflect the possible state of the external world being modeled: for example, if people can have more than one phone number, it will allow this information to be captured. Designing a good conceptual data model requires a good understanding of the application ___domain; it typically involves asking deep questions about the things of interest to an organization, like "can a customer also be a supplier?", or "if a product is sold with two different forms of packaging, are those the same product or different products?", or "if a plane flies from New York to Dubai via Frankfurt, is that one flight or two (or maybe even three)?". The answers to these questions establish definitions of the terminology used for entities (customers, products, flights, flight segments) and their relationships and attributes.
 
Producing the conceptual data model sometimes involves input from [[Business process modeling|business processes]], or the analysis of [[workflow]] in the organization. This can help to establish what information is needed in the database, and what can be left out. For example, it can help when deciding whether the database needs to hold historic data as well as current data.
*'''DBMS external [[Interface (computing)|interface]]s''' - They are the means to communicate with the DBMS (both ways, to and from the DBMS) to perform all the operations needed for the DBMS. These can be operations on a database, or operations to operate and manage the DBMS. For example:
::- Direct database operations: defining data types, assigning security levels, updating data, querying the database, etc.
::- Operations related to DBMS operation and management: backup and restore, database recovery, security monitoring, database storage allocation and database layout configuration monitoring, performance monitoring and tuning, etc.
:An external interface can be either a ''[[user interface]]'' (e.g., typically for a database administrator), or an ''[[application programming interface]]'' (API) used for communication between an application program and the DBMS.
*'''Database language engines''' (or '''processors''') - Most operations upon databases are performed through expression in Database languages (see above). Languages exist for data definition, data manipulation and queries (e.g., SQL), as well as for specifying various aspects of security, and more. Language expressions are fed into a DBMS through proper interfaces. A language engine processes the language expressions (by a compiler or language interpreter) to extract the intended database operations from the expression in a way that they can be executed by the DBMS.
*'''[[Query optimizer]]''' - Performs [[query optimization]] on every query to choose for it the most efficient ''[[query plan]]'' (a partial order (tree) of operations) to be executed to compute the query result.
*'''[[Database engine]]''' - Performs the received database operations on the database objects, typically at their higher-level representation.
*'''Storage engine''' - translates the operations to low-level operations on the storage [[bit]]s. In some references the Storage engine is viewed as part of the Database engine.
*'''[[Database transaction|Transaction]] engine''' - for correctness and reliability purposes most DBMS internal operations are performed encapsulated in transactions (see below). Transactions can also be specified externally to the DBMS to encapsulate a group of operations. The transaction engine tracks all the transactions and manages their execution according to the transaction rules (e.g., proper concurrency control, and proper ''commit'' or ''abort'' for each).
*'''DBMS management and operation component''' - Comprises many components that deal with all the DBMS management and operational aspects like performance monitoring and tuning, backup and restore, recovery from failure, security management and monitoring, database storage allocation and database storage layout monitoring, etc.
 
Having produced a conceptual data model that users are happy with, the next stage is to translate this into a [[Database schema|schema]] that implements the relevant data structures within the database. This process is often called logical database design, and the output is a [[logical data model]] expressed in the form of a schema. Whereas the conceptual data model is (in theory at least) independent of the choice of database technology, the logical data model will be expressed in terms of a particular database model supported by the chosen DBMS. (The terms ''data model'' and ''database model'' are often used interchangeably, but in this article we use ''data model'' for the design of a specific database, and ''database model'' for the modeling notation used to express that design).
===Database storage===
{{Main|Computer data storage}}
 
The most popular database model for general-purpose databases is the relational model, or more precisely, the relational model as represented by the SQL language. The process of creating a logical database design using this model uses a methodical approach known as [[Database normalization|normalization]]. The goal of normalization is to ensure that each elementary "fact" is only recorded in one place, so that insertions, updates, and deletions automatically maintain consistency.
Database storage is the container of the physical materialization of a database. It comprises the ''Internal'' (physical) ''level'' in the database architecture. It also contains all the information needed (e.g., [[metadata]], "data about the data", and internal [[data structure]]s) to reconstruct the ''Conceptual level'' and ''External level'' from the Internal level when needed. It is not part of the DBMS but rather manipulated by the DBMS (by its Storage engine; see above) to manage the database that resides in it. Though typically accessed by a DBMS through the underlying [[Operating system]] (and often utilizing the operating systems' [[File system]]s as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database).
 
The final stage of database design is to make the decisions that affect performance, scalability, recovery, security, and the like, which depend on the particular DBMS. This is often called ''physical database design'', and the output is the [[physical data model]]. A key goal during this stage is [[data independence]], meaning that the decisions made for performance optimization purposes should be invisible to end-users and applications. There are two types of data independence: Physical data independence and logical data independence. Physical design is driven mainly by performance requirements, and requires a good knowledge of the expected workload and access patterns, and a deep understanding of the features offered by the chosen DBMS.
In principle the database storage can be viewed as a [[linear]] [[address space]], where every bit of data has its unique address in this address space. Practically only a very small percentage of addresses is kept as initial reference points (which also requires storage), and most of the database data is accessed by indirection using displacement calculations (distance in bits from the reference points) and data structures which define access paths (using pointers) to all needed data in effective manner, optimized for the needed data access operations.
 
Another aspect of physical database design is security. It involves both defining [[access control]] to database objects as well as defining security levels and methods for the data itself.
====Data====
=====Coding the data and Error-correcting codes=====
{{Main|Code|Character encoding|Error detection and correction|Cyclic redundancy check}}
*Data is [[Encoding|encoded]] by assigning a bit pattern to each language [[alphabet]] character, [[Numerical digit|digit]], other numerical patterns, and [[multimedia]] object. Many standards exist for encoding (e.g., [[ASCII]], [[JPEG]], [[MPEG-4]]).
*By adding bits to each encoded unit, the redundancy allows both to detect errors in coded data and to correct them based on mathematical algorithms. Errors occur regularly in low probabilities due to [[random]] bit value flipping, or "physical bit fatigue," loss of the physical bit in storage its ability to maintain distinguishable value (0 or 1), or due to errors in inter or intra-computer communication. A random bit flip (e.g., due to random [[radiation]]) is typically corrected upon detection. A bit, or a group of malfunctioning physical bits (not always the specific defective bit is known; group definition depends on specific storage device) is typically automatically fenced-out, taken out of use by the device, and replaced with another functioning equivalent group in the device, where the corrected bit values are restored (if possible). The [[Cyclic redundancy check]] (CRC) method is typically used in storage for error detection.
 
=====Data compression==Models===
{{Main|DataDatabase compressionmodel}}
[[File:Database models.jpg|thumb|upright=2|Collage of five types of database models]]
Data compression methods allow in many cases to represent a string of bits by a shorter bit string ("compress") and reconstruct the original string ("decompress") when needed. This allows to utilize substantially less storage (tens of percents) for many types of data at the cost of more computation (compress and decompress when needed). Analysis of trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data in a database compressed or not.
 
A database model is a type of data model that determines the logical structure of a database and fundamentally determines in which manner [[data]] can be stored, organized, and manipulated. The most popular example of a database model is the relational model (or the SQL approximation of relational), which uses a table-based format.
Data compression is typically controlled through the DBMS's data definition interface, but in some cases may be a [[Default (computer science)|default]] and automatic.
 
Common logical data models for databases include:
=====Data encryption=====
*[[Navigational database]]s
{{Main|Cryptography}}
**[[Hierarchical database model]]
For security reasons certain types of data (e.g., credit-card information) may be kept [[encryption|encrypted]] in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots (taken either via unforeseen vulnerabilities in a DBMS, or more likely, by bypassing it).
**[[Network model]]
**[[Graph database]]
*[[Relational model]]
*[[Entity–relationship model]]
**[[Enhanced entity–relationship model]]
*[[Object database|Object model]]
*[[Document-oriented database|Document model]]
*[[Entity–attribute–value model]]
*[[Star schema]]
 
An object–relational database combines the two related structures.
Data encryption is typically controlled through the DBMS's data definition interface, but in some cases may be a default and automatic.
 
[[Physical data model]]s include:
====Data storage types====
*[[Inverted index]]
*[[Flat file]]
 
Other models include:
This collection of bits describes both the contained database data and its related metadata (i.e., data that describes the contained data and allows computer programs to manipulate the database data correctly). The size of a database can nowadays be tens of [[Terabyte]]s, where a [[byte]] is eight bits. The physical materialization of a bit can employ various existing technologies, while new and improved technologies are constantly under development. Common examples are:
*[[Multidimensional database|Multidimensional model]]
*''Magnetic medium'' (e.g., in [[Magnetic disk]]) - Orientation of [[magnetic field]] in magnetic regions on a surface of material (two orientation directions, for 0 and 1).
*[[Array DBMS|Array model]]
*''[[Dynamic random-access memory]]'' (DRAM) - State of a miniature [[electronic circuit]] consisting of few [[transistor]]s (among millions nowadays) in an [[integrated circuit]] (two states for 0 and 1).
*[[Multivalue model]]
These two examples are respectively for two major storage types:
*''Nonvolatile storage'' can maintain its bit states (0s and 1s) without electrical power supply, or when power supply is interrupted;
*''[[Volatile storage]]'' loses its bit values when power supply is interrupted (i.e., its content is erased).
 
Specialized models are optimized for particular types of data:
Sophisticated storage units, which can, in fact, be effective dedicated parallel computers that support a large amount of nonvolatile storage, typically must include also components with volatile storage. Some such units employ [[Battery (electricity)|batteries]] that can provide power for several hours in case of external power interruption (e.g., see the [[EMC Symmetrix]]) and thus maintain the content of the volatile storage parts intact. Just before such a device's batteries lose their power the device typically automatically backs-up its volatile content portion (into nonvolatile) and shuts off to protect its data.
*[[XML database]]
*[[Semantic data model|Semantic model]]
*[[Content store]]
*[[Event store]]
*[[Time series database|Time series model]]
 
===External, conceptual, and internal views===
Databases are usually too expensive (in terms of importance and needed investment in resources, e.g., time, money, to build them) to be lost by a power interruption. Thus at any point in time most of their content resides in nonvolatile storage. Even if for operational reason very large portions of them reside in volatile storage (e.g., tens of [[Gigabyte]]s in volatile memory, for in-memory databases), most of this is backed-up in nonvolatile storage. A relatively small portion of this, which temporarily may not have nonvolatile backup, can be reconstructed by proper automatic database recovery procedures after volatile storage content loss.
 
[[Image:Traditional View of Data SVG.svg|thumb|upright=1.15|Traditional view of data<ref name="ITL93">itl.nist.gov (1993) [http://www.itl.nist.gov/fipspubs/idef1x.doc ''Integration Definition for Information Modeling (IDEFIX)''] {{Webarchive|url=https://web.archive.org/web/20131203223034/http://www.itl.nist.gov/fipspubs/idef1x.doc |date=2013-12-03 }}. 21 December 1993.</ref>]]
More examples of storage types:
*Volatile storage can be found in processors, computer memory (e.g., DRAM), etc.
*Non-volatile storage types include [[ROM]], [[EPROM]], [[Hard disk drive]]s, [[Flash memory]] and [[Flash drive|drives]], [[Storage array]]s, etc.
 
A database management system provides three views of the database data:
=====Storage metrics=====
{{Expand section|date=July 2011}}
Databases always use several types of storage when operational (and implied several when idle). Different types may significantly differ in their properties, and the optimal mix of storage types is determined by the types and quantities of operations that each storage type needs to perform, as well as considerations like physical space and energy consumption and dissipation (which may become critical for a large database). Storage types can be categorized by the following attributes:
*Volatile/Nonvolatile.
*Cost of the medium (e.g., per Megabyte), Cost to operate (cost of energy consumed per unit time).
*Access speed (e.g., bytes per second).
*Granularity&nbsp;— from fine to coarse (e.g., size in bytes of access operation).
*Reliability (the probability of spontaneous bit value change under various conditions).
*Maximal possible number of writes (of any specific bit or specific group of bits; could be constrained by the technology used (e.g., "write once" or "write twice"), or due to "physical bit fatigue," loss of ability to distinguish between the 0, 1 states due to many state changes (e.g., in Flash memory)).
*Power needed to operate (Energy per time; energy per byte accessed), Energy efficiency, Heat to dissipate.
*Packaging density (e.g., realistic number of bytes per volume unit)
 
* The '''external level''' defines how each group of end-users sees the organization of data in the database. A single database can have any number of views at the external level.
=====Protecting storage device content: Device mirroring (replication) and RAID=====
* The '''conceptual level''' (or ''logical level'') unifies the various external views into a compatible global view.{{sfn|Date|2003|pages=31–32}} It provides the synthesis of all the external views. It is out of the scope of the various database end-users, and is rather of interest to database application developers and database administrators.
{{Main|Disk mirroring|RAID}}
* The '''internal level''' (or ''physical level'') is the internal organization of data inside a DBMS. It is concerned with cost, performance, scalability and other operational matters. It deals with storage layout of the data, using storage structures such as [[Index (database)|indexes]] to enhance performance. Occasionally it stores data of individual views ([[materialized view]]s), computed from generic data, if performance justification exists for such redundancy. It balances all the external views' performance requirements, possibly conflicting, in an attempt to optimize overall performance across all activities.
:See also [[Storage replication|Disk storage replication]]
 
While there is typically only one conceptual and internal view of the data, there can be any number of different external views. This allows users to see database information in a more business-related way rather than from a technical, processing viewpoint. For example, a financial department of a company needs the payment details of all employees as part of the company's expenses, but does not need details about employees that are in the interest of the [[human resources]] department. Thus different departments need different ''views'' of the company's database.
While a group of bits malfunction may be resolved by error detection and correction mechanisms (see above), storage device malfunction requires different solutions. The following solutions are commonly used and valid for most storage devices:
* '''Device [[Disk mirroring|mirroring]] (replication)''' - A common solution to the problem is constantly maintaining an identical copy of device content on another device (typically of a same type). The downside is that this doubles the storage, and both devices (copies) need to be updated simultaneously with some overhead and possibly some delays. The upside is possible concurrent read of a same data group by two independent processes, which increases performance. When one of the replicated devices is detected to be defective, the other copy is still operational, and is being utilized to generate a new copy on another device (usually available operational in a pool of stand-by devices for this purpose).
* '''Redundant array of independent disks''' ('''[[RAID]]''') - This method generalizes the device mirroring above by allowing one device in a group of N devices to fail and be replaced with content restored (Device mirroring is RAID with N=2). RAID groups of N=5 or N=6 are common. N>2 saves storage, when comparing with N=2, at the cost of more processing during both regular operation (with often reduced performance) and defective device replacement.
 
The three-level database architecture relates to the concept of ''data independence'' which was one of the major initial driving forces of the relational model.{{sfn|Date|2003|pages=31–32}} The idea is that changes made at a certain level do not affect the view at a higher level. For example, changes in the internal level do not affect application programs written using conceptual level interfaces, which reduces the impact of making physical changes to improve performance.
Device mirroring and typical RAID are designed to handle a single device failure in the RAID group of devices. However, if a second failure occurs before the RAID group is completely repaired from the first failure, then data can be lost. The probability of a single failure is typically small. Thus the probability of two failures in a same RAID group in time proximity is much smaller (approximately the probability squared, i.e., multiplied by itself). If a database cannot tolerate even such smaller probability of data loss, then the RAID group itself is replicated (mirrored). In many cases such mirroring is done geographically remotely, in a different storage array, to handle also recovery from disasters (see disaster recovery above).
 
The conceptual view provides a level of indirection between internal and external. On the one hand it provides a common view of the database, independent of different external view structures, and on the other hand it abstracts away details of how the data are stored or managed (internal level). In principle every level, and even every external view, can be presented by a different data model. In practice usually a given DBMS uses the same data model for both the external and the conceptual levels (e.g., relational model). The internal level, which is hidden inside the DBMS and depends on its implementation, requires a different level of detail and uses its own types of data structure types.
====Database storage layout====
 
==Research==
Database bits are laid-out in storage in data-structures and grouping that can take advantage of both known effective algorithms to retrieve and manipulate them and the storage own properties. Typically the storage itself is design to meet requirements of various areas that extensively utilize storage, including databases. A DBMS in operation always simultaneously utilizes several storage types (e.g., memory, and external storage), with respective layout methods.
 
Database technology has been an active research topic since the 1960s, both in [[academia]] and in the research and development groups of companies (for example [[IBM Research]]). Research activity includes [[Database theory|theory]] and development of [[prototype]]s. Notable research topics have included [[Data model|models]], the atomic transaction concept, related [[concurrency control]] techniques, query languages and [[query optimization]] methods, [[RAID]], and more.
=====Database storage hierarchy=====
 
The database research area has several dedicated [[academic journal]]s (for example, ''[[ACM Transactions on Database Systems]]''-TODS, ''[[Data and Knowledge Engineering]]''-DKE) and annual [[Academic conference|conferences]] (e.g., [[Association for Computing Machinery|ACM]] [[SIGMOD]], ACM [[Symposium on Principles of Database Systems|PODS]], [[VLDB conference|VLDB]], [[IEEE]] ICDE).
A database, while in operation, resides simultaneously in several types of storage. By the nature of contemporary computers most of the database part inside a computer that hosts the DBMS resides (partially replicated) in volatile storage. Data (pieces of the database) that are being processed/manipulated reside inside a processor, possibly in processor's [[cache]]s. This data are being read from/written to memory, typically through a computer [[Bus (computing)|bus]] (so far typically volatile storage components). Computer memory is communicating data (transferred to/from) external storage, typically through [[standard]] storage interfaces or networks (e.g., [[fibre channel]], [[iSCSI]]). A [[Disk array|storage array]], a common external storage unit, typically has storage hierarchy of it own, from a fast cache, typically consisting of (volatile and fast) [[DRAM]], which is connected (again via standard interfaces) to drives, possibly with different speeds, like [[flash drive]]s and magnetic [[disk drive]]s (non-volatile). The drives may be connected to [[magnetic tape]]s, on which typically the least active parts of a large database may reside, or database backup generations.
 
Typically a correlation exists currently between storage speed and price, while the faster storage is typically volatile.
 
=====Data structures=====
{{Main|Database storage structures}}
{{Expand section|date=June 2011}}
 
A data structure is an abstract construct that embeds data in a well defined manner. An efficient data structure allows to manipulate the data in efficient ways. The data manipulation may include data insertion, deletion, updating and retrieval in various modes. A certain data structure type may be very effective in certain operations, and very ineffective in others. A data structure type is selected upon DBMS development to best meet the operations needed for the types of data it contains. Type of data structure selected for a certain task typically also takes into consideration the type of storage it resides in (e.g., speed of access, minimal size of storage chunk accessed, etc.). In some DBMSs database administrators have the flexibility to select among options of data structures to contain user data for performance reasons. Sometimes the data structures have selectable parameters to tune the database performance.
 
Databases may store data in many data structure types.<ref name="Physical Database Design">{{harvnb|Lightstone|Teorey|Nadeau|2007}}</ref> Common examples are the following:
 
* ordered/unordered [[flat file database|flat files]]
* [[hash table]]s
* [[B+ tree]]s
* [[ISAM]]
* [[heap (data structure)|heaps]]
 
=====Application data and DBMS data=====
 
A typical DBMS cannot store the data of the application it serves alone. In order to handle the application data the DBMS need to store this data in data structures that comprise specific data by themselves. In addition the DBMS needs its own data structures and many types of bookkeeping data like indexes and [[Database log|log]]s. The DBMS data is an integral part of the database and may comprise a substantial portion of it.
 
=====Database indexing=====
{{Main|Index (database)}}
 
[[Index (database)|Indexing]] is a technique for improving database performance. The many types of indexes share the common property that they reduce the need to examine every entry when running a query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a [[binary search]] with an adjacent reference to the ___location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date.)
 
Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as the database grows and database usage evolves.
 
Given a particular query, the DBMS' query optimizer is responsible for devising the most efficient strategy for finding matching data.
 
Indexes can speed up data access, but they consume space in the database, and must be updated each time the data is altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost.
 
=====Database data clustering=====
 
In many cases substantial performance improvement is gained if different types of database objects that are usually utilized together are laid in storage in proximity, being ''clustered''. This usually allows to retrieve needed related objects from storage in minimum number of input operations (each sometimes substantially time consuming). Even for in-memory databases clustering provides performance advantage due to common utilization of large caches for input-output operations in memory, with similar resulting behavior.
 
For example it may be beneficial to cluster a record of an ''item'' in stock with all its respective ''order'' records. The decision of whether to cluster certain objects or not depends on the objects' utilization statistics, object sizes, caches sizes, storage types, etc. In a relational database clustering the two respective relations "Items" and "Orders" results in saving the expensive execution of a [[Join (relational algebra)|Join]] operation between the two relations whenever such a join is needed in a query (the join result is already ready in storage by the clustering, available to be utilized).
 
=====Database materialized views=====
{{Main|Materialized view}}
 
Often storage redundancy is employed to increase performance. A common example is storing ''[[materialized view]]s'', which are frequently-needed ''External views''. Storing such external views saves expensive computing of them each time they are needed.
 
=====Database and database object replication=====
{{Main|Database replication}}
:See also ''[[Database#Replication|Replication]]'' below
 
Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases the entire database is replicated.
 
===Database transactions===
{{Main|Database transaction}}
As with every software system, a DBMS that operates in a faulty computing environment is prone to failures of many kinds. A failure can corrupt the respective database unless special measures are taken to prevent this. A DBMS achieves certain levels of fault tolerance by encapsulating operations within transactions. The concept of a ''database transaction'' (or ''atomic transaction'') has evolved in order to enable both a well understood database system behavior in a faulty environment where crashes can happen any time, and ''recovery'' from a crash to a well understood database state. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands).
 
====ACID rules====
{{Main|ACID}}
Every database transaction obeys the following rules:
*'''[[Atomicity (database systems)|Atomicity]]''' - Either the effects of all or none of its operations remain ("all or nothing" semantics) when a transaction is completed (''committed'' or ''aborted'' respectively). In other words, to the outside world a committed transaction appears (by its effects on the database) to be indivisible, atomic, and an aborted transaction does not leave effects on the database at all, as if never existed.
*'''[[Database Consistency (computer science)|Consistency]]''' - Every transaction must leave the database in a consistent (correct) state, i.e., maintain the predetermined integrity rules of the database (constraints upon and among the database's objects). A transaction must transform a database from one consistent state to another consistent state (however, it is the responsibility of the transaction's programmer to make sure that the transaction itself is correct, i.e., performs correctly what it intends to perform (from the application's point of view) while the predefined integrity rules are enforced by the DBMS). Thus since a database can be normally changed only by transactions, all the database's states are consistent. An aborted transaction does not change the database state it has started from, as if it never existed (atomicity above).
*'''[[Isolation (database systems)|Isolation]]''' - Transactions cannot interfere with each other (as an end result of their executions). Moreover, usually (depending on concurrency control method) the effects of an incomplete transaction are not even visible to another transaction. Providing isolation is the main goal of concurrency control.
*'''[[Durability (computer science)|Durability]]''' - Effects of successful (committed) transactions must persist through [[Crash (computing)|crash]]es (typically by recording the transaction's effects and its commit event in a [[non-volatile memory]]).
 
====Isolation, concurrency control, and locking====
{{Main|Concurrency control|Isolation (database systems)|Two-phase locking}}
'''Isolation''' provides the ability for multiple users to operate on the database at the same time without corrupting the data.
 
*'''[[Concurrency control]]''' comprises the underlying mechanisms in a DBMS which handle isolation and guarantee related correctness. It is heavily utilized by the Database and Storage engines (see above) both to guarantee the correct execution of concurrent transactions, and (different mechanisms) the correctness of other DBMS processes. The transaction-related mechanisms typically constrain the database data access operations' timing ([[Schedule (computer science)|transaction schedules]]) to certain orders characterized as the [[Serializability]] and [[Recoverability|Recoverabiliry]] schedule properties. Constraining database access operation execution typically means reduced performance (rates of execution), and thus concurrency control mechanisms are typically designed to provide the best performance possible under the constraints. Often, when possible without harming correctness, the serializability property is compromised for better performance. However, recoverability cannot be compromised, since such typically results in a quick database integrity violation.
*'''[[Two-phase locking|Locking]]''' is the most common transaction concurrency control method in DBMSs, used to provide both serializability and recoverability for correctness. In order to access a database object a transaction first needs to acquire a lock for this object. Depending on the access operation type (e.g., reading or writing an object) and on the lock type, acquiring the lock may be blocked and postponed, if another transaction is holding a lock for that object.
 
===Query optimization===
{{Main|Query optimization|Query optimizer}}
 
A query is a request for information from a database. It can be as simple as "finding the address of a person with SS# 123-123-1234," or more complex like "finding the average salary of all the employed married men in California between the ages 30 to 39, that earn less than their wives." Queries results are generated by accessing relevant database data and manipulating it in a way that yields the requested information. Since database structures are complex, in most cases, and especially for not-very-simple queries, the needed data for a query can be collected from a database by accessing it in different ways, through different data-structures, and in different orders. Each different way typically requires different processing time. Processing times of a same query may have large variance, from a fraction of a second to hours, depending on the way selected. The purpose of '''query optimization''', which is an automated process, is to find the way to process a given query in minimum time. The large possible variance in time justifies performing query optimization, though finding the exact optimal way to execute a query, among all possibilities, is typically very complex, time consuming by itself, may be too costly, and often practically impossible. Thus query optimization typically tries to approximate the optimum by comparing several common-sense alternatives to provide in a reasonable time a "good enough" plan which typically does not deviate much from the best possible result.
 
===DBMS support for the development and maintenance of a database and its application===
{{Expand section|date=May 2011}}
A DBMS typically intends to provide convenient environment to develop and later maintain an application built around its respective database type. A DBMS either provides such tools, or allows integration with such external tools. Examples for tools relate to database design, application programming, application program maintenance, database performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a DBMS and related database may span computers, networks, and storage units) and related database mapping (especially for a distributed DBMS), storage allocation and database layout monitoring, storage migration, etc.
 
==See also==
{{for outline|Outline of databases}}
{{Wikipedia books|Databases}}
{{Div col|colwidth=18em}}
* [[Comparison of database tools]]
* [[Comparison of object database management systems]]
* [[Comparison of object–relational database management systems]]
* [[Comparison of relational database management systems]]
* [[Comparison of database tools]]
* [[Data hierarchy]]
* [[Data bank]]
* [[Data store]]
* [[Database theory]]
* [[Database testing]]
* [[Database-centric architecture]]
* [[Database testingDatalog]]
* [[Database-as-IPC]]
* [[DBOS]]
* [[Flat-file database]]
* [[INP (database)]]
* [[Journal of Database Management]]
* Question-focused dataset
{{Div col end}}
 
==Notes==
{{notelist}}
 
==References==
{{Reflist|colwidth=30em}}
 
==Sources==
 
{{refbegin|30em}}
* {{cite journal
| first1 = Charles W.
| last1 = Bachman
| author1-link = Charles Bachman
| title = The Programmer as Navigator
| date = 1973
| volume = 16
| issue = 11
| pages = 653–658
| journal = Communications of the ACM
| doi = 10.1145/355611.362534
| doi-access= free
}}
* {{cite book
| last1 = Beynon-Davies
| first1 = Paul
| date = 2003
| title = Database Systems
| edition = 3rd
| publisher = Palgrave Macmillan
| isbn = 978-1403916013
}}
* {{cite web
|last1 = Chapple
|first1 = Mike
|title = SQL Fundamentals
|date = 2005
|work = Databases
|publisher = About.com
|url = http://databases.about.com/od/sql/a/sqlfundamentals.htm
|access-date = 28 January 2009
|archive-url = https://web.archive.org/web/20090222225300/http://databases.about.com/od/sql/a/sqlfundamentals.htm
|archive-date = 22 February 2009
|url-status = live
}}
* {{cite tech report
| url = https://deepblue.lib.umich.edu/bitstream/handle/2027.42/4163/bac0294.0001.001.pdf?sequence=5&isAllowed=y
| title = Description of a set-theoretic data structure
| first1 = David L.
| last1 = Childs
| date = 1968a
| id = Technical Report 3
| series = CONCOMP (Research in Conversational Use of Computers) Project
| publisher = University of Michigan
| author-link = David L. Childs
}}
* {{cite tech report
| url = https://deepblue.lib.umich.edu/bitstream/handle/2027.42/4164/bac0293.0001.001.pdf?sequence=5&isAllowed=y
| title = Feasibility of a set-theoretic data structure: a general structure based on a reconstituted definition
| first1 = David L.
| last1 = Childs
| date = 1968b
| id = Technical Report 6
| series = CONCOMP (Research in Conversational Use of Computers) Project
| publisher = University of Michigan
| author-link = David L. Childs
}}
* {{cite book
|first1 = Raul F.
|last1 = Chong
|first2 = Xiaomei
|last2 = Wang
|first3 = Michael
|last3 = Dang
|first4 = Dwaine R.
|last4 = Snow
|chapter = Introduction to DB2
|chapter-url = http://www.ibmpressbooks.com/articles/article.asp?p=1163083
|title = Understanding DB2: Learning Visually with Examples
|edition = 2nd
|year = 2007
|publisher = IBM Press Pearson plc
|isbn = 978-0131580183
|access-date = 17 March 2013
}}
* {{cite journal
|last1 = Codd
|first1 = Edgar F.
|author1-link = Edgar F. Codd
|date = 1970
|url = http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
|title = A Relational Model of Data for Large Shared Data Banks
|journal = Communications of the ACM
|volume = 13
|issue = 6
|pages = 377–387
|doi = 10.1145/362384.362685
|s2cid = 207549016
}}
* {{cite book
| last1 = Connolly
| first1 = Thomas M.
| last2 = Begg
| first2 = Carolyn E.
| date = 2014
| title = Database Systems – A Practical Approach to Design Implementation and Management
| edition = 6th
| isbn = 978-1292061184
| publisher = Pearson
}}
* {{cite book
|last1 = Date
|first1 = C. J.
|author1-link = Christopher J. Date
|title = An Introduction to Database Systems
|edition = 8th
|publisher = Pearson
|year = 2003
|isbn = 978-0321197849
|url-access = registration
|url = https://archive.org/details/introductiontoda0000date
}}
* {{cite journal
|first1 = Raju
|last1 = Halder
|first2 = Agostino
|last2 = Cortesi
|url = http://www.dsi.unive.it/~cortesi/paperi/CL2012.pdf
|doi = 10.1016/j.cl.2011.10.004
|title = Abstract Interpretation of Database Query Languages
|journal = Computer Languages, Systems & Structures
|volume = 38
|issue = 2
|year = 2011
|pages = 123–157
|issn = 1477-8424
|archive-date = 2024-11-23
|access-date = 2015-06-18
|archive-url = https://web.archive.org/web/20241123005344/https://www.dsi.unive.it/~cortesi/paperi/CL2012.pdf
|url-status = dead
}}
* {{cite conference
|first1 = William
|last1 = Hershey
|first2 = Carol
|last2 = Easthope
|url = https://docs.google.com/open?id=0B4t_NX-QeWDYNmVhYjAwMWMtYzc3ZS00YjI0LWJhMjgtZTYyODZmNmFkNThh
|title = A set theoretic data structure and retrieval language
|conference = Spring Joint Computer Conference, May 1972
|journal = ACM SIGIR Forum
|volume = 7
|issue = 4
|year = 1972
|pages = 45–55
|doi = 10.1145/1095495.1095500
}}
* {{cite book
| last1 = Nelson
| first1 = Anne Fulcher
| last2 = Nelson
| first2 = William Harris Morehead
| date = 2001
| title = Building Electronic Commerce: With Web Database Constructions
| publisher = Prentice Hall
| isbn = 978-0201741308
}}
* {{cite journal
|first1 = Ken
|last1 = North
|url = http://drdobbs.com/blogs/database/228700616
|title = Sets, Data Models and Data Independence
|journal = Dr. Dobb's
|date = 10 March 2010
|archive-url = https://web.archive.org/web/20121024064523/http://www.drdobbs.com/database/sets-data-models-and-data-independence/228700616
|archive-date = 24 October 2012
|url-status = live
}}
* {{cite book
|last1 = Tsitchizris
|first1 = Dionysios C.
|last2 = Lochovsky
|first2 = Fred H.
|date = 1982
|title = Data Models
|isbn = 978-0131964280
|publisher = Prentice–Hall
|url = https://archive.org/details/datamodels00tsic
}}
* {{cite book
|first1 = Jeffrey
|last1 = Ullman
|first2 = Jennifer
|last2 = Widom
|date = 1997
|title = A First Course in Database Systems
|publisher = Prentice–Hall
|isbn = 978-0138613372
|url = https://archive.org/details/firstcourseindat00ullm
}}
* {{Citation
| title = SQL/XML:2006 – Evaluierung der Standardkonformität ausgewählter Datenbanksysteme
| last1 = Wagner
| first1 = Michael
| year = 2010
| publisher = Diplomica Verlag
| isbn = 978-3836696098
}}
{{refend}}
 
==Further reading==
* [[Ling Liu (computer scientist)|Ling Liu]] and Tamer M. Özsu (Eds.) (2009). "[https://www.springer.com/computer/database+management+&+information+retrieval/book/978-0-387-49616-0 Encyclopedia of Database Systems], 4100 p.&nbsp;60 illus. {{ISBN|978-0-387-49616-0}}.
{{refbegin}}
* Ling Liu and Tamer M. Özsu (Eds.) (2009). "[http://www.springer.com/computer/database+management+&+information+retrieval/book/978-0-387-49616-0 Encyclopedia of Database Systems], 4100 p.&nbsp;60 illus. ISBN 978-0-387-49616-0. Table of Content available at http://refworks.springer.com/mrw/index.php?id=1217
* Beynon-Davies, P. (2004). Database Systems. 3rd Edition. Palgrave, Houndmills, Basingstoke.
* Connolly, Thomas and Carolyn Begg. ''Database Systems.'' New York: Harlow, 2002.
* {{cite book|last=Date |first=C. J. |authorlink=Christopher J. Date |title=An Introduction to Database Systems, Fifth Edition |publisher=Addison Wesley |year= 2003 |isbn=0-201-51381-1 |ref=harv}}
* Gray, J. and Reuter, A. ''Transaction Processing: Concepts and Techniques'', 1st edition, Morgan Kaufmann Publishers, 1992.
* Kroenke, David M. and David J. Auer. ''Database Concepts.'' 3rd ed. New York: Prentice, 2007.
* [[Raghu Ramakrishnan]] and [[Johannes Gehrke]], ''[http://pages.cs.wisc.edu/~dbbook/ Database Management Systems]''.
* {{cite book|last1=Lightstone |first1=S. |first2=T. |last2=Teorey |first3=T. |last3=Nadeau |title=Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more |publisher=Morgan Kaufmann Press |year=2007 |isbn=0-12-369389-6 |ref=harv}}
* [[Abraham Silberschatz]], [[Henry F. Korth]], S. Sudarshan, ''[http://www.db-book.com/ Database System Concepts]''.
* Teorey, T.; Lightstone, S. and Nadeau, T. ''Database Modeling & Design: Logical Design'', 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
* {{cite book|last1=Lightstone |first1=S. |first2=T. |last2=Teorey |first3=T. |last3=Nadeau |title=Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more |publisher=Morgan Kaufmann Press |year=2007 |isbn=978-0-12-369389-1 |ref=none}}
{{refend}}
* Teorey, T.; Lightstone, S. and Nadeau, T. ''Database Modeling & Design: Logical Design'', 4th edition, Morgan Kaufmann Press, 2005. {{ISBN|0-12-685352-5}}.
* ''[https://www.youtube.com/@CMUDatabaseGroup/playlists CMU Database courses playlist]''
* ''[https://ocw.mit.edu/courses/6-830-database-systems-fall-2010/ MIT OCW 6.830 | Fall 2010 | Database Systems]''
* ''[https://cs186berkeley.net Berkeley CS W186]''
 
==External links==
*[http://www.fileextension.org/DB DB File extension]&nbsp;– information about files with the DB extension
{{Sister project links|wikt=database|commons=Category:Database|v=Topic:Databases}}
{{Sister bar|wikt=database|commons=Category:Database|v=Topic:Databases}}
*{{dmoz|Computers/Data_Formats/Database}}
 
{{Navboxes
|list=
{{Computer science}}
{{Database}}
{{Databases}}
{{Database models}}
{{Data warehouse}}
{{Semantic Web}}
}}
{{Authority control}}
 
{{DEFAULTSORT:Database}}
[[Category:Databases| ]]
[[Category:Database management systems| ]]
[[Category:Database theory]]
 
[[af:Databasis]]
[[ar:قاعدة بيانات]]
[[an:Base de datos]]
[[az:Verilənlər bazası]]
[[bn:ডেটাবেজ]]
[[be:База дадзеных]]
[[be-x-old:База зьвестак]]
[[bar:Datenbank]]
[[bs:Baza podataka]]
[[br:Stlennvon]]
[[bg:База данни]]
[[ca:Base de dades]]
[[cs:Databáze]]
[[da:Database]]
[[de:Datenbank]]
[[et:Andmebaas]]
[[el:Βάση δεδομένων]]
[[es:Base de datos]]
[[eo:Datumbazo]]
[[eu:Datu-base]]
[[fa:پایگاه داده‌ها]]
[[fr:Base de données]]
[[ga:Bunachar sonraí]]
[[gl:Base de datos]]
[[ko:데이터베이스]]
[[hy:Տվյալների բազա]]
[[hi:डेटाबेस]]
[[hr:Baza podataka]]
[[id:Basis data]]
[[ia:Base de datos]]
[[is:Gagnagrunnur]]
[[it:Database]]
[[he:בסיס נתונים]]
[[jv:Basis data]]
[[ka:მონაცემთა ბაზა]]
[[kk:Мәліметтер базасы]]
[[ku:Danegeh]]
[[lv:Datubāze]]
[[lt:Duomenų bazė]]
[[hu:Adatbázis]]
[[ml:ഡാറ്റാബേസ്]]
[[arz:قاعدة بيانات]]
[[ms:Pangkalan data]]
[[my:ဒေတာဘေ့စ်]]
[[nl:Database]]
[[ja:データベース]]
[[no:Database]]
[[mhr:Ыҥпалыпого]]
[[uz:Ma'lumotlar Bazasi]]
[[ps:توكبنسټ]]
[[pl:Baza danych]]
[[pt:Banco de dados]]
[[ro:Bază de date]]
[[ru:База данных]]
[[sq:Baza e të dhënave]]
[[scn:Databbasi]]
[[si:දත්ත සංචිතය]]
[[simple:Database]]
[[sk:Databáza]]
[[sl:Podatkovna baza]]
[[ckb:بنکەدراوە]]
[[sr:База података]]
[[sh:Baza podataka]]
[[fi:Tietokanta]]
[[sv:Databas]]
[[tl:Database]]
[[ta:தரவுத்தளம்]]
[[th:ฐานข้อมูล]]
[[tr:Veri tabanı]]
[[uk:База даних]]
[[ur:Database]]
[[vi:Cơ sở dữ liệu]]
[[wa:Båze di dnêyes]]
[[zh:数据库]]