Database transaction schedule: Difference between revisions

Content deleted Content added
m Example: used more precise word
Order of actions: Rewrote information to make it easier to read
Line 71:
 
=== Order of actions ===
In general, operationsOperations of transactions in a schedule can interleave (i.e., transactions can be executed [[Concurrency (computer science)|concurrently]]), whilebut time orders between operations in each transaction must remain unchanged as implied by the transaction's program. Since not always time orders between all operations of all transactions matter and need to be specified, aThe schedule is, in general, a ''[[partial order]]'' betweenwhen the operations ratherof thantransactions in a ''[[totalschedule order]]''interleave (wherei.e., orderwhen forthe each pairschedule is determined,conflict-serializable asbut innot a list of operationsserial). AlsoThe inschedule theis general case, each transaction may consist of several processes, and itself be properly represented by ain ''[[partial order|total oforder]]'' operations,when ratherthe thanoperations aof total order. Thus,transactions in general, a schedule isdo anot partialinterleave order of operations(i.e., containing ([[embedding]])when the partialschedule ordersis ofserial). all its transactions.
 
Time-order between two operations can be represented by an ''[[ordered pair]]'' of these operations (e.g., the existence of a pair (OP1, OP2) means that OP1 is always before OP2), and a schedule in the general case is a [[set (mathematics)|set]] of such ordered pairs. Such a set, a schedule, is a [[partial order]] which can be represented by an ''[[acyclic directed graph]]'' (or ''directed acyclic graph'', DAG) with operations as nodes and time-order as a [[directed edge]] (no cycles are allowed since a cycle means that a first (any) operation on a cycle can be both before and after (any) another second operation on the cycle, which contradicts our perception of [[Time]]). In many cases, a graphical representation of such a graph is used to demonstrate a schedule.
Line 168:
|Com.
|}
Serializability is used to keep the data in the data item in a consistent state. Serializability is a property of a transaction schedule (history). It is the major criterion for the correctness of concurrent transactions' schedule, and thus supported in all general purpose database systems. Schedules that are not serializable are likely to generate erroneous outcomes; which can be extremely harmful (e.g., when dealing with money within banks).
 
If any specific order between some transactions is requested by an application, then it is enforced independently of the underlying serializability mechanisms. These mechanisms are typically indifferent to any specific order, and generate some unpredictable [[partial order]] that is typically compatible with multiple serial orders of these transactions. This partial order results from the scheduling orders of concurrent transactions' data access operations, which depend on many factors.
 
Serializability is used in [[concurrency control]] of [[database]]sdatabases,<ref name="Bernstein872">[[Phil Bernstein|Philip A. Bernstein]], Vassos Hadzilacos, Nathan Goodman (1987): [http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx ''Concurrency Control and Recovery in Database Systems''] (free PDF download), Addison Wesley Publishing Company, {{ISBN|0-201-10715-5}}</ref><ref name="Weikum012">[[Gerhard Weikum]], Gottfried Vossen (2001): [http://www.elsevier.com/wps/find/bookdescription.cws_home/677937/description#description ''Transactional Information Systems''], Elsevier, {{ISBN|1-55860-508-8}}</ref> [[transaction processing]] (transaction management), and various [[Database transaction|transactional]] applications (e.g., [[transactional memory]]<ref name="Herlihy1993">[[Maurice Herlihy]] and J. Eliot B. Moss. ''Transactional memory: architectural support for lock-free data structures.'' Proceedings of the 20th annual international symposium on Computer architecture (ISCA '93). Volume 21, Issue 2, May 1993.</ref> and [[software transactional memory]]). Transactions are normally executed concurrently (they overlap), since this is the most efficient way. Serializability is considered the highest level of [[Isolation (database systems)|isolation]] between [[Database transaction|transactions]], and plays an essential role in [[concurrency control]]. As such it is supported in all general purpose database systems.
 
'''Serializability theory''' provides the formal framework to reason about and analyze serializability and its techniques. Though it is [[Mathematics|mathematical]] in nature, its fundamentals are informally (without mathematics notation) introduced below.
 
====Conflicting actions====
Line 410:
 
==== Distributed serializability ====
'''Distributed serializability''' is the serializability of a schedule of a transactional [[Distributed computing|distributed system]] (e.g., a [[distributed database]] system). Such a system is characterized by ''[[distributed transaction]]s'' (also called ''global transactions''), i.e., transactions that span computer processes (a process abstraction in a general sense, depending on computing environment; e.g., [[operating system]]'s [[Thread (computer science)|thread]]) and possibly network nodes. A distributed transaction comprises more than one of several ''local sub-transactions'' that each has states as described above for a [[Serializability#Database transaction|database transaction]]. A local sub-transaction comprises a single process, or more processes that typically fail together (e.g., in a single [[processor core]]). Distributed transactions imply a need for an [[atomic commit]] protocol to reach consensus among its local sub-transactions on whether to commit or abort. Such protocols can vary from a simple (one-phase) handshake among processes that fail together to more sophisticated protocols, like [[Two-phase commit protocol|two-phase commit]], to handle more complicated cases of failure (e.g., process, node, communication, etc. failure). Distributed serializability is a major goal of [[distributed concurrency control]] for correctness. With the proliferation of the [[Internet]], [[cloud computing]], [[grid computing]], and small, portable, powerful computing devices (e.g., [[smartphone]]s,) the need for effective distributed serializability techniques to ensure correctness in and among distributed applications seems to increase.
 
Distributed serializability is achieved by implementing distributed versions of the known centralized techniques.<ref name="Bernstein872"/><ref name="Weikum012"/> Typically, all such distributed versions require utilizing conflict information (of either materialized or non-materialized conflicts, or, equivalently, transaction precedence or blocking information; conflict serializability is usually utilized) that is not generated locally, but rather in different processes, and remote locations. Thus information distribution is needed (e.g., precedence relations, lock information, timestamps, or tickets). When the distributed system is of a relatively small scale and message delays across the system are small, the centralized concurrency control methods can be used unchanged while certain processes or nodes in the system manage the related algorithms. However, in a large-scale system (e.g., ''grid'' and ''cloud''), due to the distribution of such information, a substantial performance penalty is typically incurred, even when distributed versions of the methods (vs. the centralized ones) are used, primarily due to computer and communication [[Latency (engineering)|latency]]. Also, when such information is distributed, related techniques typically do not scale well. A well-known example with respect to scalability problems is a [[distributed lock manager]], which distributes lock (non-materialized conflict) information across the distributed system to implement locking techniques.