Concurrent data structure

This is an old revision of this page, as edited by 84.108.77.32 (talk) at 14:38, 24 November 2009 (Design and Implementation). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science, a concurrent data structure is a particular way of storing and organizing data for access by multiple computing threads (or processes) on a computer.

Historically, such data structures were used on uniprocessor machines with operating systems that supported multiple computing threads (or processes). The term concurrency captured the multiplexing/interleaving of the threads' operations on the data by the operating system, even though the processors never issued two operations that accessed the data simultaneously.

Today, as multiprocessor computer architectures that provide parallelism become the dominant computing platform (through the proliferation of multi-core processors), the term has come to stand mainly for data structures that can be accessed by multiple threads which may actually access the data simultaneously because they run on different processors that communicate with one another. The concurrent data structure (sometimes also called a shared data structure) is usually considered to reside in an abstract storage environment called shared memory, though this memory may be physically implemented as either a "tightly coupled" or a distributed collection of storage modules.

Basic principles

Concurrent data structures, intended for use in parallel or distributed computing environments, differ from "sequential" data structures, intended for use on a uniprocessor machine, in several ways) [1]. Most notably, in a sequential environment one specifies the data structure's properties and checks that they are implemented correctly, by providing safety properties. In a concurrent environment, the specification must also describe liveness properties which an implementation must provide. Safety properties usually state that something bad never happens, while liveness properties state that something good keeps happening. These properties can be expressed, for example, using Linear Temporal Logic.

The type of liveness requirements tend to define the data structure. The method calls can be blocking or non-blocking (see non-blocking synchronization). Data structures are not restricted to one type or the other, and can allow combinations where some method calls are blocking and others are non-blocking (examples can be found in the Java concurrency software library).

In order to guarantee the safety and liveness properties, concurrent data structures must typically (though not always) allow threads to reach consensus as to the results of their simultaneous data access and modification requests. To support such agreement, concurrent data structures are implemented using special primitive synchronization operations (see synchronization primitives) available on modern multiprocessor machines that allow multiple threads to reach consensus. There is a wide body of theory on the design of concurrent data structures (see bibliographical references).

Design and Implementation

Concurrent data structures are significantly more difficult to design and to verify as correct than their sequential counterparts.

The primary source of this additional difficulty is concurrency, exacerbated by the fact that threads must be thought of as being completely asynchronous: they are subject to operating system preemption, page faults, interrupts, and so on.

On today's machines, the layout of processors and memory, the layout of data in memory, the communication load on the various elements of the multiprocessor architecture all influence performance. Furthermore, there is a tension between correctness and performance: algorithmic enhancements that seek to improve performance often make it more difficult to design and verify a correct data structure implementation.

A key measure for performance is scalability, captured by the the speedup of the implementation. Speedup is a measure of how effectively the application is utilizing the machine it is running on. On a machine with P processors, the speedup is the ratio of the structures execution time on a single processor to its execution time on T processors. Ideally, we want linear speedup: we would like to achieve a speedup of P when using P processors. Data structures whose speedup grows with P are called scalable.

References

  1. ^ Mark Moir and Nir Shavit (2007). "Concurrent Data Structures". In Dinesh Metha and Sartaj Sahni (ed.). 'Handbook of Data Structures and Applications' (1st ed.). Chapman and Hall/CRC Press. pp. 47-14 – 47-30.
  • Nancy Lynch "Distributed Computing"
  • Hagit Attiya and Jennifer Welch "Distributed Computing: Fundamentals, Simulations And Advanced Topics, 2nd Ed"
  • Doug Lea, "Concurrent Programming in Java: Design Principles and Patterns"
  • Maurice Herlihy and Nir Shavit, "The Art of Multiprocessor Programming"
  • Mattson, Sanders, and Massingil "Patterns for Parallel Programming"