Synchronization (computer science): Difference between revisions

Content deleted Content added
Undid revision 1174997895 by 209.27.51.122 (talk)
Line 2:
{{Use American English|date=October 2020}}
{{More citations needed|date=November 2014}}
{{Main articledistinguish|Data synchronization}}
In [[computer science]], '''synchronization''' refers to one of two distinct but related concepts: synchronization of [[Process (computer science)|processes]], and synchronization of [[Dataset|data]]. ''Process synchronization'' refers to the idea that multiple processes are to join up or [[Handshake (computing)|handshake]] at a certain point, in order to reach an agreement or commit to a certain sequence of action. ''[[Data synchronization]]'' refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain [[data integrity]]. Process synchronization primitives are commonly used to implement data synchronization.
 
In [[computer science]], '''synchronization'''is the task of coordinating multiple of [[Process (computer science)|processes]] to join up or [[Handshake (computing)|handshake]] at a certain point, in order to reach an agreement or commit to a certain sequence of action.
==The need for synchronization==
 
==Motivation==
The need for synchronization does not arise merely in multi-processor systems but for any kind of concurrent processes; even in single processor systems. Mentioned below are some of the main needs for synchronization:
 
Line 13 ⟶ 15:
''Exclusive use resources:'' When multiple processes are dependent on a resource and they need to access it at the same time, the operating system needs to ensure that only one processor accesses it at a given point in time. This reduces concurrency.
 
=={{Anchor|TSync}}Thread or process synchronizationRequirements==
[[File:Multiple Processes Accessing the shared resource.png|thumb|'''Figure 1''': Three processes accessing a shared resource ([[critical section]]) simultaneously.]]
 
Line 39 ⟶ 41:
* [[busy waiting]], which occurs when a process frequently polls to determine if it has access to a critical section. This frequent polling robs processing time from other processes.
 
==Minimization==
==Minimizing synchronization==
One of the challenges for exascale algorithm design is to minimize or reduce synchronization.
Synchronization takes more time than computation, especially in distributed computing. Reducing synchronization drew attention from computer scientists for decades. Whereas it becomes an increasingly significant problem recently as the gap between the improvement of computing and latency increases. Experiments have shown that (global) communications due to synchronization on distributed computers takes a dominated share in a sparse iterative solver.<ref>
Line 59 ⟶ 61:
</ref> for ranking the top 500 supercomputers.
 
===Classic problems of synchronization===
The following are some classic problems of synchronization:
* [[Producer–consumer problem|The Producer–Consumer Problem]] (also called The Bounded Buffer Problem);
Line 67 ⟶ 69:
These problems are used to test nearly every newly proposed synchronization scheme or primitive.
 
===Hardware synchronization===
Many systems provide hardware support for [[critical section]] code.
 
Line 73 ⟶ 75:
"The key ability we require to implement synchronization in a multiprocessor is a set of hardware primitives with the ability to atomically read and modify a memory ___location. Without such a capability, the cost of building basic synchronization primitives will be too high and will increase as the processor count increases. There are a number of alternative formulations of the basic hardware primitives, all of which provide the ability to atomically read and modify a ___location, together with some way to tell if the read and write were performed atomically. These hardware primitives are the basic building blocks that are used to build a wide variety of user-level synchronization operations, including things such as [[Lock (computer science)|locks]] and [[Barrier (computer science)|barriers]]. In general, architects do not expect users to employ the basic hardware primitives, but instead expect that the primitives will be used by system programmers to build a synchronization library, a process that is often complex and tricky."<ref name="Morgan2011">{{cite book|last1=Hennessy|first1=John L.|last2=Patterson|first2=David A.|title=Computer Architecture: A Quantitative Approach|date=September 30, 2011|publisher=Morgan Kaufmann|isbn=978-0-123-83872-8|edition=Fifth|chapter=Chapter 5: Thread-Level Parallelism}}</ref> Many modern hardware provides special atomic hardware instructions by either [[test-and-set]] the memory word or [[compare-and-swap]] contents of two memory words.
 
===Synchronization strategiesSupport in programming languages===
In [[Java (programming language)|Java]], to prevent thread interference and memory consistency errors, blocks of code are wrapped into '''synchronized''' ''(lock_object)'' sections. This forces any thread to acquire the said lock object before it can execute the block. The lock is automatically released when the thread which acquired the lock, and is then executing the block, leaves the block or enters the waiting state within the block. Any variable updates, made by a thread in a synchronized block, become visible to other threads when they similarly acquire the lock and execute the block.
 
Line 82 ⟶ 84:
The [[.NET Framework]] has synchronization primitives. "Synchronization is designed to be cooperative, demanding that every thread or process follow the synchronization mechanism before accessing protected resources (critical section) for consistent results." In .NET, locking, signaling, lightweight synchronization types, spinwait and interlocked operations are some of mechanisms related to synchronization.<ref>{{cite web|title=Synchronization Primitives in .NET framework|url=http://msdn.microsoft.com/en-us/library/ms228964%28v=vs.110%29.aspx|website=MSDN, The Microsoft Developer Network|publisher=Microsoft|access-date=23 November 2014}}</ref>
 
===Implementation of Synchronization===
 
====Spinlock====
{{Main article|Spinlock}}
Another effective way of implementing synchronization is by using spinlocks. Before accessing any shared resource or piece of code, every processor checks a flag. If the flag is reset, then the processor sets the flag and continues executing the thread. But, if the flag is set (locked), the threads would keep spinning in a loop and keep checking if the flag is set or not. But, spinlocks are effective only if the flag is reset for lower cycles otherwise it can lead to performance issues as it wastes many processor cycles waiting.<ref>{{Cite book|title=Embedded Software Development with ECos|last=Massa|first=Anthony|publisher=Pearson Education Inc|year=2003|isbn=0-13-035473-2}}</ref>
 
====Barriers====
{{Main article|Barrier (computer science)}}
Barriers are simple to implement and provide good responsiveness. They are based on the concept of implementing wait cycles to provide synchronization. Consider three threads running simultaneously, starting from barrier 1. After time t, thread1 reaches barrier 2 but it still has to wait for thread 2 and 3 to reach barrier2 as it does not have the correct data. Once all the threads reach barrier 2 they all start again. After time t, thread 1 reaches barrier3 but it will have to wait for threads 2 and 3 and the correct data again.
Line 102 ⟶ 104:
Experiments show that 34% of the total execution time is spent in waiting for other slower threads.<ref name=":0" />
 
====Semaphores====
{{Main article|Semaphore (programming)}}
Semaphores are signalling mechanisms which can allow one or more threads/processors to access a section. A Semaphore has a flag which has a certain fixed value associated with it and each time a thread wishes to access the section, it decrements the flag. Similarly, when the thread leaves the section, the flag is incremented. If the flag is zero, the thread cannot access the section and gets blocked if it chooses to wait.
Line 108 ⟶ 110:
Some semaphores would allow only one thread or process in the code section. Such Semaphores are called binary semaphore and are very similar to Mutex. Here, if the value of semaphore is 1, the thread is allowed to access and if the value is 0, the access is denied.<ref>{{Cite book|title=Real-Time Concepts for Embedded Systems|last=Li, Yao|first=Qing, Carolyn|publisher=CMP Books|year=2003|isbn=978-1578201242}}</ref>
 
===Mathematical foundations===
Synchronization was originally a process-based concept whereby a lock could be obtained on an object. Its primary usage was in databases. There are two types of (file) [[File locking|lock]]; read-only and read–write. Read-only locks may be obtained by many processes or threads. Readers–writer locks are exclusive, as they may only be used by a single process/thread at a time.
 
Line 117 ⟶ 119:
An abstract mathematical foundation for synchronization primitives is given by the [[history monoid]]. There are also many higher-level theoretical devices, such as [[process calculi]] and [[Petri net]]s, which can be built on top of the history monoid.
 
==Examples include:==
===Synchronization examples===
Following are some synchronization examples with respect to different platforms.<ref name="Wiley2012">{{cite book|last1=Silberschatz|first1=Abraham|last2=Gagne|first2=Greg|last3=Galvin|first3=Peter Baer|title=Operating System Concepts|date=December 7, 2012|publisher=John Wiley & Sons.|isbn=978-1-118-06333-0|edition=Ninth|chapter=Chapter 5: Process Synchronization}}</ref>
 
====Synchronization inIn Windows====
[[Windows]] provides:
* [[interrupt|interrupt masks]], which protect access to global resources (critical section) on uniprocessor systems;
Line 126 ⟶ 128:
* [[dynamic dispatch]]ers{{citation needed|date=June 2022}}, which act like [[mutual exclusion|mutex]]es, [[Semaphore (programming)|semaphores]], [[Event (computing)|event]]s, and [[timer]]s.
 
====Synchronization inIn Linux====
[[Linux]] provides:
* [[semaphore (programming)|semaphores]];
Line 137 ⟶ 139:
Enabling and disabling of kernel preemption replaced spinlocks on uniprocessor systems. Prior to kernel version 2.6, [[Linux]] disabled interrupt to implement short critical sections. Since version 2.6 and later, Linux is fully preemptive.
 
====Synchronization inIn Solaris====
[[Solaris (operating system)|Solaris]] provides:
* [[Semaphore (programming)|semaphores]];
Line 145 ⟶ 147:
* [[turnstiles]], queue of threads which are waiting on acquired lock.<ref>{{cite web|url=http://sunsite.uakom.sk/sunworldonline/swol-08-1999/swol-08-insidesolaris.html|title=Turnstiles and priority inheritance - SunWorld - August 1999|first=Jim|last=Mauro|website=sunsite.uakom.sk}}</ref>
 
====PthreadsIn synchronization=Pthreads===
[[Pthreads]] is a platform-independent [[API]] that provides:
* mutexes;
Line 152 ⟶ 154:
* spinlocks;
* [[barrier (computer science)|barrier]]s.
 
==Data synchronization==
{{Main article|Data synchronization}}
[[File:Data Synchronization.png|thumb|'''Figure 3: '''Changes from both server and client(s) are synchronized.]]
 
A distinctly different (but related) concept is that of [[data synchronization]]. This refers to the need to update and keep multiple copies of a set of data coherent with one another or to maintain [[data integrity]], Figure 3.<ref>{{Cite journal |last1=Nakatani |first1=Kazuo |last2=Chuang |first2=Ta-Tao |last3=Zhou |first3=Duanning |date=2006 |title=Data Synchronization Technology: Standards, Business Values and Implications |url=http://dx.doi.org/10.17705/1cais.01744 |journal=Communications of the Association for Information Systems |volume=17 |doi=10.17705/1cais.01744 |issn=1529-3181|doi-access=free }}</ref> For example, database replication is used to keep multiple copies of data synchronized with database servers that store data in different locations.
 
Examples include:
* [[File synchronization]], such as syncing a hand-held MP3 player to a desktop computer;
* [[Cluster file system]]s, which are [[file system]]s that maintain data or indexes in a coherent fashion across a whole [[computing cluster]];
* [[Cache coherency]], maintaining multiple copies of data in sync across multiple [[cache (computing)|cache]]s;
* [[RAID]], where data is written in a redundant fashion across multiple disks, so that the loss of any one disk does not lead to a loss of data;
* [[Database replication]], where copies of data on a [[database]] are kept in sync, despite possible large geographical separation;
* [[Journaling file system|Journaling]], a technique used by many modern file systems to make sure that file metadata are updated on a disk in a coherent, consistent manner.
 
===Challenges in data synchronization===
Some of the challenges which user may face in data synchronization:
* data formats complexity;
* real-timeliness;
* data security;
* data quality;
* performance.
 
====Data formats complexity====
Data formats tend to grow more complex with time as the organization grows and evolves. This results not only in building simple interfaces between the two applications (source and target), but also in a need to transform the data while passing them to the target application. [[Extract, transform, load|ETL]] (extraction transformation loading) tools can be helpful at this stage for managing data format complexities.
 
====Real-timeliness====
In real-time systems, customers want to see the current status of their order in e-shop, the current status of a parcel delivery—a real time parcel tracking—, the current balance on their account, etc. This shows the need of a real-time system, which is being updated as well to enable smooth manufacturing process in real-time, e.g., ordering material when enterprise is running out stock, synchronizing customer orders with manufacturing process, etc. From real life, there exist so many examples where real-time processing gives successful and competitive advantage.
 
====Data security====
There are no fixed rules and policies to enforce data security. It may vary depending on the system which you are using. Even though the security is maintained correctly in the source system which captures the data, the security and information access privileges must be enforced on the target systems as well to prevent any potential misuse of the information. This is a serious issue and particularly when it comes for handling secret, confidential and personal information. So because of the sensitivity and confidentiality, data transfer and all in-between information must be encrypted.
 
====Data quality====
Data quality is another serious constraint. For better management and to maintain good quality of data, the common practice is to store the data at one ___location and share with different people and different systems and/or applications from different locations. It helps in preventing inconsistencies in the data.
 
====Performance====
There are five different phases involved in the data synchronization process:
* [[data extraction]] from the source (or master, or main) system;
* [[data transfer]];
* [[data transformation]];
* data load to the target system.
* data updation
 
Each of these steps is critical. In case of large amounts of data, the synchronization process needs to be carefully planned and executed to avoid any negative impact on performance.
 
==See also==