* ''Synchronization costs''. As thread context switch on modern CPUs can cost up to 1 million CPU cycles,<ref>{{Cite web |author='No Bugs' Hare |date=12 September 2016 |title=Operation Costs in CPU Clock Cycles |url=http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/}}</ref> it makes writing efficient multithreading programs difficult. In particular, special attention has to be paid to avoid inter-thread synchronization from being too frequent.
*
==Programming language support==
Many programming languages support threading in some capacity.
* IBM [[PL/I]](F) included support for multithreading (called ''multitasking'') as early as in the late 1960s, and this was continued in the Optimizing Compiler and later versions. The IBM Enterprise PL/I compiler introduced a new model "thread" API. Neither version was part of the PL/I standard.
* Many implementations of [[C (programming language)|C]] and [[C++]] support threading, and provide access to the native threading APIs of the operating system. A standardized interface for thread implementation is [[POSIX Threads]] (Pthreads), which is a set of C-function library calls. OS vendors are free to implement the interface as desired, but the application developer should be able to use the same interface across multiple platforms. Most [[Unix]] platforms, including Linux, support Pthreads. Microsoft Windows has its own set of thread functions in the [[process.h]] interface for multithreading, like [[beginthread]].
* Some [[high-level programming language|higher level]] (and usually [[cross-platform]]) programming languages, such as [[Java (programming language)|Java]], [[Python (programming language)|Python]], and [[.NET Framework]] languages, expose threading to developers while abstracting the platform specific differences in threading implementations in the runtime. Several other programming languages and language extensions also try to abstract the concept of concurrency and threading from the developer fully ([[Cilk]], [[OpenMP]], [[Message Passing Interface]] (MPI)). Some languages are designed for sequential parallelism instead (especially using GPUs), without requiring concurrency or threads ([[Ateji PX]], [[CUDA]]).
* A few interpreted programming languages have implementations (e.g., [[Ruby MRI]] for Ruby, [[CPython]] for Python) which support threading and concurrency but not parallel execution of threads, due to a [[global interpreter lock]] (GIL). The GIL is a mutual exclusion lock held by the interpreter that can prevent the interpreter from simultaneously interpreting the applications code on two or more threads at once, which effectively limits the parallelism on multiple core systems. This limits performance mostly for processor-bound threads, which require the processor, and not much for I/O-bound or network-bound ones. Other implementations of interpreted programming languages, such as [[Tcl]] using the Thread extension, avoid the GIL limit by using an Apartment model where data and code must be explicitly "shared" between threads. In Tcl each thread has one or more interpreters.
* In programming models such as [[CUDA]] designed for [[data parallel computation]], an array of threads run [[compute kernel|the same code]] in parallel using only its ID to find its data in memory. In essence, the application must be designed so that each thread performs the same operation on different segments of memory so that they can operate in parallel and use the GPU architecture.
* [[Hardware description language]]s such as [[Verilog]] have a different threading model that supports extremely large numbers of threads (for modeling hardware).
== See also ==
|