Multithreading (computer architecture): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 17:02, 16 May 2022 edit 2409:4072:6113:5bef:1442:74a3:e988:94cc (talk) →Fine Grained multithreading Tags: Reverted Mobile edit Mobile web edit ← Previous edit		Latest revision as of 20:42, 14 April 2025 edit undo WikiCleanerBot (talk \| contribs) Bots 1,007,735 edits m v2.05b - Bot T19 CW#25 - Fix errors for CW project (Heading hierarchy) Tag: WPCleaner
(22 intermediate revisions by 17 users not shown)
Line 1: {{~~Dablink~~For\|~~This article describes hardware supports for multithreads. For~~ threads in software~~, see [[~~\|Thread (~~computer science~~computing)~~]].~~}} {{Multiple issues\| {{Refimprove\|date=October 2009}} Line 6: {{short description\|Ability of a CPU to provide multiple threads of execution concurrently}} [[File:Multithreaded process.svg\|thumb\|A process with two threads of execution, running on a single processor. \|alt=A process with two threads of execution, running on a single processor. Thread #1 is executed first, eventually starts Thread #2, and waits for a response. When Thread #2 finishes, it signals Thread #1 to resume execution to completion and then finishes.]] In [[computer architecture]], '''multithreading''' is the ability of a [[central processing unit]] (CPU) (or a single core in a [[multi-core processor]]) to provide multiple [[thread (computer science)\|threads of execution~~]] concurrently, supported by the [[operating system~~]]. This approach differs from [[multiprocessing]]. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the [[CPU cache]]s, and the [[translation lookaside buffer]] (TLB). Where multiprocessing systems include multiple complete processing units in one or more cores, multithreading aims to increase utilization of a single core by using [[thread-level parallelism]], as well as [[instruction-level parallelism]]. As the two techniques are complementary, they are combined in nearly all modern systems architectures with multiple multithreading CPUs and with CPUs with multiple multithreading cores. ==Overview== Line 23 ⟶ 21: Multiple threads can interfere with each other when sharing hardware resources such as caches or [[translation lookaside buffer]]s (TLBs). As a result, execution times of a single thread are not improved and can be degraded, even when only one thread is executing, due to lower frequencies or additional pipeline stages that are necessary to accommodate thread-switching hardware. Overall efficiency varies; [[Intel]] claims up to 30% improvement with its [[Hyper-Threading Technology]],<ref>{{cite web\|url=http://cache-www.intel.com/cd/00/00/01/77/17705_htt_user_guide.pdf\|title=Intel Hyper-Threading Technology, Technical User's Guide\|page=13\|archive-url=https://web.archive.org/web/20100821074918/http://cache-www.intel.com/cd/00/00/01/77/17705_htt_user_guide.pdf\|archive-date=2010-08-21}}</ref> while a synthetic program just performing a loop of non-optimized dependent floating-point operations actually gains a 100% speed improvement when run in parallel. On the other hand, hand-tuned [[assembly language]] programs using [[MMX (instruction set)\|MMX]] or [[AltiVec]] extensions and performing data prefetches (as a good video encoder might) do not suffer from cache misses or idle computing resources. Such programs therefore do not benefit from hardware multithreading and can indeed see degraded performance due to contention for shared resources. From the software standpoint, hardware support for multithreading is more visible to software, requiring more changes to both application programs and operating systems than multiprocessing. Hardware techniques used to support [[thread (computer science)\|multithreading]] often parallel the software techniques used for [[multitasking of computer programs\|computer multitasking]]. Thread scheduling is also a major problem in multithreading. Merging data from two processes can often incur significantly higher costs compared to processing the same data on a single thread, potentially by two or more orders of magnitude due to overheads such as inter-process communication and synchronization. <ref>{{Cite book \|title=Operating System Concepts \|isbn=978-0470128725 \|last1=Silberschatz \|first1=Abraham \|last2=Galvin \|first2=Peter B. \|last3=Gagne \|first3=Greg \|date=29 July 2008 \|publisher=Wiley }}</ref><ref>{{Cite book \|title=Computer Organization and Design MIPS Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) \|date=2013 \|publisher=Morgan Kaufmann \|isbn=978-0124077263}}</ref><ref>{{Cite book \|title=Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers \|date=2005 \|publisher=Pearson \|isbn=978-0131405639}}</ref> ~~==Types of multithreading==~~ ~~===Interleaved/Temporal multithreading===~~ ==Types== {{main\|Temporal multithreading}} ~~====Coarse-grained multithreading====~~ The simplest type of multithreading occurs when one thread runs until it is blocked by an event that normally would create a long-latency stall. Such a stall might be a cache miss that has to access off-chip memory, which might take hundreds of CPU cycles for the data to return. Instead of waiting for the stall to resolve, a threaded processor would switch execution to another thread that was ready to run. Only when the data for the previous thread had arrived, would the previous thread be placed back on the list of [[process state#Ready\|ready-to-run]] threads. Line 41: # Cycle {{math\|''i'' + 5}}: instruction {{math\|''k'' + 1}} from thread {{mvar\|B}} is issued. Conceptually, it is similar to cooperative multi-tasking used in [[real-time operating system]]s, in which tasks voluntarily give up execution time when they need to wait upon some type of ~~the~~ event. This type of multithreading is known as block, cooperative or coarse-grained multithreading. The goal of multithreading hardware support is to allow quick switching between a blocked thread and another thread ready to run. Switching from one thread to another means the hardware switches from using one register set to another. To achieve this goal, the hardware for the program visible registers, as well as some processor control registers (such as the program counter), is replicated. For example, to quickly switch between two threads, the processor is built with two sets of registers. Line 49: Many families of [[microcontroller]]s and embedded processors have multiple register banks to allow quick [[context switch]]ing for interrupts. Such schemes can be considered a type of block multithreading among the user program thread and the interrupt threads.{{citation needed\|date=October 2010}} ==== Fine ~~Grained~~-grained multithreading= === {{main\|Barrel processor}} The purpose of ~~interleaved~~fine-grained multithreading is to remove all [[data dependency]] stalls from the execution [[pipeline (computing)\|pipeline]]. Since one thread is relatively independent from other threads, there is less chance of one instruction in one pipelining stage needing an output from an older instruction in the pipeline. Conceptually, it is similar to [[preemption (computing)\|preemptive]] multitasking used in operating systems; an analogy would be that the time slice given to each active thread is one CPU cycle. For example: Line 58: # Cycle {{math\|''i'' + 2}}: an instruction from thread {{mvar\|C}} is issued. This type of multithreading was first called barrel processing, in which the [[stave (wood)\|staves]] of a barrel represent the pipeline stages and their executing threads. Interleaved, preemptive, fine-grained or time-sliced multithreading are more modern terminology. In addition to the hardware costs discussed in the block type of multithreading, interleaved multithreading has an additional cost of each pipeline stage tracking the thread ID of the instruction it is processing. Also, since there are more threads being executed concurrently in the pipeline, shared resources such as caches and TLBs need to be larger to avoid thrashing between the different threads. Line 86: ==See also== [[Async/await]] [[Super-threading]] *[[Speculative multithreading]]