Content deleted Content added
No edit summary |
m Open access bot: hdl updated in citation with #oabot. |
||
(31 intermediate revisions by 19 users not shown) | |||
Line 1:
''' Explicit Multi-Threading ''' ('''
The [[random
This large body of parallel algorithms knowledge for the PRAM model and their relative simplicity motivated building computers whose programming can be guided by these parallel algorithms. Since productivity of parallel programmers has long been considered crucial for the success a parallel computer, simplicity of algorithms is important.
Line 10:
Experimental work published in 2011 and 2012 demonstrates significantly greater speedups for advanced PRAM algorithms on XMT prototypes than for the same problems on state-of-the-art multi-core computers.
Work published in 2018 shows that lock-step parallel programming (using ICE) can achieve the same performance as the fastest hand-tuned multi-threaded code on XMT systems. Such inductive lock-step approach stands in contrast to multi-threaded programming approaches of many other core systems that are known for challenging programmers.
The XMT paradigm was introduced by [[Uzi Vishkin]]. ▼
==The main levels of abstraction of XMT==
The Explicit Multi-Threading (XMT) computing paradigm integrates several levels of abstraction.
Line 16 ⟶ 19:
The work-time (WT) (sometimes called work-depth) framework, introduced by {{harvtxt|Shiloach|Vishkin|1982}}, provides a simple way for conceptualizing and describing parallel algorithms. In the WT framework, a parallel algorithm is first described in terms of parallel rounds. For each round, the operations to be performed are characterized, but several issues can be suppressed. For example, the number of operations at each round need not be clear, processors need not be mentioned and any information that may help with the assignment of processors to jobs need not be accounted for. Second, the suppressed information is provided. The inclusion of the suppressed information is, in fact, guided by the proof of a scheduling theorem due to {{harvtxt|Brent|1974}}. The WT framework is useful since while it can greatly simplify the initial description of a parallel algorithm, inserting the details suppressed by that initial description is often not very difficult. For example, the WT framework was adopted as the basic presentation framework in the parallel algorithms books (for the PRAM model) {{harvtxt|JaJa|1992}} and {{harvtxt|Keller|Kessler|Traeff|2001}}, as well as in the class notes {{harvtxt|Vishkin|2009}}. {{harvtxt|Vishkin|2011}} explains the simple connection between the WT framework and the more rudimentary ICE abstraction noted above.
The XMT paradigm can be programmed using [[XMTC]], a parallel multi-threaded programming language which is a small extension of the programming language C. The XMT paradigm include a
The XMT multi-core computer systems provides run-time load-balancing of multi-threaded programs incorporating several patents. One of them <ref>Vishkin, Uzi. Spawn-join instruction set architecture for providing explicit multithreading. U.S. Patent 6,463,527. See also {{harvtxt|Vishkin|Dascal|Berkovich|Nuzman|1998}}.</ref> generalizes the [[program counter]] concept, which is central to the [[von Neumann architecture]] to multi-core hardware.
==XMT prototyping and links to more information==
In January 2007, a 64-processor computer <ref>University of Maryland, press release, June 26, 2007: [http://www.newsdesk.umd.edu/scitech/release.cfm?ArticleID=1459 "Maryland Professor Creates Desktop Supercomputer"] {{Webarchive|url=https://web.archive.org/web/20091214195046/http://www.newsdesk.umd.edu/scitech/release.cfm?ArticleID=1459 |date=2009-12-14 }}.</ref> named Paraleap
Experimental work reported in {{harvtxt|Caragea|Vishkin|2011}} for the [[Maximum
XMT prototyping was culminated in {{harvtxt|Ghanim|Vishkin|Barua|2018}}, establishing that lock-step parallel programming (using ICE) can achieve the same performance as the fastest hand-tuned multi-threaded code on XMT systems. This 2018 result sharpens the contrast between XMT programming and the multi-threaded programming approaches employed by nearly all many other-core systems, whose race conditions and other demands tend to challenge, and sometimes even fail programmers {{harvtxt|Vishkin|2014}}.
▲Experimental work reported in {{harvtxt|Caragea|Vishkin|2011}} for the [[Maximum Flow]] problem, and in two papers by {{harvtxt|Edwards|Vishkin|2012}} for the [[Graph Connectivity]], Graph Biconnectivity ([[biconnected graph]]) and Graph Triconnectivity ([[Triconnected component]]) problems demonstrated that for some of the most advanced algorithms in the parallel algorithmic literature, the [[XMT]] paradigm can offer 8 times to over 100 times greater speedups than for the same problems on state-of-the-art multi-core computers. Each reported speedup was obtained by comparing clock cycles on an XMT prototype relative to the fastest serial algorithm running on the fastest serial machines.
==References==
*{{citation
Line 30 ⟶ 37:
| year=1974
| volume=21
| issue=2
| pages=201–208
| doi=10.1145/321812.321815
| citeseerx=10.1.1.100.9361
}}.▼
| s2cid=16416106
}}.
*{{Citation
| last1 = Shiloach | first1 = Yossi
Line 41 ⟶ 50:
| journal = Journal of Algorithms
| volume = 3
| issue =2
| pages = 128–146
| doi =10.1016/0196-6774(82)90013-X
Line 48 ⟶ 57:
| first = Joseph
| last = JaJa
▲ | title = [[An Introduction to Parallel Algorithms]]
| edition =
| publisher = Addison-Wesley
| year = 1992
| isbn = 978-0-201-54856-
}}
* {{Citation
|
|
|
| first2 = Cristoph W.
| title = [[Practical PRAM Programming]]▼
| last3 = Traeff
| first3 = Jesper L.
| edition =
| publisher = Wiley-Interscience
| year = 2001
| isbn = 978-0-471-35351-5
}}
*{{Citation
Line 72 ⟶ 83:
| year = 2003
| title = Towards a First Vertical Prototyping of an Extremely Fine-Grained Parallel Programming Approach
| journal = Theory of
| volume = 36
| issue =5
| pages = 551–552
| doi =10.1007/s00224-003-1086-6
| s2cid = 1929495
| url=http://www.umiacs.umd.edu/users/vishkin/XMT/spaa01-j-03.pdf
}}.
Line 86 ⟶ 98:
| last4 = Ellison | first4 = David
| year = 2010
| title = Proceedings of the 41st ACM technical symposium on Computer science education - SIGCSE '10
| title = Is teaching parallel algorithmic thinking to high-school student possible? One teacher’s experience. ▼
| pages = 290▼
| doi =10.1145/1734263.1734363
| volume =▼
|
▲|
▲| pages =
| isbn = 9781450300063
| doi =▼
| doi-access = free
}}.
*{{Citation
Line 99 ⟶ 111:
| last3=Berkovich | first3=Efraim
| last4=Nuzman | first4=Joseph
| title=Proc. 1998 ACM
| contribution=Explicit Multi-Threading (XMT) bridging models for instruction parallelism
| pages=140–151
| year=1998
| doi=
| contribution-url=http://www.umiacs.umd.edu/users/vishkin/XMT/spaa98.ps
}}.
* {{Citation
Line 110 ⟶ 122:
| first = Uzi
| last = Vishkin
| title = Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques, 104 pages
| edition =
Line 127 ⟶ 138:
| doi=10.1145/1366230.1366240
| url=http://www.umiacs.umd.edu/users/vishkin/XMT/CompFrontiers08.pdf
| isbn=9781605580777
| s2cid=11557669
}}.
*{{Citation
| last1=Vishkin | first1=Uzi
| title=Communications of the ACM, Volume 54 Issue 1, January 2011 ▼
▲| contribution=Using simple abstraction to reinvent computing for parallelism
| pages=75–85
| year=2011
| doi=10.1145/1866739.1866757
| journal=Communications of the ACM
| volume=54
}}.
*{{Citation
| last1=Caragea | first1=George
| last2=Vishkin | first2=Uzi
| title=Proc. 23rd ACM
| contribution=Brief announcement: better speedups for parallel max-flow
| pages=
| year=2011
| doi=10.1145/1989493.1989511
| title-link=Symposium on Parallelism in Algorithms and Architectures
| isbn=9781450307437
| s2cid=5511743
}}.
*{{Citation
| last1=Edwards | first1=James
| last2=Vishkin | first2=Uzi
| title=Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
| contribution=Better speedups using simpler parallel programming for graph connectivity and biconnectivity
| pages=
| year=
| doi=10.1145/2141702.2141714
| isbn=9781450312110
| s2cid=15095569
}}.
*{{Citation
| last1=Edwards | first1=James A.
| last2=Vishkin | first2=Uzi
| title=Proc. 24th ACM
| contribution=Brief announcement: speedups for parallel graph triconnectivity
| pages=
| year=
| doi=10.1145/2312005.2312042
| title-link=Symposium on Parallelism in Algorithms and Architectures
| isbn=9781450312134
| s2cid=16908459
▲}}.
*{{Citation
| last1=Vishkin | first1=Uzi
| title=Is Multi-Core Hardware for General-Purpose Parallel Processing Broken? Viewpoint article
| pages=35–39
| year=2014
| doi=10.1145/2580945
| issue=4
| s2cid=30098719
}}.
*{{Citation
| last1=Ghanim | first1=Fady
| last2=Vishkin | first2=Uzi
| last3=Barua | first3=Rajeev
| journal= IEEE Transactions on Parallel and Distributed Systems
| volume=29
| issue=2
| date=February 2018
| title=Easy PRAM-Based High-Performance Parallel Programming with ICE
| pages=377–390
| doi=10.1109/TPDS.2017.2754376
| hdl=1903/18521
| doi-access=free
| hdl-access=free
}}.
==Notes==
{{reflist
==External links==
|