==Commercial examples==
*[[IBM|International Business Machines (IBM)]]'s [[POWER4]], released in [[2000]], was the first dual-core microprocessor on the market.
*IBM's [[IBM POWER|POWER5]] dual-core chip is now in production, and the company has a [[PowerPC 970|PowerPC 970MP]] dual-core processor in production and is in use in the Apple PowerMac G5.
* [[PA-RISC]] (PA-8800) , ▼
*[[Intel]] released its dual-core desktop [[Pentium D]] [[x86]] [[64-bit]] processors to [[original equipment manufacturer|OEM]]s on [[12 April]] [[2005]] though they are not true dual cores, simply two dies on the same package. Its dual-core [[Xeon]] processors, code-named ''Paxville'' and ''Dempsey'', are shipping at 3GHz. The company is also currently developing dual-core versions of its [[Itanium]] high-end server CPU architecture but there have been many delays.
*[[Advanced Micro Devices|AMD]], Intel's chief rival, released its dual-core [[Opteron]] server/workstation processors on [[22 April]] [[2005]], and its dual-core desktop processors, the [[Athlon 64]] [[Athlon 64 X2|X2]] family, were released on [[31 May]] 2005.
*[[Motorola]]/[[Freescale]] has dual-core ICs based on the [[PowerPC]] [[e600]] and [[e700]] cores in development.
*[[Sun_Microsystems]] dualmulti-core CPUs include the 1.05 - 1.35 [[GHz]] [[UltraSPARC]] IV and 1.5 [[GHz]] [[UltraSPARC]] IV+ models, as well as the [[UltraSPARC T1]]
* [[Cradle Technologies]] [[MDSP]] (CT3400) (CT3600) ▼
==Architectural class==
The dual-core type of processor falls into the architectural class of a tightly-coupled [[multiprocessing|multiprocessor]]. In this class, a processing unit, with an independent instruction stream executes code from a pool of shared memory. Contention for the memory as a resource is managed by arbitration and by the processing unit specific caches. The localized caches make the architecture viable since modern CPUs are highly optimized to maximize bandwidth to the memory interface. Without them, each CPU would run near 50% efficiency. Multiple caches into the same resource must be managed with a [[cache coherency]] protocol.
Beyond dual-core processors, there are examples of chips with multiple cores. Such chips include [[network processor]]s which may have a large number of cores or microengines that may operate independently on different packet processing tasks within a [[computer network|networking]] application.
==Development motivation==
===Technical pressures===
<!-- feh... this section is terrible; it needs to be reworked -->
As CMOS process technologies continue to shrink, the high end constraints on the complexity that can be placed on a single die move back. In terms of CPU designs, the choice becomes adding more functions to the device (e.g. an [[Ethernet]] controller, memory controller, or high-speed CPU cache), or adding complexity to increase CPU throughput. Generally speaking, shrinking the features on the IC also means that they can run at lower power and at a higher clock rate.
While CMOS manufacturing technology continues to improve, reducing the size of single gates, physical limits of semiconductor-based microelectronics become a major design concern. Some effects of these physical limitations can cause significant heat dissipation and data synchronization problems. The demand for more complex and capable microprocessors causes CPU designers to utilize various methods of increasing performance. Some [[Instruction-level parallelism|ILP]] methods like [[superscalar]] [[pipelining]] are suitable for many applications, but are inefficient for others that tend to contain difficult-to-predict code. Many applications are better suited to TLP methods, and multiple independent CPUs is one common method used to increase a system's overall TLP. A combination of increased available space due to refined manufacturing processes and the demand for increased TLP led to the logical creation of multi-core CPUs.
Various potential architectures contend for the additional "real estate" on the die. One option is to widen the registers and/or the bus interface of an existing processor architecture. Widening the bus interface alone leads to [[superscalar]] processor architectures, and widening both usually requires new programming models. Other options include including multiple levels of memory cache, and developing [[system-on-a-chip]] solutions.
===Commercial incentives===
Several business motives drive the development of dual-core architectures. Since [[Symmetric multiprocessing|multiple-CPU SMP]] designs have been long implemented using discrete CPUs, the issues regarding implementing the architecture and supporting it in software are well known. Additionally, utilizing a proven processing core design (e.g. Freescale's e700 core) without architectural changes reduces design risk significantly. Finally, the connotations of the terminology "dual-core" (and other multiples) lends itself to marketing efforts.
Additionally, for general-purpose processors, much of the motivation for dualmulti-core processors comes from the increasing difficulty of improving processor performance by increasing the operating frequency (frequency-scaling). In order to continue delivering regular performance improvements for general-purpose processors, manufacturers such as [[Intel]] and [[AMD]] have turned to dualmulti-core designs, sacrificing lower manufacturing costs for higher performance in some applications and systems.
It should be noted that while dualmutli-core architectures are being developed, so are the alternatives. An especially strong contender for established markets is to integrate more peripheral functions into the chip.
===Advantages===
*Proximity of twomultiple CPU cores on the same die have the advantage that the [[cache coherency]] circuitry can operate at a much higher clock rate than is possible if the signals have to travel off-chip, so combining equivalent CPUs on a single die significantly improves the performance of [[cache snooping|cache snoop]] operations.
*Assuming that the die can fit into the package, physically, the dualmulti-core CPU designs require much less [[Printed circuit board|PCB]] space than multi-chip SMP designs.
*A dual-core processor uses slightly less power than two coupled single-core processors, principally because of the increased power required to drive signals external to the chip and because the smaller silicon process geometry allows the cores to operate at lower voltages.
*In terms of competing technologies for the available silicon die area, the dualmulti-core design can make use of proven CPU core library designs and produce a product with lower risk of design error than devising a new wider core design. Also, adding more cache suffers from diminishing returns.
===Disadvantages===
*DualMulti-core processors require [[operating system]] (OS) support to make optimal use of the second computing resource.{{ref|PMTandSMP}} Also, making optimal use of multiprocessing in a desktop context requires [[application software]] support.
*The higher integration of the dualmulti-core chip drives the production yields down and are more difficult to manage thermally than lower density single-chip designs.
*From an architectural point of view, ultimately, single CPU designs may make better use of the silicon surface area than multiprocessing cores, so a development commitment to this architecture may carry the risk of obsolescence.
*Scaling efficiency is largely dependent on the application or problem set. For example, applications that require processing large amounts of data with low computer-overhead algorithms may find this architecture has an I/O bottleneck, underutilizing the device.
<!--*If a dual-core processor has only one memory bus (which is often the case) the available memory bandwidth per core is half the one available in a dual-processor mono-core system.--><!--goes w/o saying...-->
==Licensing==
Another issue that has surfaced in recent business development is the controversy over whether dualmulti core processors should be treated as two separate CPUs for software licensing requirements. Typically enterprise server software is licensed per processor, and some software manufacturers feel that dual core processors, while a single CPU, should be treated as two processors and the customer should be charged for two licenses - one for each core. However, the trend seems to be counting dual-core chips as a single processor as Microsoft, IBM, Intel, and AMD support this view. Oracle counts AMD and Intel dual-core CPUs as a single processor but has other funny numbers for other types. IBM and Microsoft count a multi-chip-module as multiple processors. If multi-chip-modules counted as one processor then CPU makers would have an incentive to make large expensive multi-chip-modules so their customers saved on software licensing. So it seems like the industry is slowly heading towards counting each die as a processor, no matter how many cores each die has. Intel has released Paxville which is really a multi-chip-module but Intel is calling it a dual-core. It is not clear yet how licensing will work for Paxville. This is an unresolved and thorny issue for software companies and customers.
== The limitations of single-processor architecture==
* High frequencies have an upper size limit: a 100GHz chip at 0.01ns per clock cycle limits the chip size to 3mm due to the speed of light (~300mm/ns).
* Long [[pipeline]] introduces big penalty for mis-prediction/wrong speculation
* Higher energy density increases localized heat output and makes cooling hard
== Multicore architecture is a solution==
A multicore architecture is actually a [[Symmetric multiprocessing|SMP]] implemented on a single [[VLSI]] circuit. The goal is to allow greater utilization of [[thread-level parallelism]] ('''TLP'''), especially for applications that lack sufficient [[instruction-level parallelism]] ('''ILP''') to make good use of [[superscalar|superscalar processors]]. Exploiting TLP at a chip level is usually called [[Chip-level multiprocessing]] (also known as '''CMP'''), or [[Chip-level multithreading]] ('''CMT''')
The characteristics of a CMP system
* A slow but wide approach: improve the throughput of the whole [[computer system]].
**Good for transaction processing, [[database]] and scientific computing applications.
** No benefits for single application that cannot be parallelized (divided and run on several tasks or threads)
* Better [[data locality]] than regular multi-processor architectures
* Better communication between processing units
* Saves space, saves energy.
* Better cost/performance ratio than a single-core processor
== Example multicore processors==
The whole microprocessor industry is jumping into multicore today. The latest versions of most [[RISC]] architectures use '''CMP''', including
▲* [[PA-RISC]] (PA-8800),
* [[IBM POWER]] (POWER4 and POWER5)
* [[Sun Microsystems]] [[SPARC]] (UltraSPARC IV, [[UltraSPARC T1]])
* [[AMD]] [[Opteron]] (shipping in May of 2005)
* [[Intel]] [[Pentium D]] (shipping in May 2005)
▲* [[Cradle Technologies]] [[MDSP]] (CT3400) (CT3600)
* [[Cavium Networks]] [[OCTEON]] (CN3XXX) (CN3XXX)
Other microprocessor families are also expected to use CMP in future versions.
* Intel's [[Itanium]] is expected to do so in the middle of [[2006]], with a release codenamed ''[[Montecito (processor)|Montecito]]''; then even more extensively in [[2007]] with a product codenamed ''[[Tukwila (processor)|Tukwila]]''.
==Notes==
|