Вы находитесь на странице: 1из 11

Multi-tasking Concepts: HT, Dual Core,

& Multi-Processor

If you’ve been around computers for some time, you can probably remember back to March
2000 when Intel formally introduced the gigahertz processor to the world. Touted as the
"world's highest performance microprocessor PC" (as per Intel), it was unbelievably
speedy at the time. The 'gigahertz' became a part of common terminology. Just look for
it like it’s a product review for performance. Then the processor became faster…faster…
and faster. 2Ghz? No problem. 3Ghz? Here you go.

Inevitably, we closed in on a 'speed barrier'– a point where it is no longer possible to


crank up the speed due to one of many conditions. Because producing newer better
processors with the demanding market is essential, the manufacturers were forced to
think of other improvements that will give the consumers an incentive to purchase the
newer hardware. In came an emphasis on multi-tasking.

Chances are, you've probably heard about the new dual core processors. The corporate giants,
Intel and AMD, have shifted their emphasis on multitasking using dual core as the
primary driving force. However, the concept of computer multi-tasking has been with us
for quite a while and it is not limited to dual-core. Multi-processor systems in servers
and high-end workstations as well as Intel’s coveted Hyper Threading Technology have been with
us for many years. Three technologies, one concept/goal. You may wonder, what’s the
difference?

Let's take a step back– what exactly is multi-tasking? What do you do when you can
multi-task? At the most basic level, it is obviously the ability to do two or more things
simultaneously. You are multi-tasking right now – reading this article, and breathing.
Likewise, in the more specific picture of processors, multitasking is the ability for the
processor to execute (or seem to execute – more on that later) more than one task
simultaneously. With that ability, you can listen to your favorite soundtracks, and finish
your meeting presentations, and surf the web all at the same time.

For most, it would be very hard to live without the ability to multi-tasking on the computer.
Yet, this multi-tasking was exactly what the first computers lacked. The original
processors could only uni-task. If any peripheral needed to be accessed, the CPU had to
come to a grinding halt and allow the peripheral to respond and report back to the
processor. This was excruciatingly slow – as with all technological concepts, they
improved.

Most mainstream computer users probably have a single core single processor inside
the case. Yet, you can still do more than one task at the same time with very little, if
any, drops in speed. The processor seems to be processing two sets of code at the
same time. Allow me to let you in on a little secret – it isn't! A typical single-core single-
processor cannot process more than one line of code at any given instant. The
processor is quickly switching from one task to the next, creating an illusion that each
task is being processed simultaneously.

There are two ways basic methods that this illusion is created: Cooperative multitasking
and Pre-emptive multitasking.

Cooperative Multitasking
When one task is already occupying the processor, a wait line is formed for other tasks
that also need to use the CPU. Each application is programmed such that after a certain
amount of cycles, the program would step down and allow other tasks their processor
time. However, this cooperation schema is rather outdated in its use and is hampered by
its limitations. Programmers were free to decide if and when their application would
surrender CPU time. In the perfect world, every program would respect the other
programs running alongside it, but unfortunately, this is not the perfect world. Even
when Program A could be using CPU cycles while Program B and Program C are waiting
in line, there was no way to stop Program A unless it voluntarily stepped down. As a
result of these shortcomings, cooperative multitasking was retired with the release of
Windows 95.

Apple's Macintosh OS used cooperative multitasking in every operating system release up to


the Mac OS 9 because Apple was able to control many of the programs that were loaded
onto its systems. Starting OS X, Apple decided to follow everyone else with the new kid
on the block – Pre-emptive multitasking.

Pre-emptive Multitasking:
The inefficiency of cooperative multitasking left the computer industry scrambling for
different ideas. Finally, a new standard called pre-emptive multitasking took form. In
pre-emptive multitasking, the system has the power to halt or "pre-empt" one code
from hogging CPU time. After forcing an interrupt, the control is at the hands of the
Operating System, which can appropriately hand CPU cycle/time to another task.
Inconvenient interrupt timing is the greatest drawback of pre-emptive multitasking. But
in the end, it is better that all your programs see some CPU time rather than having a
single program work negligibly faster. Most current systems utilize pre-emptive
multitasking including Windows XP and Apple Mac OS X.

Pre-emptive and cooperative multitasking are, as mentioned, "illusional" multitasking.


There are processors that can physically address two streams of data simultaneously,
and these technologies are dual/multi processor, dual/multi core, and simultaneous
multi-threading (Intel's Hyper Threading).

Before we delve into these technologies, I want to make a few terms clear. I define them
here since I will constantly refer to these terms.

• Thread: An order of tasks and instructions that the processor executes/processes.


• Execution Core: The actual part of the processor that is doing the processing.
• Registers: A very fast memory access to the processor core which stores frequently used values for the
CPU. It resides within the actual processor. It acts as the work desk and the hands of the processor.
There are many types of registers (FPR, GPR, Data Registers, etc.) but for the purpose of this article,
you'll only need to know what they do.
• On-die/Onboard cache: Refers to the very fast Static RAM (SRAM) that is built into the processor
chip. Since it is part of the physical processor, access speeds are faster than the normal system RAM.
The onboard cache is the first pool of data which the processor accesses. They are often denoted by the
level in which they are accessed. Level one cache is the fastest and the first pool of data accessed.
Level two cache is the second, and so on. I will refer to "cache" as level two cache because the level
one cache is very small and rather insignificant.
• Latency: In short, it's the initiation response time. In the context of this article, latency will describe
the difference in time between when the data is requested and when the memory will be able to respond
to send the data across the bus.
• Front Side Bus: The distance between the processor and the memory. If the data the CPU is looking
for is not found in the onboard cache (a "cache miss"), the processor looks at the RAM for data. The
faster the FSB is, the faster the processor can access data.

Simplified Image of the RAM-CPU

Two processors can easily chew two streams of data treating each one as if it's the
processor's only task. Each stream of data or "thread" has its own on-die cache, its own
set of pipelines, and most importantly, its own execution core. As is today, dual/multi-
processor systems are mainly seen in high-end workstations and servers. The sheer
processing power that it possesses is optimal in processor-intensive systems.

In order to take advantage of multi-processor configuration power, the application must


be able to stream multiple threads. A good number of these multi-threaded applications
tend to be multimedia and graphical. In CPU-intensive tasks, such as 3D rendering and
graphics editing, splitting the program into two or more threads of data and having
them processed independently of one another would theoretically cut the processing
time to either the number of threads that are exported or the number of processors
there are. Notice how I said "theoretically" - this would not occur in any real-life
situation, but certainly there would be a very noticeable speed increase. However, if an
application is not multi-threaded, then only one of the processor could be effectively
used. So for example, if a dual Intel Xeon configuration at 2.8 Ghz clock speed were
used for a single threaded program (such as many games out there), the setup would
act as if only one 2.8Ghz Xeon processor were present. "Two processors always run
faster than one processor" is a false statement.

Naturally, multi processor capable hardware is very expensive as compared to the


normal mainstream components. Some of the costs involved with owning a multi-
processor system include:
• Initial cost of the Processors
• Need for specific expensive registered memory
• Power Supply and electrical bill

Furthermore, single multi-processor capable server-line processors tend to be slower


than the mainstream desktop line processor at equivalent clock speeds. Processors made
specifically for servers and high-end workstations, such as the AMD Opterons or the
Intel Xeons, tend to be more conservative on performance speed. Servers need to be
reliable. They frequently use registered ECC (Error correcting code) RAM, which cost
more, which are slower, but very rarely go wrong. However, when these processors
work together, they can provide massive processing power.
There are two main types of multi-processor architectures - Symmetrical Multi
Processing and Non-Uniform Memory Architecture.

Symmetrical Multi Processing (SMP)


SMP is the most commonly used type of multi-processor architecture and it is based on
a simple concept. The processors share a common front side bus from which to collect
and export data. SMP is most commonly found on the x86-based architecture, which
includes the Pentium Pro P-6 based processor (Pentium II, Pentium III line) as well as AMD
Athlon MP and select Opteron Processors. Due to low costs, lower-scale systems use
SMP more commonly than high-end large-scale servers/clusters.

Depiction of SMP
You see that with a virtually symmetrical architecture, latencies remain constant
between the processors.
But there is a downside to SMP. Almost all modern-day processors work at a much
higher speed than the memory they are accessing their data from. That is why there is
onboard cache in order to compensate for the timing difference. However, cache misses
are very common because the processor will access more data than the amount of data
that could occupy the onboard cache at any given instant. As a result, the processor is
bottlenecked by the speed of either the front side bus or the memory. SMP furthers the
bottlenecking. Because the processors have to share a common front side bus, there
will often be "congestion" in the bus. As the saying goes, the chain is only as strong as
its weakest link.
This bottlenecking is less evident in lower-end, small-scale systems. Large-scale
systems with many processors will see some evident lag reminiscent of a Los Angeles
rush hour traffic jam. And that is why there is NUMA.

Non-Uniform Memory Architecture (NUMA)


NUMA is widely considered the more efficient and sophisticated of the two main Multi-
Processor technologies. Instead of a shared Front Side Bus in the SMP architecture,
NUMA has two (or more) independent front side buses, along with a high-speed bus
connecting the CPUs. This is made possible because each processor boasts its own
integrated memory controller which allows it to access its own pool of RAM. The biggest
advantage with this architecture is that it solves the bottlenecking issue found in SMP;
latency is kept to a minimum. The downside, though, is the expense related with using
NUMA - as always, you have to pay the bigger bucks to get speed. NUMA is most often
found in higher-end AMD Opteron processors.

Depiction of NUMA

Systems requiring smaller scale multi-tasking support have now found an answer in
multi-core processors.

So if you've been following computers to any degree, chances are you've at least heard of
dual core systems. It started back in 2001 with the IBM Power4 processors. Since IBM dual
core's ground breaking, Intel and AMD have both caught onto the concept and
expanded on it, both touting it as the technology of the future. Note that I will refer to the
combined technology of dual/multi-core as simply dual core from here on - multi-core
will happen, but we are not there yet.
D Image courtesy of PC Per.
Intel Pentium
Notice how thick the die is

But what exactly do dual core systems offer? Architecturally, it is simply two processor
execution cores on a single die. Threads could be processed in two separate and
physical execution cores in parallel. Because both cores have to share some of the
processor resources with each other, dual core processors will replicate some but not all of the
multi-tasking abilities of a multi-processor system.

Intel,
AMD, and IBM's Power PC dual core processors make up a large majority of the dual
core processors in the market right now. Here's an architectural overview on each one:

Pentium-D/Extreme Edition-based Dual Core


In a nutshell, the Intel approach is similar to the SMP for dual cores. Two cores are
attached together onto one processor chip. The cores work independently of each other
with no communication between them until they reach the front side bus. Each core has
its own onboard cache and its own set of registers and architectural state. The cores
share an 800Mhz Front Side Bus with each other, which is one of the downsides to the
architecture. In order for the two cores to communicate, the signals need to travel off
the chip, through the front side bus, and then back up the bus.

Intel Pentium-D Architecture

Here are some of the many amenities that Intel has prepared in the Pentium-D:
EM64T (Extended Memory 64-bit Technology): 64-bit processor support similar to the
AMD64 that AMD offers. With the processor capable of addressing up to 64-bits of data
per cycle, the added processing power found in dual core systems helps.

XB/EDB (executive disable bit): - know that it's an anti-malware feature available on all
the Pentium-Ds.

EIST (Enhanced Intel SpeedStep Technology): Intel's version of "on-demand power" is


available on all but the 820 model.
The Pentium-D processors target the mainstream users and are priced somewhat
reasonably. As of Q4 2005, the lowest Pentium-D is offered at around 270 USD. Many
computer builder giants, such as Dell and HP, have unsurprisingly jumped onto the
Pentium-D bandwagon. Currently, Intel is looking for a newer dual-core technology to
replace the Pentium-D Smithfield Architecture.

The Pentium Extreme Edition, not to be confused with Pentium 4 Extreme Edition, is
essentially a Pentium-D processor with Intel's Hyper Threading Technology enabled,
theoretically providing four logical processors. Like with many of Intel's top-end models,
their multipliers can be changed in order to manually increase the net processor clock-
speed, a method of overclocking. As imagined, the Pentium-EE has a very heavy price
tag. You may wonder, what is Hyper Threading? I will get to that shortly, but for now,
know that it is simply a single-core multi-tasking technology (even though I said it
doesn't exist).

Take note: Celeron-D, despite the "D", is a single core processor. The "D" in both the
Pentium-D and Celeron-D, according to Intel, stands for "different".

AMD's Dual Core Technology


AMD decided to use its integrated memory controller as best it could in dual core. Prior
to launch, there was a rumor that AMD will have trouble delivering a dual core processor
because of the complication involved with sharing an integrated memory controller. The
AMD Opterons and Athlon 64 X2 dual cores silenced all doubts when it was released in
2004. The two cores (labeled CPU 0 and CPU1) have their independent level two cache.
While only one memory controller, AMD integrated a System Request interface and a
Crossbar switch that will allow the two cores to communicate with each other within the
die of the processor. The request would then be sent to AMD's HyperTransport Bus,
AMD's special bus connecting the processor with the rest of the computer.

AMD Dual Core diagram courtesy of AMD

Currently, the highest Athlon 64 X2 processor is the 4800+

This architectural approach is surprisingly effective. However, the Athlon 64 X2, AMD's
mainstream dual core line, is priced fairly high. AMD decided to later merge this dual-
core concept with its high-end workstation/server product line, the Opteron. Opteron
2xx and 8xx series boast dual-core capabilities that are very similar to what is found in
the Athlon 64 X2.

Apple: IBM's last stand


Out of nowhere, Steve Jobs, in early June 2005, stood in front of an audience full of
reporters, announcing a radical Apple system change from IBM's PowerPC to Intel's line
of processors. What left unsaid was that Apple was not done with IBM yet. In late 2005,
Apple announced its PowerMac G5 line with IBM PowerPC Dual-Core G5 processors.
Being the frequently preferred platform of graphic designers, audio, and video editors,
Apple dove into dual core, offering parallel processing for those who would greatly
benefit from it.

Apple currently offers dual-core processor G5s on its PowerMacs while offering two dual-
core G5s on its flagship model.

IBM's PowerPC 970MP, IBM's name for the G5, basically acts like the AMD Athlon 64
X2/Opteron dual core more than the Pentium-D dual core. Each core has its own supply
of 1MB level two cache along with two velocity engine units, four floating point units,
and four integer units. What does this all add up to? A good deal of processing power.

Developed in 2002~2003, Intel's Hyper Threading Technology (or "HT Technology" for
short) is a way for two threads to be processed on a single core simultaneously. It is
found in all Intel Pentium 4 processors with an 800Mhz or 1066Mhz Front Side Buses,
the 3.06Ghz Pentium 4 with 533Mhz Font Side Bus, along with all Pentium-Extreme Edition
and Pentium 4-Extreme Edition processors. It is somewhat a "real" single core
multitasking, and commonly referred to as Simultaneous Multi-Threading (SMT).

When a thread of data is executed on the core, it does not use up all the resources
provided. Modern-day processors are almost never one-hundred percent efficient. In reality,
one thread will occupy around thirty to sixty-percent of the available processor
execution units. Hyper Threading streams an additional thread into the processor, using
the vacant execution core resources. It is not quite as effective as dual core in processing
the two threads simultaneously. In HT, the overwhelming majority of resources are
shared between the two threads being executed.

Images courtesy of Intel

It is important to note that Hyper Threading will simulate a single processor when only
one thread is present. As soon as the second thread is present, the processor will switch
from the Single Thread (ST) to the Multi-Task (MT) mode.

Intel came up with a clever way of making Hyper Threading work with the operating
system. After all, every new technology needs to be supported by proper software.
Processors with HT will present themselves to the operating system as two standalone
processors. The Operating System treats the two threads as if they have their own
individual resources. The threads will both have shared caches, but can be treated to
their own architectural state and APIC (Advanced Programmable Interrupt Controller).
Once they make it through, they come onto the execution core to be processed on a
single core.
Screen shot of a Pentium 4 2.8 CPU with HT

Windows Task Manager sees two logical processors and it will stream two threads for
what it thinks as independent and completely parallel processing. The logical processor,
for all intents and purposes, is treated like it is acting entirely on its own independent
resources. It's doing exactly what Intel wants the OS to do.

Also notice in the screenshot that the two logical processors have similar but not
identical workloads. This shows that one logical processor can keep working while the
other stalls from, among other things, a cache miss.

Because of the OS-specific nature on HT, take a note on compatibility. The following are
operating systems which are compatible with Hyper Threading:

• MS Windows XP Home
• MS Windows XP Professional*
• MS Windows 2000 Professional**
• Linux Operating Systems released after Nov. 2002 including, but not limited to:
o Red Hat Linux 9 and above (Professional and Personal)
o SuSE Linux 8.2 and above (Professional and Personal)
o RedFlag Linux Desktop 4.0 and above
o COSIX Linux

*Including all branches which are based on XP Pro, including but not limited to Tablet
Edt., and XP x64
**Compatible, but not optimized
The following are operating systems which do not support Hyper Threading

• MS Windows ME
• MS Windows 98SE
• MS Windows 98
and needless to say…
• MS Windows 95

To take full advantage of what HT has to offer, applications must support SMT to benefit
from technology. If only a single thread were being streamed into the processor, the
processor will act as if Hyper Threading doesn't exist. Multi-Threaded applications are
found among those applications that are CPU-intensive, such as video editing, digital
image editing, and rendering software. For example, Adobe's Photoshop is multi-
threaded and will make good use of Intel's Hyper Threading Technology.
Hypothetically, the performance boost would be two fold - two threads are
simultaneously executed. Due to shared resources, the realistic boost in performance is
around 20% on average, with most applications seeing around 5~10% boost. In the
scheme of things, however, a 10% speed boost in rendering a heavy image file is a
considerable improvement. For example, in a test done by X-bit Labs, HT yielded a
15.8% performance increase on 3DMax 5 rendering (Underwater) over its non-HT
counterpart. That amounts to a 45 second increase - imagine how much time you could
save when rendering a heavier file.
The original idea behind Hyper Threading was to alleviate the effects of a cache miss. If
a thread halts from a cache miss, the other thread can keep working on by. But HT had
a far different effect. The technology placed multi-tasking capable systems in the reach
of many mainstream users. While dual processors were available in the form of a
Pentium-Pro based SMP (such as with the Pentium III), the first affordable multi-
processing came in the form of an efficient single core. No need for an expensive
motherboard and a new power supply which will handle two chips chewing off your
monthly electricity bill. It was an open incentive for many software companies to make
applications multi-threaded, preparing the stage for dual core and dual-processors to
enter mainstream.
A note on games. As of now, there is still a distinct lack of games that take full
advantage of SMT and what Intel's HT has to offer. However, with the emergence of
dual-core as a universal technology, we should be seeing more and more games that
will utilize multi-threading, and therefore benefit Hyper Threading, at least to a small
degree.

Some final words…

So, is this multi-tasking concept worth buying into? The mainstream low-cost availability
of all these multi-tasking tools, whether it'd be dual-core, dual-processor, or even just
an Intel Hyper-Threading-enabled processor, have challenged the reasoning of purchasing
a processor without one. Is it necessary? For most, hardly so right now. But the computer
giant's will to change the market will quickly transform luxury into necessity. Look out
for one the next time you go to your local electronic retailer. After all, remember at one
point, "640k ought to be enough for anybody" - from the greatest computer pioneer
himself, Bill Gates.

In case you're interested in further reading…

Multi-Processors:

• Hardware Central - Dual Processor Workstation: Super Highway or Dead End?


http://hardware.earthweb.com/chips/article.php/600091
• Digit-Life - Non-Uniform Memory Architecture (NUMA): Dual Processor AMD Opteron Platform
Analysis in RightMark Memory Analyzer
http://www.digit-life.com/articles2/cpu/rmma-numa.html
• Sourcefuge - What is NUMA?
http://lse.sourceforge.net/numa/faq/index.html

Dual-Core:

• Webopedia: All About Dual-Core Processors


http://www.webopedia.com/DidYouKnow/Hardware_Software/2005/dual_core.asp
• The Inquirer: Athlon 64 and Opteron dual-core explained
http://www.webopedia.com/DidYouKnow/Hardware_Software/2005/dual_core.asp
• Apple: PowerMac G5 Overview
http://images.apple.com/powermac/pdf/20051018_PowerMacG5_TO.pdf

SMT/Hyper-Threading:
• Intel HyperThreading Technology Overview
http://www.intel.com/business/bss/products/hyperthreading/overview.htm
• Intel HyperThreading Technology
http://www.intel.com/technology/hyperthread/index.htm
• Digit-Life - Intel HyperThreading Technology Review
http://www.digit-life.com/articles/pentium4xeonhyperthreading/