X 86 Vs PPC

Analysis: x86 Vs PPC posted by Nicholas Blachford on Wed 9th Jul 2003 16:43 UTC IconThis article started
life when I was asked to write a comparison of x86 and PowerPC CPUs for work. We produce PowerPC based systems and are often asked why we use PowerPC CPUs instead of x86 so a comparison is rather useful. While I hav e had an interest in CPUs for quite some time but I have never explored this iss ue in any detail so writing the document proved an interesting exercise. I thoug ht my conclusions would be of interest to OSNews readers so I've done more resea rch and written this new, rather more detailed article. This article is concerne d with the technical differences between the families not the market differences . History and Architectural Differences The x86 family of CPUs began life in 1978 as the 8086, an extension to the 8 bit 8080 CPU. It was a 16bit CISC (Complex instruction Set Computing) processor. In the following year the 8088 was introduced which was used in the original IBM P C. It is this computer which lead to todays PCs which are still compatible with the 8086 instruction set from 1978. The PowerPC family began life with the PowerPC 601 in 1993, the result of a coll aboration started in 1991 between Apple, IBM and Motorola. The family was design ed to be a low cost RISC (Reduced Instruction Set Computing) CPU, it was based o n the existing IBM POWER CPU used in the RS/6000 workstations so it would have a n existing software base. RISC Vs CISC When Microprocessors such as x86 were first developed during the 1970s memories were very low capacity and highly expensive. Consequently keeping the size of so ftware down was important and the instruction sets in CPUs at the time reflected this. The x86 instruction set is highly complex with many instructions and addr essing modes. Additionally it also shows it's age by the small number and comple x nature of registers (internal stores) available to the programmer. The x86 onl y has 8 registers and some of these are special purpose, PowerPC has 32 general purpose registers. RISC was originally developed at IBM by John Cocke in 1974 [1]. Commercial RISC microprocessors appeared in the mid 80s first in workstations later moving to th e desktop in the Acorn Archimedes. These use a simplified instruction set which allow the CPUs to be simpler and thus faster. They also included a number of arc hitectural improvements such as pipelining, super scalar execution and out of or der execution which enabled the CPUs to perform significantly better than any CI SC CPUs. CISC CPUs such as the 68040 and the Intel 80486 onwards picked up and u sed many of these architectural improvements. In the mid 1990s a company called NextGen produced an x86 CPU which used a trans lator to convert x86 instructions to run within a RISC core. Pretty much all x86 CPUs have since used this Technique. Even some RISC CPUs such as the POWER4 / P owerPC 970 use this technique for some instructions. The high level internal arc hitecture of the vast majority of modern desktop CPUs is now glaringly similar b e they RISC or CISC. Current State Of x86 And PowerPC CPUs The current desktop PowerPC and x86 CPUs are the following: x86 AMD Athlon XP Intel Pentium 4 PowerPC IBM 750xx (G3)
Motorola 74xx (G4) IBM 970 (G5) The current G4 CPUs run at significantly lower speeds compared with the x86 CPUs which are now above 2GHz (P4 > 3GHz). The recently announced PowerPC 970 curren tly runs up to 2GHz and delivers performance in line with the x86 CPUs. CPUs break down all operations into stages and these are performed in a pipeline , these stages can be big or small and the number of stages depends on what's do ne in each stage, the more an individual stage does the less stages you need to complete the operation. However if the stages are simple you will need more of t hem but each stage can complete quicker. The clock speed of the CPU is limited b y the time an individual stage needs to complete. A CPU with simpler but greater number of stages will operate at a higher frequency. Both the Athlon and Pentium 4 use longer pipelines (long and thin) with simple s tages whereas the PowerPC G4s use shorter pipelines with more complex stages (sh ort and fat). This is the essence of the so called "megahertz myth". A CPU with a very high clock speed may not be any faster than a CPU with a lower clock spee d. The Pentium 4 is now at 3.2 GHz yet a 1.25 GHz Alpha can easily outgun it on floating point operations. The longer pipelines allow the x86 CPUs to attain these very high frequencies wh ereas the PowerPCs G4s are somewhat restricted because they use a smaller number of pipeline stages and this limits the clock frequency. The amount of voltage the CPU can use restricts the power available and this eff ects the speed the clock can run at, x86 CPUs use relatively high voltages to al low higher clock rates, to boost clock speeds further, power hungry high speed t ransistors are used. A long thin pipeline is very fast but also very inefficient power wise. All these things add up so a 3GHz CPU may be fast but are also very power hungry with maximum power consumption rates now approaching or even excee ding 100 Watts. Intel in fact have taken to using a much lower frequency part fo r laptop computers than the top end Pentium 4. Yet, despite the fact it is only 1.6GHz, the Pentium M performs just as well as the 2.2GHz Pentium 4. The Law Of Diminishing Returns (Aka Amdahl's Law) The Law of diminishing returns is not exactly a new phenomenon, it was originall y noticed in parallel computers by IBM engineer Gene Amdahl, one of creators of the IBM System 360 Architecture. The original describes the problem in parallel computing terms however this simplified version pretty much describes the proble m in terms of any modern computer system: "Each component of a computer system contributes delay to the system If you make a single component of the system infinitely fast... ...system throughput will still exhibit the combined delays of the other compone nts." [3] As the clock speeds goes upwards the actual performance of the CPU does not scal e exactly with the clock speed. A 2GHz CPU is unlikely to be twice the speed of a 1GHz CPU, indeed on everyday tasks people seem to have some difficulty telling the difference between these speeds. The reason for the lack of scaling is the fact that memory performance has not s caled with the CPU so the CPU is sitting doing nothing for much of it's time (HP estimate this at 70% for server CPUs). Additionally the latency of memory has b arely improved at all so any program which requires the CPU to access memory a l ot will be effected badly by memory latency and the CPU will not reach anything near it's true potential. The CPU memory cache can alleviate this sort of proble m to a degree but it's effectiveness depends very much on the type of cache and
software algorithm used. Many of the techniques used within x86 CPUs may only boost performance by a smal l amount but they are used because of the need for AMD and Intel to outdo one an other. As the clock speed increases ever higher the scaling problem increases fu rther meaning that the additional effort has less and less effect on overall per formance. Recent SPEC marks for two Dell workstations show that a greater than 5 0% increase in CPU speed and the addition of hyper-threading results in only a 2 6% increase in SPEC marks [2]. Yet when the Itanium 2 CPU got an 11% clock speed boost and double the cache the SPEC mark increased by around 50% Of course there are other factors which effect the performance of CPUs such as t he cache size and design, the memory interface, compiler & settings, the languag e it's programmed in and the programmer who wrote it. Changing the language can in fact be shown to have a much greater effect than changing the CPU [4]. Changi ng the programmer can also have a very large effect [5]. Performance Differences Between The PowerPC And x86 Since AMD began competing effectively with Intel in the late 1990s both Intel an d AMD have been aggressively developing new faster x86 CPUs. This has lead them to becoming competitive with and sometimes even exceeding the performance of RIS C CPUs (If you believe the benchmarks, see below). However RISC vendors are now becoming aware of this threat and are responding by making faster CPUs. Ironical ly however if you were to make all CPUs at the same geometry the Alpha 21364 is the fastest CPU going - yet it uses a 7 year old core design. PowerPCs although initially designed as desktop processors are primarily used in embedded applications where power usage concerns outweigh raw processing power. Additionally, current G4 CPUs use a relatively slow single data rate bus system which cannot match the faster double or quad data rate busses found on x86 CPUs . The current (non G5) PowerPC CPUs do Us however due to the effects of the sively behind in terms of CPU power. as you might expect [6]. (Again, see not match up to the level of the top x86 CP law of diminishing returns they are not mas The x86 CPUs are faster but not by as much below section on benchmarks).
Vector Processing Differences Vector processing is also known as SIMD (Single Instruction Multiple Data) and i t is used in some types of processing. When used it speeds up operations many ti mes over the normal processing core. Both x86 and PowerPC have added extensions to support Vector instructions. x86 s tarted with MMX, MMX2 then SSE and SSE2. These have 8 128 bit registers but oper ations cannot generally be executed at the same time as floating point instructi ons. However the x86 floating point unit is notoriously weak and SSE is now used for floating point operations. Intel has also invested in compiler technology w hich automatically uses the SSE2 unit even if the programmer hasn't specified it boosting performance. The PowerPC gained vector processing in one go when Apple, IBM and Motorola revi sed the powerPC instruction set and added the Altivec unit which has 32 128 bit registers. This was added in the G4 CPUs but not to the G3s but these are now ex pected to get Altivec in a later revision. Altivec is also present in the 970. Currently the bus interface of the G4 slows down Altivec as it is very demanding of memory. However the Altivec has more registers than SSE so it can operate wi thout going to memory too much which boosts performance over SSE. The Altivec un it can also operate independently from and simultaneously to the floating point unit.
Power Consumption Differences One very big difference between PowerPC and x86 is in the area of power consumpt ion. Because PowerPCs are designed for and used in the embedded sector their pow er consumption is deliberately low. The x86 CPUs on the other hand have very hig h power consumption due to the old, inefficient architecture as well as all the techniques used to raise the performance and clock speed. The difference in powe r consumption is greater than 10X for a 1GHz G4 (7447) compared with the 3GHz Pe ntium 4. The maximum rating for a G4 is less than 10 Watts whereas Intel do not appear to give out figures for power consumption rather referring to a "thermal design rating" which is around 30 Watts lower than the maximum figure. The Figur e given for the design rating of a P4 3GHz is 81.9 Watts so the maximum is close r to and may even exceed 100 Watts. A single 3GHz Pentium 4 CPU alone consumes more than 4 times power than a Pegaso s PowerPC motherboard including a 1GHz G4. Low Power x86s There are a number of low power x86 designs from Intel, AMD, VIA and Transmeta. It seems however that cutting power consumption in the x86 also means cutting pe rformance - sometimes drastically. Intel still sell low power Pentiium III CPUs right down at 650MHz. The Pentium 4 M can reduce it's power consumption but only by scaling down it's clock frequency. Transmeta use a completely different arch itecture and "code morphing" software to translate the x86 instructions but thei r CPUs have never exactly broken speed records. VIA have managed to get power usage down even at 1GHz levels but they too use a different architecture. The VIA C3 series is a very simple CPU based on an archi tecture which forgoes the advanced features like instruction re-ordering and mul tiple execution units. The nearest equivalent is the 486 launched way back in 19 89. This simplified approach produces something of a compromise however, at 800M Hz it still requires a fan and even at 1GHz the performance is abysmal - a 1.3GH z Celeron completely destroys it in multiple benchmarks [7]. Why The Difference? PowerPCs seem to have no difficulty reaching 1GHz without compromising their per formance or generating much heat - how? CISC and RISC CPUs may use the same techniques and look the same at a high level but at a lower level things are very different. RISC CPUs are a great deal more efficient. No need to convert CISC -> RISC ISA x86 CPUs are still compatible with the large complex x86 Instruction set which started with the 8080 and has been growing ev er since. In a modern x86 CPU this has to be decoded into simpler instructions w hich can be executed faster. The POWER4 and PPC 970 also do this with some instr uctions but this is a relatively simple process compared with the multi-length i nstructions or the complex addressing modes found in the x86 instruction set. Decoding the x86 instruction set is not going to be a simple operation, especial ly if you want to do it fast. How for instance does a CPU know where the next in struction is if the instructions are different lengths? It could be found by dec oding the first instruction and getting it's length but this takes time and impo ses a performance bottleneck. It could of course be done in parallel, guess wher e the instructions might be and get all possibilities, once the first is decoded you pick the right one and drop the incorrect ones. This of course takes up sil icon and consumes power. RISC CPUs on the other hand do not have multi-length in structions so instruction decoding is vastly simpler. Related to the above is addressing modes, an x86 has to figure out what addressi
ng mode is used so it can figure out what the instruction is. A similar parallel process like the above could be used. RISC CPUs on the other hand again have a much simpler job as they usually only have one or two addressing modes at most. To RISC Or Not To RISC Once you have the instructions in simpler "RISC like" format they should run jus t as fast - or should they? Remember that the x86 only has 8 registers, this makes life complicated for the execution core in an x86 CPU. x86 execution cores use the same techniques as RIS C CPUs but the limited number of registers will prove problematic. Consider an l oop which uses 10 variables in an iteration. An x86 will need hardware assist ju st to perform a single iteration. Now consider a RISC CPU which generally have in the order of 32 registers. It ca n work across multiple iterations simultaneously, the compiler can handle this w ithout any hardware assist. The Hardware assist in question is Out-Of-Order execution and the tools of this trade are called rename registers. Essentially the hardware fools the executing program into thinking there are more registers than there really are and in the example this will allow for instance an iteration to be completed without the CP U needing to go the cache for data, the data needed will be in a rename register . OOO execution is mainly used to increase the performance of a CPU by executing m ultiple instructions simultaneously. If so the instructions per cycle increases and the CPU gets it's work done faster. However when the x86 includes this kind of hardware the 8 registers becomes a pr oblem. In order to perform OOO execution, program flow has to be tracked ahead t o find instructions which can be executed differently from their normal order wi thout messing up the logic of the program. In x86 this means the 8 registers may need to be renamed many times and this requires complex tracking logic. RISC wins out here again because of it's larger number of registers. Less renami ng will be necessary because of the larger number of registers so less hardware is required to do register usage tracking. The Pentium 4 has 128 rename register s, the 970 has less than half at 48 and the G4 has just 16. Because of the sheer complexity of the x86 ISA and it's limited number of archit ectural registers a RISC processor requires less hardware to do the same work. Despite not using the highly aggressive methodologies used in the x86 CPUs, IBM have managed to match and even exceed the computing power of x86 CPUs with the P owerPC 970 - at lower power consumption. They were able to do this because of th e efficiency of RISC and the inefficiency of x86 CPUs. IBM have already managed to get this processor to run at 2.5GHz and this should perform better than any x 86 (with the possible exception of the Opteron). The idea that x86 have RISC-like cores is a myth. They use the same techniques b ut the cores of x86 CPUs require a great deal more hardware to deal with the com plexities of the original instruction set and architecture. PowerPC And x86 Get More Bits Both families are in the process of transitioning to 64 bit. AMD Opteron Athlon 64 (due September)
IBM PowerPC 970 The AMD Opteron adds 64 bit addressing and 64 bit registers to the x86 line. The re is already some support for this CPU in linux and the BSDs, a 64 bit version of Windows is also due. The Opteron is designed as a server CPU and as such both the CPU and motherboards cost more than for normal desktop x86 CPUs. The Athlon 64 can be expected to arrive at rather lower prices. Despite performing better than the best existing 32 bit Athlon, the Opteron has a slower clock speed (1.8G Hz Vs 2.2GHz). AMDs x86-64 instruction set extensions give the architecture additional register s and an additional addressing mode but at the same time remove some of the olde r modes and instructions. This should simplify things a bit and increase perform ance but the compatibility with the x86 instruction set will still hold back it' s potential performance. The PowerPC 970 is as predicted on OSNews [8] is a 64 bit PowerPC CPU based on t he IBM POWER 4 design but with a smaller cache and the addition of the Altivec u nit as found in the G4. It supports 32 bit software with little or no changes al though some changes to the original 64bit PowerPC architecture have been made in the form of a "64 bit bridge" to ease the porting of 32 bit Operating Systems [ 9]. This bridge shall be removed in subsequent processors. The hardware architecture of the 970 is similar to that of any advanced CPU howe ver it does not have the aggressive hardware design of the x86 chips. IBM use au tomated design tools to do layout whereas Intel does it by hand to boost perform ance. The 970 has a long pipeline however it is not run at a very high clock rat e, unusually the CPU does more per clock than other long pipeline designs so the 970 is expected to perform very well. In addition to the new architecture the 9 70 includes dual floating point units and a very high bandwidth bus which matche s or exceeds anything in the x86 world, this will boost performance and especial ly boost the Altivec unit's capabilities. The IBM PPC 970 closes the performance difference between the PowerPC and x86 CP U without consuming x86 levels of power (estimated 20 Watts at 1.4GHz, 40W at 1. 8GHz). It has been announced in Apple Power Macintosh computers for August 2003, with the pent up demand I think we can expect Mac sales to increase significant ly. Benchmarks There has been a great deal of controversy over the benchmarks that Apple has pu blished when it announced the new PPC 970 based G5 [10]. The figures Apple gave for the Dell PC were a great deal lower than the figures presented on the SPEC website. Many have criticised Apple for this but all they did is use a different compiler (GCC) and this gave the lower x86 results. GCC m ay not be the best x86 compiler but it contains a scheduler for neither the P4 o r PPC 970 however it is considerably more mature on x86 than PowerPC. In fact on ly very recently has the PowerPC code generation began to approach the quality o f x86 code generation. GCC 3.2 for instance produced incorrect code for some Pow erPC applications. However, this does lead to the question of why the SPEC scores produced by GCC a re so different from those produced by Intel's ICC compiler which it uses when s ubmitting SPEC results. Is ICC really that much better than GCC? In a recent tes t [11] of x86 compilers most results turned out glaringly similar but when SEE2 is activated ICC completely floors the competition. ICC is picking up the code a nd auto-vectorising it for the x86 SSE2 unit, the other compilers do not have th
is feature so don't get it's benefit. I think it's fairly safe to assume this at least in part is the reason for the difference between the SPEC scores produced by Apple and Intel. This was a set of artificial benchmarks but does this translate into real life s peed improvements? According to this comment [12] by an ICC user the auto-vector ising for the most part doesn't make any difference as most code cannot be autovectorised. In the description of the SPEC CPU2000 benchmarks the following is stated: "These benchmarks measure the performance of the processor, memory and compiler on the tested system." SPEC marks are generally used to compare the performance of CPUs however the abo ve states explicitly this is not what they are designed for, SPEC marks also als o test the compiler. There are no doubt real life areas where the auto-vectorisa tion works but if these are only a small minority of applications, benchmarks th at are effected by it become rather meaningless since they do show reliably how most applications are likely to perform. Auto-vetorisation also work the other way, The PowerPCs Altivec unit is very pow erful and benchmarks which are vectorised for it can show a G4 outperforming a P 4 by up to 3 1/2. By using GCC Apple removed the compiler from the factors effecting system speed and gave a more direct CPU to CPU comparison. This is a better comparison if you just want to compare CPUs and prevents the CPU vendor from getting inflated res ults due to the compiler. x86 CPUs may use all the tricks in the book to improve performance but for the r easons I explained above they remain inefficient and are not as fast as you may think or as benchmarks appear to indicate. I'm not the only one to hold such an opinion: "Intel's chips perform disproportionately well on SPEC's tests because Intel has optimised its compiler for such tests"[13]* - Peter Glaskowsky, editor-in-chief of Microprocessor Report. I note that the term "chips" is used, I wonder does the same apply to the Itaniu m? This architecture is also highly sensitive to the compiler and this author ha s read (on more than one occasion) from Itanium users that it's performance is n ot what the benchmarks suggest. If SPEC marks are to a useful measure of CPU performance they should use the sam e compiler, an open source compiler is ideal for this as any optimisations added for one CPU will be in the source code and can thus be added to the other CPUs also keeping things rather more balanced. People accuse Apple of fudging their benchmarks, but everybody in the industry d oes it - and SPEC marks are certainly not immune, it's called marketing. Personally I liked the following comment from Slashdot which pretty much sums th e situation up: "The only benchmarks that matter is my impression of the system while using the apps I use. Everything else is opinion." - FooGoo The Future x86 has the advantage of a massive market place and the domination of Microsoft. There is plenty of low cost hardware and tons of software to run on it, the sam e cannot be said for any other CPU architecture. RISC may be technically better
but it is held in a niche by market forces which prefer the lower cost and plent iful software for x86. Market forces do not work on technical grounds and rarely chose the best solution. Could that be about to change? There are changes afoot and these could have an u npredictable effect on the market: 1) Corporate adoption of Linux Microsoft is now facing competition from Linux and unlike Windows it is not lock ed into x86. Linux runs across many different architectures if you need more pow er or low heat / noise you can run Linux on systems which have those features. I f you are adopting Linux you are no longer locked into x86. 2) Market saturation The computer age as we know it is market is ending as the market is more computers will need to find hese reasons are beginning to run at an end. The massive growth of the computer reaching saturation. Companies wishing to sell reasons for people to upgrade, unfortunately t out.
3) No more need for speed Computers are now so fast it's getting difficult to tell the difference between CPUs even if their clock speeds are a GHz apart. What's the point of upgrading y our computer if you're not going to notice any difference? How many people reall y need a computer that's even over 1GHz? If your computer feels slow at that spe ed it's because the OS has not been optimised for responsiveness, it's not the f ault of the CPU - just ask anyone using BeOS or MorphOS. There have of course always been people who can use as much power as they can ge t their hands on but their numbers are small and getting smaller. Notably Apple' s software division has invested in exactly these sorts of applications. 4) Heat problems What is going to be a hurdle for x86 systems is heat. x86 CPUs already get hot a nd require considerable cooling but this is getting worse and eventually it will hit a wall. A report by the publishers of Microprocessor Report indicated that Intel is expected to start hitting the heat wall in 2004. x86 CPUs generate a great deal of heat because they are pushed to give maximum p erformance but because of their inefficient instruction set this takes a lot of energy. In order to compete with one another AMD and Intel will need to keep upp ing their clock rates and running their chips at the limit, their chips are goin g to get hotter and hotter. You may not think heat is important but once you put a number of computers toget her heat becomes a real problem as does the cost of electricity. The x86's cost advantage becomes irrelevant when the cooling system costs many times the cost o f the computers. RISC CPUs like the 970 are at a distinct advantage here as they give competitive performance at significantly lower power consumption, they don't need to be pus hed to their limit to perform. Once they get a die shrink into the next process generation power consumption for the existing performance will go down. This str ategy looks set to continue in the next generation POWER5. The POWER5 (of which there will be a "consumer version") will include Simultaneo us Multi-Threading which effectively doubles the performance of the processor un like Intel's Hyper Threading which only boosted the performance by 20% (although this looks set to improve). IBM are also adding hardware acceleration of common functions such as communications and virtual memory acceleration onto the CPU. Despite these the number of transistors is not expected to grow by any significa
nt measure so both manufacturing cost and heat dissipation will go down. Conclusion x86 is not what it's sold as. x86 benchmarks very well but benchmarks can and ar e twisted to the advantage of the manufacturer. RISC still has an advantage as t he RISC cores present in x86 CPUs are only a marketing myth. An instruction conv erter cannot remove the inherent complexity present in the x86 instruction set a nd consequently x86 is large and inefficient and is going to remain so. x86 is s till outgunned at the high end and perhaps surprisingly also at the low end - yo u can't make an x86 fast and run cool. There is a lot of marketing goes into x86 and the market -technical people included- just lap it up. x86 has the desktop market and there are many large companies who depend on it. Indeed it has been speculated that inefficient or not, the market momentum of x8 6 is such that even Intel, it's creator may not be able to drag us away from it [14]. The volume of x86 production makes them very low cost and the amount of so ftware available goes without saying. Microsoft and Intel's domination of the PC world has meant no RISC CPU has ever had success in this market aside from the PowerPCs in Apple systems and their market share is hardly huge. In the high end markets, RISC CPUs from HP, SGI, IBM and Sun still dominate. x86 has never been able to reach these performance levels even though they are some times a process generation or two ahead. RISC vendors will always be able to mak e a faster, smaller CPUs. Intel however can make many more CPUs for less. x86 CPUs have been getting faster and faster for the last few years, threatening even the server vendors. HP and SGI may have given up but IBM has POWER5 and PO WER6 on the way and Sun is set to launch CPUs which handle up to 32 threads. Loo ks like the server vendors are fighting back. Things are changing, Linux and other Operating Systems are becoming increasingly popular and these are not locked into x86 or any other platform. x86 is running into problems and PowerPC looks like it is going to increasingly become a real, valid alternative to x86 CPUs both matching and exceeding the performance witho ut the increasingly important power consumption or heat issues. Notes: Both Amdahl's Law (of diminishing returns) and Moore's Law date from around the same time but notably we hear a great deal more about Moore's law. Moore's Law d escribes how things are getting better, Amdahl's Law says why it's not. There is a difference however: Moore's Law was an observation, Amdahl's Law is a Law.

X 86 Vs PPC

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

X 86 Vs PPC

Загружено:

Авторское право:

Доступные форматы

Analysis: x86 Vs PPC posted by Nicholas Blachford on Wed 9th Jul 2003 16:43 UTC IconThis article started

Вам также может понравиться