Вы находитесь на странице: 1из 31

CHAPTER 19 THE ATOM SoC INTELS HIGH END EMBEDDED PROCESSOR

In this chapter ,you will learn: The history of Intels embedded processors The history of the Atom processor The features of the Bonnell microarchiteture The components of the tunnelcreek SoC The display controllers and their working Power management and ACPI The new microarchitecture named Silvermont

Intel is known the world over by its prominence in the desktop PC and server processor market. Most of the worlds desktop and laptop PCs and servers are powered by Intels x86 processors which have, over the years, made giant leaps in terms of computing power and performance . Intel has other processors which are not based on the x86 architecture .For example ,Intels Itanium processor is one such product which has made itself useful for servers . Besides all this ,Intel has products which have been used in the embedded market as well .Let us make a survey of the embedded processors/microcontrollers popularized by Intel . 19.1 History of Intels embedded processors In 1976 Intel introduced the 8048, possibly the first microcontroller, which had on-chip memory and peripherals. It was followed by Intels 8 bit microcontroller chip 8051 which was launched in 1980 and became an instant hit .It included a host of variants and was called the MCS-51 family . It is one of the very popular microcontroller families in use because of being very simple and easy to use .Intel stopped production of 8051 in 2006 ,but had already licensed it to many companies and now there are many other manufacturers for this embedded processor. Examples are Atmel ,Philips etc ,to name a few . There are many enhanced variants of this architecture and many manufacturers have incorporated the 8051 core into their SoCs

.Even though by current standards ,an 8 bit processor seems relatively low end ,the truth is that many applications are still around for such embedded processors .

In the x86 series , Intel released 80186, the immediate successor of 8086. Its architecture was meant for embedded use and hence was never targeted as a processor for PC use. It had an x86 core similar to the 8086 with a 16 bit data bus and 20 bit address bus and could use a maximum frequency of 10 MHz.. But it also had a number of internal peripherals which allowed it to be categorized as an embedded processor . The peripherals included were timers, DMA controllers ,interrupt controllers etc..In 1987, Intel added more enhancements to it by including math co-processor support, energy saving power down modes, and a DRAM refresh controller. Clock rates were increased up to 25MHz and through the years Intel continued to develop new versions of the 80186 with added features, lower voltages, and different packages. In 2007 ,intel discontinued this chip; however, there were other producers of this chip ,who continued to manufacture it under license . AMD, for example made versions up to 50MHz . Fujitsu and Siemens were two others who continued to support it because it was embedded in millions of devices. Because of its simplicity and familiar x86 ISA ,it is still being used in devices which do not need the high end services expected from the modern gadgets we are getting accustomed to . Besides the above, there were other embedded processors manufactured by Intel ,like the MCS-196 family which was also a popular embedded 16 bit processor . All these had their peak times in history ,but as market requirements change ,new products are needed to fill up the space that comes up.

Currently , high end embedded processors are those with 32 bit computational capability , while the 64 bit processor market for embedded products is just opening up. ARM is a very popular 32 bit RISC SoC which boasts of very low power dissipation and has made its presence felt in all the tablet ,mobile phones and various other hand held devices

Intel is the market leader in the general purpose processor market and has a list of high performing processors .We have discussed all the techniques used in advanced

processors in Chapter 16 .With chapters 15 ,17 and 18 we covered all the advanced x86 processors of Intel. In 2008, Intel released its is first low power but high performance embedded processor, code named ATOM. Within the space of the last five years ,Atom has been used in many implementations starting from netbooks to tablets, mobile phones, and other embedded devices . In 2010, an Atom based SoC was released ,targeted specifically for embedded use cases. This is an x86 core with on chip peripherals so that it could be used as a single chip computer . This chapter will explore this Atom SoC architecture in detail and comment on its use for advanced embedded computing . 19.1.2 Desktop vs embedded systems In chapter 16 ,we discussed the various ways by which the performance of a processor can be increased .Performance is defined in terms of IPC i.e. Instructions per cycle. Over the years, processors have shown very high improvements in IPC .This parameter is very important, especially in the field of desktop computing where we use processors that we call general purpose processors . In such systems . the processor is expected to facilitate computation

only. The processor is accompanied by interfacing chips when a full fledged computer system has to be developed.The accompanying chips interface the processor with peripheral devices which are necessary for users to communicate with the processing system . Currently such interfacing is done by a set of chips called a chipset . The latest Intel platforms (Section..) have just one chip acting as a chipset . More details of x86 platforms and chipsets are presented in Chapter 20.. Embedded systems have to be looked up in a different perspective. For embedded applications ,the following points are to be noted: i) All the peripheral controllers are expected to be inside the processor chip and the chip then becomes an SoC.

ii)

Many embedded systems operate on battery power and hence conserving power is very important . SoCs for such uses are expected dissipation, both in active and idle states to have very low power

iii) iv)

Most embedded products are expected to be of small physical size As embedded systems have become very advanced in terms of their handling of signals like audio , image and video ,the computational capability of an embedded processor SoC should encompass the domains of digital signal processing ,floating point operations etc.

v)

In todays world , connectivity is absolutely necessary for any embedded product . Hence the SOC being used should provide the necessary support for ensuring this.

Now that we have listed out the features necessary for an embedded processor , let us examine the capabilities of Atom as a processor which can fit in the embedded domain. We will start with the history of Atom ,followed by the microarchitectures used in Atom ,and then examine an Atom based SoC and its inbuilt peripherals. 19.1.2 The Atom processor The first Atom processor chip was released in the year 2008 . It was based on a 45 nm technology and its microarchitecture was named Bonnell . This is similar to the P5( Pentium )architecture in many ways ,but special attention was given to make it a low power

dissipating one .This made Bonnell is an entirely new design. It is also the smallest processor ever ,of Intel .The smaller die version (32nm)of Bonnell is named Saltwell. An impressive list of Bonnell based processors and SoCs for applications like netbooks ,nettops, tablets and smartphones can been seen .It was in 2010 that a single core SoC for embedded applications in the consumer electronics market was released .They are named Tunnel creek and Cloverview . In May 2013 , the next new microarchitecture for Atom ,named Silvermont was officially released .It and is based on a 22nm process and it differs from Bonnell on many other

counts,as well . A smaller die version (14nm) of this, named Airmont is planned for release in 2014.

19.2 THE BONNELL MICROARCHITECTURE

For any embedded processor ,the parameter performance /watt

is what is aimed to be

enhanced ,and Bonnell was designed with this objective. The Bonnell microacrhcitecture is not dynamically different from some of Intels desktop processors. . But in spite of these similarities , Bonnell based Atom is an entirely new design. It combines low power

consumption, small size and low production cost.

It adopted the principles used in Intels P5 (Pentium) architecture in that it is superscalar with two integer ALU units ,with in order execution .It does not implement speculative execution ,out-of order execution and register renaming which is present in the more advanced

processors ( Chap..16) . The reason for not including these performance enhancing features ,is because of aspects like more die space, more hardware needs , more power dissipation etc that would be that would result from the extra effort involved in squeezing out a higher performance . Because of the in-order core ,there is the problem of stall cycles resulting from cpu waiting for data resident in caches or even system memory. To circumvent this problem, an algorithm named as Safe Instructions Recognition has been incorporated .This is a

mechanism that allows the processor to let those instructions that do not need to wait for any data, to get ahead in the queue and be processed first. A 16 stage pipeline is employed for Bonnell and hyperthreading (Section..) is also incorporated. Hyperthreading enhances performance by around 50% but causes a power increase of around 17% . However support of Hyper-Threading is Atoms main trump card for Bonnell. As a result, the operating system sees a single-core Atom processor as a dual-core CPU that can process two threads at a time.

Refer Table 18.1 to get an idea of some of the parameters of three Bonnell based single core Atom processors. Note the low supply voltages used and the low TDPs(Thermal Design Power- Section 19) .A desktop processor like Core i5 would have around 25W TDP and a higher performance ,though . The point is that, for Atom, it is performance /watt that is aimed to be enhanced ,rather than just performance.

Table 19.1

19.2.1 Bonnell based Atom variants Initially some Atom variants were 32 bit and without integrated graphics. The newer ones support 64 bit instructions if needed ,but they can work in the 32 bit mode as well. It is available in the Z,N,D ,CEand E series. What applications are built around the Atom processor? In 2008 ,when Atom was released ,the first application domains were netbooks and nettops which are PCs with smaller size and lower computing power .Next ,Intel envisaged the idea of MIDs i.e. Mobile Internet Devices which were to be smaller ,connected computers without a conventional key board .But around this time ,the smartphone market and later the tablet market started growing and so the most recent applications for which Atom in targeted are smartphones and tablets . Atom platforms for such applicat ions are discussed in Section 20. The following list is the Atom processor series vis--vis their intended fields of application.

Atom N-Series (Netbooks)

Atom CE-Series (Set-Top boxes, TV) Atom D-Series (Entry Level Desktops) Atom E- Series (Embedded Devices) Atom Z-Series (smartphones & tablets)

19.3 The Atom SoC In 2010 ,the E6xx series of Atom was announced. This series is targeted for embedded applications and is a System on Chip design .This design is meant to take Atom into the fields of consumer electronics , automotive, Industrial controls, and all possible embedded domains. Providing high performance at low power, along with rich user interface on the chip , and compatibility with main stream operating systems is been the central theme of this SoC variant of Atom. In the remaining sections of this chapter, we will now look into the details of the Tunnel Creek which is an SoC based on the E6xx Atom series. All Intel CPUs need a chipset (Ref ..) to put it into the working mode .A chipset is a set of chips or just a single chip which contains the controllers to all the peripherals with which the processor interfaces to the outside world . For all Intel CPUs there are proprietary chipsets available and the processor plus chipset is placed on a motherboard which is the platform for the working of the processor in a PC like environment. Such platforms and their evolution are discussed in Chapter 20. The E6xx series based Tunnel Creek SoC is different .It can be used along with certain standard chipsets ,but this is not mandatory .As discussed in Section 19.1.2 , an embedded system needs its peripheral controllers to be within the chip itself so that is can perform as a single chip computer . The E6xx series has been designed for such a scenario .It interfaces to the outside world through an open-standard, industry-proven PCI Express bus ( Sec..). Because of this ,it can used along with customer defined I/Os ,ASICs, FPGAs and other discrete

components . This provides flexibility for its use in different kinds of embedded applications where IOs differ . In addition ,the controllers of many standard peripherals like audio and video and display devices are present within the chip . It also has controllers for c ertain standard on

board buses like SPI and SMBus. There is a set of GPIOs(General purpose Input Output) which are single pins that can be programmed to be either input or output pins and can be used to connect non-standard peripherals .Figure 19.1 shows the internal details of the E6xx SoC. Let us examine each one of the component parts. ..

Fig 19.1. The components of the E6xx SoC

19.3.1 The E6xx core :


Most of the features of the core are like any x86 core and have been discussed thoroughly in previous chaptersso here, let us just list out the features . The core has different clock frequencies for different applications the low frequency versions are for very low power applications and the high frequency ones are for higher performance . We can classify them as o o

600 MHz (Ultra Low Power ) 1 GHz (Entry level)

o o

1.3 GHz (Mainstream ) and 1.6 GHz (Premium)

Macro-operation execution support (Section..) 2-wide instruction decode and in-order execution 32 KB L1 Instruction Cache and 24 KB L1 Data Cache 512 KB L2 cache 32-bit address bus Support for IA 32-bit architecture (Section..) Support for Intel Virtualization Technology (Section..) Support for Intel Hyper-Threading Technology with two threads (Section..) Intel SSE2 and Intel SSE3 support (Section..) Advanced power management features including Enhanced Intel SpeedStep Technology (Section) Deep Power Down Technology (C6) (Section..)

19.3.2 Memory Controller :


It is a single-channel DDR2 memory controller which can support only
soldered-down DRAM configurations. The memory controller does not support SODIMM (Small Outline DIMM) or any type of DIMMs(Section..) It can support memory sizes of 128 MB, 256 MB, 512 MB, 1 GB and 2 GB.

19.3.3 Graphics,Video and Display


Figure 19.2 .gives a more detailed view of the integrated graphics unit of the processor containing 3D grpahics engines, Video processing units, and Display controller.

Fig 19.2 The components of the graphics unit of the processor

a)3D graphics: An integrated 3D graphics engine is an inbuilt feature of the SoC and this equips it with very good capability to handle pixel shading and vertex shading which are

important in rendering 3D displays commonly found in applications like point of sale terminals, , games, and car dashboard . To understand this term ,let us try to get a grasp on how 3D graphics is created for such applications . Any 3D object is generated by describing it in terms of triangles which in turn are defined by their vertices . The more the number of triangles (and hence the vertices) used to describe an object ,the better is its appearance because of the better resolution available to represent curves .In effect ,a smoother picture is the result ,but at the cost of high computational complexity .A vertex shader transforms the vertex geometry information to

create a 2D representation in the screen space. Those transformed vertices are then processed to create display lists in memory. Next comes pixel processing .A pixel, i.e. picture element , is the smallest unit of a picture . A pixel shader operates on every pixel thus creating even more detail than what is provided by the vertex shaders. Pixel shaders create the fine details of a picture i.e ,the texture. In essence, the quality of graphics is dependent on the computational capability of the vertex and pixel shaders . In E6xx ,vertex shading is realized in software and pixel shading in hardware. The unified shader of Atom E6xx contains a specialized programmable processing unit with capabilities specifically suited for efficient processing of graphics geometries (vertex

shading), graphics pixels (pixel shading), and general-purpose video and image processing programs. In addition to data processing operations, the unified shader engine has a rich set of program-control functions permitting complex branches, subroutine calls, tests, etc., for runtime program execution. b)Video Encoder and Decoder In earlier times, video encoding and decoding was handled by the general purpose CPU itself but current trends are to perform video encoding and decoding using dedicated hardware and this is called hardware accelerated transcoding .Thus ,there has been a surge in the

availability of graphics cards and processors that can decode and subsequently re-encode compressed video. The Atom has such a unit integrated in the SoC . The video encode hardware accelerator in E6xx improves video capture performance by providing dedicated hardware-based acceleration, allowing encoding of high definition video streams in the highly compressed H.264 format with a very low main CPU utilization, thus releasing the general purpose processor for other parallel workloads. Other benefits are low power consumption, low host processor load, and high picture quality. This unit can take a video and encode it to the following formats : MPEG4, H.264 , H.263 and VGA. Applications like video surveillance can effectively use the onchip video encoding accelerator to compress the incoming raw video before storing or sending over the network interface. The SoC has video decoding hardware also , which can handle the formats of MPEG2,

MPEG4, VC1, WMV9, H.264 and DivX*. Applications like Digital video rendering can use video decoder facility to render high definition data streams on the display.

c) 2D and Display Controller The Display Controller provides the 2D graphics functionalities for the display pipeline. The Display Controller converts a set of source images and delivers them with proper timing to display devices. The display output can be divided into three stages: Planes

The Display Controller contains a variety of planes. A plane constitutes a display and a cursor and is a rectangular shaped image that has characteristics such as source, size, position,

method, and format. Pipes A pipe consists of a set of planes and a timing generator. The processor has two independent display pipes that can allow for support of two independent display streams. Along the display pipe, the display data can be converted from one format to another, stretched or shrunk, and colour corrected or gamma converted. Ports Display ports are the destination for the display pipe. The E6xx Series has one dedicated LVDS and one SDVO port. Since two display ports are available for its two pipes, the processor can support up to two different images on two different display devices. Automotive applications would use one display as car dash board with Navigation support, and the other display for back seat entertainment such as playing DVD movie.

Fig 19.3 The Display controller

Fig 19.3. shows two


listed

independent display pipes, Pipe A and Pipe B which produces output as

below.

Display Pipe A: Outputs directly as LVDS Display Pipe B: Outputs directly as SDVO Let us now discuss the terms LVDS and SDVO. 19.3.3.1 LVDS (Low-Voltage Differential Signaling) LVDS was introduced in the mid-1990s and is very popular in computers, where it forms part of very high-speed networks and computer buses. The idea in this is that a signal is to be represented as the difference between voltages at two points. Because difference ,any common noise signal gets cancelled out. Thus of the

differential signals are

much cleaner than single ended signals. Also the difference signal is very low in amplitude ,and so power dissipation is very small.

LVDS is a high-speed digital interface that is used for several high data rate applications that require high noise immunity and low power consumption . It is used in all types of devices - DDRs ,buses , displays ,networks etc. The LVDS output from the display pipe A is designed to be used directly for flat panel displays 19.3.3.2 SDVO(Serial Digital Video Out) Digital display channel B is capable of driving SDVO adapters .SDVO was developed by Intel to interface third party compliant display controller devices that may have a variety of output formats like DVI,LVDS,HDMI and TV-Out . The protocol and timings are unique though the electrical interface is PCIe.

19.3.4 Intel High Definition Audio Controller This controller conforms to the Intel High Definition Audio Specification which defines a digital interface that can be used to attach different types of audio codecs (coder-decoder). This specification supports up to four audio streams, two in and two out. Having such a high definition audio controller inside the SoC allows the processor to be easily used to design automotive and consumer electronics products with excellent audio quality. Fig 19.4 .shows the architecture of a HD audio controller with multiple links. The controller uses a set of DMA engines to effectively manage and support simultaneous independent streams on the link. The audio controller supports isochronous data transfers which allow glitch free audio. . On the input side, the array of microphones. E6xx Series adds support for an

Fig 19.4 Block diagram of Intel High Definition audio management system

19.3.5 SPI Controller SPI is a serial three wire protocol which is very popular in embedded systems .It is a bus developed by Motorola. SPI stands for Serial Peripheral Interface and as the name suggests, it is a serial data transfer protocol, which is synchronous and full duplex (data can be sent in both directions simultaneously), between a microcontroller unit (MCU) and a peripheral. As a system, it is a single master, multi-slave system, in which only one of the slaves is to be enabled at a time. It is a master slave protocol, in the sense that the master is the unit that generates the clock signal and initiates data transfer. When the master does this, data transfer occurs in both directions (simultaneously). In most x86 based systems , the SPI bus is the one used to connect to the flash ROM which contains the BIOS. Such an interface provides a low cost option for connecting high capability flash devices. E6xxx contains an SPI controller module which offloads Atom CPU by driving the SPI protocol on the bus and providing the necessary software interface for reading/writing devices in data blocks. 19.3.6 SM Bus Controller: The term SMBus stands for System Management Bus defined and developed by Intel in 1995. It is a two wire proprietary bus of Intel which is similar to the I2C protocol developled by

Phillips. It is used in personal computers and servers for low-speed system management communications with various devices on the system. Such devices typically include temperature sensors, fan controls, voltage regulators etc. Now ,there are some embedded devices which
use this protocol

19.3.7 RTC (Real Time Clock) The RTC i s used for getting time to be counted in terms of hours ,minutes and seconds .(In PCs , the RTC is realized in a separate chip which is powered by a battery ,which keeps it working non-stop even when the system is powered off.) The Real Time Clock (RTC) module inside the Atom chip provides a battery backed-up date and time keeping functionality. The design of this RTC is functionally compatible with the Motorola MS146818B which is the RTC used in PCs. The time keeping base clock comes from a 32.768 kHz oscillating source, which is internally divided to achieve an update every second. The RTC of the E6xx series has internal registers and SRAM which are organized as two banks of 128 bytes each, called the standard and extended banks. The first 14 bytes of the standard bank contain the RTC time and date information along with four registers, A to D, that are used for configuration of the RTC. The extended bank contains a full 128 bytes of battery backed SRAM which typically stores BIOS settings.

19.3.8 8254 Timer and 8259 APIC The 8254 is the legacy x86 timer ,and the 8259 is the legacy PIC (programmable Interrupt controller) Both these ICs have been discussed in depth in Chapters 11 and 12 . There is also

the I/O APIC (Advanced PIC) ,the details of which have been discussed in Section 17..

19.3.9 Low Pin Count Interface (LPC) This is another - legacy

interface for low speed

peripherals (Section 20..). It is used to connect low speed devices that dont require the bandwidth of PCI Express. Example of such devices as PS2 keyboard/mouse ,legacy serial and parallel ports etc.

19.3.10 Watch Dog Timer(WDT) A watchdog timer is an additional timer that does a monitoring job and resets the system, if necessary. The scenario is this. Most embedded systems are expected to be self-reliant. There is very little possibility of intervention by a human operator in case the associated software goes awry by getting stuck in an infinite loop.. Such anomalies can occur due to various reasons like deadlocks (in a multitasking environment), a noise voltage on some pin which may cause wrong triggering and so on. The point is that, if such a situation arises, there should be a mechanism by which this is automatically detected and gets the system to reset. The watchdog timer, like any timer, can be loaded with a count which decrements down to zero. When it reaches zero, it resets the processor. For a system which is doing its job correctly, the watchdog timer count will never reach zero. Before that, the correctly operating software will re-start it periodically and re-load its original count, so as to prevent it from counting down to zero. If software get awry, the Watchdog does not get reloaded, and upon its count reaching zero, it resets and restarts the system automatically. Fig 19.5 illustrates the general operation of a WDT which is usually a mandatory component in all embedded processors. What is the number loaded as the count in a WDT? This is decided by considering how much time is to be allowed for the system to recover (on its own) before it is to be forcibly reset. . E6xx processors offers a wide time range from 1 s to 10 minutes to cover varieties of applications.

Fig 19.5 The operation of a watch dog timer

19.3.11 GPIO(General Purpose I/O) Embedded processors need signal pins to which non-standards sensors/actuators can be connected . These pins -should be programmable to act as input or outputs . For example an input pin may be needed to attach a temperature sensor and an output pin to connect a single LED. Atom has 14 such GPIO pins .

19.3.12 PCIe Channels PCIe is a high speed scalable interface which provides point to point connectivity between components. There are 4 single lane PCIe channels in E6xxx which can be used to connect external peripherals. The 4 x1 PCIe ports operate as four independent PCIe controllers. Each root port supports up to 2.5 Gbits/sec bandwidth in each direction per lane. These ports may be used to attach discrete I/O components such as network adapters or a custom I/O Hub for increased I/O expansion. . Fig19.6 shows the case of one PCIe channel being used with an standard IO Hub (Sec 20..) which has multiple devices connected to it. .

Fig 19.6 Multiple peripherals attached through an IOH to one PCIe port of Atom

19.3.13 A general Atom based System Fig 19.7 shows the Atom SoC with some general external peripherals -a DDR2 memory , two displays, one codec and many possible PCIe devices . In spite of being a single chip computer, Atom still needs an external clock generator .An optional power management IC is sometimes used to take care of the different voltage and power requirements needed for the different peripherals that form part of the system. Observe the 4 PCIe lanes . Four possible types of

devices are shown connected to the channels. -i.e. any discrete component ,ASIC ,FPGA or a proprietary Hub . Since PCIe is an industry standard for connectivity, a plethora of devices have been developed over the years to cater various purposes; any peripheral which has PCIe connectivity can be seamlessly connected to the system.

Fig 19.7 A general Atom based system

Now observe 19.8 where ,the Atom processor is shown to be interfaced to various devices through a proprietary PCH. A PCH (Platform Controller Hub) is an Intel chipset which expands the SoCs capability by providing it access to SATA, Ethernet ,USB etc ..

Fig 19.8 Expanding the functionality of the chip using a custom IOH

19.4 Power reduction techniques in processors. The title of this section addresses a very general problem ,but it is introduced in this chapter ,because power reduction is a very important issue for Atom processors. In the early years of computer evolution ,power dissipation and power reduction were not issues on which much time or effort was spent. CPUs worked at low frequencies and were massive in physical size but computers were used only for specialized applications. Getting more performance was the point in focus then. Over time ,computers became more powerful and also more power hungry .Thermal management i.e cooling techniques to reduce heat dissipated in computer systems , became a big research problem . Along with this ,more thought was given to power management while keeping performance high. . In recent times ,the whole scenario has changed -dramatically and very fast. Now

computers are something that everyone possesses and they have also become portable and

mobile .When high performance processors have to be used for small devices like mobile phones and tablets where batteries are the source of power , power reduction requirements become very stringent . In short , no processor or computer design can ignore the power issue . This is being tackled at various fronts which are as listed below. I)CPU design stage ii) System design stage iii) Application level

19.4.1 CPU Design Stage : This is a very specialized topic which is to be dealt with, at the IC design level . Techniques for reducing static and leakage current and designing CPUs which operate at low volage levels is the way it is tackled . Some time back , microprocessors had 5V as supply voltage .The latest ones use levels around 1 V now . Atom needs a supply voltage of just 1.8V. There are techniques such as clock-gating, pulsed operation of latches, insertion of

sleep transistors, use of high threshold CMOS devices on noncritical paths, partitioning the chip into multiple clock and voltage domains etc which have been extensively used in processor design. Most of these and many more innovative design concepts have been added to the newer processors including Atom .But we will not discuss these techniques here as they are beyond the scope of this chapter . 19.4.2 System Design Stage: A computer system is not the just CPU alone . Every component uses power and hence choosing low power components like displays ,disk drives , buses , memory,chipsets etc is very important .In the field of embedded systems ,the right choice of peripherals matter much more than regular computer systems. 19.4.3 Application Level : This is where power management is done actively -of course with the support of the processor and platform hardware . We will see more of it now.

19.4.4 THERMAL DESIGN POWER (TDP) A term TDP has been coined where TDP (expressed in watts) is a power rating for a processor/ system. It is defined as the maximum amount of power the cooling system in a

computer must be able to dissipate. It does not represent the maximum wattage that a system can withstand but indicates how much power would be drawn when running applications for which it is designed. Thus there is a typical TDP for any processor and so also for a platform or a system as a whole. The Atom E6xx series has TDP ranges from 3.3 W to 4.5 W depending on the specific processor. When it is a part of a system ,other components like chipsets and peripherals add more power dissipation .Table 19.2 shows TDPs of typical computer systems .

Table 19.2 TDP ratings for typical computer systems(Courtesey : .) Why is TDP rating so very important? TDP pertains to power that is used up when the system is being used . i) It affects the cooling system .For higher TDPs ,active cooling techniques like cooling fans and cooling tubes may become necessary .This increases the bulk of the system and portability is affected .Obviously tablets and mobile phones cannot use such cooling techniques. ii) A great majority of computing platforms now conform to the mobile category .Desktops have given way to notebooks which we call laptops . Tablets and mobile phones are also computing devices. All these are battery operated and for longer battery life, a low TDP is very essential.

19.4.5 POWER MANAGEMENT Let us list out a few interesting points with respect to power management i) TDP is defined for a system in its running state . What this implies is that ,when it runs at maximum performance ,it power dissipation should be within the limit specified. But a system that is running may be at different levels of performance at different times. For example, it may be switched ON ,but may not be doing anything significant . But because of being powered on ,it draws power .This is what we call idle power . To conserve the battery ,this idle power should be very low . ii) For idle parts of the system , power to part of the systems may be switched off . As the idle period extends ,more and more devices in the system may be put in an idle state ,which is called the sleep state where power consumption is at the lowest . Since the processor is not doing any activity ,its clock ,memory , etc can be turned off . All these techniques lead to a very low average power dissipation .If the device is battery powered ,it definitely extends the battery life.

iii) Power dissipated in a system is directly proportional to the clock frequency ,as well as the square of the supply voltage. For getting higher performance , processors and systems are expected to work at high speeds ,which translates to high clock frequencies iv) A point to note is that a system has varying workloads ,which means that the number of tasks it needs to run concurrently ,varies . If it has just a few number of tasks and these tasks dont demand stringent deadlines ,there is no absolutely need for it to compute at high speeds . This directly leads to the possibility of being able to operate the system at a lower frequency . In contrast ,when heavy workloads with time constraints and complex computations are needed ,the system must be able to perform efficiently .This will need higher frequencies and cores working at higher supply voltages .This necessitates the concept of Dynamic voltage scaling (DVS) which addresses the

problem of how to modulate a processors clock frequency and supply voltage in accordance with the workload .

v) Because of the importance given to low power capabilities ,systems are better evaluated in terms of performance/watt rather than just performance.

19.5 ACPI (Advanced Power and Configuration Interface) The ideas discussed above have been formalized by the computer industry and a standard has been devised for dynamic power management of computers . The latest

standard is named ACPI. ACPI (Advanced Configuration and Power Interface ) is an open industry specification co-developed by several companies (ACPI, 2004). Before ACPI ,there was APM ( Advanced Power Management)which was a power management solution controlled by BIOS . Since the APM scheme had no idea on the applications running on a system, it became inadequate for supporting effective power management. In contrast, ACPI is one that is controlled by the OS and all major OSes of systems from desktops to tablets and mobile phones support this standard. It should be noted however ,that the processors used should have the hardware to support the OS for power management.

What does ACPI do ? ACPI defines a number of states from active and sleep to off states , for CPUs , system s ,devices etc and also defines and activates various levels of performance. ACPI provides a well defined interface so that BIOS can communicate the system capabilities to the OS in a consistent way. The ACPI standard categories the various states as follows:

19.5.1 Processor C States We will start with the states relevant to the CPU which are designated as C states. When a system is in the ON state ,it is not necessary for the CPU to be in an active state always ,because there will be periods when it can be shut off or be dormant . Also, some parts of the CPU can be put in sleep states .When there is more than one core ,it is possible that some of them are shut off .All these possibilities are engrained in the various C states C0 to Cn. Co is the state when the CPU is fully ON. As the value of n increases ,it implies that more that more components in the CPU are shut off and that the CPU is in a deeper sleep state .Deep sleep states have two effects. i) Lower power consumption

ii)

More latency for waking up

Table 19.3. explains these states, in a simplified manner

C states

Condition of the What is done CPU

Remarks

C0

Active

The CPU instructions The CPU

is active is executing

in executing No power saving the HLT 70% power saving

C1

Halt

instruction ,so no instruction processing is done C2 C3 C4 Stop Grant Deep sleep Deeper sleep The core clock is stopped The bus clock is stopped -

The core voltage reduced to around 1 V and a gradual reduction data in L2 cache is begun

C5

Still deeper sleep

Data in L2 cache is reduced to zero

Power saving of 98%

C6

A new state

Core voltage is reduced down to zero

This in

state

is

implemented only mobile

processors.

Table 19.3 The C states as defined in ACPI Note that these states have to be implemented in the design of the CPU . Most x86 processors have implementation of states C0 to C5 ,while Atom is one of the few

processors in which the C6 state is also implemented. Now that we have had a glance at the CPU sleep states ,it is easy to understand similar states for other parameters and components. 19.5.2 P States :This pertains to performance states. We have seen that for higher

performance higher frequencies and voltages are needed.Obviously scaling these factors up or

down directly affects performance .There are 16 defined Pn states where P1 is for the best performance and higher values of n mean lower performance ,but less power consumption and consequent less heat generated. In Intel processors including Atom ,performance modulation is achieved by Enhanced Intel SpeedStep Technology(EIST) which is a thermal and power management technology that allows processor performance and power consumption levels to be modified while the system is running. The clock frequency and voltages are reduced in steps to reduce power dissipation. Any system which wants to take advantage of this technology needs to have it supported in the processor, chipset , motherboard, BIOS and Operating System.

The BIOS enlists the number of C states and P states supported by the system in the ACPI tables. The OS then makes use of this information for managing speed and power.

19.5.3 S States: These are sleep states for a system where parts of the system are progressively switched off .See Table 19.4. State Condition S0 S1 System On Stop Grant What is done System is fully ON and all parts are in their working state The processor stops executing instructions, but processor context is available in hardware registers S2 S3 Suspend to RAM This S2 state is not used anymore The processor is OFF and its context is available in system RAM. Example for S3 state is laptops Sleep mode. S4 Suspend to disk The processor context is available in the disk only. Example for S4 state is laptops Hibernate mode. S5 Soft OFF The system is OFF practically,System context is lost, though mechanical OFF is not done. Table 19.4 System sleep states 19.5.4 Power management at multiple levels

It is not only at the CPU and System level that various sleep states are defined. They can be defined for individual devices and links and also for a global system view .All these are listed below. Global view: Gx states System: Sx states CPU: Cx states PCI / PCI-X Bus: Bx states PCI Express Links: Lx states Devices: Dx states Thermal: Tx States

SILVER MONT We have talked about the Bonnell architecture which is based on a 45nm technology .The smaller die version (32nm)of it is named Saltwell . In May 2013, Atom with a new architecture was released . The name of this new microarchitetcure for Atom is Silvermont ,which is an entirely new design though it has added in it many advanced aspects used in other Intel processors. In what aspects does Silvermont differ from Bonnell ? Some of the differences are Out-of order architecture Modular Core architecture with upto two cores per module Upto four modules, i.e. 8 cores Based on a 22nm process specially tuned for SoCs Uses 3D Finfet transistors L2 cache upto 1 MB per module Better branch prediction and faster recovery from branch mispredictions

No hyperthreading Addition of new instructions Higher speed internal buses to support high badnwidth for memory and PCI Express

An interesting feature is that Silvermont's multi-core design is organized as modules with each module having two cores in it A dual core chip will use just one module . The device can be scaled up to 8 cores.See Fig 19.8 Each module is built around a pair of cores, with individual L2 caches (up to 1MB,) interconnected via a system agent /crossbar.

. Fig 19.8 A quad core Silvermont Atom with two interconnected modules

With many such innovative aspects ,Silvermont is expected to have around 3 times higher performance and 5 time power efficiency compared to Saltwell . Silvermont has been made available in the SoC named Merrifield for smartphones, Baytrail for tablets, and Avoton for Microservers . The latter two are targeted for Embedded and Comunications Infrastructure market segments. The processor is also available in various TDP versions which means it can be used for desktops and servers also ,though with higher TDPs and performance.

Intel plans to release a new core every year for Atom ,like it does for its desktop processors .The 14 nm version of Silvermont is named Airmont and is scheduled for release in 2014. APPLICATIONS The high performance, cost effective, low power Atom architecture based processors enable various classes of applications. Examples are Industrial Controllers, Automotive In-vehicle Infotainment systems, Internet of Things, portable instruments etc.

Key Points of this chapter Intel is a company which has developed many general purpose processors and is the industry leader in this field. Intel has also designed many embedded processors which have been very popular. The newest embedded processor of Intel is Atom which has very many applications. The name of the microarchitecture used in Atom is Bonnell. Tunnelcreek is the name of an Atom based SoC It is meant for embedded applications and has all the peripherals in it for its use in this way. It has specialized hardware for graphics. An Atom based system can use an IO Hub chipset or can be interfaced to peripherals using the PCIe bus. Power reduction techniques are important in all processors and specially in Atom. Atom has been designed for a very low TDP. Power management for computer systems is done using OS supports. ACPI is the current power management standard. Silvermont is the new microarchitecture for the latest Atom processor.

QUESTIONS

1. In what ways should an embedded processor be different from a general purpose processor? 2. Why has the Bonnell microarchitecture been designed to be an in order pipeline? 3. What is the downside of adding hyperthreading to the Bonnell core? 4. List out a few applications where Atom has been used . 5. Why is Tunnelcreek called an SoC? 6. What are the graphics capabilities of Tunnel creek? Explain in detail. 7. Why is LVDS a very good method of signal capture? 8. How does the WDT function? Why is it a mandatory peripheral in all embedded devices? 9. Why is power reduction a key point in system design? 10. How is power managed dynamically ? 11. What is ACPI and how is it used? 12. List out the different states used in ACPI and explain the concept behind the definition of each of these states. 13. What is meant by the term TDP and how is it relevant for mobile devices? 14. List out the features of Silvermont which are different from Bonnell. 15. What is Airmont?