Functional Units

Functional Units A computer has five functionally independent main parts: Input Unit Memory Unit Arithmetic and
logic Unit Output Unit Control unit
Input Unit:Computers accept coded information through input units, which read the data. The most well-known input device is keyboard. Whenever a key is pressed, the corresponding letter or digit is automatically translated into its corresponding binary code and transmitted over a cable to either the memory or the processor. Memory Unit:The function of the memory unit is to store programs and data. There are two classes of storage, called Primary and Secondary. Primary storage is a fast memory that operates at electronic speeds. Programs must be stored in the memory while they are being executed. Memory in which any location can be reached in a short and fixed amount of time after specifying its address is called Random Access Memory(RAM). The time required to access one word is called the Memory Access Time. Although Primary Storage is essential, it tends to be expensive. Thus additional, cheaper, secondary storage is used when large amounts of data and many programs have to be stored, particularly for information that is accessed infrequently. Ex:- magnetic disks and tapes and optical disks. Arithmetic and Logic Unit:-
All the arithmetic or logic operation, is initiated by bringing the required operands into the processor, where the operation is performed by the ALU.
Output Unit:Its function is to send processed result to the outside world. Ex:- Printer. Control Unit:The Control Unit is used to co-ordinate the operations of memory, ALU, input and output units. The Control Unit can be said as the nerve center that sends control signals to other units and senses their states.
Basic Operational Concepts The computer works based on some given instruction. Let us consider an example: Add LOCA, R0 -> This instruction adds the operands at memory location LOCA to the operand in the register R0 and places the sum into the register R0.It seems that this instruction is done in one step, but actually it internally performs several steps ->First, the instruction is fetched from the memory into the processor. Next, the operand at LOCA is fetched and added to the contents of R0. The above instruction can be written also asLoad LOCA, R1 Add R1,R0 Let us now analyse how the memory and processor are connected:The Processor contains a number of registers used for several purposes. IR: The IR(Instruction Register) holds the instruction that is currently being executed. PC: The PC(Program Counter) is another specialized register which contains the memory address of next instruction to be fetched. MAR: This register facilitates communication with the memory. The MAR holds the address of the location to be accessed. MDR: This register facilitates communication with the memory. The MDR contains the data to be written into or read out of the addressed location. There are n general purpose registers R0 to Rn-1. The Program execution starts when the PC is set to point the 1st instruction. The content of the PC is transferred to the MAR and Read control signal is sent to memory. Then the addressed word is read out of the memory and loaded into the MDR. Next the contents of the MDR are transferred to the IR. Then the program is decoded, it is sent to the ALU if it has some arithmetic or logical calculations. The n general purpose registers are used during this calculations to store the result. Then the result is sent to the MDR, and its address of location where result is stored is sent to MAR.
And then a write cycle is initiated. Then PC is incremented and the process continues.
Bus Structures:The Bus is used to connect all the individual parts of a computer in an organized manner. The simplest way to interconnect functional units is to use a single bus. The main advantage of the single bus structure is its low cost and its flexibility. The multiple bus achieve more concurrency in operation by allowing two or more transfers to be carried at the same time. Multiprocessors and Multicomputers:A large computer system which contains a number of processor units is called Multiprocessor System. This systems either execute a number of different application tasks in parallel, or they execute subtasks of a single large task in parallel. The high performance of these systems comes with much increased complexity and cost. The Multicomputer System is formed by interconnecting a group of complete computers to achieve high total computational power. They can access only their own memory. They communicate data by exchanging messages over a communication network. Performance:Performance means how quickly it can execute programs. For best performance, it is necessary to design the compiler, the machine instruction set, and the hardware in a coordinated way. Processor circuits are controlled by a timing signal called clock. The processor divides the action to be performed in basic steps, such that each step can be completed in one clock cycle. Performance Equation is given by:- T=((NXS)/R) , Where N= actual no. of instruction executions , S= avg no. of basic step needed to execute one machine instruction , R- clock rate (cycles/sec) In order to achieve high performance, the T value should reduced which can be done by reducing N and S, or by increasing R. A Substantial improvement can also be done by overlapping the execution of successive instructions. This concept is known as pipelining.
PERFORMANCE AND PERFORMANCE METRICS Computer performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used. Depending on the context, good computer performance may involve one or more of the following: Short response time for a given piece of work High throughput (rate of processing work) Low utilization of computing resource(s) High availability of the computing system or application Fast (or highly compact) data compression and decompression High bandwidth / short data transmission time
Performance metrics There are a wide variety of technical performance metrics that indirectly affect overall computer performance. Because there are too many programs to test a CPU's speed on all of them, benchmarks were developed. The most famous benchmarks are the SPECint and SPECfp benchmarks developed by Standard Performance Evaluation Corporation and the ConsumerMark benchmark developed by the Embedded Microprocessor Benchmark Consortium EEMBC. Some important measurements include: Instructions per second Most consumers pick a computer architecture (normally Intel IA32 architecture) to be able to run a large base of pre-existing, pre-compiled software. Being relatively uninformed on computer benchmarks, some of them pick a particular CPU based on operating frequency (see megahertz myth). FLOPS The number of floating-point operations per second is often important in selecting computers for scientific computations. Performance per watt System designers building parallel computers, such as Google, pick CPUs based on their speed per watt of power, because the cost of powering the CPU outweighs the cost of the CPU itself. [1][2] Some system designers building parallel computers pick CPUs based on the speed per dollar. System designers building real-time computing systems want to guarantee worst-case response. That is easier to do when the CPU has low interrupt latency and when it has deterministic response. (DSP[disambiguation needed ][clarification needed])
Computer programmers who program directly in assembly language want a CPU to support a fullfeatured instruction set. Low power For systems with limited power sources (e.g. solar, batteries, human power). Small size or low weight - for portable embedded systems, systems for spacecraft. Environmental impact Minimizing environmental impact of computers during manufacturing and recycling as well as during use. Reducing waste, reducing hazardous materials. (see Green computing). Giga-updates per second - a measure of how frequently the RAM can be updated Occasionally a CPU designer can find a way to make a CPU with better overall performance by improving one of these technical performance metrics without sacrificing any other (relevant) technical performance metricfor example, building the CPU out of better, faster transistors. However, sometimes pushing one technical performance metric to an extreme leads to a CPU with worse overall performance, because other important technical performance metrics were sacrificed to get one impressive-looking numberfor example, the megahertz myth. The total amount of time (t) required to execute a particular benchmark program is t=N*C/f where N is the number of instructions actually executed (the instruction path length). The code density of the instruction set strongly affects N. The value of N can either be determined exactly by using an instruction set simulator (if available) or by estimationitself based partly on estimated or actual frequency distribution of input variables and by examining generated machine code from an HLL compiler. It cannot be determined from the number of lines of HLL source code. N is not affected by other processes running on the same processor. The significant point here is that hardware normally does not keep track of (or at least make easily available) a value of N for executed programs. The value can therefore only be accurately determined by instruction set simulation, which is rarely practiced. f is the clock frequency in cycles per second. C is the average cycles per instruction (CPI) for this benchmark. Even on one machine, a different compiler or the same compiler with different compiler optimization switches can change N and CPIthe benchmark executes faster if the new compiler can improve N or C without making the other worse, but often there is a trade-off between themis it better, for example, to use a few complicated instructions that take a long time to execute, or to use instructions that execute very quickly, although it takes more of them to execute the benchmark. CPU designer is often required to implement a particular instruction set, and so cannot change N. Sometimes a designer focuses on improving performance by making significant improvements in f (with techniques such as deeper pipelines and faster caches), while (hopefully) not sacrificing too much Cleading to a speed-demon CPU design. Sometimes a designer focuses on improving performance by making significant improvements in CPI (with techniques such as out-of-order execution, superscalar CPUs, larger caches, caches with improved hit rates, improved branch prediction, speculative execution, etc.), while (hopefully) not sacrificing too much clock frequency leading to a brainiac CPU design.
INSTRUCTION CYCLE
Circuits used The circuits used in the CPU during the cycle are: Program Counter (PC) - an incrementing counter that keeps track of the memory address of which instruction is to be executed next... Memory Address Register (MAR) - holds the address of a memory block to be read from or written to Memory Data Register (MDR) - a two-way register that holds data fetched from memory (and ready for the CPU to process) or data waiting to be stored in memory Instruction register (IR) - a temporary holding ground for the instruction that has just been fetched from memory Control Unit (CU) - decodes the program instruction in the IR, selecting machine resources such as a data source register and a particular arithmetic operation, and coordinates activation of those resources Arithmetic logic unit (ALU) - performs mathematical and logical operations The time period during which one instruction is fetched from memory and executed when a computer is given an instruction in machine language. There are typically four stages of an instruction cycle that the CPU carries out: 1) Fetch the instruction from memory. 2) "Decode" the instruction. 3) "Read the effective address" from memory if the instruction has an indirect address. 4) "Execute" the instruction. Instruction cycle Each computer's CPU can have different cycles based on different instruction sets, but will be similar to the following cycle: 1. Fetch the instruction The next instruction is fetched from the memory address that is currently stored in the Program Counter (PC), and stored in the Instruction register (IR). At the end of the fetch operation, the PC points to the next instruction that will be read at the next cycle. Clock Pulse: T0-T1 2. Decode the instruction The decoder interprets the instruction. During this cycle the instruction inside the IR (instruction register) gets decoded. Clock Pulse: T2 3. Read the effective address In case of a memory instruction (direct or indirect) the execution phase will be in the next clock pulse. If the instruction has an indirect address, the effective address is read from main memory, and any required data is fetched from main memory to be processed and then placed into data registers(Clock Pulse: T3). If the instruction is direct, nothing is done at this clock pulse. If this is an I/O instruction or a Register instruction, the operation is performed (executed) at clock Pulse: T3. Clock Pulse: T3
4. Execute the instruction The CU passes the decoded information as a sequence of control signals to the relevant function units of the CPU to perform the actions required by the instruction such as reading values from registers, passing them to the ALU to perform mathematical or logic functions on them, and writing the result back to a register. If the ALU is involved, it sends a condition signal back to the CU. Clock Pulse: T3-T6 (Up to T6) The result generated by the operation is stored in the main memory, or sent to an output device. Based on the condition of any feedback from the ALU, Program Counter may be updated to a different address from which the next instruction will be fetched. The cycle is then repeated. Initiating the cycle The cycle starts immediately when power is applied to the system using an initial PC value that is predefined for the system architecture (in Intel IA-32 CPUs, for instance, the predefined PC value is 0xfffffff0). Typically this address points to instructions in a read-only memory (ROM) which begin the process of loading the operating system. (That loading process is called booting.) Fetch cycle Step 1 of the Instruction Cycle is called the Fetch Cycle. These step are the same for each instruction. The fetch cycle processes the instruction from the instruction word which contains an opcode. Decode Step 2 of the instruction Cycle is called the decode. The opcode fetched from the memory is being decoded for the next steps and moved to the appropriate registers. Read the effective address Step 3 is deciding which operation it is. If this is a Memory operation - in this step the computer checks if it's a direct or indirect memory operation: Direct memory instruction - Nothing is being done. Indirect memory instruction - The effective address is being read from the memory. If this is a I/O or Register instruction - the computer checks it's kind and execute the instruction. Execute cycle Step 4 of the Instruction Cycle is the Execute Cycle. These steps will change with each instruction. The first step of the execute cycle is the Process-Memory. Data is transferred between the CPU and the I/O module. Next is the Data-Processing uses mathematical operations as well as logical operations in reference to data. Central alterations is the next step, is a sequence of operations, for example a jump operation. The last step is a combined operation from all the other steps. The Fetch-Execute cycle in Transfer Notation Expressed in register transfer notation: (Increment the PC for next cycle at the same time)
INTERFACES Hardware Interfaces Hardware interfaces exist in computing systems between many of the components such as the various buses, storage devices, other I/O devices, etc. A hardware interface is described by the mechanical, electrical and logical signals at the interface and the protocol for sequencing them (sometimes called signaling). A standard interface, such as SCSI, decouples the design and introduction of computing hardware, such as I/O devices, from the design and introduction of other components of a computing system, thereby allowing users and manufacturers great flexibility in the implementation of computing system Hardware interfaces can be parallel where performance is important or serial where distance is important. Software Interfaces A software interface may refer to a range of different types of interface at different "levels": an operating system may interface with pieces of hardware, applications or programs running on the operating system may need to interact via streams, and in object oriented programs, objects within an application may need to interact via methods. Types of Addressing Modes in Computers
Random access memory, or RAM, is the main memory for a computer. Applications are loaded and run in RAM. Addressing modes allocate RAM into portions that may be individually referenced so that the central processing unit, or CPU, can determine which memory location is being used by a machine instruction. Immediate Addressing Mode The immediate mode is the simplest form of addressing. The operand is part of the instruction, and therefore no memory reference, other than the instruction, is required to retrieve the operand. This mode is fast and can be used to define constants or set initial variable values. This mode has a limited range because it is limited to the size of the address field, which for most instruction sets is small compared with word length. Direct Addressing Mode In the direct mode, the address field contains the address of the operand. It requires a single memory reference to read the operand from the given location. However, it provides only a limited address space. Indirect Addressing Mode In the indirect mode, the memory cell pointed to by the address field contains the address of (pointer) the operand, which in turn contains the full-length address of the operand. This mode has a large address space, unlike direct and immediate addressing, but because multiple memory accesses are required to find the operand it is slower. Register Addressing Mode Register mode is similar to direct mode. The key difference between the two modes is that the address field of the instruction refers to a register rather than a memory location. Register addressing does not have an effective address. Three or four bits are used as the address field to reference registers.
Register Indirect Addressing Mode This mode is similar to indirect addressing. The operand is in a memory cell pointed to by contents of a register. The register contains the effective address of the operand. This mode uses one fewer memory access than indirect addressing. This mode has a large address space, but it is limited to the width of the registers available to store the effective address.
Displacement Addressing Mode Displacement mode consists of 3 variations: 1) Relative addressing 2) Base register addressing 3) Indexing addressing. This mode can be considered a combination of direct and register indirect addressing. The address holds two values: base value and a register that contains an integer displacement that is added or subtracted from the base to form the effective address in memory. Stack Addressing Mode Stack mode, also known as implicit addressing, consists of a linear array of locations referred to as last-in first-out queue. The operand is on the top of the stack. The stack pointer is a register that stores the address of top of stack location. An instruction set, or instruction set architecture (ISA), is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine language), and the native commands implemented by a particular processor. Instruction set architecture is distinguished from the microarchitecture, which is the set of processor design techniques used to implement the instruction set. Computers with different microarchitectures can share a common instruction set. For example, the Intel Pentium and the AMD Athlon implement nearly identical versions of the x86 instruction set, but have radically different internal designs. Some virtual machines that support bytecode for Smalltalk, the Java virtual machine, and Microsoft's Common Language Runtime virtual machine as their ISA implement it by translating the bytecode for commonly used code paths into native machine code, and executing less-frequentlyused code paths by interpretation; Transmeta implemented the x86 instruction set atop VLIW processors in the same fashion. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture, a given instruction may specify: Particular registers for arithmetic, addressing, or control functions Particular memory locations or offsets Particular addressing modes used to interpret the operands
More complex operations are built up by combining these simple instructions, which (in a von Neumann architecture) are executed sequentially, or as otherwise directed by control flow instructions. Instruction types Some operations available in most instruction sets include: Data handling and Memory operations set a register (a temporary "scratchpad" location in the CPU itself) to a fixed constant value move data from a memory location to a register, or vice versa. This is done to obtain the data to perform a computation on it later, or to store the result of a computation. Read and write data from hardware devices Arithmetic and Logic Add, subtract, multiply, or divide the values of two registers, placing the result in a register, possibly setting one or more condition codes in a status register perform bitwise operations, taking the conjunction and disjunction of corresponding bits in a pair of registers, or the negation of each bit in a register Compare two values in registers (for example, to see if one is less, or if they are equal) Control flow Branch to another location in the program and execute instructions there Conditionally branch to another location if a certain condition holds Indirectly branch to another location, while saving the location of the next instruction as a point to return to (a call) Complex instructions Some computers include "complex" instructions in their instruction set. A single "complex" instruction does something that may take many instructions on other computers. Such instructions are typified by instructions that take multiple steps, control multiple functional units, or otherwise appear on a larger scale than the bulk of simple instructions implemented by the given processor. Some examples of "complex" instructions include: Saving many registers on the stack at once Moving large blocks of memory Complex and/or floating-point arithmetic (sine, cosine, square root, etc.) Performing an atomic test-and-set instruction Instructions that combine ALU with an operand from memory rather than a register A complex instruction type that has become particularly popular recently is the SIMD or SingleInstruction Stream Multiple-Data Stream operation or vector instruction, an operation that performs the same arithmetic operation on multiple pieces of data at the same time. SIMD have the ability of manipulating large vectors and matrices in minimal time. SIMD instructions allow easy parallelization of algorithms commonly involved in sound, image, and video processing. Various SIMD implementations have been brought to market under trade names such as MMX, 3DNow!
and AltiVec. Parts of an instruction One instruction may have several fields, which identify the logical operation to be done, and may also include source and destination addresses and constant values. This is the MIPS "Add" instruction which allows selection of source and destination registers and inclusion of a small constant. On traditional architectures, an instruction includes an opcode specifying the operation to be performed, such as "add contents of memory to register", and zero or more operand specifiers, which may specify registers, memory locations, or literal data. The operand specifiers may have addressing modes determining their meaning or may be in fixed fields. In very long instruction word (VLIW) architectures, which include many microcode architectures, multiple simultaneous opcodes and operands are specified in a single instruction. Some exotic instruction sets do not have an opcode field (such as Transport Triggered Architectures (TTA) or the Forth virtual machine), only operand(s). Other unusual "0-operand" instruction sets lack any operand specifier fields, such as some stack machines including NOSC [1]. Instruction length The size or length of an instruction varies widely, from as little as four bits in some microcontrollers to many hundreds of bits in some VLIW systems. Processors used in personal computers, mainframes, and supercomputers have instruction sizes between 8 and 64 bits(The longest possible instruction on x86 is 15 bytes, that is 120 bits). Within an instruction set, different instructions may have different lengths. In some architectures, notably most reduced instruction set computers (RISC), instructions are a fixed length, typically corresponding with that architecture's word size. In other architectures, instructions have variable length, typically integral multiples of a byte or a halfword. Representation The instructions constituting a program are rarely specified using their internal, numeric form; they may be specified by programmers using an assembly language or, more commonly, may be generated by compilers. Arithmetic Logic Unit (ALU) Design We have mentioned the Arithmetic Logic Unit (ALU) in earlier pages, however we have never gone into its details. The ALU is the part of the microprocessor that handles all boolean and mathematical operations. However, exploring an actual ALU from one of the current microprocessors is too complex for our purposes. Thus, we have taken a more simplified design that, although limited in functionality, will show you the general picture. It is best for you to first familiarize yourself with the diagram below before reading on, such that you can better understand the rest of this section. If you are somewhat confused by this drawing, there is nothing wrong. We have divided the diagram into three sections: Logical Unit, Decoder, and Full Adder. You can also observe that the inputs are A, B, F0, F1, and Carry In. You can also notice several 3-input AND gates and a 4-input OR gate. They are basically the same as the 2-input gates that were presented earlier, except that the 3-input AND gate only outputs a 1 if all 3 inputs are 1s, and the 4-input OR gate always outputs a 1, except when all inputs are 0s.
1-bit ALU
The F inputs control the enable lines. As you can see, no matter which of the four possible combinations of F inputs is put in, only one (and each time different) enable line will be "turned on." Thus, the function of the decoder subpart is to figure out which of the 4 operations will be done. The A and B are used as the regular inputs for all operations. The full adder is nearly the same circuit as the one shown on the Mathematical Operations page. However, all output is ANDed with the corresponding enable line. The logical unit is simply a collection of three boolean operations, AB, A+B, NOT B. As with the full adder, each of their outputs is ANDed with the corresponding enable line. On the very right, all of the outputs are ORed together. However, only one of the four inputs could potentially be a 1 because of the enable lines.
The diagram, however only represents a 1-bit ALU. Most likely, an 8-bit ALU is more convenient for useful operations. To create an 8-bit ALU, this diagram needs to be repeated 8 times, linking the Carry-Out to the Carry-In of the next one each time. This concept has already been introduced with the full adder. In case you have not fully understood how the ALU works, we will present a sample case. In this sample case, the inputs are A=1, B=0, F0=1, F1=0, and we will not worry about the Carry In. Going to the decoder, we can see that this will enable the third enable line. By following this enable line, we find out that it is going to be ANDed with the NOT, thus we will only explore the NOT (since all the other outputs will be ANDed with 0, automatically producing a 0 output). The NOT takes in only B. Since B is 0, NOT B is 1. When that 1 is ANDed with the enable line, it produces a 1, since both are 1. This then goes to the output, where the 1 is ORed with 3 other 0s. Thus the output will be 1, and the carry out will be 0. Reduced instruction set computing, or RISC is a CPU design strategy based on the insight that simplified (as opposed to complex) instructions can provide higher performance if this simplicity enables much faster execution of each instruction. A computer based on this strategy is a reduced instruction set computer (also RISC). There are many proposals for precise definitions,[1] but the term is slowly being replaced by the more descriptive load-store architecture. Well-known RISC families include DEC Alpha, AMD 29k, ARC, ARM, Atmel AVR, Blackfin, MIPS, PA-RISC, Power (including PowerPC), SuperH, and SPARC. Some aspects attributed to the first RISC-labeled designs around 1975 include the observations that the memory-restricted compilers of the time were often unable to take advantage of features intended to facilitate manual assembly coding, and that complex addressing modes take many cycles to perform due to the required additional memory accesses. It was argued that such functions would be better performed by sequences of simpler instructions if this could yield implementations small enough to leave room for many registers,[2] reducing the number of slow memory accesses. In these simple designs, most instructions are of uniform length and similar structure, arithmetic operations are restricted to CPU registers and only separate load and store instructions access memory. These properties enable a better balancing of pipeline stages than before, making RISC pipelines significantly more efficient and allowing higher clock frequencies. Typical characteristics of RISC For any given level of general performance, a RISC chip will typically have far fewer transistors dedicated to the core logic which originally allowed designers to increase the size of the register set and increase internal parallelism. Other features, which are typically found in RISC architectures are: Uniform instruction format, using a single word with the opcode in the same bit positions in every instruction, demanding less decoding; Identical general purpose registers, allowing any register to be used in any context, simplifying compiler design (although normally there are separate floating point registers); Simple addressing modes. Complex addressing performed via sequences of arithmetic and/or load-store operations; Few data types in hardware, some CISCs have byte string instructions, or support complex numbers; this is so far unlikely to be found on a RISC. Exceptions abound, of course, within both CISC and RISC. RISC designs are also more likely to feature a Harvard memory model, where the instruction stream and the data stream are conceptually separated; this means that modifying the memory where code is held might not have any effect on the instructions executed by the processor (because the CPU
has a separate instruction and data cache), at least until a special synchronization instruction is issued. On the upside, this allows both caches to be accessed simultaneously, which can often improve performance. Many early RISC designs also shared the characteristic of having a branch delay slot. A branch delay slot is an instruction space immediately following a jump or branch. The instruction in this space is executed, whether or not the branch is taken (in other words the effect of the branch is delayed). This instruction keeps the ALU of the CPU busy for the extra time normally needed to perform a branch. Nowadays the branch delay slot is considered an unfortunate side effect of a particular strategy for implementing some RISC designs, and modern RISC designs generally do away with it (such as PowerPC and more recent versions of SPARC and MIPS
Fixed-point arithmetic In computing, a fixed-point number representation is a real data type for a number that has a fixed number of digits after (and sometimes also before) the radix point (after the decimal point '.' in English decimal notation). Fixed-point number representation can be compared to the more complicated (and more computationally demanding) floating point number representation. Fixed-point numbers are useful for representing fractional values, usually in base 2 or base 10, when the executing processor has no floating point unit (FPU) or if fixed-point provides improved performance or accuracy for the application at hand. Most low-cost embedded microprocessors and microcontrollers do not have an FPU. Representation A value of a fixed-point data type is essentially an integer that is scaled by a specific factor determined by the type. For example, the value 1.23 can be represented as 1230 in a fixed-point data type with scaling factor of 1/1000, and the value 1230000 can be represented as 1230 with a scaling factor of 1000. Unlike floating-point data types, the scaling factor is the same for all values of the same type, and does not change during the entire computation. The scaling factor is usually a power of 10 (for human convenience) or a power of 2 (for computational efficiency). However, other scaling factors may be used occasionally, e.g. a time value in hours may be represented as a fixed-point type with a scale factor of 1/3600 to obtain values with one-second accuracy. The maximum value of a fixed-point type is simply the largest value that can be represented in the underlying integer type, multiplied by the scaling factor; and similarly for the minimum value. For example, consider a fixed-point type represented as a binary integer with b bits in two's complement format, with a scaling factor of 1/2f (that is, the last f bits are fraction bits): the minimum representable value is 2b-1/2f and the maximum value is (2b-1-1)/2f. Operations To convert a number from a fixed point type with scaling factor R to another type with scaling factor S, the underlying integer must be multiplied by R and divided by S; that is, multiplied by the ratio R/S. Thus, for example, to convert the value 1.23 = 123/100 from a type with scaling factor R=1/100 to one with scaling factor S=1/1000, the underlying integer 123 must be multiplied by (1/100)/(1/1000) = 10, yielding the representation 1230/1000. If S does not divide R (in particular, if the new scaling factor R is less than the original S), the new integer will have to be rounded. The rounding rules and methods are usually part of the language's specification. To add or subtract two values of the same fixed-point type, it is sufficient to add or subtract the underlying integers, and keep their common scaling factor. The result can be exactly represented in the same type, as long as no overflow occurs (i.e. provided that the sum of the two integers fits in
the underlying integer type.) If the numbers have different fixed-point types, with different scaling factors, then one of them must be converted to the other before the sum. To multiply two fixed-point numbers, it suffices to multiply the two underlying integers, and assume that the scaling factor of the result is the product of their scaling factors. This operation involves no rounding. For example, multiplying the numbers 123 scaled by 1/1000 (0.123) and 25 scaled by 1/10 (2.5) yields the integer 12325 = 3075 scaled by (1/1000)(1/10) = 1/10000, that is 3075/10000 = 0.3075. If the two operands belong to the same fixed-point type, and the result is also to be represented in that type, then the product of the two integers must be explicitly multiplied by the common scaling factor; in this case the result may have to be rounded, and overflow may occur. For example, if the common scaling factor is 1/100, multiplying 1.23 by 0.25 entails multiplying 123 by 25 to yield 3075 with an intermediate scaling factor if 1/10000. This then must be multiplied by 1/100 to yield either 31 (0.31) or 30 (0.30), depending on the rounding method used, to result in a final scale factor of 1/100. To divide two fixed-point numbers, one takes the integer quotient of their underlying integers, and assumes that the scaling factor is the quotient of their scaling factors. The first division involves rounding in general. For example, division of 3456 scaled by 1/100 (34.56) by 1234 scaled by 1/1000 (1.234) yields the integer 34561234 = 3 (rounded) with scale factor (1/100)/(1/1000) = 10, that is, 30. One can obtain a more accurate result by first converting the dividend to a more precise type: in the same example, converting 3456 scaled by 1/100 (34.56) to 3456000 scaled by 1/100000, before dividing by 1234 scaled by 1/1000 (1.234), would yield 34560001234 = 2801 (rounded) with scaling factor (1/100000)/(1/1000) = 1/100, that is 28.01 (instead of 290). If both operands and the desired result are represented in the same fixed-point type, then the quotient of the two integers must be explicitly divided by the common scaling factor. Binary vs. decimal The two most common fixed-point types are decimal and binary. Decimal fixed-point types have a scaling factor that is a power of ten, for binary fixed-point types it is a power of two. Binary fixed-point types are most commonly used, because the rescaling operations can be implemented as fast bit shifts. Binary fixed-point numbers can represent fractional powers of two exactly, but, like binary floating-point numbers, cannot exactly represent fractional powers of ten. If exact fractional powers of ten are desired, then a decimal format should be used. For example, one-tenth (0.1) and one-hundredth (0.01) can be represented only approximately by binary fixedpoint or binary floating-point representations, while they can be represented exactly in decimal fixed-point or decimal floating-point representations. These representations may be encoded in many ways, including BCD.

Functional Units

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Functional Units

Загружено:

Авторское право:

Доступные форматы

Functional Units A computer has five functionally independent main parts: Input Unit Memory Unit Arithmetic and

logic Unit Output Unit Control unit

Вам также может понравиться