Вы находитесь на странице: 1из 6

Part A 1.

Data Dependency Types Data Dependency: It refers to the situation in which two or more instructions share same data. The following types of data dependencies are recognized: Flow Dependence : If instruction I2 follows I1 and output of I1 becomes input of I2, then I2 is said to be flow dependent on I1. Antidependence : When instruction I2 follows I1 such that output of I2 overlaps with the input of I1 on the same data. Output dependence : When output of the two instructions I1 and I2 overlap on the same data, the instructions are said to be output dependent. I/O dependence : When read and write operations by two instructions are invoked on the same file, it is a situation of I/O dependence. Consider the following program instructions: I1: a = b I2: c = a + d I3: a = c In this program segment instructions I1 and I2 are Flow dependent because variable a is generated by I1 as output and used by I2 as input. Instructions I2 and I3 are Antidependent because variable a is generated by I3 but used by I2 and in sequence I2 comes first. I3 is flow dependent on I2 because of variable c. Instructions I3 and I1 are Output dependent because variable a is generated by both instructions. Flynns System Architecture Classification Single Instruction and Single Data stream (SISD) In this organization, sequential execution of instructions is performed by one CPU containing a single processing element (PE), i.e., ALU under one control unit as shown in Figure. Therefore, SISD machines are conventional serial computers that process only one stream of instructions and one stream of data.

Single Instruction and Multiple Data stream (SIMD) In this organization, multiple processing elements work under the control of a single control unit. It has one instruction and multiple data stream.

Multiple Instructions and Multiple Data stream (MIMD) In this organization, multiple processing elements and multiple control units are organized as in MISD. But the difference is that now in this organization multiple instruction streams operate on multiple data streams .

3.Describe Performance Factor: Performance Factor T = Ic x CPI x o o o o Ic be the number of instruction. CPI is the Cycles per Instruction is the clock with constant time Memory cycle is k times the processor cycle. Where p = No. of processor cycles needed for the instruction decode and execution m = No. of memory references needed k = ration b/w memory cycle and processor cycle

T = Ic x (p + m x k) +

4.What is communication Latency? Latency is the time it takes one message to travel from source to destination. Includes various overheads.

5.Define Vector and Vector processing and vectorization A vector is a set of scalar data items, all of the same type, stored in memory. A vector processor is an ensemble of hardware resources, including vector registers, functional pipelines, processing elements, and register counters, for performing vector operations. Vector processing occurs when arithmetic or logical operations are applied to vectors. The conversion from scalar processing to vector code is called vectorization. Part B 1.Explain the Shared Memory Multiprocessor Model: Two categories of parallel computers are architecturally modeled. These physical models are distinguished by having a shared common memory or unshared distributed memories. Three shared-memory multiprocessor models: Uniform memory access (UMA) model, Non uniform-memory-access(NUMA) model, Cache only memory access (COMA) model.

1.Uniform Memory Access Model (UMA) In this model, main memory is uniformly shared by all processors in multiprocessor systems and each processor has equal access time to shared memory. This model is used for time-sharing applications in a multi user environment All processors have equal access time to all memory words, which is why it is called uniform memory access. Each processor may use a private cache.Peripherals are also shared in some fashion. The UMA model is suitable for general purpose and time sharing applications by multiple users. It can be used to speed up the execution of a single large program in time-critical applications. To coordinate parallel events, synchronization and communication among processors are done through using shared variables in the common memory.

2. Non-Uniform Memory Access Model (NUMA) In shared memory multiprocessor systems, local memories can be connected with every processor. The collection of all local memories form the global memory being shared. In this way, global memory is distributed to all the processors. In this case, the access to a local memory is uniform for its corresponding processor as it is attached to the local memory. But if one reference is to the local memory of some other remote processor, then the access is not uniform. It depends on the location of the memory. Thus, all memory words are not accessed uniformly (a) Shared local memories

The shared memory is physically distributed to all processors, called local memories. The collection of all local memories forms a global address space accessible by all processors. It is faster to access a local memory with a local processor. The access of remote memory attached to other processors takes longer due to the added delay through the interconnection network b. A hierarchical cluster model

A hierarchically structured multiprocessor is modeled. The processors are divided into several clusters. Each cluster is itself an UMA or a NUMA multiprocessor. The clusters are connected to global sharedmemory modules. The entire system is considered a NUMA multiprocessor. All processors belonging to the same cluster are allowed to uniformly access the cluster shared-memory modules.

3. Cache-Only Memory Access Model (COMA) A multiprocessor using cache-only memory assumes the COMA model. The COMA model is a special case of a NUMA machine, in which the distributed main memories are converted to caches. There is no memory hierarchy at each processor node.All the caches form a global address space. Remote cache

access is assisted by the distributed cache directories. 2.Explain Vector super computer with neat Diagram A vector computer is often built on top of a scalar processor. As shown in the figure vector processor is attached to the scalar processor as an optional feature. Program and data are first loaded into the main memory through a host computer. All instructions are first decoded by the scalar control unit. If the decoded instruction is a scalar operation or a program control operation, it will be directly executed by the scalar processor using the scalar functional pipelines. If the instructions are decoded as a vector operation, it will be sent to the vector control unit. This control unit will supervise the flow of vector data between the main memory and vector functional pipelines. The vector data flow is coordinated by the control unit.A number of vector functional pipelines may be built into a vector processor Vector Processor Model: The Fig shows a register-to-register architecture. Vector registers are used to hold the vector operands, intermediate and final vector results. The vector functional pipelines retrieve operands from and put results into the vector registers. All vector registers are programmable in user instructions. Each vector register is equipped with a component counter which keeps track of the component registers used in successive pipelines cycles.

The length of each vector register is usually fixed, say, sixty-four 64-bit component registers in a vector register in Cray Series supercomputers. Other machines, like the Fujitsu VP2000 Series, use reconfigurable vector registers to dynamically match the register length with that of the vector operands In general, there are fixed numbers of vector registers and functional pipelines in a vector processor. Therefore, both resources must be reserved in advance to avoid resource conflicts between vector operations. A memory-to-memory architecture differs from a register-to-register architecture in the use of a vector stream unit to replace the vector registers. Vector operands and results are directly retrieved from the main memory in super words, say, 512 bits as in the Cyber 205.

Вам также может понравиться