Fat-Pyramid-NOC and Fat-Stack-NOC: New Frameworks Network-On-Chip Architectures

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.
ORG
Fat-Pyramid-NOC and Fat-Stack-NOC:

New Frameworks Network-On-Chip Architectures
Reza Kourdy Department of Computer Engineering Islamic Azad University, Khorramabad Branch, Iran Mohammad Reza Nouri rad Department of Computer Engineering Islamic Azad University, Khorramabad Branch, Iran
Abstract Network-on-Chip (NoC) has emerged as a very promising paradigm for designing scalable communication architecture for Systems on Chips (SoCs). This paper proposes a general framework for the design and simulation of networkon-chip-based pyramid architectures such as Fat-Pyramid-NOC and Fat-Stack-NOC. Several parameters in the design space are investigated, namely, network topology, parallelism degree, and the Scalability. Emulation is necessary to evaluate and validate the performance of the NoC system. Index Terms Network-on-Chip (NoC), Systems on Chip (SoC), Field-Programmable Gate Array (FPGA), processing element (PE).
1 INTRODUCTION
atest applications ported to embedded systems (e.g., scalable video rendering, communication protocols) demand a large computation power, while must respect other critical embedded design constraints, such as, short time-to-market, low energy consumption or reduced implementation size. Thus, embedded systems are complex Systems-onChip (SoCs) that consist of a large number of components, such as, processing elements, storage devices and even reconfigurable devices, such as Field-Programmable Gate Arrays (FPGAs), to enhance the flexibility of final SoCs to be used in different environments [1], [2]. Nevertheless, one of the most critical areas of MPSoC design is the definition of the suitable interconnect subsystem for all these SoC components, due to architectural and physical scalability concerns [3]. In fact, traditional shared bus interconnects are relatively easy to design, but do not scale well for latest and forthcoming SoC consumer platforms. In order to cope with the large communication demands of such SoCs, the use of modular and scalable Networks-on-Chips (NoCs) has been proposed [3]. Then, designing custom-tailored NoC interconnects that satisfy the performance and design constraints of the SoC for all the different combinations of possible executed applications is a key goal to achieve optimal commercial products [4],[5]. However, as general-purpose processor cores are used to run software tasks of different applications in SoCs, the communication between the cores cannot be precharacterized and fully optimized, since the application processes can be mapped differently to the cores, typically with the support of the compiler. Thus, to provide predictable performance of the NoC, the bandwidth capacity of the dif-
ferent links must be sufficient to support the peak rate of traffic on the links of the possible different mappings of the tasks onto the final SoC. Otherwise, the network might experience traffic congestion and the latency for the traffic streams and, hence, the interconnect performance will become unacceptable, which needs to be avoided to provide appropriate consumer devices. As a result, NoCs designs that guarantee worst-case bandwidth conditions of SoC operation with multiple concurrent application often leads to over-sized topologies and links on regular operation of the SoC. In this context, the development of new methods and frameworks that increase the runtime versatility of initial static NoC designs to adapt to different working conditions, originated by the diversity of sets of applications at each moment, is an important research area in the NoC domain. Networks on Chips (NoCs) have been proposed as a promising solution to complex on-chip communication problems. However, many challenging research problems remain unsolved at all levels of design abstraction, such as design exploration of NoC architecture for applications; scheduling and mapping algorithms; evaluation of switching, topology or routing algorithms for efficient execution of applications; and optimizing communication costs, area, energy, and so forth. A solution to solving the above problems calls for the development of a synthesizable, parameterizable NoC framework that would evaluate and implement these problems and algorithms with minimum ease and flexibility.[6]
2 NETWORK-ON-CHIP IMPLEMENTATION
The proposed NoC framework consists of five main
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
modules: [6] i) The Processing Architecture ii) The Communication Infrastructure iii) A Communication Paradigm iv) The Monitor v) The Traffic Generation Module. The Processing Architecture module consists of a Processing Element (PE) and Network Adapter (Core Network Interface) module. The Communication Infrastructure consists of network topology and a routing node. The Communication Paradigm describes the switching techniques and routing algorithms employed in the NoC Communication Infrastructure. The Monitor module includes two sub modules: a) a Node monitor, which monitors the activities in a routing node, and b) an NoC monitor, which monitors the communication within the framework. Figure 1 shows the NoC framework mod-
router and an NoC monitor at the network level (see Figure 1). The transaction monitor at each router contains information about the buffer count in each virtual channel and sends this information to the top NoC monitor and traffic controller. It also keeps track of PE status.
2.1. Processing Architecture The processing element (PE) in the framework can be a master PE or a slave PE. Only master PEs can initiate a message transfer. Slave PEs respond to the requests from the master PE either by sending back the requested signals/data or by saving the received information. In our framework, UART, TIMER, Instruction/Data Memory and slave processors are considered as slave PEs, and the master PEs and slave processors are capable of performing computational operations. 2.2. Communication Infrastructure The communication infrastructure consists of a routing node and network topology. The routing node consists of a link controller and a router. The link controller (LC) provides an interface between the NA and the NoC. Its main function is to match the NA clock rate with that of the network topology. Routing nodes run at four times the frequency of PEs. Synchronization registers are used to match clock rates between the slow PE and fast routing nodes. First-in first-out (FIFO) buffers are also added in the LC to store data packets from the network before transmitting to adjacent PEs. 2.3. Communication Paradigm In order to forward the message/packet, the implemented NoC framework can choose either the Store and Forward (SF) switching technique or the Wormhole (WH) switching technique. In SF switching, the message can be sent either as packets or in the form of flits. Each flit is contains 25 bits. When the message is transmitted as flits, each routing node will wait until the entire message is received before processing the HEADER. The end of the message/packet is determined by the TAIL flit. In Wormhole routing, the message is transmitted as soon as the HEADER is available. The path is determined from the HEADER as it moves through the network. The remaining flits follow the same path. The path is disconnected when the TAIL
Fig.1. NoC Framework
el. The processing architecture module consists of a processing element (PE) and a network adapter module. The communication infrastructure consists of the network topology and the routing node. The communication paradigm describes the switching techniques and routing algorithms employed in the NoC communication infrastructure. The monitor module includes two submodules: a node monitor, which monitors the activities in a routing node, and a NoC monitor, which monitors the communication in the framework. The traffic generator (TG) module injects packets (traffic) into the network. It can initiate either a request or a start of transmission from the top level. The TG also determines the type of traffic (uniform, hotspot, sporadic) as well as the source and destination nodes for traffic flow. Different congestion scenarios and node failures can also be created through the TG. The design consists of a node monitor at each
Fig.2. Packet Format
flit is received (see Figure 2).
2.4. Monitor Module Every routing node in the NoC is connected to a Node monitor, which connects to a top-level monitor called the NoC monitor. The main function of the NoC monitor is to collect information from individual Node moni-
tors regarding the traffic. The Node monitors generate control information based on the buffer conditions of that router node. The Node monitor uses a few ON/OFF signals, such as FAIL, FULL and ALMOST FULL, to communicate with the NoC monitor.
2.5. Traffic Generator Module The Traffic Generator (TG) module is responsible for generating different traffic distribution in the network.
3 SYSTEM TOPOLOGY
Since the ability of the network to efficiently disseminate information depends largely on the topology, we especially focus on different types of Topologies:
3.2 Fat-pyramid The fat-pyramid inherits the 4-ary tree framework of the fat-tree and adds a mesh on each level of the nodes up the tree. The fat-tree is universal only under unit wire delay condition; its universality does not hold under nonunit wire delays while the fat-pyramid has been proven to be universal under both unit and nonunit wire delay conditions [8]. The fat-tree has been used in the CM-5 parallel computer whereas the fat-pyramid has not been adopted for any machine. Another clear advantage of the fat-pyramid over the fat-tree is its better absolute efficiency due to its hierarchical meshes. But these same meshes of the fat-pyramid reduce its scalability, increase its wire usage considerably,
3.1 Fat-tree (hyper-tree) A typical fat-tree (see fig.3) assumes a 4-ary tree structure with link capacities doubling up the levels of the tree. The fat-tree is the first proved universal network [7]. The architecture "fat-tree" (hyper-tree) is offered by Charles E. Leiserson in 1985. Processors are localized in leaves of a tree while internal units of a tree are grouped in an internal network. Sub-trees can communicate among themselves, not mentioning higher levels of a
Fig.4. Pyramid with three levels and 4 4 base along with its 2D
layout.
Fig.3. (A)"Fat-tree" cluster architecture (b)"Fat-Tree" top view
(Layout).
and make it not scalable to represent a distributed network. [9]
network. K-ary n-trees are implemented by using identical switches of a fixed radix. The number of stages is n and k is the arity or the number of links of a switch that connect to the previous or to the next stage (i.e., the switch radix is 2k). Notice that k-ary n-trees are bidirectional MINs. A k-ary n-tree connects N = kn cores using nk (n-1) switches and 2nknk unidirectional links.
3.3 Fat-stack The fat-stack is a hierarchical network, consisting of tiers or levels. Each level has one or more sub-networks. Each sub-network is a ring of n nodes. A graphical representation of a GFS is shown in Figure 1. A fat-stack can have arbitrary levels of rings. Figure 5 shows a fat-stack topology that has three nodes in a subnetwork. Each subnetwork connects to its upper level via a node by a single link. Dashed lines represent tier boundaries. Link capacities double up-
wards. The fat-stack is relatively simplistic in structure. It can be constructed by stacking up atomic sub network units following a fat-tree framework. The common sub network unit is made of a ring of certain nodes and one or more upward links each from one node of the unit. These links connect to the same node of a sub network right above the unit. The network is built up recursively. The fat-stack is a hierarchical network, consisting of tiers or levels. Each level has one or more sub networks. Each sub network is a ring. There is one link connecting a sub network to an upper level from one node to an upper
temC [19] based NOC simulator, specifically tailored for turbo decoder architectures. It estimates the throughput and complexity of a parallel NOC based turbo decoder architecture.
5. SIMULATION DETAILS
The top most shared component in simulation is the NoC node, in which PE (Processing Element) and router are the main components. The PE is a module that injects/ejects the generated/receiving packets based on a traffic model like uniform, hotspot, etc. Routers receive packets on their input channels and after routing a packet based on the routing algorithm and destination address, the packet is sent to the selected output channel. When a specific topology like mesh or WK-recursive is supposed to be modeled by such components, a top-level wrapper module is implemented that connects several nodes of this type to each other based on the structure of the specified topology. Ns-2 [20] is a discrete event network simulator designed for simulation of ordinary networks of computers. As many models of network components are provided, the user can simulate at a high abstraction level. Yet, it is possible to implement new components in the network model. Ns-2 has support for local area networks, mobile networks and even satellite networks. Two computer languages are used in ns-2, namely C++ and OTcl.
Fig.5. A fat-stack topology.
node. A fat-stack can have arbitrary levels of rings. A graphical representation of an AFS (augmented fat-stack) is shown in Figure 5. [9]
6. SIMULATION RESULTS
In this section, we present the Simulation of NoC with different levels with the topology Fat-Pyramid-NOC and Fat-Stack-NOC. We survey the ability and flexibility of ns2 in simulations. Figures 6 to 15 show different views of
4 NOC SIMULATORS
A brief summary of existing NoC simulators and emulators are presented here. Orion [10] and LUNA [11], two NoC simulators especially developed for power simulation of on-chip interconnection networks, do not consider computational cores. FAST [12] is a functionally accurate NoC simulator limited to IBMs proprietary Cyclops-64 architecture. SICOSYS [13] is a general-purpose interconnection network simulator that captures essential details of low-level simulation. RSIM simulates shared-memory multiprocessors and uniprocessors built from processors that aggressively exploit instruction-level parallelism (ILP). RSIM, which is execution-driven, models state-ofthe-art ILP processors, an aggressive memory system, and a multiprocessor coherence protocol and interconnect, including contention at all resources. NoC simulators such as NNSE [14], Noxim [15], and NIRGAM [16] have flexibilities in configuring parameters of on-chip networks and are capable of obtaining performance metrics; however, these simulators are based on SystemC and are not synthesizable. XPIPES [17] consists of parameterizable network building blocks that can be composed at instantiation time; the parameterizable factors are the network interface, switches, and links. The Turbo NOC simulator [18] is a cycle accurate, Sys-
Fig.6. pyramid mesh with 3 levels
simulations.
6.2 PYRAMID_MESH WITH 4 LEVELS
Fig.7. pyramid mesh with 3 levels Fig.9. pyramid mesh with 3 levels
6.1 PYRAMID_MESH WITH 3 LEVELS Second view of simulation of pyramid mesh with 3 levels:
a) DEGREE 3
B) DEGREE 4
C) DEGREE 5
6.3. PYRAMID_MESH WITH 5 LEVELS
6.4. PYRAMID_MESH WITH 6 LEVELS REFERENCES

[1] A. Vicentelli and G. Martin. "A vision for embedded systems: Platform based design and software". IEEE Design and Test - Special Issue of Computers, 18(6):2333, November 2001. [2] Gordon Brebner and Delon Levi. "Networking on chip with platform fpgas". In Proceedings of the 2003 International Conference on Field-Programmable Technology (FPT), pages 1320, December 2003. [3] Luca Benini and Giovanni De Micheli, editors. "Networks on chips: Technology and Tools". Morgan Kaufmann Publishers, San Francisco, CA, USA, 2006. [4] Srinivasan Murali, Martijn Coenen, Andrei Radulescu, Kees Goossens, and Giovanni De Micheli. "Mapping and configuration methods for multi-use-case networks on chips". In Proceedings of the 2006 conference on Asia South Pacific design automation (ASPDAC), pages 146151, New York, NY, USA, 2006. ACM Press. [5] F. Angiolini, P. Meloni, S. Carta, L. Benini, and L. Raffo. "Contrasting a NoC and a traditional interconnect fabric with layout awareness". In Proceedings of Design, Automation and Test in Europe Conference (DATE06), pages 124129, Munich, Germany, 2006. [6] J. Suseela, V. Muthukumar, "Parametrizable NoC Emulation Framework for Performance Evaluations", International Conference on Embedded Systems and
Applications, Jul-2011. [7] C. E. Leiserson. Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Transactions on Computers, C-34(10):892901, Oct. 1985. [8] R. I. Greenberg. The fat-pyramid and universal parallel computation independent of wire delay. IEEE Transactions on Computers, 43(12):13581364, Dec. 1994. [9] Kevin F. Chen, Edwin H.-M. Sha. "The Fat-Stack and Universal Routing in Interconnection Networks", In Proceedings of the ISCA 17th International Conference on Parallel and Distributed Computing Systems, Volume 66 Issue 5, May 2006 [10] Hangsheng Wang et al.: Orion: A Power-Performance Simulator for Interconnection Networks. In Proceedings of MICRO 35, 2002. [11] Zhonghai Lu, Rikard Thid, et al.: NNSE: Nostrum network-on-chip simulation environment. Design, Automation and Test in Europe Conference, 2005. [12] Juan del Cuvillo et al.: FAST: A Functionally Accurate Simulation Toolset for the Cyclops64 Cellular Architecture. MoBS05 Workshop in conjunction with ISCA05, 2005. [13] V. Puente et al., Sicosys: An integrated framework for studying interconnection network performance in multiprocessor systems, Parallel, Distributed, and Network-Based Processing, Euromicro Conference on, vol. 0, p. 0015,2002. [14] Noxim. http://sourceforge.net/projects/noxim, 2008. [15] Lavina Jain et al.: NIRGAM: A Simulator for NoC Interconnect Routing and Application Modeling. Design, Automation and Test in Europe Conference, 2007. [16] CellSim. http://pcsostres.ac.upc.edu/cellsim, 2007. [17] M. Dallosso et al., Pipes: A Latency Insensitive Parameterized Network-on-chip Architecture for MultiProcessor SoCs, pp. 536-539, Proc. Intl Conf. Computer Design, 2003. [18] M. Martina, Turbo NOC: Network On Chip based turbo decoder architectures, downloadable at www.vlsilab.polito.it/~martina. [19] http://www.systemc.org. [20] Breslau L., Estrin D., Fall K., S. Floyd, J. Heidemann, A. Helmy, P. Huang, S. McCanne, K. Varadhan, Ya Xu, and Haobo Yu. "Advances in network simulation", IEEE Computer, 33(5):59{ 67, May 2000.
Reza Kourdy received his B.Sc. Degree in Computer Engineering and his M.Sc. Degree in Computer Architecture both from Azad University of Arak, Iran, in 2002 and 2007, respectively. His research interests include Network-On-Chip Architecture and Faulttolerance.
Mohammad Reza Nouri Rad received his B.Sc. Degree in Computer Engineering Software from Azad University of Najafabad, Iran, in 2001, and his M.Sc. Degree in Computer Software from Azad University of Arak, Iran, in 2010. His research interests include NetworkOn-Chip Architecture and Network Security. He is Program Committee of following conferences : WICT 2011 CSNT 2011 CICN 2011 SocProS 2011 CSNT 2012 CICN 2012 BIC-TA 2012

Fat-Pyramid-NOC and Fat-Stack-NOC: New Frameworks Network-On-Chip Architectures

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Fat-Pyramid-NOC and Fat-Stack-NOC: New Frameworks Network-On-Chip Architectures

Загружено:

Авторское право:

Доступные форматы

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.

Fat-Pyramid-NOC and Fat-Stack-NOC:

Fig.1. NoC Framework

Fig.2. Packet Format

flit is received (see Figure 2).

and make it not scalable to represent a distributed network. [9]

Fig.5. A fat-stack topology.

Fig.6. pyramid mesh with 3 levels

6.2 PYRAMID_MESH WITH 4 LEVELS

Fig.10. pyramid mesh with 3 levels

Fig.8. pyramid mesh with 3 levels

6.3. PYRAMID_MESH WITH 5 LEVELS

Fig.15. pyramid mesh with 3 levels

6.4. PYRAMID_MESH WITH 6 LEVELS REFERENCES

Вам также может понравиться