Вы находитесь на странице: 1из 14

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.


IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

Smart Reliable Network-on-Chip


Cdric Killian, Camel Tanougast, Fabrice Monteiro, and Abbas Dandache
Abstract In this paper, we present a new network-on-chip (NoC) that handles accurate localizations of the faulty parts of the NoC. The proposed NoC is based on new error detection mechanisms suitable for dynamic NoCs, where the number and position of processor elements or faulty blocks vary during runtime. Indeed, we propose online detection of data packet and adaptive routing algorithm errors. Both presented mechanisms are able to distinguish permanent and transient errors and localize accurately the position of the faulty blocks (data bus, input port, output port) in the NoC routers, while preserving the throughput, the network load, and the data packet latency. We provide localization capacity analysis of the presented mechanisms, NoC performance evaluations, and eld-programmable gate array synthesis. Index Terms Adaptive algorithm, dynamic reconguration, network-on-chip (NoC), reliability.

I. I NTRODUCTION ECENTLY the trend of embedded systems has been moving toward multiprocessor systems-on-chip (MPSoCs) in order to meet the requirements of real-time applications. The complexity of these SoCs is increasing and the communication medium is becoming a major issue of the MPSoC [1]. Generally, integrating a network-on-chip (NoC) into the SoC provides an effective means to interconnect several processor elements (PEs) or intellectual properties (IP) (processors, memory controllers, etc.) [2]. The NoC medium features a high level of modularity, exibility, and throughput. An NoC comprises routers and interconnections allowing communication between the PEs and/or IPs. The NoC relies on data packet exchange. The path for a data packet between a source and a destination through the routers is dened by the routing algorithm. Therefore, the path that a data packet is allowed to take in the network depends mainly on the adaptiveness permitted by the routing algorithm (partially or fully adaptive routing algorithm), which is applied locally in each router being crossed and to each data packet [3], [4]. Dynamically recongurable 2-D mesh NoCs (DyNoC, CuNoC, QNoC, ConoChi, etc.) are suitable for eldprogrammable gate array (FPGA)-based systems [2], [5][8]. Thanks to the partial dynamic reconguration of FPGAs [9] with varying position and the number of implemented PEs and IPs, higher adaptiveness is allowed in MPSoCs during runtime. To achieve a recongurable NoC, an efcient dynamic routing algorithm is required for the data packets. The goal
Manuscript received March 8, 2012; revised October 28, 2012; accepted December 26, 2012. The authors are with Lorraine University, Metz 57000, France (e-mail: cedric.killian@univ-metz.fr; camel.tanougast@univ-metz.fr; fabrice. monteiro@univ-metz.fr; Abbas.dandache@univ-metz.fr). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TVLSI.2013.2240324

is to preserve exibility and reliability while providing high NoC performance in terms of throughput. Fig. 1 illustrates a dynamic reliable NoC. Fig. 1(a) shows the communications between several IPs and Fig. 1(b) and (c) depicts the dynamic placement of an IP and the occurrence of a faulty node, respectively, both cases where bypasses determined by the dynamic routing algorithm are required. Furthermore, faulty nodes or even faulty regions make communications within the networks harder and even impossible with some routing algorithms, as shown in Fig. 1(c). Therefore, dynamic component placement and faulty nodes or regions are the main reasons why fault-tolerant or adaptive algorithms have been introduced and used in runtime dynamic NoCs [5]. Regarding adaptive or fault-tolerant routing algorithms, several solutions have been proposed [10], [11]. Generally, these algorithms correspond to a modied XY routing algorithm that allows faulty or unavailable regions to be bypassed. In the case of adaptive routing algorithms based on the turn model [12], zones are dened corresponding to faulty nodes or unavailable regions already detected in the NoC. The neighboring routers of these zones must not send data packets towards these known faulty routers or unavailable regions. Several solutions have been proposed to achieve this constraint. One solution is to include a routing table containing the output port to use for each destination in the network [13]. These tables are updated by an initialization algorithm. The main drawback of this solution is the requirement to invoke the algorithm at a nonspecied time in order to update the routing tables of the NoC routers. Another solution usually applied is the use of chains and rings formed around the adjacent faulty nodes and regions, in order to delimit rectangular parts in the NoC covering all the faulty nodes or unavailable regions. These chains or rings of switches modify the routing tables, which therefore differ from the standard tables realizing the XY routing algorithm. These specic switches integrate in their tables additional routing rules that allow the faulty zones and regions dedicated to dynamic IP/PE instantiations to be bypassed, while avoiding starvation, deadlock, and livelock situations [3], [12], [14]. Another reliable routing algorithm solution is the use of the de Bruijn graph [15]. This algorithm is deadlock-free and handles the bypassing of faulty links between two switches by assuming that nodes are aware of the faulty link that is connected to them by the use of a detection mechanism. However, these solutions do not give the mechanism to detect a faulty link or router. With regard to the increasing complexity and the reliability evolution of SoCs, MPSoCs are becoming more sensitive to phenomena that generate permanent, transient, or intermittent faults [16]. These faults may generate data packet errors, or may affect router behavior leading to data packet losses or permanent routing errors [17]. Indeed, a fault in a routing

10638210/$31.00 2013 IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

(a)
Fig. 1.

(b)

(c)

Illustration of a dynamic reliable NoC. (a) Normal operation. (b) Dynamic implementation of an IP. (c) Online detection of a faulty router.

logic will often lead to packet routing errors and might even crash the router. To detect these errors, specic error detection blocks are required in the network to locate the faulty sources. Moreover, permanent errors must be distinguished from transient errors. Indeed, the precise location of permanent faulty parts of the NoC must be determined, in order for them to be bypassed effectively by the adaptive routing algorithm. To protect data packets against errors, error correcting codes (ECCs) are implemented inside the NoC components. Among the well known solutions, three are usually applied for the MPSoC communications based NoC. First, the end-to-end solution requires an ECC to be implemented in each input port of the IPs or PEs in the NoC [18]. The main drawback of this solution is its incapacity to locate the faulty components (PE, IP, router, data bus, etc.) in the NoC. Consequently, it is inadequate for dynamic NoCs, where the faulty and unavailable zones must be bypassed. Second, the switch-toswitch detection is based on the implementation of an ECC in each input port of the NoC switches. For instance, in a router of four communication directions (North, South, East, and West), four ECC blocks are implemented. Therefore, when a router receives a data packet from a neighbor, the ECC block analyzes its content to check the correctness of the data. This process detects and corrects data errors according to the effectiveness of the ECC being used. Third, another proposed solution is the code disjoint [18]. In this approach, routers include one ECC in each input and output data port. This solution localizes the error sources, which can be either in the switches or on the data links between routers. However, if an error source is localized inside a router, this solution mechanism disables the totality of the switch. These online detection mechanisms cannot disconnect just the faulty parts of the NoC, and hence do not give an accurate localization of the source of errors. The result is that the network throughput decreases while the network load and data packet latency increase. Moreover, they are not able to distinguish between permanent and transient errors. For all these techniques, each ECC implemented in the routers of the network adds cost in terms of logic area, latency in data packet transmission, and power consumption. An analysis of the source and destination addresses, as presented in [19], is among the techniques usually proposed

to be able to detect faulty routing decisions. When a router receives a data packet, it compares its own address to the destination and source addresses. Then, the router checks its own position in the deterministic XY path of the NoC for the considered data packet. The router performing this checking is able to decide whether the switch from which the packet was received made a routing error or not according to the correct XY path. However, this technique has a major drawback; it is unable to handle the bypass of faulty nodes and unavailable regions. Consequently, this solution cannot be applied in adaptive or fault-tolerant routing algorithms. Indeed, as specied in a turn model algorithm [12], the structure of the recongurable NoC may contain bypass areas in which the switches take routing decisions differently from the XY routing algorithm. For handling message routing errors in dynamic networks, a new faulty switch detection mechanism is required for adaptive or fault-tolerant routing algorithms. In this paper, we present a new reliable dynamic NoC. The proposed NoC is a mesh structure of routers able to detect routing errors for adaptive routing based on the XY algorithm [3], [12], [14], [20]. Our approach includes data packet error detection and correction. The originality of the proposed architecture is its ability to localize accurately error sources, allowing the throughput and network load of the NoC to be maintained. In our case study, we consider a reliable approach based on the adaptive routing module proximity algorithm [12]. The considered routing algorithm is based on the adaptive turn model routing scheme and the wellknown XY algorithm. This adaptive algorithm is livelock- and deadlock-free and allows data packets to pass around faulty regions. The remainder of this paper is organized as follows. Section II describes the architecture of the proposed reliable switch. Section III details the proposed routing error detection suitable for adaptive routing algorithms. Section IV presents a specic self-loopback mechanism allowing the avoidance of data packet loss, the maintaining of the performance of the NoC, and the effectively localizing of permanent sources of data packet errors. Section V presents FPGA synthesis and NoC performance evaluations, while Section VI validates the proposed techniques by giving NoCs robustness and localization capacity of the error detection mechanisms. A discussion

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KILLIAN et al.: SMART RELIABLE NETWORK-ON-CHIP 3

Fig. 2.

Architecture of the reliable router RKT-switch.

on the limitations and the extensibility of the presented work is given in Section VII. Finally, conclusions and opinions on future works are given in Section VIII. II. BASIC CONCEPT OF THE RKT-S WITCH We propose a new reliable NoC-based communication approach called RKT-NoC. The RKT-NoC is a packetswitched network based on intelligent independent reliable routers called RKT-switches. The architecture of the RKTswitch is depicted in Fig. 2. The RKT-switch is characterized by its architecture having four directions (North, South, East, West) suitable for a 2-D mesh NoC. The PEs and IPs can be connected directly to any side of a router. Therefore, there is no specic connection port for a PE or IP. The proposed detection mechanisms can also be applied to NoCs using veport routers with a local port dedicated to an IP. However, the major drawback of these architectures is when the local port has a permanent error and the IP connected to it is lost or needs to be dynamically moved in the chip because of the dynamic partial reconguration. On the contrary, for the four-port RKTNoC, an IP can replace several routers by having several input ports and hence be strongly connected in the network [5]. Moreover, by using dynamic partial reconguration and IPs

strongly connected in the NoC, no one fault location is more catastrophic than another. Indeed, an IP may have access to the network by being connected to several routers, or can be dynamically moved on the chip if this only access point becomes faulty. Each port direction is composed of two unidirectional data buses (input and output ports). Each input port is associated to a rst-input, rst-output (FIFO) (buffers) and a routing logic block. The RKT-switch operation is based on the store-and-forward switching technique. This technique is suitable for dynamically recongurable NoCs. Indeed, in our NoC, PEs and IPs can be implemented in place of one or several routers [7]. At any instant with the storeand-forward technique, each data packet is stored only in a single router. Hence, when a router needs to be recongured, the router is only required to empty its buffers. On the contrary, with the wormhole switching technique [14], a single data packet can be spread over several routers. Consequently, the time required to clear all the routers containing partial packet data (its) and to reconstruct these packets before performing a reconguration is more signicant. The RKTNoC uses nonbouncing routers [21], so that if a router is surrounded by three unavailable neighbors, it also becomes unavailable. Indeed, if a data packet is sent to a router

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

surrounded by three unavailable nodes, the packet cannot be routed. The data ow control used in our architecture is the Ack/Nack solution, which can handle fault-tolerant transmissions [22], although this does increase the energy consumption [23]. This solution relies on the retransmission of packets being received as faulty by a neighboring node. Being able to perform a packet retransmission after it has been sent to a node requires that a copy of the packet be locally saved until an Ack or Nack is received. If a neighboring router receives a it containing an error that cannot be corrected by the ECC, a Nack is sent back and the whole packet is retransmitted. Otherwise, an Ack is generated at full packet reception. More precisely, an Ack is generated only when all the its of the data packet have been received and checked by the router, which reduces latency. The Hamming ECC is considered for our RKT-switch, in order to provide a convenient tradeoff between area overhead and error correction capacity. This choice permits the correction of single event upset (SEU) errors (one bit ip in a it) and the detection of multiple event upset (MEU) errors (two bit ips in a it). Moreover, the Hamming code is more suitable for NoCs based on Ack/Nack ow control than the parity bit check. Indeed, on a single bit-ip error occurrence, error correction is possible with the Hamming ECC, whereas the single parity check would require packet retransmission and hence an increased transmission latency. The distinction between permanent and transient errors is granted thanks to a local historic, which saves the transmission results, and a loopback output mechanism (see Section IV). Furthermore, our solution combined with the loopback mechanism and the novel local historic allows the localization of errors, either on the bus connections or inside the switches, by localizing the faulty port (more details in Section IV). In addition, our reliable structure is based on switch-to-switch detection, offering robustness against SEU and two-bit MEU errors, while maintaining a good tradeoff between area overhead and the capacity to locate errors. III. ROUTING E RROR D ETECTION The reliable switch being proposed incorporates an online routing fault detection mechanism. This approach can operate with adaptive algorithms based on the well-known XY routing algorithm [3], [12], [14], [20]. The main difculty in routing error detection is to distinguish a bypass of an unavailable component in the NoC (due to the use of the adaptive algorithm) from a real routing error (due to a faulty component in the NoC). Fig. 3 illustrates the challenge for such error detection. Apart from an increase of the data packet latency, the consequence of the nondetection of routing errors is the possible loss of data packets being sent either to an already detected faulty router or to an area performing a dynamic reconguration. In order to achieve routing error detection, the proposed reliable router relies on diagonal state indications, on additional routing information in the header its, and on the routing error detection blocks in each port (see Fig. 2). The basic concept of our approach is the following: Each router receiving a data packet checks the correctness of the routing

(a)

(b)

(c)
Fig. 3. Illustration of the routing error detection problem (a) to distinguish a dynamic bypass (b) from a routing error and (c) to avoid a loss of data packets.

decision made by the previous crossed switch. This routing error detection is performed in parallel after the Hamming ECC, as shown in Fig. 2. Consequently, this detection does not increase the data packet latency. A. Elements Required for Routing Error Detection 1) Diagonal Availability Indications: The RKT-switch uses information links to indicate to its neighbors its availability status. We dene as unavailable an input port that cannot receive data packets. To preserve the highest throughput of the NoC, our strategy is to disconnect only the faulty parts of the routers. Thereby, if a router input port is permanently faulty, it is disabled while maintaining the other input ports as active, in order to obtain a partially operating switch. On the contrary, if all input ports are faulty, the router is considered as unavailable. Similarly, we dene as unavailable NoC components that cannot receive data packets due to permanent faults or a partial dynamic reconguration. The RKT-switch indicates its availability status to the eight direct neighboring routers through the diagonal availability indication (DAI) links. The network structure based on DAI links is shown in Fig. 4. These DAI links allow the checking of the correctness of the routing algorithm. Indeed, each router is able to control the availability status of the neighboring routers and components. For instance, in Fig. 4(a), router(i , j ) can check the availability of router(i + 1, j 1). The network components (PEs or IPs) are not allowed to route data packets and are restrained to accept only data packets intended for them; hence, their DAI interconnections are set. 2) Journal of Routing Error Localizations: Each routing error detection block of the router inputs owns three journals to keep the routing error detection results. These journals are related to the routing logic blocks of the neighboring router connected to the considered input port. For example, in Fig. 4(a), the West routing error detection block of router(i , j ) has three journals corresponding to the West, North, and South routing blocks of the router(i 1, j ). Thanks to these journals, the distinction between permanent and transient errors can be ensured. In addition, the location of the faulty routing algorithm blocks in the neighboring routers can be deduced from these journals. A permanent error is considered when

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KILLIAN et al.: SMART RELIABLE NETWORK-ON-CHIP 5

(a)

(b)

(c)
Fig. 4. Case study of error detections. (a) Dynamic bypass. (b) Routing fault. (c) Role of the unique routing path indication.

Fig. 5. Illustrations of a data packet loopback and a dynamic bypass decision.

three successive routing errors are detected for a specic routing logic block. 3) Structure of Information Fields in the Data Packets: A sliding gather data (SGD) eld is added to each header it of the data packets being transmitted. Table I details the structure of a data packet. A it-type bit is used to distinguish the header from the data its. The SGD eld contains the addresses of the previous and penultimate crossed routers. Each router receiving a data packet checks this SGD eld and validates the routing choice made by the previous router. To achieve the routing validation, the SGD eld is updated by each router crossed along the transmission path. This update is done by each input buffer block. This requires an update of the Hamming code because of the modication of the header it of the data packets. A unique routing path indication (URPI) bit is added to the header of the data packets. This bit is set if a router has only a single routing output path available. This bit allows the avoidance of false detections. Our approach is then suitable for a dynamic NoC-based fault-tolerant routing algorithm. B. Illustration of the Routing Error Detection The RKT-switch allows online detection of routing faults and the distinguishing of routing errors from dynamic bypasses related to the adaptive routing algorithm. Several examples illustrate the efciency of the proposed routing error detections. Fig. 4 shows two examples of data packets being routed from router(i 1, j 1) to router(i , j 1). In Fig. 4(a), router(i , j 1) is receiving a data packet from its West neighbor router. It checks the correctness of the routing decision made by router(i 1, j 1). As router(i 1, j 1) obeys the XY algorithm, no error is detected. In Fig. 4(b), router(i , j ) is receiving a data packet from the South direction. The XY routing path between router(i , j 1) and router(i + 1, j 1) is deduced from the analysis of the SGD eld of the data packet and of the penultimate and destination router addresses. Hence, router(i , j ) detects that router(i , j 1)

is not obeying the XY routing algorithm. Consequently, router(i , j ) checks the DAI links of router(i + 1, j 1). The router(i + 1, j 1) being unavailable means that router(i , j 1) made a bypass decision and hence is not faulty [see Fig. 4(a)]. Otherwise, the router(i + 1, j 1) is available, which means that router(i , j 1) made a faulty routing decision [see Fig. 4(b)]. In Fig. 4(b), router(i , j ) sets the error journal associated with the position of the routing logic block causing this routing error decision. This routing logic block is identied from the address of the penultimate node. More precisely, the West routing logic block of the router(i 1, j 1) is identied by checking of the SGD eld and the identication of the penultimate router router(i 2, j 2). Fig. 4(c) illustrates the role of the URPI bit. Here, router(i , j ) is receiving a data packet from the North direction according to the XY routing algorithm. By checking the DAI of router(i + 1, j + 1), a bypass decision is deduced. However, the destination is located on the North direction compared with the previous router. According to the routing algorithm being used, the bypass should have been done by the North direction. The availability of router(i , j + 2) is checked in the URPI in the SGD eld. Indeed, if the URPI is set, then router(i , j + 1) has only one remaining path, and hence did not made a routing error. On the contrary, if the URPI is not set, router(i , j + 1) has two remaining available paths to route the data packets, and hence made a routing error. C. Principle of the Routing Error Detection An XY -based adaptive routing algorithm primarily uses the rules of the XY algorithm [12] to route data packets into the network when the required components are available. In the case of an unavailable component, a specic routing path is locally chosen to bypass its position. When a router receives a data packet, it checks the correctness made by the routing decision of the previous node, using the routing error detection algorithm. From address comparisons, the router checks if the previous routing decision obeyed the XY routing algorithm. If it is the case, then the previous decision is correct. Otherwise,

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

TABLE I S TRUCTURE OF THE D ATA PACKETS

Fig. 6.

Architecture of the loopback module.

the router decides whether the previous decision is a bypass decision or a routing error. The detection algorithm is required to check the availability of the router through which the data packets should have passed according to the XY algorithm. This verication is performed thanks to the DAI links. If the router in the XY path is unavailable, the previous router decision was a correct bypass. If it is available, the previous router decision is a routing error. In the latter case, the router adds one 1 to the error journal associated with the faulty routing logic block. The position of the faulty block is deduced from the address of the penultimate router in the SGD eld. If three consecutive errors are performed by the same faulty routing logic block, a permanent error is considered. In this situation, a specic data packet is generated towards the switch generating the routing errors. This specic one-it data packet indicates the faulty input port of the considered router that must be disconnected. For NoCs based on multiit data packets, it may happen that a it is received without being preceded by a header it, which is an erroneous situation. In the proposed RKT-NoC, there is a bit in each it indicating whether the it is a header it or a data it, as depicted in Table I. When a router receives the rst it of a data packet, it checks after the hamming decoding whether it is a header it. If not, the it is destroyed. Therefore, when receiving a data packet, the destination IP or PE counts the number of received its. If this number does not match the number indicated in the header it, the packet is destroyed and a retransmission request is sent back to the emitter IP or PE. IV. L OOPBACK M ODULE A. Basic Principles In a dynamic recongurable NoC, the position and the number of components in the network can change during

operation, as illustrated in Fig. 1. Actually, the number and position of the PE and IP in the NoC can be dynamically modied in order to meet the requirements of the application. Partial recongurable regions (PRRs) must be dened inside the FPGA in order to achieve dynamic reconguration of the 2-D mesh NoC [24]. These PRRs are the regions where partial recongurable modules (PRMs) can be implemented. PRMs represent electronic instantiations of functional units. They are dened by specic partial bitstreams and can be placed according to the application needs [24]. In practice, these PRMs correspond to the PEs and IPs being implemented and placed inside the dynamic NoC, as illustrated in Fig. 1. In a reliable NoC, faulty routers are isolated at runtime during the network operations. Let us consider a permanent faulty router that cannot be corrected. This router is permanently disabled. Similarly, during the reconguration of a PRR, no packet can be sent inside the area being recongured. Thus, these PRRs are dynamically isolated. However, these isolations can lead to data packet losses or increase packet transmission latency. More precisely, these drawbacks occur when routers containing data packets in their output buffers have their neighboring nodes unavailable due to a dynamic reconguration or permanent fault detection. Thereby, these data packets remain stored in the output routers until the end of the reconguration (dynamic implementation case) or are lost, in the case of detection of a permanent faulty node. To overcome these drawbacks, the proposed RKT-switch contains output buffer blocks associated with loopback modules, as described in Fig. 2. The role of each loopback module is to empty the buffers of each output port by looping back the data packets in the input port of the router (more details in the Section IV-C). The result is that the looped back packets are rerouted towards another output port of the router. This avoids data packets becoming trapped when a neighboring switch is detected as permanently faulty, and reduces latency when a neighbor has suffered a dynamic reconguration. Fig. 5 illustrates the role of a loopback module. A PE or IP emitter sends data packets towards a destination IP according to the XY routing algorithm. If suddenly router(1, 3) becomes unavailable, the data packets remaining in the West output of router(2, 3) are looped back and rerouted towards its South output. This mechanism allows the stored data packets to be routed to the destination. Therefore, with router(1, 3) being marked as unavailable, the subsequent data packets coming from the East input port are routed directly towards the South port by the dynamic routing algorithm. Furthermore, the main advantage of the combined use of the proposed loopback module, the local historic of data errors, and the switch-to-switch data error detection mechanism (see Fig. 2) is the precise localization and distinction of the sources of data

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KILLIAN et al.: SMART RELIABLE NETWORK-ON-CHIP 7

TABLE II L OCALIZATION OF THE E RRORS Results of the Data Transmission Ack Three consecutive Nack Three consecutive Nack Results of the Data Error Detection After Loopback No loopback required No error detected after loopback Uncorrectable error detected after loopback Input Port Not faulty Not faulty Suspect Output Port Not faulty Not faulty Suspect Data Bus Not faulty Permanently faulty Suspect

errors. Therefore, we can accurately locate whether the data errors are on the data bus, the input port, or output port, and whether the faults are permanent or transient. B. Localization of Data Packet Errors To locate and distinguish permanent and transient errors, a local historic of data packet errors is implemented locally in each router, as described in Fig. 2. This block is composed of journals related to the input and output ports. These journals are 3-bit-deep shift registers. The RKT-switch uses the Ack/Nack data ow control. When a data packet is transmitted to a neighboring node, a copy of the data is stored locally until the Ack is received. If no error occurred during the transmission (reception of an Ack), a set to 0 is added to the journal related to the input port of the get-in direction and the output port. If an uncorrectable error is detected by the neighbor, a retransmission is performed in response to a Nack. If three Nacks are received, the packet is looped back and a set to 1 is added into the journal related to the input and output ports taken by the data packet. Indeed, the error source can be located on the bus, in the input port, or in the output port. After going through the loopback, the data packet is checked by the input ECC. If an uncorrectable error is detected by the ECC, the data packet is destroyed. If no error is detected, we can conclude the errors detected by the neighbor occurred on the data bus. If the error occurred consecutively three times on the bus (i.e., three Nacks), we can conclude that there is a permanent error on the data bus. Table II shows the correlation between the data error detection results and the location of the errors (input block, output block, or data bus). The local historic has a threshold before disconnecting a part of a router. This threshold is the number of consecutive errors required to ag an error source as permanent. Here, we set a threshold of 3 (see Section VI for more details on the impact of the threshold). When three consecutive errors occur on the same journal related to an input or output, the local historic of data packet errors concludes that a permanent error exists in the related direction. The data packets being looped back, after being checked by the ECC, are checked by the routing error detection block. However, the routing error detection block nds in the SGD eld that the previous router address is its own address and deduces a loopback. Consequently, it does not apply the routing error detection algorithm. When a permanent fault is detected in a router, the faulty part of the NoC has to be isolated. The part to be isolated has been located accurately by using the local historic and the loopback with the switch-to-switch error detection mechanism. It can be located in the input port, the output port, or the data bus. If the error is in the input port, the router locally activates the horizontal availability link of the faulty input port, and the two associated DAI links. In this way, the neighboring component connected to the faulty port cannot send new data packets

in this direction, and the DAI ags indicate to the diagonal neighbors the possibility to bypass its position. If the error is on the data bus or in the output port, the router detecting the permanent error must indicate to the neighbor to activate its availability indications links. To indicate which port needs to be disconnected, the router detecting the permanent fault sends data packets to the destination of the neighboring router. This one-it data packet contains the address of the destination router and the direction of the port to disconnect. However, the router must not send this special it in the direction that was detected as faulty. Consequently, the data packet is generated in the input port corresponding to the direction of the faulty neighbor. As we use nonbouncing routers, the data packets cannot be routed in the same direction as the get-in port. The routing logic block will then make a routing bypass and the packet will be sent to an available input of the faulty router. C. Architecture of the Loopback Module A loopback module is implemented in each of the four ports of the router, as illustrated in Fig. 2. The architecture of the loopback module is depicted in Fig. 6. The logic control block checks the availability of the neighboring router in order to transmit the data packets (data_request_in signal). If no loopback is required, a semi-crossbar connects the buffer to the data_out signal in order to send the data packets towards the neighboring router and activates the data_request_out signal. Next, a multiplexor connects the input data bus to the data_in bus. When a loopback is required, due to the unavailability of a neighboring router or an output block request occurrence after three Nack receptions, the logic control block congures the semi-crossbar block to send the considered data packet on the data_loopback bus. Therefore, the data packet is looped back inside the router and will be considered as a new packet. During this step, in order to avoid the reception of a new data packet from the neighboring switch, the occ_out signal is activated. The loopback module requires one clock cycle to be crossed. Thereby, a data packet crossing a router has its latency increased by two clock cycles. Indeed, two loopback modules are crossed: one when arriving and one when leaving the switch. V. S YNTHESIS R ESULTS AND P ERFORMANCE E VALUATIONS A. FPGA Synthesis Results The results presented are obtained considering RKTswitches congured to process data packets of four its and able to hold two data packets in each input buffer. Table III shows the synthesis results in terms of slices registers, slices LUTs, and maximal working frequency for different sizes of data bus and several FPGA technologies (Virtexs VVII Xilinx FPGA). It can be seen that the 32-b RKT-switch

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

TABLE III RKT-S WITCH S YNTHESIS R ESULTS Data Bus Width (bits) 24 32 64 Slices Registers 3503 4308 8402 Virtex V Slices LUTs 4146 4998 8743 f [MHz] 303.93 311.85 306.28 Slices Registers 3537 4377 8360 Virtex VI Slices LUTs 5675 6661 11672 f [MHz] 429.54 431.04 419.21 Slices Registers 3540 4340 8362 Virtex VII Slices LUTs 5571 6542 11681 f [MHz] 459.71 459.68 441.62

TABLE IV RKT-N O C S YNTHESIS R ESULTS RKT-NoC 2 2 Data Bus Width (bits) 24 32 64 Slices Registers 14174 17208 33438 Slices LUTs 31977 25718 46335 f [MHz] 414.96 406.17 390.82 RKT-NoC 3 3 Slices Registers 21700 38572 75204 TABLE V AVERAGE L ATENCY E VALUATION OF 10 000 PACKETS S ENT P ER C OMMUNICATION M ODULE W ITH A R ANDOM T RAFFIC AND A M AXIMUM PIR 11 Latency Clock cycles (nb) Virtex VII (ns) Virtex VI (ns) Virtex V (ns) min 20.86 47.24 49.77 68.12 max 21.60 48.91 51.52 70.52 RKT-NoC Size 33 min 73.10 165.53 174.38 238.68 max 129.43 293.09 308.75 422.59 44 min 96.33 213.59 225.01 307.98 max 168.66 381.91 402.32 550.66 Slices LUTs 52696 57362 108601 f [MHz] 390.74 400.15 393.60 RKT-NoC 4 4 Slices Registers 56840 68352 133920 Slices LUTs 95279 104556 201074 f [MHz] 390.41 398.51 402.23

requires 4340 registers and 6542 LUTs and can operate up to 459.6 MHz on the Virtex VII FPGA technology. We have also synthesized RKT-NoC for several sizes on the Xilinx Virtex VI technology. These results are given in Table IV. The synthesis results clearly show that our architecture can be efciently implemented in FPGA technology. It can be stated that an attractive tradeoff between high speed and logic resources has been achieved. B. Performance Evaluation 1) Flit Injection Rate: The packet injection rate (PIR) is the number of data packets that can be sent in a single clock cycle. For instance, an IP having a PIR of 0.5 means that it can send 50 data packets in 100 clock cycles. The it injection rate (FIR) is the result of multiplying the PIR value by the number of its in each data packet. We have evaluated the FIRmax by simulating different NoC sizes 1 1, 3 3, and 4 4. Each RKT-NoC is surrounded by the maximum number of communication modules: 4 modules for the 1 1, 12 modules for the 3 3, and 16 modules for the 4 4 NoC. FIRmax is obtained when the network is working without unavailable components, and when the modules are sending and receiving data packets only from and to the neighbor located at the opposite side of the network. Indeed, no case of router congestion can occur by using this trafc pattern. FIRmax has been estimated to be 0.369 for any RKT-NoC size and any number of IPs. 2) Latency: The latency of one RKT-switch (LatencyRTRmin) is dened by 1. By using store-andforward switching techniques, a data packet can be sent only

when the entire packet has been received. Consequently, this technique adds a latency equivalent to the number of its (N it ) in the data packet. The ECC, which is carried out in series, increases the latency by the number of clock cycles required. It is dened by LatencyECC in 1. The RKT-switch has at least three clock cycles, one for the routing logic block and two to cross one loopback module when arriving and one when leaving the router. In the rest of this paper, we use data packets of four its and the Hamming ECC takes two clock cycles to be performed. The LatencyRTRmin is then nine clock cycles. (1) LatencyRTRmin = Nit + LatencyECC + 3 In an n n RKT-NoC, the minimal latency (latencymin ) to cross the network from source to destination is dened by 2. The number of switches crossed is N RKT . Equation 2 takes into account the additional clock cycles for the Ack/Nack data ow control used in our reliable switches. This data ow control technique requires a latency of two clock cycles per router crossed. Latencymin = NRKT latencyRTRmin + NRKT 2 (2)

The data packet latency depends on the trafc network. We have evaluated the average latency for 1 1, 3 3, and 4 4 RKT-NoC sizes. To evaluate the latency, we have simulated these NoCs surrounded by the maximum number of communication modules. Each module sends and receives data packets. The destinations for the data packets are generated randomly. The data packets are emitted at the maximum PIR. Table V gives the minimal and maximal average latencies for each NoC size in terms of clock cycles and time (ns),

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KILLIAN et al.: SMART RELIABLE NETWORK-ON-CHIP 9

considering the maximum working frequencies obtained for the Xilinx Virtexs VVII. For a 3 3 RKT-NoC, the minimum and maximum average latencies are 73 and 129 clock cycles, respectively. 3) Throughput: The maximum throughput of an RKTNoC depends on the data bus width of n bits, the working frequency f, the number of IP NIP , and the FIR. The Throughputmax is given by 3 in the case of NIP being connected by assuming they are connected to the NoC boundary. Throughputmax = NIP n FIRmax f (3)

Considering one RKT-switch with a data bus size of 64 and 4 IPs connected, and a maximum operating frequency for the Virtex VII FPGA technology, the Throughputmax is 41.72 Gbit/s. Fig. 7 shows the throughputmax of a single RKTswitch connected to four IPs for different data bus widths and operating frequencies. Fig. 8 shows the Throughputmax of several RKT-NoCs connected to the maximum number of IPs and for different data widths. For instance, the 4 4 RKT-NoC is surrounded by 16 IPs. These throughputmax results are given considering the maximum working frequency of the synthesis results of Table IV considering the Virtex VI technology. It can be noticed that for a 3 3 RKT-NoC using 64 b, the Throughputmax is 111.5 Gbit/s. C. Impact of the Proposed Mechanisms on the Network Performances To evaluate the performance degradation due to the proposed mechanisms, we compared our RKT-NoC with the same NoC architecture without the loopback module, the data packet error localization, or the routing error detection mechanism. Table VI gives the implementation comparisons of the two NoCs for a Virtex VI FPGA. We can see that for a 3 3 NoC the overhead is 22% and 55% for slices registers and slices LUTs, respectively. The working frequency of the RKTNoC is better than that of the nonreliable NoC, even with a greater area. This result is due to the pipelined architecture of the RKT-NoC. More precisely, the loopback module and the ECCs are reducing the critical paths by the use of pipelines. Regarding the impact of the proposed mechanisms on the NoC performance, Table VII gives the throughput, latency, and power consumption for a 6 6 RKT-NoC and a 6 6 nonreliable NoC. Both estimations have been performed by using the higher FIR for a random trafc pattern and a transposed pattern. In the transposed pattern, each IP sends data packets to the IP located at the opposite side of the NoC. We can see that the 6 6 RKT-NoC has 30% less throughput than the nonreliable NoC. Regarding the data packet latency, the 6 6 RKT-NoC requires 57% and 44% more clock cycles to route the packets for a random trafc and a transposed trafc pattern, respectively. The power consumption given in Table VII has been performed at 100 MHz in a Virtex VI by using Xilinx power analyzer [25]. We can see that the power consumption overhead is 41% and 40% for the transposed and random trafc pattern, respectively. It can be noticed that the chips consume more energy by using a transposed trafc pattern because the throughput is higher.

Fig. 7.

Throughput of one RKT-switch for different data widths.

Fig. 8. Maximum throughput of the RKT-NoC for different 2-D mesh sizes and data widths.

VI. VALIDATION OF THE P ROPOSED C ONCEPTS A. Network Load We simulated the network load for a 4 4 RKT-NoC surrounded by the maximum number of communication modules. Each communication module sent 10 000 data packets by using a random trafc pattern. Fig. 9(a) shows the network load for the NoC without any errors. This result shows nonuniformly distributed trafc in the network. Most of the trafc is located on the edge of the NoC where the source and destination modules are connected. In Fig. 9(b), one input of a router has been disconnected from the network by simulating a permanent error. The faulty router is router(3, 2). The simulation result shows a reduction

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

TABLE VI V IRTEX VI S YNTHESIS C OMPARISONS B ETWEEN THE RKT-N O C


AND A

N ONRELIABLE N O C Slices Registers Slices LUTs 11672 7904 47.67% 108601 69936 55.28% 453945 278346 63% f (MHz) 419.21 386.68 393.60 400.52 402.23 392.58 Fig. 10. Throughput of a 6 6 RKT-NoC for several SEU (bit ips) frequencies.

11

RKT-NoC Nonreliable NoC Overhead RKT-NoC Nonreliable NoC Overhead RKT-NoC Nonreliable NoC Overhead

8360 6850 22.04% 75204 61632 22.02% 301650 246885 28.18%

33

66

of the network load for the faulty routers. The load of router(3, 2) decreases from 4.39% to 3.53%. The maximum increase of network load is obtained for router(2, 2), increasing from 4.78% to 7.09%. Fig. 9(c) shows the network load for a network containing one entirely faulty router. More precisely, the four input ports of router(3, 2) are disconnected from the network. The network load clearly increases, especially around the faulty switches. The largest network load increase is for router(2, 2), from 4.33% to 6.1%, which represents an increase of 40%. These simulation results clearly show the interest of our error detection approach. By disconnecting accurately only the faulty parts of the NoC, we maintain the network load at a level similar to that of a network without fault. We clearly show that by disconnecting one entire router, when using only the switch-to-switch error detection mechanism, or using the code-disjoint mechanism in the case of an error inside the router, the network load is much altered, which can lead to the generation of network congestion. B. Fault Injection Method The following simulations have been realized in the ModelSim environment through a C-VHDL co-simulation [26]. We assume nonfaulty detection blocks. The RTL design of the NoC is modied to generate the errors. More precisely, the input buffers, output buffers, data bus, and routing logic of all the routers have a special input to activate errors. Furthermore, the position of the errors in the data is specied by using a mask input. Similarly, the routing blocks have an input to force a routing decision. For instance, we can force a routing block to always route to the North direction. These inputs are activated by a specic IP modeled in C language, which generates randomly the error positions. Each NoC router has the same probability to get an error, and the positions of the faulty bits are random. C. Network Robustness Against Transient Data Packet Errors We have performed an analysis of the NoC against several SEU (bit ips) frequencies. To evaluate the robustness of the network, we simulated a 6 6 RKT-NoC connected to 24 IPs. Each IP injects 100 000 data packets at the maximum PIR by using a random trafc pattern. Each packet contains four

its of 64 b. The SEU locations in the network are random. The throughput and the data packets latency of the network are shown in Figs. 10 and 11, respectively. The throughput is almost constant at 116.5 Gbit/s until an SEU rate of 0.1 (one bit ip every 10 clock cycles). After this value, the throughput decreases as the SEU rate increases. The latency of the data packets is shown in Fig. 11, which increases linearly until an SEU rate of 0.1. After this value, the latency increases more rapidly. These results can be correlated with Fig. 12, which shows the number of data packets lost for several SEU injection rates. We can clearly see that the number of packets lost increases from the SEU rate of 0.1. Thus, the degradation of the NoC performance is linked to the number of packets lost. Indeed, with uncorrectable errors, the number of retransmissions due to the Ack/Nack data ow control increases the data packet latency and decreases throughput. It can be seen that for an SEU rate of 0.2, only 81 data packets have been lost during the transmission of 2 400 000 data packets. D. Evaluation of the Data Packets Error Localization Capacity We propose an analysis of the capacity to locate the error sources regarding permanent data packet errors. This localization capacity is given for a 6 6 RKT-NoC using random trafc. For this analysis, 3000 simulation cases have been performed. In each simulation, the position of the permanent errors is random and the errors simulated are 2 b stuck at 1. The results are given in Table VIII. In this table, we can clearly see all the permanent errors are accurately localized. Regarding the case of permanent errors located on the data bus, the faulty data bus is always localized. Moreover, when localizing the data error sources on the data bus, no data packets are lost by using the loopback module. Regarding the cases where error sources are in the input or output buffer; in 100% of these cases, the faulty port is accurately disconnected. The impact of the threshold (number of uncorrectable errors detected before activating the ags of permanent errors) of the journals in the local historic is shown in these results. The higher the threshold, the higher is the criterion of discrimination allowing the detection of whether an

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KILLIAN et al.: SMART RELIABLE NETWORK-ON-CHIP 11

TABLE VII P ERFORMANCE C OMPARISONS B ETWEEN A 6 6 RKT-N O C AND A 6 6 N ONRELIABLE N O C FOR V IRTEX VI T ECHNOLOGY Throughput (Gbit/s) Transposed Random Trafc Trafc RKT-NoC Nonreliable NoC Difference 205.95 268.19 30.22% 117.49 154, 24 31.27% Average Latency (Clock Cycles) Transposed Random Trafc Trafc 74 47 57.44% 173.833 120.46 44.30% Power Consumption (W) Transposed Random Trafc Trafc 5.69 4.03 41.19% 5.57 3.97 40.30%

(a)

(b)

(c)

Fig. 9. Trafc distribution in a 4 4 RKT-NoC connected to 12 trafc generators with a random trafc pattern. (a) No faulty router in the NoC. (b) West input port of router(3, 2) is unavailable. (c) Four ports of router(3, 2) are unavailable.

error is in the input or in the output buffer of a router. Indeed, by increasing the threshold, the probability of testing more paths inside a router increases. We can see that, with a low threshold of 2, in 82% of cases where error sources are in an input buffer, both one input and one output are disconnected. If the threshold is 4, for the same case, only in 46.62% of cases are both one input and one output disconnected. We can see that the more the threshold increases, the better is the localization. However, we can also see that between a threshold of 2 and 3, the gain is important. On the contrary, between a threshold of 3 and 4 the gain is less important. Moreover, with a higher threshold more lost data packets are required by the router before taking a decision. The choice of the threshold is a very important point: when it is too low, the network can waste resources by disconnecting nonfaulty blocks; when it is too high, time is wasted before faulty parts of routers are localized. Regarding the impact of the threshold on the logical resources, Table VIII gives the FPGA slices registers and LUTs required for these blocks. It can be noticed that the area of the local historic increases with the threshold. However, this block requires a small area. For instance, only 32 registers and 56 LUTs are required for a threshold of 3. We can conclude from this analysis of the impact of the threshold that a threshold of 3 demonstrates a good compromise between the capacity to locate the sources of errors and the number of data packets required to activate the unavailable indicators. We can see that the number of cases for an error located in an input, where both one input and one output are disconnected, is higher than for an error located in an output. Indeed, by using adaptive XY algorithms, when a North or South input receives a data packet, this means the packet is in the column of the destination and will always be sent towards the y -axis. Hence, when error sources are located in the input North or

Fig. 11. Average latency of data packets in a 6 6 RKT-NoC for several SEU (bit ips) frequencies.

South, the mechanism will disconnect both one input and one output. This is true only for the XY algorithm without bypass, and for the routers not connected to an IP. Regarding the capacity to locate the errors of the proposed RKT-NoC compared with the switch-to-switch and the codedisjoint mechanisms, Table IX gives a comparison of these three data packets error source localization mechanisms. The main advantage of the proposed mechanism is that, even if the error sources are inside a switch, only the faulty port is disconnected, and in the worst cases just one input and one output port are disconnected. Indeed, for the codedisjoint and the switch-to-switch mechanisms, when an error source is located inside a router, all the ports of the router are disconnected. Moreover, compared with the code-disjoint mechanism, when an error source is on a data bus, the RKT-

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

TABLE X ROUTING E RROR D ETECTION R ATES FOR A 6 6 RKT-N O C FOR O NE , T WO , AND T HREE P ERMANENT E RRORS Number of Permanent Routing Error in the Network 1 error 2 errors 3 errors Routing Error Detection Rate 83.5% 82.5% 81.1%

Fig. 12. Number of data packet lost in a 6 6 RKT-NoC for several SEU (bit ips) frequencies.

NoC can locate precisely the faulty bus and the loopback mechanism ensures that the data packet is not lost. E. Evaluation of the Routing Error Detection Rate To evaluate the capacity to detect and locate routing errors, we have performed several simulation cases. We simulated a 6 6 RKT-NoC surrounded by 24 communication modules. We used a random trafc pattern. The routing algorithm used for this simulation is the S-XY [20]. This adaptive algorithm is a straightforward solution for a dynamic mesh NoC. The packets are rst routed in the x -axis, and, when reaching the destination column, in the y -axis. If an obstacle is reached in the XY path, a bypass decision is performed locally in the direction of the destination or in a default direction if the destination is in the same column or line. We simulated an NoC with one, two, and three permanent errors and each case was simulated 1000 times. The routing faults simulated are permanent and simulate the routing logic blocks sending data packets always in the same direction. For example, a fault in a North routing logic block can always send packets to the East direction. The faults are randomly injected at the beginning of each simulation and can be in any routing logic block of the 36 RKT-switches. For each simulation, each communication module sends a maximum of 2000 data packets. If all the errors are detected, each communication module sends 2000 data packets again, in order to dene whether the routing error detection blocks will not detect bypass as a routing fault. An error is located when the unavailable ag of the input port of the faulty routing logic block is activated. Table X gives the routing error detection rates for each simulation case. It can be seen that for one permanent error, the proposed solution can locate precisely the error in 83.3% of cases. Among the undetected errors, 13 percentage points are errors that did not modify the routing decisions. These undetected errors occur when a North or South faulty routing block receives a data packet and sends it to the opposite direction (to North for a South routing logic, and to the South for a North routing logic). Indeed, in the S-XY, the routers rst route along the x -axis. Thus, when a router receives

a data packet in its North routing block, this means that the packet has reached the destination column. In this case, to send the data packet along the y -axis matches with the routing algorithm and therefore is not detected as a fault. By considering these no detections as normal operation, the routing error detection rate is 96%. The routing error detection rates are 82.5% and 81.1% for two and three permanent errors, respectively. The error detection rates decrease as the number of faults increases. Indeed, the accumulation of errors can prevent the detection of specic error cases. More precisely, if two errors are on the same NoC row and on the same side of the two neighboring routers, the detection of one will prevent the detection of the second. For instance, as illustrated in Fig. 13, a permanent error has been detected in the West port of router(2, 2). The bypass of this faulty port is made towards the North or South direction of router(1, 2). The West routing logic of router(3, 2) will not be used again due to the rules of the S-XY routing algorithm. Then, if there are errors in the West port of this second router, they will not be detected. This lack of detection is not problematic because this faulty routing logic is not used. We can conclude that the decrease of the routing error detection rate is not problematic because the undetected faults are in parts of the NoC that are not used (in the case of the S-XY routing algorithm), and the detection of real errors is close to 96%. VII. L IMITATIONS AND E XTENSIBILITY We assumed nonfaulty correction/detection blocks and DAI through the presented simulations and estimations. If this hypothesis is false, the faults can generate two error cases. The rst disables the detection capacity of the block, which cannot detect any errors. The second is the generation of false detection. More precisely, a detection block generating false detection nds an error even when there is none. For both cases, if the errors are transitory, their effects will not affect the NoC. Indeed, the routers check journals where an error needs to occur consecutively three times before a faulty part is disabled. Regarding the permanent faults generating the error detection incapacity, the data packets error detection relies on the subsidiarity of the NoC elements. More precisely, if an ECC does not detect an error in a data packet, this faulty packet is sent to a neighbor. This neighbor will detect the error and generate a Nack. After three Nacks, the data packet is looped back to the router that did not detect the error. After the loopback, the ECC and the local historic will locate the port in which the error was not detected and the router will isolate the input port including this faulty block. The permanent faults generating error detection incapacity (fault in the detection

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KILLIAN et al.: SMART RELIABLE NETWORK-ON-CHIP 13

TABLE VIII C APACITY TO L OCATE THE E RROR S OURCES FOR D IFFERENT T HRESHOLD S IZE OF THE L OCAL H ISTORIC
AND THE

V IRTEX VI S YNTHESIS R ESULTS OF T HIS B LOCK FPGA Synthesis Results Slices Slices Registers LUTs 24 36 32 56 40 68 48 74

Threshold Between Permanent and Transient Errors 2 3 4 5

Permanent Error Localization Rate (and Percentage of Case Having Both One Input and One Output Disconnected) Error source Error Source Error Source in Input Buffer in Output Buffer on Data Bus 100% (82%) 100% (53.5%) 100% (0%) 100% (57%) 100% (43.21%) 100% (0%) 100% (49.7%) 100% (30.65%) 100% (0%) 100% (46.62%) 100% (21.02%) 100% (0%) TABLE IX

C OMPARISON B ETWEEN THE RKT-N O C, S WITCH - TO -S WITCH , AND C ODE -D ISJOINT M ECHANISMS
OF

P ERMANENT D ATA E RROR S OURCE L OCALIZATION Error Source in Output Buffer All the ports are disconnected All the ports are disconnected Only the faulty output is disconnected (43% of cases one input and one output) Error Source on Data Bus All the ports are disconnected The faulty bus is disconnected and the packet is lost The faulty bus is disconnected and the packet is not lost by using the loopback

Switch-to-Switch Code-disjoint RKT-NoC

Error Source in input Buffer All the Ports are disconnected All the ports are disconnected Only the faulty input is disconnected (57% of cases one input and one output)

the use of the loopback module, routers using wormhole routing need some modication. More precisely, with wormhole techniques, a data packet can be spread over several routers. Each output block receiving a header it needs to store locally its routing information. When a it fails to be transmitted to a neighbor (i.e., the router receives three Nacks), the packet needs to be looped back. A header it is generated locally with the information saved in the output buffer, and the its that were not transmitted to the neighbor are looped back and analyzed by the ECC located in the input port, as detailed in Section IV-B. The it already transmitted to the neighbor continues to be routed towards the destination. Thus, this destination will receive the data packet in two parts, each having the same header it, which will permit the reconstitution of the data packet. VIII. C ONCLUSION
Fig. 13. Case of undetectable errors due to a bypass for the S-XY routing algorithm.

block or the DAI) in routing error detection cannot be detected in the NoC. However, standard solutions of fault tolerance can be applied to the routing error detection blocks and the DAI, like the duplication of the blocks/links [27]. Regarding the permanent false detections, because the presented mechanism can localize accurately the error sources, these error cases will only disconnect small parts of healthy routers, which does not critically affect the NoC, as shown in the simulations in Section VI-A. The presented error detection mechanisms have been detailed for a router based on the storeand-forward switching technique. However, our mechanisms are also suitable for virtual cut-through and wormhole routing. Indeed, the routing error detection as presented in the RKT-NoC can be used with these switching policies without any modication. Regarding the data packet errors based on

In this paper, we proposed new error detection mechanisms for dynamic NoCs. The proposed routing error detection mechanisms allow the accurate localization of permanent faulty routing blocks in the network. They are suitable for adaptive routing algorithms based on XY where the main difculty is to distinguish the bypasses of an unavailable component in the NoC (due to the use of the adaptive algorithm) from real routing errors (due to faulty components in the NoC). Validation simulations of our proposed routing error detection showed a routing error localization close to 96% for routing errors on an adaptive algorithm based on XY in a 6 6 NoC. Regarding the proposed data packet error localization mechanisms, the simulations presented in this paper clearly show the efciency of our techniques, which can localize permanent sources of errors more accurately than the switch-toswitch or code-disjoint mechanisms. Moreover, both presented techniques can distinguish permanent and transient errors, and show attractive performance as presented in the FPGA

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
14 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

synthesis comparisons with a nonreliable NoC. Our ongoing work focuses on evaluating accurately the impact of faulty detection blocks and improving the routing error detection mechanisms, by protecting the DAI links and routing detection blocks against errors. R EFERENCES
[1] K. Sekar, K. Lahiri, A. Raghunathan, and S. Dey, Dynamically congurable bus topologies for high-performance on-chip communication, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 10, pp. 14131426, Oct. 2008. [2] J. Shen and P. Hsiung, Dynamic Recongurable Network-on-Chip Design: Innovations for Computational Processing and Communication, J. Shen and P. Hsiung, Eds. Hershey, PA, USA: IGI Global, 2010. [3] G.-M. Chiu, The odd-even turn model for adaptive routing, IEEE Trans. Parallel Distrib. Syst., vol. 11, no. 7, pp. 729738, Jul. 2000. [4] Y. M. Boura and C. R. Das, Efcient fully adaptive wormhole routing in n-dimensional meshes, in Proc. 14th Int. Conf. Distrib. Comput. Syst., Jun. 1994, pp. 589596. [5] C. Bobda, A. Ahmadinia, M. Majer, J. Teich, S. Fekete, and J. van der Veen, DyNoC: A dynamic infrastructure for communication in dynamically recongurable devices, in Proc. Int. Conf. Field Program. Logic Appl., Aug. 2005, pp. 153158. [6] T. Pionteck, R. Koch, and C. Albrecht, Applying partial reconguration to networks-on-chip, in Proc. Field Program. Logic Appl. Int. Conf., Aug. 2006, pp. 16. [7] S. Jovanovic, C. Tanougast, and S. Weber, A new high-performance scalable dynamic interconnection for fpga-based recongurable systems. in Proc. Int. Conf. Appl.-Specic Syst., Archit. Process., Jul. 2008, pp. 6166. [8] S. Jovanovic, C. Tanougast, C. Bobda, and S. Weber, CuNoC: A dynamic scalable communication structure for dynamically recongurable FPGAs, Microprocess. Microsyst., vol. 33, no. 1, pp. 2436, Feb. 2009. [9] P. Lysaght and J. Dunlop, Dynamic reconguration of FPGAs, in Proc. Int. Workshop Field Program. Logic Appl. More FPGAs. 1994, pp. 8294. [10] J. Wu, A fault-tolerant and deadlock-free routing protocol in 2d meshes based on odd-even turn model, IEEE Trans. Comput., vol. 52, no. 9, pp. 11541169, Sep. 2003. [11] D. Park, C. Nicopoulos, J. Kim, N. Vijaykrishnan, and C. Das, Exploring fault-tolerant network-on-chip architectures, in Proc. Int. Conf. Depend. Syst. Netw., Jun. 2006, pp. 93104. [12] S. Jovanovic, C. Tanougast, S. Weber, and C. Bobda, A new deadlock-free fault-tolerant routing algorithm for NoC interconnections, in Proc. Int. Conf. Field Program. Logic Appl., Aug.Sep. 2009, pp. 326331. [13] D. Fick, A. DeOrio, G. Chen, V. Bertacco, D. Sylvester, and D. Blaauw, A highly resilient routing algorithm for fault-tolerant NoCs, in Proc. Design, Autom. Test. Eur. Conf. Exhibit., Apr. 2009, pp. 2126. [14] W. Dally and C. Seitz, Deadlock-free message routing in multiprocessor interconnection networks, IEEE Trans. Comput., vol. C-36, no. 5, pp. 547553, May 1987. [15] M. Hosseinabady, M. Kakoee, J. Mathew, and D. Pradhan, Low latency and energy efcient scalable architecture for massive NoCs using generalized de Bruijn graph, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 8, pp. 14691480, Aug. 2011. [16] C. Grecu, L. Anghel, P. Pande, A. Ivanov, and R. Saleh, Essential faulttolerance metrics for NoC infrastructures, in Proc. Int. On-Line Test. Symp., 2007, pp. 3742. [17] A. P. Frantz, L. Carro, E. Cota, and F. L. Kastensmidt, Evaluating SEU and crosstalk effects in network-on-chip routers, in Proc. 12th IEEE Int. Symp. On-Line Test., Jul. 2006, pp. 191192. [18] C. Grecu, A. Ivanov, R. Saleh, E. Sogomonyan, and P. Pande, On-line fault detection and location for NoC interconnects, in Proc. 12th IEEE Int. On-Line Test. Symp., Jul. 2006, pp. 145150. [19] N. Karimi, A. Alaghi, M. Sedghi, and Z. Navabi, Online networkon-chip switch fault detection and diagnosis using functional switch faults, J. Universal Comput. Sci., vol. 14, no. 22, pp. 37163736, 2008. [20] M. Majer, C. Bobda, A. Ahmadinia, and J. Teich, Packet routing in dynamically changing networks on chip, in Proc. 19th IEEE Int. Parallel Distrib. Process. Symp., Apr. 2005, p. 154b.

[21] J. Raik, V. Govind, and R. Ubar, An external test approach for network-on-a-chip switches, in Proc. 15th Asian Test Symp., Nov. 2006, pp. 437442. [22] A. Pullini, A. Federico, D. Bertozzi, and L. Benini, Fault tolerance overhead in network-on-chip ow control schemes, in Proc. 18th Symp. Integr. Circuits Syst. Design Conf., Sep. 2005, pp. 224229. [23] A. Ejlali, B. Al-Hashimi, P. Rosinger, S. Miremadi, and L. Benini, Performability/energy tradeoff in error-control schemes for on-chip networks, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 1, pp. 114, Jan. 2010. [24] Early Access Partial Reconguration User Guide, Xilinx Inc., San Jose, CA, USA, 2008. [25] Xilinx Power Tools Tutorial, Xilinx Inc., San Jose, CA, USA, 2011. [26] C. Killian, C. Tanougast, S. Jovanovic, F. Monteiro, C. Diou, and A. Dandache, Modeling and behavioral co-simulation C-VHDL of network on chip on FPGA for education, in Proc. Recong. Commun. Centric SoCs, May 2010, pp. 135139. [27] J. Han, Toward hardware-redundant, fault-tolerant logic for nanoelectronics, IEEE Design Test Comput., vol. 22, no. 4, pp. 328339, Jul. Aug. 2005. Cdric Killian received the M.S. degree in electronic systems from the University of Metz, Metz, France, in 2009, the Ph.D. degree in electronic systems from the University of Lorraine, Metz, in 2012. He is currently a Researcher with LICM. His current research interests include eld-programmable gate array architecture design, adaptive network-onchip, reliability, and VLSI design. Camel Tanougast received the Ph.D. degree in microelectronic and electronic instrumentation from the Henri Poincar University of Nancy, Nancy, France, in 2001, and the Habilitation degree from the University of Metz, Metz, France, in 2009. He is currently an Associate Professor in electronics with the University of Lorraine, Metz. He joined the Microelectronic and Sensor Interface Laboratory of Metz (LICM) in 2008. He is the Head Research of the networked adaptive and self-organized systems. He has authored and co-authored more than 70 publications. His current research interests include recongurable systems and NoCs, design and implementation real time processing architectures, System-on-Chip development, FPGA design, computing vision, image processing, cryptography and the Digital Television Broadcast. Fabrice Monteiro (M96) received the Ph.D. degree in microelectronics from the University of Montpellier, France, in 1992. He has been an Associate Professor and Professor with the University of Lorraine, France (formerly University of Metz), since 1994. He has been involved in several academic and industrial research projects targeting interdisciplinary topics in the areas of telecommunications, mecatronics, and sensor networks. His current research interests include fault tolerant digital circuits and systems, coding theory and related high throughput circuitry for communication applications, new MPSoC and NoC paradigms. Abbas Dandache received the Ph.D. degree both in computer sciences and microelectronics from the INPG, Grenoble, in 1983 and 1986, respectively, and the Habilitation degree from the University of Metz, Metz, France, in 2000. He is currently a Professor with the University of Lorraine, Metz, since 2001. Since 2007, he has been the Director of the LICM Laboratory. He has been a Principal Investigator of several research contracts from the French telecommunications industry. He has authored and co-authored several publications in the domain. His current research and teaching interests include electronics design are related to embedded systems and smart sensors. Keywords of his activity include high performance communicating circuitry, fault-tolerant and dependable computing and, error coding circuitry.

Вам также может понравиться