Вы находитесь на странице: 1из 10

Model Checking a SystemC/TLM Design of the AMBA AHB Protocol

Marcel Pockrandt Technische Universit at Berlin Berlin, Germany marcel.pockrandt@tu-berlin.de Paula Herber International Computer Science Institute Berkeley, California, USA herber@icsi.berkeley.edu Sabine Glesner Technische Universit at Berlin Berlin, Germany sabine.glesner@tu-berlin.de

AbstractTransaction Level Modeling (TLM) is gaining more and more importance to quickly evaluate design alternatives in multimedia systems and other mixed HW/SW systems. However, the comprehensive and automated verication of TLM models is still a difcult challenge. In previous work, we presented an approach for model checking of SystemC/TLM designs based on a transformation into U PPAAL timed automata. In this paper, we present an optimized version of our previously proposed transformation, and show its effectiveness with experimental results from an industrial case study. The key idea is to generate a U PPAAL model that is especially tailored for being model checked. This signicantly reduces the semantic state space and makes model checking considerably faster and less memory-consuming. We demonstrate this by comparing the verication times of both versions for our previously used case study, and by presenting results from a new and larger case study, namely a TLM implementation of the AMBA Advanced High-performance Bus (AHB). The AMBA bus is one of the most popular on-chip bus architectures in IP-based embedded SoCs, and it is used in many multimedia applications. The case study shows that with the proposed optimizations, our approach is applicable for industrial real world examples. The detection of a serious bug, namely a deadlock situation in a certain scenario, and also the verication of some important safety, liveness, and timing properties provide evidence for the usefulness of our approach.

I. I NTRODUCTION SystemC/TLM [11], [19] is widely used for modeling and simulation in HW/SW co-design. Its main advantages lie in early platform development, fast simulation and evaluation of different design alternatives, and in its exible integration with pure hardware description languages such as VHDL or Verilog. The highest relevance for design-space exploration and rapid prototyping have transaction level models, which provide enough details for architecture evaluation but are much faster to simulate than bit- and cycle-accurate models. However, while TLM models are well-suited for architecture evaluation and rapid prototyping, they are difcult to verify. In particular, formal verication and automated testing are typically not supported by TLM design frameworks. For example, for SystemC/TLM, there exist a powerful simulation environment and many tools to ease development with graphical interfaces and static analysis tools, but not even the semantics of SystemC/TLM is dened formally in [11],

[19]. There exist some approaches to formalize the semantics of SystemC, e. g., [17], [5], [7], [23], [15], but they are mostly either limited to the synthesizable subset of SystemC, or they require a tedious manual formalization of a given design. In contrast to purely synchronous hardware design languages, SystemC/TLM supports concurrent processes, dynamic sensitivity and timing, and abstract communication. State-of-the-art formal hardware verication techniques are thus not applicable as well. In previous work, we have presented an approach to overcome these problems by formalizing the semantics of SystemC/TLM with the help of U PPAAL timed automata [9], [10]. U PPAAL timed automata [2] have the advantage that their semantics is formally well-dened, and that they come with the U PPAAL tool suite. The U PPAAL tool suite provides means to animate and simulate timed automata models, and, most importantly for us, also a model checker that enables the fully-automatic verication of safety, liveness, and timing properties. In [9], we presented an approach to automatically transform a given SystemC design into a U PPAAL timed automata model, which instantaneously enables the application of the U PPAAL model checker. This approach was extended in [10] for the TLM 2.0 standard. We showed the applicability of this approach with two small case studies, namely a loosely-timed model that uses a blocking transport and an approximately-timed model that uses a 4-phase non-blocking transport. In this paper, we present a novel and highly optimized version of our transformation from SystemC/TLM into U P PAAL timed automata. In contrast to our previously proposed approach, we do not aim at a transformation which is as close as possible to the implementation of the TLM core interfaces. Instead, we focus on efciency issues, with the general aim of enhancing the scalability of model checking of the resulting timed automata model. The main challenge when formalizing the semantics of the TLM core interfaces is the payload event queue (PEQ), which is used to maintain a queue of SystemC event notications, where each notication is associated with a transaction object. This mechanism is used for non-blocking communications, where the communication method immediately returns but a

978-1-4577-2122-9/11/$26.00 2011 IEEE

66

ESTIMedia 2011

further action on the transaction object is typically scheduled for later execution. The postponed execution of actions on transaction objects is difcult to model in the semantics of U PPAAL timed automata if there are multiple concurrent notications. In [10], we presented a formalization of the PEQ using four different timed automata, which is consistent with the SystemC/TLM reference implementation that uses as many methods and processes. However, in this paper, we present an alternative formalization that only requires two automata. The most important optimization of our new approach is that we model the PEQ in a more abstract way. With that, we reduce the number of clocks and variables, and consequently reduce the size of each semantic state and the overall semantic state space signicantly. Additionally, we use a simple live variable analysis to temporarily reset all unused variables in order to make it possible for the model checker to detect and use symmetries in the model. To show the effect of our optimizations, we present experimental results for the approximately-timed model that uses a 4-phase non-blocking transport, which we also used in [10], and compare the verication times. To demonstrate the practical applicability of our approach, we applied it to a new and signicantly larger design, namely the TLM 2.0 solution for the AMBA AHB released by Carbon Design Systems 1 in February 2011. The rest of this paper is structured as follows: In Section II, we summarize related work. In Section III, we briey introduce SystemC and U PPAAL timed automata. In Section IV, we review our formal semantics for SystemC/TLM as presented in [9], [10]. In Section V, we describe our optimized version for the formalization of the TLM 2.0 standard core interfaces. Finally, we present experimental results in Section VI and conclude in Section VII. II. R ELATED W ORK There have been several approaches to provide a formal semantics for SystemC. For example, a denition of the simulation semantics based on abstract state machines is given by M uller et al. [17] and Ruf [20]. The purpose of their work is to provide a precise description of the SystemC scheduler. However, the system design itself, as built from modules, processes and channels, is not covered and therefore cannot be veried with this approach. Salem [21] presented a denotational semantics for the SystemC scheduler and for SystemC processes, but only for a synchronous subset. Similarly, Groe et al. [5] present an approach for formal verication of SystemC designs using Binary Decision Diagrams (BDDs) and bounded model checking, but only for the synthesizable subset. In contrast to our approach, they are not able to cope with dynamic sensitivity or timing. Habibi et al. [8], [7] proposed program transformations from SystemC into equivalent state machines. In these approaches, time
1 http://www.carbonipexchange.com/

is ignored, and the transformation is performed manually. Besides, the state machine models do not reect the structure of the underlying SystemC designs. Traulsen et al. [22] proposed a mapping from SystemC to PROMELA, but they only handle SystemC designs at an abstract level, do not model the non-deterministic scheduler and cannot cope with primitive channels. Zhang et al. [23] introduced the formalism of SystemC waiting-state automata. Those SystemC waiting-state automata are supposed to allow a formal representation of SystemC designs at the delta-cycle level. However, the approach is limited to the modeling of delta-cycles, the scheduler and complex interactions between processes are not considered and the formal model has to be specied manually. In [14], Man presented the formal language SystemCFL , which is based on process algebras and denes the semantics of SystemC processes by means of structural operational semantics style deduction rules. SystemCFL does not take dynamic sensitivity into account, and considers only simple communications. The concept of channels is neglected. A tool to automatically transform SystemC to SystemCFL is presented by Man in [15]. However, it does not handle any kind of interaction between processes. Karlsson et al. [12] verify SystemC designs using a petrinet based representation. This introduces a huge overhead because interactions between subnets can only be modeled by introducing additional subnets. To the best of our knowledge, only a few approaches directly target SystemC-TLM designs. In [16], a toolbox for the analysis of transactional SystemC designs is proposed, which is based on a transformation from SystemC to heterogeneous parallel input/output machines (HPIOM). The approach is similar to ours in that it also provides an executable formal semantics and an automatic transformation. However, time is not explicitly considered there. In particular, the timing behavior is heavily over-approximated, which makes the models difcult to verify. Furthermore, certain aspects of the SystemC semantics are disregarded, for example, the overriding of pending notications. They do not support the TLM 2.0 standard, but focus on STMicroelectronics TAC implementation (Transaction Accurate Communication Channel). In [18], the authors propose to transform SystemC-TLM models into communicating state machines. However, they target the TLM concepts on an abstract level and do not capture the precise semantics of the TLM transport mechanisms or sockets, and the transformation can only be performed manually. In [4], the authors propose a translation from SystemC/TLM into LOTOS, and they use the verication toolbox CADP to import C Code into the LOTOS model. This approach is very expressive and captures a large share of SystemC and C++ constructs. However, the translation has to be done manually and they also do not support the TLM 2.0 standard. In [6], bounded model checking is used on untimed SystemC designs. Though this approach works on the abstraction level of TLM models,

67

it can neither handle the OSCI TLM 2.0 standard nor any timed SystemC construct. Recently, Bombieri et al. [3] presented an approach for model checking TLM 2.0 IPs by synthesizing RTL IP models from them and applying RTL model checkers to the model. An important advantage of this approach is that they can verify whether existing TLM assertions hold in the synthesized IP. Also, the separation between protocol and functionality presented there is very interesting. However, as they rely on hardware synthesis to construct the IP from the TLM model, the whole model checking approach is restricted to the synthesizable subset of SystemC. III. P RELIMINARIES A. SystemC SystemC [11] is a system level design language and a framework for HW/SW co-simulation. It allows modeling and executing of both hardware and software on various levels of abstraction. The design ow usually starts with approximately timed transaction-level models that are rened to time-accurate models of hardware and software components. It is implemented as a C++ class library, which provides the language elements for the description of both hardware and software, and an event-driven simulation kernel. A SystemC design is a set of communicating processes, triggered by events and interacting through channels. Modules and channels are used to represent structural information. SystemC also introduces an integer-valued time model with arbitrary time resolution. The execution of a SystemC design is controlled by the SystemC scheduler. It controls the simulation time and the execution of processes and handles event notications and updates primitive channels. Like typical hardware description languages, SystemC supports the notion of delta-cycles, which impose a partial order on parallel processes. The execution order of these processes is chosen non-deterministically. B. The TLM Standard Transaction Level Modeling (TLM) is mainly used for early platform evaluation, performance analysis, and fast simulation of HW/SW systems. The general idea is to use transactions as an abstraction for bit-accurate hardware data types, which are transmitted between different modules by abstract function calls rather than pin- and cycle accurate bus protocols. This enables simulations on different abstraction levels, trading off accuracy and simulation speed. The main goal of the TLM standard [19] is to provide interoperability between different transaction level models. The core of the TLM standard is the interoperability layer, which comprises the TLM core interfaces, sockets, a generic payload and a base protocol. 1) The core interfaces implement standard blocking and non-blocking transport mechanisms.

2) Sockets are used to connect initiator and target modules. 3) The generic payload can be used to represent arbitrary transaction objects 4) The base protocol is a set of rules on how to use the TLM core interfaces to achieve maximal interoperability. TLM models often use one of the following two coding styles: Loosely-timed models are typically expected to use the blocking transport interface and temporal decoupling. Approximately-timed models are more accurate, and they are typically expected to use the non-blocking transport interface and the payload event queues, which make it possible that the processing of a transaction is postponed for later execution. C. U PPAAL Timed Automata Timed automata [1] are nite-state machines extended by clocks. A timed automaton is a set of locations connected by directed edges. Two types of clock constraints are used to model time-dependent behavior: Invariants are assigned to locations and enforce progress by restricting the time the automaton can stay in this location. Guards are assigned to edges and enable progress only if they evaluate to true. Networks of timed automata are used to model concurrent processes, which are executed with an interleaving semantics and synchronize on channels. U PPAAL [2] is a tool suite for modeling, simulation, and verication of networks of timed automata. The U PPAAL modeling language extends timed automata by bounded integer variables, a template mechanism, binary and broadcast channels, and urgent and committed locations. Bounded integer variables are manipulated with a C like action language. It is possible to declare local or global variables. Global variables can be used to pass values between processes in a network of timed automata. The U PPAAL template mechanism can be used to instantiate timed automata with different variables. In particular, it is also possible to instantiate an automaton with parameter p in a way that p is replaced by a global variable (operations on p will then be applied to the global variable). We will use this for the binding mechanism in our transformation. Binary channels enable a blocking synchronization between two processes, whereas broadcast channels enable non-blocking synchronization between one sender and arbitrarily many receivers. Urgent and committed locations are used to model locations where no time may pass. Furthermore, leaving a committed location has priority over non-committed locations. A small example U PPAAL timed automaton is shown in Figure 1. The initial location is denoted by , and request? and ack! denote sending and receiving on channels, respectively. The clock variable x is rst set to zero and then used in two clock constraints: the invariant

68

request? x=0 ack! value = f(t)


Figure 1.

x <= maxtime x >= mintime

Example Timed Automaton

Processes methods request update

notify wait

Events

activate delta delay deactivate update end update start

advance time

such as primitive channels [9]. As the semantics of the SystemC elements is only informally dened in [11], we dene their formal semantics through our timed automata models. Furthermore, for the transformation of a given SystemC design, these models can be instantiated arbitrarily often. With that, we achieve a compositional transformation, i. e., we transform each module separately and compose the system in a nal instantiation and binding phase. As a consequence, the transformation scales well even for large SystemC designs. B. SystemC-TLM Semantics In [10], we have presented a formal semantics for the TLM standard core interfaces (i. e., blocking and nonblocking transport) by mapping them to an equivalent U P PAAL timed automata representation. To achieve this, we adopted the formalization we presented in [9] for sockets and transactions, and we presented a set of timed automata models that precisely capture the semantics of the TLM core interfaces. The formalization of sockets and the generic payload requires some restrictions on the set of input designs because some of their characteristics can generally not be transformed into an equivalent timed automata representation. This means that we have to impose the following additional restrictions on a given SystemC/TLM design: 3) We require that sockets are created statically, and that socket binding only takes place before elaboration time. 4) The instantiation of the generic payload with a concrete transaction type has to consist only of (possibly multiple) bounded integers. 5) The number of concurrent non-blocking transport requests must be bound by a statically determinable maximum. If these restrictions are met, the transformation of sockets only requires to determine which methods are bound to a socket and then use a standard call-return semantics. The transformation of the TLM core interfaces was more challenging as it requires additional semantical constructs, namely the payload event queue (PEQ). The purpose of PEQs is to enable the independent implementation of the delays of different communication phases in the target and the initiator. To this end, a PEQ is able to manage a time-ordered list of transaction objects. A transaction object is inserted by calling a notify function, whose parameters the transaction object and a delay. When the delay expires, a predened callback function peq_cb is called on the transaction object and handles its further processing. The principle of a PEQ is shown in Figure 3. To capture the semantics of the PEQ, it is necessary to keep track of a set of concurrent notications, and to invoke the callback method at the correct times and in the correct order. In [10], we presented four timed automata to capture the semantics of a PEQ: One automaton implementing a time-

Primitive Channels
Figure 2.

Scheduler

Representation of SystemC designs in U PPAAL

x <= maxtime denotes that the corresponding location must be left before x becomes greater than maxtime, and the guard x >= mintime enables the corresponding edge at mintime. The symbols and c depict urgent and committed locations. IV. F ORMAL S EMANTICS OF S YSTEM C/TLM A. SystemC Semantics In [9], we have presented a formal semantics for SystemC by dening a transformation from SystemC into U PPAAL timed automata. The transformation preserves the (informally dened) behavioral semantics and the structure of a given SystemC design and can be applied fully automatically. It can handle all relevant SystemC language elements, including process execution, interactions between processes, dynamic sensitivity and timing behavior. It only requires two restrictions: 1) We cannot handle dynamic process or object creation. 2) Only bounded integer variables are supported. The rst restriction should hardly narrow the applicability of the approach, as dynamic object and process creation are rarely used in SystemC designs. The second restriction is also acceptable, as most data types used in SystemC designs can be converted to bounded integers. Figure 2 shows how we represent SystemC designs in U PPAAL. Each method is mapped to a single timed automata template. Process automata are used to encapsulate these methods and care for the interactions with event objects, the scheduler, and primitive channels. The interactions are modeled using U PPAAL channels. For example, the processes notify events using notify, and the events trigger the processes over a wait channel if they are notied. To formalize the execution semantics of SystemC, we developed predened timed automata models of the SystemC scheduler, processes, events and other SystemC constructs

69

xj notify( , dj) tj tj = xj + dj ti < tj tk tk ...

enqueue() PEQ t0 peq cb( t0 t1 ... ti tj

dequeue() )
Payload Event Queue (PEQ)

Figure 3.

ordered list, one modeling the interface of the PEQ (i. e., the notify function the PEQ provides), one for fetching events from the queue and invoking the callback function, and one for the PEQ event. To be able to keep track of a set of timed event notications, we introduced a global clock array elapsed. In this array, we used one entry for each (possible) PEQ element, i. e., the size of the array was determined by the maximal number of concurrent nonblocking transport requests of a given design. We used the difference between the delay stored in a queue element and its elapsed clock to achieve a time-ordering on PEQ elements in the timed automaton that implements the timeordered list, and to release PEQ event notications if the clock reaches the delay. V. O PTIMIZED T RANSFORMATION The main goal of the optimized version of our transformation of the TLM core interfaces into U PPAAL timed automata is to reduce the semantic state space, which must be explored during model checking. We perform this reduction without loss of information, i. e., our optimized version does not under- or over-approximate the behavior of the TLM core interfaces. The key idea of our optimization is an encoding of payload event queues that is better suited for model checking. A semantic state of a U PPAAL timed automata model comprises the set of current locations, the values of all data variables, and the clock zone computed from the values of all clock variables. To reduce the semantic state space which is explored by the model checker, there are basically three possibilities: 1) Reduce a single semantic state by reducing the number of locations, variables or clocks. 2) Reduce the number of reachable symbolic states by, for example, reducing the number of clocks or the range of variables. 3) Make it easier for the model checker to detect symmetries in the model. Note that symmetry detection can, for example, be eased by resetting unused variables. This is in particular helpful

for local variables, because the whole timed automaton representing a method or process is only reset to its initial state if all its local variables are reset too. In our optimized version of the TLM core interface formalization, we made use of all three potential optimization angles. However, the most effective optimization is a reduction in the number of clocks because the model checking effort is exponential in the number of clocks. With a maximum number of n concurrent PEQ notications and k PEQs in use, our previously proposed formalization of the PEQ required n k + 1 clocks in the global elapsed array, n for each PEQ instance and one for a global clock which is used for comparisons. In our optimized transformation, we manage the timing for each transaction object in the PEQ locally in a separate process. The use of a separate process for each transaction object may sound as if producing an overhead at rst. However, the U PPAAL model checker can cope much better with processes than with data structures. Saving the global clock, which was previously used for comparisons, yields an additional advantage. In particular, by using local clocks, which run independently from each other, we do not need to keep track of the differences between all clocks. Furthermore, the optimized version has no need for sorting the queued events, which also reduces the amount of used variables and eliminates the computational overhead. In the optimized transformation, we formalize the PEQ semantics using only two automata. Merging the PEQ mechanism into only two automata poses two difculties: First, we want to faithfully respect the PEQ semantics, without losing or adding any behavior (i. e., we dont want to perform any under- or over-approximation). Second, we need to embed the PEQ automata properly into the rest of the SystemC semantics. In particular, it is important to correctly capture the interactions with the SystemC scheduler. The requirements are as follows: 1) For each PEQ entry, the callback method must be invoked exactly at the time where the delay expires. 2) All PEQ entries must be processed in the correct order. 3) A PEQ notication must never be blocked, it must always be immediately accepted. 4) A timed automaton with a local clock must send the scheduler an advancetime signal whenever its local clock expires. This is necessary to ensure that the scheduler starts a new delta-cycle. 5) A timed automaton with a local clock must also synchronize on advancetime as a receiver. This is necessary to ensure that whenever a new delta-cycle is started by the scheduler, all actions that should take place at the same time are executed in the same deltacycle. This is particularly important to ensure that all possible interleavings of concurrent processes are considered. 6) When invoking the callback method, a PEQ behaves

70

... peq_fetch#ctrl? m_peq_notify_ctrl! m_peq_notify_param_trans = tran, m_peq_notify_param_phase = phase, m_peq_notify_param_t = delay

peq_cb#ctrl! peq_cb#param#tran = peq_fetch#param#tran, peq_cb#param#phase = peq_fetch#param#phase peq_cb#ctrl? deactivate! readyprocs-Figure 5. Timed Automaton for Callback Invocation

...
Figure 6. Timed Automaton of an nb_transport Function

as a standard SystemC process. In the original implementation, this is ensured by a dedicated SystemC process that fetches transaction objects from the queue and invokes the callback function. The two U PPAAL timed automata that precisely capture the semantics of the PEQ mechanism and also meet these requirements are shown in Figure 4 and 5. The automaton in Figure 4 receives incoming notications, stores the transaction object (its payload and its phase) in the local variables payload and phase, and the given delay in a local delay variable. Furthermore, it starts a local clock c by resetting it to zero. Then, it changes to a location with an invariant c <= delay. There, it waits until the delay expires. If the given delay is a zero delay, the transaction object is to be processed without a timed delay during the next delta-cycle. In this case, the waiting location is left if the deltadelay signal occurs, which is sent by the scheduler whenever a new delta-cycle is started. If the delay is greater than zero, the waiting location is left as soon as c == delay. Note that we have two transitions here, one where the automaton synchronizes on advancetime as a sender, and one where it synchronizes as a receiver. This ensures that all automata whose delay expires at the same time synchronize on the same advancetime signal, i. e., that the requirements (4.) and (5.) are met. After leaving the waiting location, i. e., when the delay expired and the transaction is to be processed, the automaton in Figure 4 starts to behave like a standard SystemC process that was triggered by an event, i. e., it increments the variable readyprocs to inform the scheduler that an additional process is ready to execute, and waits for the activate signal, which is sent by the scheduler to start processes in arbitrary order. Then, it yields control to the second automaton shown in Figure 5. The second automaton invokes the callback method and passes the transaction object on to it. When the callback method returns, it informs the scheduler that the process has terminated by sending deactivate and decrementing readyprocs. The splitting between the two automata ensures that the notify automaton can receive a new

notication as soon as the callback method is invoked. For every PEQ we have to create one instance of the callback invocation automata and n instances of the notify automaton for a maximum number of n concurrent PEQ notications. For a better understanding of the interactions between the two PEQ automata and the rest of the U PPAAL model, have a look at Figure 6 and Figure 7. Figure 6 shows how an nb_transport function uses the interface of the PEQ (notify) to enqueue a transaction object together with an associated delay. Figure 7 shows how a callback function receives a transaction object from the PEQ by storing its values into local variables. Note that the four automata are connected by instantiating their parameters with the same global variables. For example, for a given PEQ instance, peq_notify#ctrl in Figure 4 is instantiated with the same global channel as m_peq_notify#ctrl in Figure 6. Similarly, the global variables are used to connect the parameters, e. g., peq_notify#param#trans and m_peq_notify#param#trans are bound to the same global variable. In the reference implementation, the parameters trans and phase are passed by-reference, i. e., their values must be copied into local variables within the notify automaton and back to the nb_transport automaton when the execution of notify is nished. A schematic of the interconnections between the different automata is given in Figure 8. The nb_transport method invokes a peq_notify automaton in a non-blocking fashion, i. e., it continues execution immediately. A binary channel is used for all control channels, which ensures that this is a one-aone communication and that another peq_notify automaton is non-deterministically chosen for each PEQ entry. The peq_notify automata synchronize themselves with the scheduler through the broadcast channels advancetime and deltadelay, and they invoke peq_fetch through a binary channel if their delay expires. Finally, peq_fetch invokes the callback method peq_cb, which optionally may invoke another call to nb_transport. Our formalization faithfully respects the semantics of the PEQ implementation in the TLM 2.0 standard with only one exception: in the TLM 2.0 implementation, PEQ entries that expire at the same time are processed in a deterministic order. The sorting algorithm inserts those elements at the last

71

peq_notify#ctrl? payload = peq_notify#param#trans, phase = peq_notify#param#phase, delay = peq_notify#param#delay, c=0 peq_fetch#ctrl! peq_fetch#param#tran = payload, peq_fetch#param#phase = phase, phase = 0, delay = 0, payload = data#empty activate?
Figure 4.

c <= delay

delay != 0 advancetime? delay == 0 deltadelay? c == delay

delay != 0 && c == delay advancetime!

readyprocs++

Timed Automaton for PEQ Notications

ctrl? tran = param#tran, phase = param#phase ctrl! ... // process transaction object

VI. E XPERIMENTAL R ESULTS To evaluate the practical applicability and the performance of our approach, we have implemented the optimized transformation and applied it to an industrial case study, namely the TLM 2.0 solution for the AMBA AHB released by Carbon Design Systems in February 2011. To give a better impression of the effect of our optimizations, we also used an enhanced version of a case study presented in [10] with both the unoptimized and the optimized transformation. All experiments were run on a machine with an Intel Pentium 3.4 GHz CPU with 4 GB Ram running a Linux operating system. Verication times are averaged over 10 runs. A. Non-Blocking Transport One example used in [10] is an approximately-timed model, where a producer and a consumer communicate through a communication protocol with 4 phases, using the non-blocking transport including PEQs. The example consists of 145 lines of code and contains one process. We modied the example such that we can use it with different numbers of concurrent non-blocking transport requests, in order to analyze the performance gain. We applied both our unoptimized and our optimized transformation to this modied example and used the U PPAAL model checker to verify deadlock freedom. Table I shows the verication time needed with one (NB1) to ve (NB5) concurrent requests in each the producer and the consumer for both transformations, as well as the number of states explored during model checking. Note that the unoptimized transformation leads to an out of memory exception in the U PPAAL model checker when more than 2 concurrent requests occurred ( ). It can be seen that the optimized transformation reduces the verication times drastically. This is achieved on the one hand by a reduction of the number of semantic states and on the other hand by a reduction of the size of each semantic state enabling the exploration of more states in a shorter period of time. Both together enable the verication of systems with more concurrent non-blocking transport requests.

Figure 7.

Timed Automaton of a Callback Method

nb transport peq notify#ctrl peq fetch#ctrl peq notify advance time Scheduler
Figure 8.

peq cb peq cb#ctrl peq fetch

delta delay

Interconnections in the Timed Automata Model

possible position. As a consequence the processing order is rst in rst out for simultaneously expiring entries. In contrast to that, our formalization processes PEQ entries that expire at the same time in non-deterministic order, i. e., all possible interleavings are considered during model checking. This is a slight over-approximation of the PEQ semantics used in the TLM reference implementation, but it corresponds to the TLM 2.0 standard, where no specic order of simultaneously expiring entries is dened. Furthermore, if safety properties are proved to be correct on our model, they will always also be correct in the SystemC-TLM design because we only check additional behavior. Note that in the notify automaton, we reset all variables before returning to the initial location. This ensures that the model checker can detect and use symmetries in the model. Furthermore, note that the local clocks in each notify automaton now run independently from each other.

72

NB1
UNOPTIMIZED

NB2 50:05 7,862 0:01 3,234

NB3

NB4

NB5

CPU time # states


OPTIMIZED

16:47 1,953 0:01 460

0:01 92,789

0:29 1,062,532

47:54 66,359,278

CPU time # states

Table I M ODEL C HECKING R ESULTS FOR N ON -B LOCKING T RANSPORT

TLM2 M1: tlm2 master


init sock slave sock

TLM2 M2: tlm2 master


init sock slave sock

b transport

AHB M1: AHB M2: master to ahb master to ahb


master sock slave sock master sock

nb transport fw/bw

AHB Bus
master sock

nb transport fw/bw
slave sock slave sock

AHB S1: ahb mem


Figure 9.

AHB S2: ahb mem

Architecture of the TLM 2.0 AMBA AHB Design

B. AMBA AHB The Advanced Microcontroller Bus Architecture (AMBA) Bus is an on-a-chip bus introduced by ARM Ltd2 in 1996. The AMBA advanced high performance bus (AHB) protocol was introduced in 1999 and features burst transfers, split transactions and a bus width of up to 128 bits. Many high performance SoCs in a wide area of applications are currently using the AMBA AHB. For our experiments, we used a TLM 2.0 implementation of the AMBA AHB provided by Carbon Design Systems. The architecture of the design is shown in Figure 9. The tlm2 master initiates communications by sending read or write transactions via a blocking transport to the master to ahb module. The master to ahb module splits the given transaction into AMBA conform transfers and sends those over the bus according to the AMBA AHB protocol specication, i. e., with the correct timing, protocol phases, and transfer types. The AMBA AHB is a synchronous clocked bus. The timing and arbitration of the AMBA AHB are described in [13]. An AMBA AHB transfer starts with a bus request asserted by a bus master. The arbiter collects all bus requests and sends a grant signal to one master. The granted bus master then drives the address and control signals. These signals provide information on the address, direction and
2 http://www.arm.com

width of the transfer, as well as an indication if the transfer forms part of a burst. AMBA AHB uses separate read and write buses to move data from slave to the master and the other way around. Every transfer consists of an address and control cycle and one or more cycles for the data. The TLM 2.0 implementation of the AMBA AHB provided by Carbon Design Systems implements this by multiple clocked nonblocking transports for each transfer. The design implements an arbiter and a decoder as specied in [13]. The slave components receive transactions and read or write from/to memory, respectively. Due to the restrictions of our approach mentioned in Section IV, we had to modify the original design. Most of these modications can be considered minor ones like the change from dynamic to static process instantiation or the static binding of sockets, as the original design consists of a xed number of processes and the binding of sockets is unchanged during the elaboration time. The most intrusive modications had been done because we can only cope with bounded integers. Due to this restriction we had to simplify the transaction object and to remove the whole dynamic memory management used in the original design. Furthermore, as the AMBA AHB Protocol does not depend on the content of the data which is transferred, we only sent and requested constant data over the bus. As all those changes do not inuence the control logic or functionality of the AMBA AHB, functional properties that can be veried on the modied design are also satised in the original model. The modied implementation consists of about 1800 lines of code and can be used with a varying numbers of masters and slaves. This provides a case study which enables us to determine the practical applicability and the performance of our approach. Furthermore, it covers all of the most important SystemC/TLM constructs: a number of concurrent SystemC processes (2 for each master, 3 for the bus itself and 1 for each slave) communicate through both blocking and non-blocking transfers, a payload event queue with concurrent entries is used, and the processes use all kinds of sensitivity, i. e., static, dynamic, and timed sensitivity. We applied our transformation to the modied design and veried liveness and safety properties using the U PPAAL

73

model checker. For the experimental evaluation, we veried Nevertheless, it signicantly reduces the semantic state space the following four properties: and makes model checking considerably faster and less memory-consuming. This is shown by our experiments on 1) deadlock freedom: an approximately-timed model, where a producer and a conA[] not deadlock sumer communicate through a communication protocol with 2) the bus is always only granted to ONE master at a 4 phases, using the non-blocking transport including PEQs. time: We used a varying number of concurrent non-blocking A[] M1#busgranted + M2#busgranted <= 1 transports, previously the main source for computational 3) a bus request is always eventually answered with a bus overhead and also the reason for out of memory problems if a grant: limit of two concurrent notications was exceeded. With our new approach, up to ve concurrent notications can easily M1#busrequest --> M1#busgranted be handled. Our optimizations also enables the application M2#busrequest --> M2#busgranted of our approach to an industrial case study, namely the TLM 4) as soon as a master is granted the bus, the communimodel of the AMBA AHB released by Carbon Design Syscation never takes more than N time units (for a xed tems. This design couldnt be handled with the unoptimized number of bursts): version, but can now be model checked with reasonable A[] M1.isCommunicating imply x <= N effort. We had to modify the original design by removing && M2.isCommunicating imply x <= N dynamic memory management, which is not supported by where x is an extra clock, which is reset whenever a our approach, and by simplifying the transaction objects. master is granted the bus. However, the whole control logic, functionality, and timing Table II shows the results of the verication with different of the design remained untouched. The most serious bug we numbers of masters and slaves (from 1 master and 1 slave, were able to detect is a deadlock situation, which is due to 1M1S, to 2 master and 2 slaves, 2M2S). All properties could a missing variable reset in one of the masters. This is also be proved to be satised at the end of the verication phase. a real bug in the real design. The graphical animation in During the verication we were able to detect a bug in the U PPAAL and the structure-preservation of our transformation original design which led to a deadlock situation when a made it very easy to understand the cause of this error transaction was split into several separate transfers. In case once we had the counter-example generated by the U PPAAL of a split transaction, a counter variable is used to store model checker. After the defect removal, we were able to the number of successful transfers before the split occurs. verify some important safety, liveness, and timing properties. This variable was not reset in the original design. As a The verication effort for 2 masters and 2 slaves is less than consequence, all split transactions besides the rst one failed. half an hour for each property, which is not negligible but This is a typical example which is both difcult to detect acceptable, as it only has to be done once in the design ow. and to correct with simulation alone. With our approach, the Furthermore, the generation of counter-examples only took generation of a counter example took only a few minutes. a few minutes. With the help of the graphical visualization in U PPAAL and In future work, we plan to extend our approach to lower due to the structure preservation of our transformation, it levels of abstraction. An important advantage of SystemC was easy to understand the cause of the problem. is that it is possible to plug IP cores implemented in VHDL or Verilog into an overall system design, and to VII. C ONCLUSION simulate and evaluate it altogether. However, with respect to verication, different techniques are used for pure hardware In this paper, we have presented an optimized transforcomponents and the overall system. It would be interesting mation from SystemC/TLM into U PPAAL timed automata, to investigate whether these different verication techniques which is especially tailored for being model checked with can be integrated to achieve overall verication results. For the U PPAAL model checker. Furthermore, we have presented example, our model checking approach could be combined experimental results from an industrial case study, which with an SMT solver to verify a rened version of the AMBA provide evidence for the performance and for the usefulness AHB system where the masters and slaves are modeled on of the approach. The main idea behind the optimizations register transfer level rather than on transaction level. is to reduce the number of clocks and variables, and to reset variables to make it easier for the model checker to ACKNOWLEDGMENT detect and use symmetries in the model. To achieve this, The authors like to thank Carbon Design Systems for we move away from a direct one-to-one transformation of providing the case study of the AMBA AHB. the implementation of the TLM core interfaces. Instead, R EFERENCES we capture their semantics in a more direct way. The optimization does not sacrice the correctness of the formal[1] R. Alur and D. L. Dill. A Theory of Timed Automata. ization, i. e., no over- or under-approximation is performed. Theoretical Computer Science, 126:183235, 1994.

74

Property deadlock freedom only one master bus granted to M1 bus granted to M2 communication in time # states

1M1S 0:07 0:05 0:17 74,345

Verication time (min:sec) 1M2S 2M1S 0:26 5:10 4:00 0:20 3:25 4:00 0:25 9:53 221,152 2,009,751

2M2S 25:52 19:55 17:08 19:55 18:08 7,964,195

Table II M ODEL C HECKING R ESULTS FROM AMBA AHB

[2] G. Behrmann, A. David, and K. G. Larsen. A Tutorial on U PPAAL. In Formal Methods for the Design of Real-Time Systems, LNCS 3185, pages 200236. Springer, 2004. [3] N. Bombieri, F. Fummi, and V. Guarnieri. Model Checking on TLM-2.0 IPs through automatic TLM-to-RTL Synthesis. In VLSI System on Chip Conference (VLSI-SoC), pages 61 66. IEEE Computer Society, 2010. [4] H. Garavel, C. Helmstetter, O. Ponsini, and W. Serwe. Verication of an industrial SystemC/TLM model using LOTOS and CADP. In International Conference on Formal Methods and Models for Co-Design (MEMOCODE), pages 4655, 2009. [5] D. Groe, U. K uhne, and R. Drechsler. HW/SW CoVerication of Embedded Systems using Bounded Model Checking. In Great Lakes Symposium on VLSI, pages 43 48. ACM Press, 2006. [6] D. Groe, H. M. Le, and R. Drechsler. Proving Transaction and System-level Properties of Untimed SystemC TLM Designs. In Formal Methods and Models for Codesign, pages 113 122. IEEE Computer Society, 2010. [7] A. Habibi, H. Moinudeen, and S. Tahar. Generating Finite State Machines from SystemC. In Design, Automation and Test in Europe, pages 7681. IEEE, 2006. [8] A. Habibi and S. Tahar. An Approach for the Verication of SystemC Designs Using AsmL. In Automated Technology for Verication and Analysis, LNCS 3707, pages 6983. Springer, 2005. [9] P. Herber, J. Fellmuth, and S. Glesner. Model Checking SystemC Designs Using Timed Automata. In International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 131136. ACM press, 2008. [10] P. Herber, M. Pockrandt, and S. Glesner. Transforming SystemC Transaction Level Models into UPPAAL Timed Automata. In Formal Methods and Models for Codesign (MEMOCODE), pages 161 170. IEEE Computer Society, 2011. [11] IEEE Standards Association. IEEE Std. 16662005, Open SystemC Language Reference Manual, 2005.

[12] D. Karlsson, P. Eles, and Z. Peng. Formal verication of SystemC Designs using a Petri-Net based Representation. In Design, Automation and Test in Europe (DATE), pages 1228 1233. IEEE Press, 2006. [13] A. Ltd. AMBA3 AHB-Lite Protocol Specication, 2006. [14] K. L. Man. An Overview of SystemCFL. In Research in Microelectronics and Electronics, volume 1, pages 145 148, 2005. [15] K. L. Man, A. Fedeli, M. Mercaldi, M. Boubekeur, and M. P. Schellekens. SC2SCFL: Automated SystemC to SystemCFL Translation. In Embedded Computing Systems: Architectures, Modeling, and Simulation, LNCS 4599, pages 3445. Springer, 2007. [16] M. Moy, F. Maraninchi, and L. Maillet-Contoz. Lussy: A toolbox for the analysis of systems-on-a-chip at the transactional level. In International Conference on Application of Concurrency to System Design (ACSD), pages 2635, 2005. [17] W. M uller, J. Ruf, and W. Rosenstiel. SystemC: Methodologies and Applications, chapter An ASM based SystemC Simulation Semantics, pages 97126. Kluwer Academic Publishers, 2003. [18] B. Niemann and C. Haubelt. Formalizing TLM with Communicating State Machines. Forum on specication and Design Languages, 2006. [19] Open SystemC Initiative (OSCI). TLM 2.0 Reference Manual, 2009. [20] J. Ruf, D. W. Hoffmann, J. Gerlach, T. Kropf, W. Rosenstiel, and W. M uller. The Simulation Semantics of SystemC. In Design, Automation and Test in Europe, pages 6470. IEEE Press, 2001. [21] A. Salem. Formal Semantics of Synchronous SystemC. In Design, Automation and Test in Europe (DATE), pages 1037610381. IEEE Computer Society, 2003. [22] C. Traulsen, J. Cornet, M. Moy, and F. Maraninchi:. A SystemC/TLM semantics in Promela and its possible applications. In 14th Workshop on Model Checking Software (SPIN 07), LNCS 4595, pages 204222, Berlin, 2007. Springer. [23] Y. Zhang, F. Vedrine, and B. Monsuez. SystemC WaitingState Automata. In First International Workshop on Verication and Evaluation of Computer and Communication Systems (VECoS 2007), 2007.

75

Вам также может понравиться