Discrete Event Simulation

Waveform Compression
Said Mchaalia December 31, 2011
(draft)
Chapter 1
Introduction
Simulation is used both when the real world system is too complex for mathematical analysis and to validate mathematical analyses if this is possible. One key power of simulation is the ability to model the behavior of a system in order to make decision on. However, it is important to notice that simulation simplies reality by omitting or changing details. Digital design simulation can be dened as a technique of building a model of a real or proposed design so that the behavior of the design under specic conditions may be investigated. The purpose of digital simulation is to gain insight into the behavior of an existing or imaging design. In each case, the purpose is to help the designers to build useful mental models of their designs and provide an opportunity to test them safely and eciently. Indedd, recent developments in micro-electronic technology, namely submicron technology, allow implementing a whole digital system as a single integrated circuit. However, the introduction of System-on-Chip (SoC) requires new design methodologies to increase designers productivity signicantly. Therefore, for such large design projects, verication is being struggled to keep up with the increasing circuit densities available for chip design. Thereby, verication becomes one of the most important part in the design cycle of digital systems, especially in sub-micron designs. On the other hand, functional verication becomes more and more complex due to several reasons. Thereby the functionality of designs are constantly increasing. Thus, more and more functional units must be tested. Moreover, the length of the test sequences, which must be applied to a circuit to achieve a sucient test coverage, is increasing as well as it will be. The most common functional verication method is simulation. During simulation, a set of test vectors are applied to the circuit under verication. 3
CHAPTER 1. INTRODUCTION
The response of the design is then compared to the expected results to detect errors. To eciently pin down the erroneous part within the circuit, the designer usually inspects the signal waveform generated by the simulator. However the location of erroneous part is unknown in advance. Hence, in many cases, waveforms for a large subset of signals or even the entire signal set of the design must be recorded. This is not only true for designs at gate level but also holds for deep-sub-micron RTL designs. E.g., complex SoC (System on a Chip) designs often consists of several modules where each module itself is a complex design. These waveforms generated during simulation are stored on disk for future investigation and will nally ll up huge amounts of disk space. In addition, the large waveform data sets slow down simulation due to the frequent write to disk operations. As a result, compression techniques reducing le size of waveform database are indispensable to speedup simulation. Moreover, as compression is usually done on-line during simulation, it must be achieved without sacricing too much simulation speed. Despite waveform compression is a challenging problem, only few work is known in this eld. In [?], the authors remove the signals from the watch list that can be easily rebuilt. However, this approach is mainly dedicated to gate level models. There are commercial tools that support dumping compressed waveform les [?, ?], unfortunately their algorithms are not disclosed. On the other hand, common compression algorithms may be used to decrease the le size of a waveform database [?, ?, ?]. Although these algorithms are well known and may be optimized for waveform compression, they usually consume a signicant amount of computation power and the achievable compression values is limited. The objective of this work is to develop new waveform compression techniques that are able to speedup hardware design simulation and to reduce the disc spaces needed for storing waveforms. To achieve this object, basically three waveform compression algorithms, which are Signal Set compression, Value compression and compression using Control Data Flow Graph (CDFG), will be exposed. These techniques will be used to improve the waveform les (Value Change Dump (VCD) les) compression. The obtained results will be compared to the other waveform compression techniques. This document is organized as follows: the second chapter introduces the digital verication and describes waveform generations for each VCD le. The next chapter describes the developed Control Data ow Graph and its use. The fourth chapter gives an overview about the existing data compression algorithms. The fth chapter describes the waveform compression
5 techniques. The next chapter details the use of CDFG in improving waveform compression. Experimental results are then detailed in the seventh chapter. The last chapter review this work and discusses its features.
CHAPTER 1. INTRODUCTION
Chapter 2
Waveform File Generation

In nowadays designers have to verify a huge of complex levels of Digital circuits, embedded software and on-chip analog circuitry with fragmented methodologies that substantially impede verication speed and eciency. They also face a large number of technical issues including design performance, capacity, test development, test coverage, mixed-signal verication, and hardware-software co-verication methodologies [?, ?]. In fact, optimizing verication speed is a complex research subject [?, ?]. Overall, verication methodologies are used by the designers at a variety of design integration levels. To improve digital hardware design using these verication methodologies, many digital simulation techniques are used. One of them is discrete event simulation, which has successful track record in the improvement of hardware verication process. In contrast to other simulation methods (like dierential equations) in which systems evolve continuously in continuous time, the systems in discrete event simulation are described by discrete events and appropriate processes. Discrete event simulation performs, indeed, each event or transaction or item individually using an appropriate process [?, ?, ?]. Simulation, however, is not a satisfactory solution to the validation problem of digital hardware for many reasons such as: each schedule (run) proves only the correctness of the design under verication for that particular sequence of inputs (stimuli); and only one design under verication state and input combination are visited per simulated clock cycle. However cyclebased simulation involves these simulation limitations, it is still a sophisticated technology choice for the validation process of large synchronous systems, in which logical simulation is nicely scalable regarding to designer requests [?, ?]. To propagate values from system inputs to system outputs, 7
CHAPTER 2. WAVEFORM FILE GENERATION
a simulation clock cycle is required. After nishing one cycle, the next cycle will be begun. Moreover, practical cycle-based simulators allow for circuits with multiple clocks and interface to event-based simulation. However, cycle-based simulation ignores system delays and inter-phase relationships. This limits the amount of information about the design that can be extracted from the simulation. Note that cycle-based simulation does not work for asynchronous designs and cannot be used in timing verication. Event-driven simulation environments uses the traditional discrete event simulation mechanism and considers system delays and inter-phases [?, ?, ?]. During each verication process using either cycle-base simulation or event-driven simulation, we have the opportunity of outputting the simulation results to waveform diagrams. For a detailed performance evaluation of the design under verication, a signal trace le format called Value Change Dump le (VCD le for short) has been developed by Cadence to store signal waveforms [?]. Not only the input and output signal waveforms are stored in this le, but also the internal signal waveforms too. The waveforms will be needed to get out a trace of the real behaviour of the designs. By this way, the physical size of this le can become excessively large, although the signal waveforms are held in this le in a compact format. On the other hand, using cycle-based simulation allows to reduce stored signal waveforms, because it does not consider all real transactions of signals that are for example caused by delays and inter-phases. But this will not be useful to improve the verication process for all digital circuit types as well as to perform timing verication. Nevertheless even when using cycle-based simulation, the generated waveform les are usually huge, often exceeding the capabilities of the storage system. As follows, rst we present an overview of digital design simulation and describe how waveform can be generated during the simulation process. Second, we introduce the benets of discrete event simulation and cycle-based simulation in detail, and illustrate how they will be used in our work. Finally, the format of a signal trace le, which is created during the verication phase, is discussed.
2.1
Digital Design Concept
In this section, we will present the concept of the digital design and show how the waveforms will be generated for each kind of digital design method. Figure ?? illustrates an overview of the digital hardware manufacturing process. The rst step of this process is specication including architectural
2.1. DIGITAL DESIGN CONCEPT
Figure 2.1: HDL Design Example
Figure 2.2: Digital simulation algorithm
modeling and partitioning. The next step is HDL (Hardware Description Language) design where the circuit is described at RTL (Register Transfer Level) level. Floor-planning (including data ow, power distribution, clock distribution) and synthesis are then performed. The following step is the netlist generation and hardware verication. Finally, a design prototype will be presented. To achieve the complete design of a digital circuit many dened algorithms will be involved. Figure ?? details the design ow of the design phase. First, the RTL design is simulated to check whether it meets the specication requirements. Then, a synthesized design is generated, if the verication results are okay. The generated netlist may also be optionally veried using simulation. If the results of the previous verication phase are okay, the layout is then generated. Next, the timing model of the design is generated and veried. Here, usually static timing analysis is used as verication methodology. However, timing simulation may be also optionally applied. Based on the obtained results of timing verication, the design cycle is either completed or restarted. Indeed, verication plays a important role in achieving digital design concept. In the following section, we will detail the concept of design verication and describe its features.
2.1.1
Design verication
The goal of the system verication phase is to verify the system under real world conditions. Verication process begins with the development of detailed test plan and the assembly of the test-bench environment [?]. Figure ?? gives an overview on system verication. The testbench for the device under verication (DUV) includes a stimulus generator, a response
Figure 2.3: Verication process: in a testbench environment, a waveform storing all kinds of DUV signals will be generated.
10
generator and a response checker. The stimulus generator is used to apply signal values with a specic timing to the inputs of the design under verication (DUV). I.e., the stimulus generators create the data the test-bench uses to stimulate the design under verication. Response generators are testbench components that calculate expected responses of the device under verication. Response checker are used to verify that the data responses received from the device under verication are correct. They contain the most application specic information in the test-bench that veries functional and performance requirements of a design. Functional requirements can include calculation accuracy, event ordering and adherence to protocols. Performance requirements can include bandwidth, latency of operations and computation speeds. The test-bench environment must verify that at each stage of development the accuracy of the verication algorithm is still acceptable for the application. Verifying this accuracy causes the test-bench to become a mix of models at dierent abstraction levels. The test-bench is then accomplished with detailed instrumentation facilities that captures the response of the subsystems and produces diagrams and calculate error rates. This instrumentation is at the heart of test-bench environment. In order to enhance the traceability of verication process and to further study the behavior of the design under verication, waveforms that trace input, output and internal signals are also generated during simulation. Note that testbench environment only generates the input stimuli and checks the outputs, but does not help the designer to identify the reasons of faults if one occurs. Hence, to support the designer in pinning down an error in the design, waveforms are useful as they store a detailed description of the circuit behavior. These waveforms thus serve to investigate the design deeply and also permit a good identication of testbench faults. Moreover, there are many tools (such TestDeveloper [?], Tetramax [?], WaveMake [?]) that make use of waveforms to debug or analyze digital circuits. For future investigation, waveforms are held on disk in a corresponding le, which has a specic format. Detailed description of this le will be given in section ??. Due to the actual growth of the complexity of the digital circuits such as embedded systems and SoCs, which constitute many hundred millions of gates, the number of signals to be recorded during simulation may become very large. Thereby, depending on the DUV, the size of the relevant signal trace le may exceed many hundreds of Gigabytes and may nally ll up large amounts of disk space. Beside of disk space consumption, waveform dumping also slows down simulation due to disk activity involved in writing these les during simulation.
2.2. DISCRETE EVENT SIMULATION
11
2.2
2.2.1
Discrete event simulation

Background
Discrete event simulation is one way of analyzing models by observing the time based (or dynamic) behavior of a system [?]. In [?, ?] the basic concept behind discrete event simulation has been introduced. According to discrete event simulation, a digital system is modeled by the collection of system components (called entities) that exchange information. This exchange of information takes place at discrete points in time. Components are discrete objects, each being distinct from the others. Every instant at which one component provides information to another is called an event. Events characterize thus the change of system state. Time discreteness is hence implicit in the system itself, rather than being explicitly imposed by the simulator. In this simulation methodology, an internal clock to keep track of time advance during simulation and a simulation kernel (executive) that is responsible for controlling time advance and the logical relationships between components are used. Discrete event simulation of digital system is obtained by constructing a software in which the behavior of each system component is mimicked by a program. These programs are called logical processes. The exchange of information between components is then mimicked during simulation by the exchange of messages between the logical processes. Whereas the digital simulation does not execute in real time, each logical process has its own notion of time and each message is tagged with the time of the corresponding design event. A single data structure, called event list, is used to hold future events, which are stored by time (called time stamp too). The basic simulation cycle involves removing events from event list, forming event combinations, and causing appropriate logical processes to perform the action associated with the given input of event combination. An event combination is a collection of one or more input events having the same scheduling time (time stamp). The basic schedule cycle begins when a set of events (usually from the head or the event list) are removed from the event list. An output event will be then caused, when the corresponding logical process sends messages to an appropriate logical process. These output events will be inserted into the event list. Figure ?? describes the principle of the discrete event simulation. It shows two essential elements: the clock and simulation kernel. Here, the simulation kernel uses an event list (a set of chronologically ordered events).
12
Figure 2.4: Discrete Event Simulation
Figure 2.5: Cycle-simulation models of G4 d-latch: schematic representation of multi-components model
Each event on the event list has three key data items. The rst item is the time of the event, which allows it to be ordered on the event list. The second item is the reference to the logic process that needs to be executed. The third item is the value to be assigned to the corresponding entity. This allows the kernel to execute the appropriate process at the correct time and to assign such a value to the appropriate entity. The kernel is responsible for the ordering of the events. Thereby, it removes the rst event from the list and executes the appropriate logic process. Any new events that occur as a result are inserted on the list at the appropriate point.
2.2.2
Waveforms generated during Discrete Event Simulation
In discrete event simulation, waveforms are generated based on the statechanges of system components (entities). Indeed, for each state-change of each entity, we have a relevant new signal value. The changes of these signal values during simulation generate a waveform. Hence, a waveform for a specic signal is a set of three elements such where the rst element is the signal value describing a state-change, the second element is the transition of the corresponding signal and the last element is the corresponding simulation time. Note that each state-change of each component creates an event and the discrete event simulation kernel uses these events to perform the simulation. Hence, waveform data can be easily generated on the y during discrete event simulation by dumping events to disk. Figure ?? illustrates an example of generating waveforms using discrete event simulation. In gure ?? a sample waveform of a simple design is shown. To each state-change of each design element, a corresponding event is associated. The state changes are shown in the waveforms as signal transitions. For this system, a global clock (clkg) is considered. A scan enable is associate to control the system. On the other hand, two other clock signals (a clk and b clk) are associated within the system. As the signal c2
2.3. CYCLE-BASED SIMULATION
13
depends on the signal clock b clk, it changes value at the rising edge of this signal clock. The signal c1 depends also on the signal scan in, so it changes value only when this signal changes value. The change of the value of this signal produces the change of the value of the signal l1 latch. The value of the signal I2 latch depends on the signal values of c2 and I1 latch. Thus the change value of one of them produces the change of the signal value of I2 latch. In our work, we consider discrete event simulation to generate waveforms. We deal with signals as the state-changes of system components. Furthermore, we dene transactions T Rn (Vn , tn ) (similar to events in discrete event simulation modeling methodology), where Vn is the value to be assigned to corresponding signal, and tn is the relevant simulation time. I.e., if a signal changes value at the simulation time tn , an event hence occurred on this signal.
2.3
Cycle-based simulation
The dramatic growth in design size is straining the ability of designers to complete verication with event simulation alone. For years designers have been looking to cycle simulation as means to dramatically improve simulation speed while retaining the exibility of software simulation. This translates to a requirement for a revolutionary methodology that provides dramatic performance gain [?, ?]. In many cases, cycle based simulation is able to match these requirements.
2.3.1
Background
Cycle-based simulation of synchronous digital systems is performed on a cycle-by-cycle basis. It assumes that there exist one or more clock signals in the circuit and all inputs of the systems remain unchanged while evaluating their values in the simulation cycle. The results of simulation report only the nal values of signals in the current simulation cycle. Cycle-based simulation can be abstractly illustrated by the alter-clock pattern, where clock means simulating a clock cycle, alter stands for setting values of model components from outside. Indeed, Cycle-based simulation is treated as one of possible timing abstraction and it is still the technology of choice for validation of large synchronous systems [?, ?]. Figure ?? illustrates a sample of cycle based simulation. Assuming a xed period of 60ns, the clock edges are at 30ns and 120ns, and the inputs change at 20ns and 80ns after the start of each simulation cycle. The inputs
14
Figure 2.6: Event Driven example
Figure 2.7: Real Simulation Example
change values independently of the clock, because they are coming from the exterior of the design environment. They are external signals. However, the outputs are generated internally. Thereby, the points where the outputs might change are xed: the only change at the falling (active) edge of the clock. In our example, the output 1 changes at the rst falling cycle edge (30ns), while the output 2 changes at the second falling clock edge. Note that contrary to cycle based simulation, discrete event simulation (gure ??) uses delays, where a delay represents the transition phase of the system, to deal with the system delays and inter-phases. In Figure ??, whereas the output 1 changes value just one delay after the falling of clock edge, the output 2 changes value two delays after the falling of clock edge. Thereby, depending on the internal structure of the design, the design outputs may change values one or more delays after the active clock edge. This information can be used to get more detailed view of the real design behavior. Moreover, if detailed delay information is extractable for a circuit, discrete event simulation can be used to estimate detailed timing behavior of the design including signal delays.
2.3.2
Cycle-Based simulators
Commercial cycle-based simulators can handle circuits with multiple clocks and also may interface to event-based simulators. They only focus on the functionality of the design and therefore can highly optimize the calculations for that purpose. They also oer very fast compile times. The rst cycle-based simulators were developed by IBM and Digital Equipment Corporation as internal tools for their validation process of large chips. From the system-verication level, cycle-based and event-driven simulators behave the same when it comes to the design functionality [?]. A cycle-based simulator can be used like a traditional event-driven simulator. It can be fed stimulus with a data le or run a program on either an event or cycle-based simulator with little or no modication [?, ?].
2.4. VALUE CHANGE DUMP FILE
15
2.3.3
Waveforms generated by cycle-based and event-driven simulation
Based on the principle of cycle-based simulation, waveforms are generated at each simulation cycle. They represent the responses of system components during simulation. In cycle-based simulation, the waveforms have a reduced size, because the inputs of the system remain unchanged while evaluating their values in the simulation cycle. By this way, we must only deal with the last inputs that produce changes on the outputs. Figure ?? shows typical waveforms as can be observed from real chip design, while the waveforms in gure ?? are generated by cycle-based simulation. Note that real simulation waveforms are more complex and usually contain more information. I.e., whereas the cycle-based simulation is very fast and can improve performance, it ignores inter-phase timing and delays inside digital designs that are sometimes important during verication. Event-driven simulation does not ignore inter-phase timing and delays.
2.4
Value Change Dump le
As it is shown above, waveforms are strongly connected to verication. Waveforms are usually stored on disk during simulation. Indeed, there are dierent waveform le formats, the most popular format being Value Change Dump format. Next, we will describe the Value Change Dump format and briey present its features.
2.4.1
Background
VCD (Value Change Dump) is a widely used le format developed by Cadence to store signal waveforms [?]. It is an ASCII le format that contains information about value changes on selected signals. A VCD le is divided in two parts: A header: contains general information about the design. The header is also divided into two parts: 1. General information such as: date: represents the date when the le has been created. version: indicates the version of the used simulator. timescale: represents the time scale value and unit used in the simulation run.
16
CHAPTER 2. WAVEFORM FILE GENERATION 2. Scope modules: they are used to identify the module hierarchies inside the design. Typically, the following items can be found in this section: scope module: represent a module (architecture or/and process) in the considered design. variable: identies a signal of the current module. In order to save le space, each signal is assigned a unique identier code consisting of 1 to 4 printable ASCII characters. This code will be used in the following to identify this signal. There are four major kinds of variables: trireg, which identies a bit signal, integer, which identies an integer signal, reg, which identies an array (register) signal, and real, which identies a signal representing real numbers. The header le is ended with a special end denitions token. A transition part: this is a set of transitions that are generated during simulation written in ASCII formatted text. The transition information is composed of (so called) transition blocks. A transition block contains: simulation time: each transition block starts with a new time value, which is encoded by a hash symbol (#) followed by the actual time value. The simulation time recorded in VCD le is the absolute simulation time of the signal transition that are listed subsequently. signal value: within each transition block, each line represents a signal transition. First, the new signal value is specied by either 0, 1, x or z for binary signals or starts with character b followed by an array of bit values for vector signals (each array element may be also of 0, 1, x or z). The line ends with the signal identier which is separated by a space from the signal value in case of vector signals. Value changes for real signals are specied by real numbers. Value changes for all other signals are specied in ASCII format. Signals may be either scalars or vectors. Each type is dumped in its own specic format. The output format for each value is right-justied on vector values and appears in the possible shortest form. I.e., redundant bit values that result from left-extending values to ll a particular vector size are eliminated. For example, the binary
2.4. VALUE CHANGE DUMP FILE
17
value 0 1 10 x10 zx0 0x10
Table 2.1: How the VCD format shorten values value extended for 4-bit register value printed in VCD le 0000 0 0001 1 0010 b10 xx10 bx10 zzx0 bzx0 0x10 b0x10
Figure 2.8: Sample VCD File
format, b000010, will be written in the VCD le as b10. The table ?? gives an overview on the dierent values presented in the VCD le. The steps involved in creating the VCD le are: rst, create the VCD system tasks in the considered simulator to dene the dump le name and to specify the signals to be dumped. Second, run the simulation to obtain the desired VCD le. As shown in Figure ??, the VCD format already incorporates some techniques to save space by encoding signal names and values. Nevertheless, the le size usually becomes huge for complex designs.
2.4.2
VCD le format example
Figure ?? illustrates the format of a VCD le. The rst part is the VCD header le. In this part, rst the date including the time of the corresponding simulation is written (e.g. Sep 26 2000 16:28:52), the version of the simulator (e.g. FREEHDL 0.1) and the time scale (e.g. 1 mico second) are then printed. These items (date, version and time scale) are independent of the design. The following items such as scope module and var describe the dierent modules, signal and variables involved in the corresponding design. First, the module name is written (e.g. struct), then the signal and the variables belonging to this module are written. To end the denition inside a module, the token $upscope is used. Further, additional modules may be recursively embedded in the current module by appropriate $scope or $upscope sections. I.e., the module hierarchies is dened using the tokens
18
$scope and $upscope. Each signal within a module is identied by its type such as reg for array, trireg for scalar, its length such as 8 and 1 in our example, its identier (e.g. !, $ and ?), its user friendly name (e.g. qsig2, qsig and clk) and its range if it is a vector (e.g. [8:1]). The range is described by a left and right bound. The header ends by the keyword enddenition. The second part of the VCD le is the transition part. The transition part consists of a sequence of transition blocks. Each transition block begins with a relevant simulation time followed by corresponding signal values and identiers. As it has been already described, vector signals start by the letter b. For each transition, the signal value is written followed by the signal identier. Thus, in gure ??, the dump process starts at time 0s. At this time, the values of signals are illustrated. At simulation time 200s, a new transition block starts. This transition block is composed of signals: \" whose value is b111010, and ? whose value is 0. This transition block contains only these two signals (! and ?), because only these signals have changed values. At simulation time 1500s a don command is run. The dump process is then resumed and a new transition block is thus created. At simulation time 50ms, a new transition block which contains the signal that have changed values is then illustrated. #0 and #1000 represent the simulation times 10 and 100s and a binary signal ? and a vector signal ! have a transition at time 100s.
2.5
Conclusion
In this chapter, we briey described dierent digital simulation concepts, the discrete event simulation and cycle-based simulation and also how they generate waveforms. Further, the Value Change Dump le (VCD le for short) format, which is used to hold signal waveforms, has been introduced. The waveforms are very important for identication the traceability of the design during the verication process. On the other hand, based on the waveforms the erroneous part of the circuit could easy identied and corrected. The size of VCD les can become very large, although signals are stored in compact format. This often limits the usefulness of waveform les in the verication domain. As the VCD format is the most accepted format for storing waveforms in the industry, a compression algorithm for waveforms may reveal the usefulness of waveforms in improving verication process. This is the goal of our work. Hence, our compression techniques will be applied on waveforms stored in VCD le format.
Chapter 3
The description of involved simulator

In this chater an overview of the involved simulator will be detailed. Figure ?? illustrates our simulation structure. This structure is devided to four basic elements such as: Simulation element: simulates a given VHDL (Very high-speed Hardware Description language) model. Control Data Flow Graph element: generates the Control Data Flow Graph of corresponding model using needed les that have been created during the previous simulation phase. Waveform Compression element: this element is served to compress the waveform le created during the simulation phase. To improve the compression results, the generated Control Data Flow Graph will be used in this phase. Waveform Decompression element: to view the waveforms obtained during the simulation phase, decompression mechanism is needed. This is served to decompress the compressed waveform le. The obtained le can then be viewed.
Figure 3.1: Developed Simulator
19
20
CHAPTER 3. THE DESCRIPTION OF INVOLVED SIMULATOR
3.1
VHDL model simulation
In this paragraph we will present the developed simulator and describe its components in details. As it is shown in gure ??, the simulation is done in dierent tasks. Five tasks are distinguished. These are: First task: a design software in VHDL language is written. Second task: VCD le is created during simulation task. Third task: the Control Data Flow Graph is created. Fourth task: the signal selections to identify which signals must be store is invoked. Fifth task: the compression process is achieved. First, a VHDL model is written, then it will be compiled in order to get corresponding C++ code source. During this task an executable le with the name of the VHDL model will be created. To run the simulation, this executable le will be called. Figure ?? illustrates simulation main menu. This menu oers mainly four tasks, which will be described in the following section. As it is illustrated in gure ??, the simulator is composed of basic components that will be detailed in the following sections.
3.1.1
Design Simulator
The design simulator is created in the purpose of generating a new compiler, which translates a VHDL code source to C++ code source. Therefore, it simulates the obtained C++ code source in order to speedup the simulation of hardware designs.
3.1.2
Viewing simulation
To achieve simulation task many software tools are developed. Figure ?? represents the simulation main menu. To view simulation results, run command r, execute cycle command c, next command n and printing screen command s, are required. This simulation task evaluates VHDL signal and variable values and display them on the screen for desired simulation time.
3.1. VHDL MODEL SIMULATION
21
Available commands: h : prints list of available commands c : execute cycles = execute simulation cycles n : next = execute next simulation cycle q : quit = quit simulation r : run = execute simulation for d : dump = dump signals doff : dump off = stop dumping signals don : dump on = continue dumping signals s : show = show signal values dv : dump var = dump a signal from the signal lists ds : dump show = shows the list of dumped signals nds : number show = shows the number of dumped signals wdd : write binary design info wddl : write design info using a CDFG style syntax dc [-f ] [-t ] [-cfg ] [-q] :control waveform dumping Figure 3.2: Simulator main menu
3.1.3
Creating DDB(Design Data Base) and CDFG(Control Data Flow Graph) les
The CDFG(Control Data Flow Graph) le will been generated during compilation phase of the VHDL model using -D argument. The CDFG le is a lisp format le that describes the design architecture. It contains functions and procedures which illustrate each statement inside VHDL models. The DDB(Design Data Base) le will be created during simulation task using the wddl command. A DDB le is a lisp format le that is assumed to a data base of considered design. More details about these les will be given in the next chapter.
3.1.4
Creating VCD(Value Change Dump) le
This simulation task creates the VCD le for further use. A detailed description about this simulation task and the VCD le will be illustrated in section ??.
22
3.2
CDFG Generator-Simulator
The CDFG Generator-Simulator is used to create Control Data Flow Graph of considered model based on the DDB and CDFG les. It allows the CDFG viewing in its graphic form and creates the .cdfg.dat le, which is a lisp format le, that contains the node and edge names. Section ?? will illustrate created CDFG.
3.2.1
Signal Selections Simulator
This simulator composed of CDFG scanner and parser. It stores the CDFG in hash table forms for further use. Moreover, it denes a kernel which will use stored data to identify signals that must be stored and those that could be restored. This kernel uses genetic algorithm to achieve his task as well as possible. More details about the principle of this kernel will be given in section ??.
3.2.2
Waveform Compression Simulator
This simulator creates the waveform compressed le based on the information returned by the signal selections simulator and this stored in the VCD le. Many compression techniques are used. These techniques will be described in details in the next chapter.
3.3
Waveform Decompression Simulator
This simulator creates the waveform decompressed le based on the data returned by the signal selections simulator and this stored in the waveform compressed le. Many decompression techniques are used. In the next chapter these will be described in details.
3.3.1
Waveform viewer
To view the obtained waveforms, a commercial software is used. This allows to view all signals for each time simulation. In the following sections we will describe the Control Data Flow Graph of each statement in a VHDL(Very high-speed Hardware Description Language) design. In the synthesis step, nodes in the control data ow graph are scheduled into control steps and mapped to functional resources in the data path,
3.3. WAVEFORM DECOMPRESSION SIMULATOR
23
while edges in control data ow graph are mapped to storage resources such as registers and interconnection resources (multiplexors, busses, wires,...). During translation process, a data edge is represented by relevant storage resource name dened in the design, and a node is represented by a natural number referred to a string that describes the relevant functional resource. Scheduling tools typically work using one or more intermediate representation of behavioral description such as data ow graph, control ow graph or control data ow graph. In our work, we use control data ow graph as the intermediate of a behavioral description of a given design. Indeed, there are two major steps that occur in synthesizing a Register Transfer Level (RTL) implementation from a textual algorithm representation. One, which called the translation step, is when the original textual description is translated into an internal control data ow graph representation. The second step, called synthesis step, is performed by the high-level synthesis tools which accept control data ow graph representation and produce RTL description [?, ?, ?]. Our work will concentrate only on the rst step. Architectural synthesis, also called high-level synthesis, is a process adding structural information to functional description at the same abstraction level. This results in data path and a controller description. The data path consists of binding blocks such as functional units, memories and interconnection structure among them. The controller describes how the ow of data inside the data path is managed and is described in terms of state transitions. The controller description is translated into an implementation at the abstraction level of gates by using logic synthesis. Thereby, after selection, allocation and binding of functional units, memories and interconnect, a control data ow graph is generated to model the high-level synthesis scheduling.
24
Chapter 4
Control Data Flow Graph

The evolution of high-level design techniques has often been driven by specic application domains, such as data ow or arithmetic domain including digital signal-image processing graphics and several multimedia applications, and control ow or decision application domain including networking protocols, embedded controllers [?, ?, ?]. The behavioral descriptions of data ow designs involve arithmetic operations such as addition, subtraction and multiplication, while those of control ow designs encompass nested conditional constructs, dependent data loops, comparisons and logical operations with very few arithmetic operations as shown [?], in which the data dependency, the logical and arithmetic operations are transformed to nodes linked to eath another. The area, delay and power of structural high-level synthesis (Register Transfer Level (RTL)) implementations of intensive data ow designs are dominated by arithmetic units and registers in the data path, whereas within intensive control ow designs, they are dominated by non-arithmetic units like multiplexors, bit-manipulation units, and comparators. In practice, a large number of designs tend to contain signicant amounts of control ow as well as data ow. For such designs, the control ow constructs often impose bottlenecks on the performance achievable using hardware and software implementation alike [?, ?, ?, ?, ?]. This chapter describes Control Data Flow Graphs used in behavioral description of digital designs, and their usefulness in improving waveform compression techniques. As dened in [?, ?], a Control Data Flow Graph, CDF G = (V, E), is a directed graphic representation composed of nodes, V , representing operations (arithmetic and logic operations) in textual algorithm description, and of edges (arcs), E, representing the dependencies between operations and describing the transfer of values [?, ?, ?]. The ow 25
26
CHAPTER 4. CONTROL DATA FLOW GRAPH
of data between operations is represented by data edges and the control ow by control edges [?, ?, ?, ?]. Moreover, the execution of a control data ow graph follows the concept of token-ow mechanism, in which a data value instance is dened to be a token. This data value instance can be either a single scalar, or complex data types such as arrays, records or user dened data types. The operation processing is symbolized by removing tokens from input edges of that operation, and producing new tokens containing the result of operation computation on all output edges. The semantic behavior of a node is dened by the operation type. To support special constructs such as loops and conditionals for example, nodes with particular execution mechanism have been dened. Similar to front-end of a high-level language compiler, the program that performs the translation from text to control data ow graph can maintain a symbol table, which links lexical tokens from the text to nodes and edges in the control data ow graph. The links that are maintained are derived from semantics of the algorithm model and appear in a predictable consistent way. Due to the long term research aspect of the related work and the dierent requirements of individual tools (software), exibility and extensibility were major design goals for the le format of the developed control data ow graph design. A textual format was strongly preferred over a binary format for several reasons, but human read-ability was hardly a choice issue of such a deployed format. The control data ow graph format is relatively simple interface, both in syntax and in semantics. The employed graph models distinguish between data path and control ow, and allows cycles to model loops in the design behavior providing maximal freedom for dierent implementation styles. The uniform and combined treatment of data and control has resulted in both a concise semantical denition and an extreme exibility for architectural synthesis. On the other hand, control data ow graphs provide a maximal parallel description of high-level synthesis algorithms, for which many design alternatives can be generated. The generation of such graphs from a procedural sequential programming languages such as VHDL requires a full data ow analysis, involving a detailed lifetime and scope investigation through conditional statements, loops, and subprogram interfaces. In practice, it means that descriptions of relatively complex digital design (chips) can be converted to a control data ow graph representation in few seconds CPU time on a standard workstation.
4.1. CONTROL DATA FLOW GRAPH CONCEPT
27
4.1
Control Data Flow Graph concept
The concept of our Control Data Flow Graph consists to dene nodes based on the operation types and edges based on the values. I.e., each process Control Data Flow starts with a node called Read Signal node that serves as data source. I.e, this node stores the values of signals and variables that will be used by the following statements. Each signal value is coming from this node for all following nodes. For the rst use, the value of corresponding variable will be coming from this node. Thus, the variables change values jsut after the corresponding assignment independently of the clock cyle. I.e., they do not wait for a clock cyle that their values will be assigned. The process Control Data Flow Graph is completed with a node so called End process node. This node is used to store the values of written variables and signals and to transmit them to the Start process node. For subprograms, the Read Signal node and the Write Signal node are respectively replaced by Start Procedure or Start Function node and End Procedure or End Function node. By this way, the dependence between processes and subprograms can be shown.
4.2
Wait statements
In [?], wait statements are involved with signal assignment statements. In fact, waiting for an event is solved by associating the condition CS to a transition in the waiting process. This condition will be produced as a result of an assignment to the respective signal when the value of the signal changes. This method uses the principles of Petri Nets to simulate the token mechanism of the developed control data ow graph. An other important approach of modeling wait statements is dened in [?]. This method employed a Finite State Machine mechanism to model the wait statements. Unfortunately, it ignores the conditions involved in wait statements and consider them as black boxes. To model the entire conditions, we introduced new methods that allow to identify the operations included by the wait conditions. The wait stamtement is characterized by a special mechanism: an event node is created to check whether an event on the corresponding signal is occurred. This node has two incoming edges: a data edge that is the signal clk and control edge that is the activate edge. This edge is used to inform the event node that the assignement operation is already done. Indeed, the event node will wait for the value of the signal clk. If this value change, then a boolean true will be transmitted to the Read Signal node to restart
28
architecture arch of model is signal data bit_vector (0 to 15); process variable x : bit_vector(0 to 15); variable y : bit_vector(0 to 15) ; begin data <= y; wait until (data(0) and y(1)) = 1; end process; end architecture; Figure 4.1: VHDL model 11
Figure 4.2: CDFG of VHDL model 11
the simulation. This method simulates the wait statment in a simple manner. Due to the diversity of wait statement, we deal with each kind of wait statement dierently. Thereby, three wait statements are distinct: wait on statements: for this wait statement type, an event node is created. It has just one incoming edge containing signal name and an outgoing control edge. When the corresponding signal value changes, a control boolean true will be transmitted to the next node, otherwise the execution will be suspended. This mechanism describes the principle of the wait on statement. The sensitivity list is converted to wait on for each signal presented in this list. wait until statement: this wait statement is composed of a logic expression and a wait on statement. First, this logic expression will be evaluated. Then, the evaluation result will activate the event node. The following example (gure ??) illustrates a simple wait until statement. The CDFG of this model is shown in gure ??. Our wait statement is dierent from these described in [?, ?], which consider a macro node for each wait statement and ignore the waiting eects for the following statements. wait for statement: it is the waiting on time. Thereby a local parameter containing the actual simulation time will be dened. After
4.3. SIGNAL AND VARIABLE ASSIGNMENTS architecture arch of model is signal data bit_vector (0 to 15); process variable y : bit_vector(0 to 15) ; begin data <= y; wait for 10 ns; end process; end architecture; Figure 4.3: VHDL model 12
29
adding running time to actual simulation time using an add node, the control data ow graph compares the obtained time with the time simulation. This is done using an equal node, which activates its following nodes. Figure ?? describes an example of wait for statement. The CDFG of this model is shown in gure ??.
4.3
Signal and variable assignments
Two kinds of signal assignment are distinct: simple assignment involving the entire signal and complex assignment targeting signal array element(s), a signal array slices or a signal record elements. CDFG of simple signal assignment: a simple signal assignment is described by a node whose input (called incoming edge in the following) contains the value to be assigned to the target signal. The output value of this node (called outgoing edge in the following) represents the signal value. Figure ?? illustrates the CDFG of VHDL model dened in gure ??. The CDFG of this model starts with a Read Signal node representing the output source of signals and variables inside the model, and completes with Write Signal node depicting the input source of variables and signals that have been written in this model. A so called Signal Assignment node describing the mechanism of the relevant assignment is created. This node has as incoming edge the value to be
30
CHAPTER 4. CONTROL DATA FLOW GRAPH process begin a <= 1; wait on clk; end process Figure 4.5: VHDL model 1
assigned to the corresponding signal and as outgoing edge the signal name. CDFG of complex signal assignment(array-element, array slice or recordelement assignments): for this kind of signal assignment, we have dened a specic node that has two incoming edges: one contains the assignment value and the other contains the address of the target element. This address can be either an index of an array element, an array slice or a record element. In [?], the array operation are identied by providing node type (array functions as array declaration). The declaration contains statements giving the dimensionality and the size of the array. Optionally, initial values are provided. These nodes have outgoing chain edges, providing linkage to the other related node types: retrieve and update. This method is very complex and make CDFG les to become very large. The gure ?? deals with a signal assignment of a record element of type bit vector. In the example of gure ??, we have to identify rst the address of the assignment corresponding to data element of the signal sentword and then to assign the value 1 to the 15th element of this signal. The CDFG of this assignment is shown in gure ??. to identify the signal sentword, we used a specic node called Get Record Element whose inputs are the address of the reord element sentword and the signal sentword. The 15th element of this signal is identied using a node called Get Array Element whose input is the address of the corresponding signal element. The assignment is done using a node called Signal Element Assignment whose inputs are the value 1 to be assigned and the corresponding signal. Figure ?? illustrates the CDFG of the VHDL model depicted in gure
4.3. SIGNAL AND VARIABLE ASSIGNMENTS
31
architecture arch of model is type word is record instruction : bit; data: bit_vector(0 to 15); address : bit_vector(0 to 7); end record; signal sentword : word; signal reset : bit := 0; process begin sentword.data(15) <= 1; wait until reset = 1; end process; end architecture; Figure 4.7: VHDL model 2
32
CHAPTER 4. CONTROL DATA FLOW GRAPH architecture arch of model is signal a : integer := 0; signal b : integer := 0; process variable x: integer := 0; variable y : integer := 1; begin x := y + x; b <= a / x; y := b - x; a <= b * y; wait for 10 ns; end process; end architecture; Figure 4.9: VHDL model 3
??. Here, a wait until statement is converted to wait on until statement. Thereby, rst we check whether the relevant signal changes value. Then, we compare this value with corresponding expression. Thus, to evaluate the equal node, two control inputs are required, the rst one indicates that considered value has already been changed and the second one indicates that the last operation before this wait statement is already done. In this kind of signal assignment, rst we identify the address of the considered signal element and then we assign the corresponding value in a node called Signal Element Assignment.
The dierence between signal assignment and variable assignment is characterized by the output nodes. I.e., signal values are always read from Read Signal node, while variable values are read from their last output nodes. In the example of gure ??, we illustrate this dierence. The CDFG of model dened in gure ?? is shown in gure ??.
4.4. ATTRIBUTE STATEMENTS architecture arch of model is signal a : integer:= 0; signal b : integer; signal clk : bit := 1; process begin if a <1 then bhigh <= aright; else bleft <= alow + 1; end if; wait on clk; end process; end architecture; Figure 4.11: VHDL model 4
33
4.4
Attribute statements
Four VHDL attribute kinds are supported: high, low, left and right. For each kind of these attributes, a corresponding node will be created. This node is called range-index-high for attribute high, range-index-low for attribute low, range-index-right for attribute right and range-index-left for attribute left. Figure ?? depicts an example of the attribute statements. The CDFG of model dened in gure ?? is shown in gure ??. All the other attributes are similar to these dened above. There is just one dierence in the number of incoming data edges. For the attributes that are applicable to discrete and physical types, which are Tpos(x), Tval(n), Tsucc(x), Tpred(x), Tleftof(x) Trightof(x), Timage(x) and Tvalue(n) the number of the incoming data edges is two, the rst one contains the data identifying T and the second one contains the data identifying either x or n depending on the attribute kind. However, for the other attribute kinds, we consider nodes that have just one incoming data boolean edge and only one boolean control outgoing edge. These attributes are : Tascending: this function has to return true if T is an ascending range and false otherwise. The corresponding node called range-ascending has to return a boolean outgoing edge. It has an incoming data edge represented by the range name and a boolean true as outgoing edge.
34
CHAPTER 4. CONTROL DATA FLOW GRAPH Sevent: this function is dened for all signals and returns a true if there is an event on the signal S and false otherwise. The corresponding node called signal-event has a control boolean true as outgoing edge and an incoming data edge identied by the signal name. Stransaction: this function represents a signal of type bit that changes values from 0 to 1 and vice versa each time a transaction appears on signal S. This function is represented by a node called signaltransaction, which has an incoming data edge and a boolean outgoing edge. This can be either 0 or 1 depending of the last value taken. For model this process, a specic node that engglob these bits and returns a control edge is dened. Other signal attributes: the other signal attributes that are Sdelayed(T), Sstable(T), Squiet(T), Sactive, Slast event, Slast active and S last value are not considered for our CDFG simulation. Therefore their implementation represents a features in CDFG extension. Notice that these attributes are not dened in the synthesis process and so on they will not be implement in this work.
4.5
If and case statements
In [?], the author dened two type of nodes to describe if and case statements. Theses nodes are macro-nodes and called Branch and Merge nodes. A Branch node passes the token from the incoming data edge to one output port, which is identied by the value of the token on the control input. This node can be executed if both inputs have a token and as result one token will appear on precisely one output port. The Merge node is dual to Branch node passing a token from just one incoming data edge selected by the value of the token on the control edge, to the single output port. In [?] and in [?] an other method is distinct. This method deals with the use of the Finite State Machine to achieve the modeling of the conditional statements [?, ?]. I.e. each conditional statement is assumed as a Finite State Machine where the state of this state machine is used to schedule the appropriate nodes. The methods presented in [?], [?] and in [?] work well to identify the conditional statements inside VHDL models, but their implementations are very complex. Thereby, to consider a functional extensibility and exibility, we developed an own simple method to model conditional statements. Our concept consists to associate a select node for each decision operation. The
4.5. IF AND CASE STATEMENTS architecture arch of model is signal data bit_vector (0 to 7); process variable x : bit := 0; variable y : bit_vector(o to 7) :=010111010 begin if x = 1 then data <= y ; else data <= not(y); end if; wait on clk; end process; end architecture; Figure 4.13: VHDL model 5
35
evaluation of this decision operation activates the select node that has two control outgoing edges true and false edges. By this way, each following node inside the conditional statement has one control incoming edge either true or false coming from the previous output select node. To complete the conditional statement an end conditional statement node has been created. This node served as a temporary storage of edges that changed values inside this conditional statement. The if statement is described with at least two nodes: the rst one evaluates a logical expression and the second one is a select node. A select node has always just one incoming edge and two outgoing control edges: true and false. Figure ?? illustrates an example of if statement where the logical expression is a simple logic operation. Figure ?? illustrates the CDFG of the model described in gure ??. Note that the variable y is coming from its last output node that is a variable assignment node. Thus the CDFG compiler checks whether the value of x(1) is equal to 1 . The result of this check activates the select node, which returns two outgoing control edges: false indicating that the test is failed and true indicating that the test is successful. For each signal we dene a range which is a three element list composed
36
CHAPTER 4. CONTROL DATA FLOW GRAPH architecture arch of model is signal a: bit; signal b: integer; signal reset : bit; process variable y :bit_vector(0 to 7); begin y(0) := y(1) and y(7); if y(0) = 1 then a <= 0; else b <= 10; end if; wait on reset; end process; end architecture; Figure 4.15: VHDL model 6
of left bound, direction (to or downto) and right bound. A corresponding edge is assumed to contain this list. For each if statement, a switch node is created to group together the same signals and/or the same variables that have been written inside this if statement. The outgoing edge of this node represents the corresponding signal or variable name. This approach allows to simulate our CDFG using token simulation techniques. On the other hand, when a signal or variable has only been written in either false clause or then clause of an if statement, a dened gate node is created to group this signal or variable and its synonym coming from Read Signal node. Figure ??, which describes the CDFG of model gure ??, illustrates an example for this situation. When a signal is written only in the then clause or the else clause, the CDFG compiler creates a gate node, which it is similar to a switch whose the incoming data is equal to the outgoing data. Case statements are converted to a sequence of if statements. Figure ?? represents a simple case statement. The CDFG illustrated in gure ?? is the CDFG of model dened in gure ??. The or-equal node is dened, when the when clause has more than one input. Thereby, for each when clause, some gate nodes are created when the signals and variables are not dened
4.6. LOOP STATEMENTS architecture arch of model is signal data bit_vector (0 to 15); signal a integer := 0; process variable x: bit_vector(0 to 15); variable y : bit_vector(0 to 15) begin case a is when 0 => data <= y; when 5 | 4 | 6 => data <= y x; when others => data <=111111111111111; end case; wait on clk; end process; end architecture; Figure 4.17: VHDL model 7
37
within.
4.6
Loop statements
In this section we deal with the dierent kinds of loop statements. In contrast with the loop structure dened in [?] where two basic nodes that are exit and entry nodes have been created, our approach consists to treat each kind of loop statement separately. In [?], the exit and entry nodes are functionally identical to Branch and Merge nodes dened in the previous section (section ??) for conditional statements such as if and case statements. Indeed, the loop construct introduces cycles in the CDFG. To allow a proper execution of these loops with the token passing method, a special initialization is required: when the execution of a graph is started, all entry nodes must obtain a token at their control input, selecting the input port for external data to enter the loop. The exit node, however do not obtain such an initialization token. On the other hand, when the graph is repeatedly executed for dierent sets of input tokens, the loop constructs must not be reinitialized for each input set; such tokens are automatically
38
left after each loop termination. This method is thus a complex task for the token simulation mechanism. To reduce the complexity, we evolved another method, which associates to each loop kind a special graph structure. On the other hand, the involved method allows a simple token simulation: after a validation of decision operation a control edge is served to activate the select node. This node has to decise whether a control edge true or false has been to be used. If the end of loop is not reached, then the token will activate the restart loop node which is served to restart the loop statement at the beging otherwise, the token will activate the exit-loop node which returns the last values of signals and variables that change values inside the considered loop statement. Hence, the token simulation introduced within this method is very suitable for resolving all conicts between nodes. The processing of operation inside the loop statement is scheduled using pipline archetecture. These are served to restart and end the loop statement. Thereby, we classify the loop statements in three types: simple loop (the VHDL loop statement), for loop statement and while loop statement. For all kinds of loop statements, we have rst to create three basic nodes that are: start-loop node: depicts the beginning of loop statements. This node has an incoming data edge containing loop statement name. restart-loop node: takes all parameters that have been written inside loop statement and return them to their corresponding nodes. In these nodes following operations will be scheduled. At the end of a loop statement, this node returns these parameters to the exit loop node. exit-loop node: assumed to be a temporary memory that stores the values of all parameters that have been written inside a loop statement. In the following sections we will describe each loop statement kind in detail.
4.6.1
For loop statements
The CDFG compiler denes for this loop type a list of four parameters that are identier name, left bound, direction and right bound. First, the identier will be incremented or decremented (depending on the direction type (to or downto)) using dened node called add or substrate. Then the the new values of signals and variables that are asumed to inside this loop statement
4.6. LOOP STATEMENTS architecture arch of model is signal data bit_vector (0 to 15); signal clk : bit; process begin for i in 0 to 15 loop data(i) <= not(data(i)); end loop; wait on clk; end process; end architecture; Figure 4.19: VHDL model 8
39
will be stored in so called restart loop node. The increment/decrement node has two incoming edges: identier and constant 1, and an outgoing edge: identier. Figure ?? illustrates a for loop statement example and the CDFG of this model is shown in gure ??.
4.6.2
While loop statements
For this loop type, we rst check a decision expression in order to decide whether to excute this loop statement or to exit it. A select node is used to evaluate this test. Figure ?? illustrates an example of this loop type. The CDFG of this model is shown in gure ??. Note that there is no big dierence between the CDFG structure of a for loop statement and of a while loop statement.
4.6.3
Simple loop statements
Sometimes exiting and/or continuing loop conditions are necessary, when one or more simple loop statements are involved. Figure ?? illustrates an example of simple loop statement. The CDFG of this model is shown in gure ??
40
architecture arch of model is signal data bit_vector (0 to 15); signal clk : bit; process variable i : integer ; begin i := 0; while i < 15 loop data(i) <= not(data(i)); i := i +1; end loop; wait on clk; end process; end architecture; Figure 4.21: VHDL model 9
architecture arch of model is signal clk : bit; process variable ok : integer := 0; begin L1: loop ok := ok + 1; exit L1 when ok > 8; end loop L1; wait on clk; end process; end architecture; Figure 4.23: VHDL model 10
4.7. SUBPROGRAM CALL STATEMENTS
41
4.6.4
Next and Exit statements
The next and exit statements are used respectively to pursue and to exit loop statements. Therefore, they are assumed as if statements with dened conditions. Indeed, when these conditions are veried, a restart-loop (next statement) or an exit-loop (exit statement) node is involved within. When there is no condition, the logic true bit will be used instead of. Figure ?? illustrates an example of exit and next statements.
4.7
Subprogram call statements
The CDFGs for subprograms are created independently from the rest of the model. In a process CDFG, a specic node called Subprogram-Call is generated. This node has as outgoing edge the subprogram name data edge. I.e only subprogram name appears in a corresponding process CDFG. This method allows to reduce subprogram CDFGs and make them independent of the number of their calls. A CDFG of a subprogram is created just once for each model and does not belong to any process CDFG. Our approach for modeling subprograms is similar to the approach by dened by Kuchcinski and Minea [?]. However, Kuchcinski and Minea do not dene any subprogram call nodes [?]. Instead, the call of subprograms is virtually done based on related control edges. I.e, each subprogramm belongs to each process throughout a dened control edge inside the corresponding process. When the token arrives on this control edge, the execution of the related subprogramm will be achieved. To make this more clear, we introduced special nodes called subprogram-call nodes. The call of subprograms within a given process is described by a node called call-subprogram. This node has as incoming edges all global signals and/or variables, which will be used inside subprogram but are not passed over via the subprogram interface subprogram, as well as subprogram parameters. The outgoing edges are classied in two categories: control edge that activates the subprogram, and data edges that are either the global parameters (the global signals and/or variables that have been transmitted to the subprogram) and/or the return value when the subprogram is a function. Subprogram CDFG starts with a node called start-procedure (for procedures) or start-function (for functions). This node controls subprogram start and represents a data sources. I.e. constants, signals and variables are coming from this node at the rst time of their use. A subprogram CDFG completes with an end-procedure or end-function node, which represents a
42
architecture struct of model is signal clk : bit := 0; signal top : integer := 0; procedure proc is variable lp : bit := 1; begin if lp = 1 then top <= 2; else top <= 10; end if; end procedure; begin process begin proc; wait until clk = 1; end process; end struct; Figure 4.25: VHDL model 13
temporary storage of all parameters that have been written inside this subprogram. This node does not have any outgoing edges. Its outgoing edges are virtually transmitted to call-subprogram node. Figure ?? depicts an example of procedure call and gure ?? represents a function call example. Every function in VHDL has a return type. A return-type node is created to indicate the returned type of the function. The output of this node is given as input to return-value node that stores function type and return value. This node has to return its outgoing edges to the end-function node. This mechanism allows to detect the return value of considered function. The CDFG of corresponding model is described in Figure ??. Figure ?? illustrates the CDFG of a procedure call. A procedure call is simpler than a function call. Thereby, procedures do not have any return type and any return value. For calling procedures with parameters, a check of parameter kinds is required. Four kinds are distinguished: in, inout, out or buer. When the parameter kind is out, inout, or buer. These parameters
4.7. SUBPROGRAM CALL STATEMENTS
43
architecture struct of model is signal a, b : integr := 10; signal clk : bit := 0; signal c : integer := 0; function sum( z, y, x: integer) return integer is begin z := x + y; return z; end function; begin process begin top := sum(c, a, b); wait until clk = 1; end process; end struct; Figure 4.27: VHDL model 14
44
will be implemented as an outgoing edge storing from the corresponding call-subprogram node.
4.7.1
Return statements
This statement belongs to subprograms. It is represented by a node called create-return-value, which returns the corresponding value. Figure ?? shows an example for a return statement. The CDFG of this model is represented in gure ??. When the return statement has to return an expression, the CDFG compiler will rst evaluate this expression, identifying the relevant nodes and then create the create-return-value node, which has as incoming data edge the outgoing data edge of the node. The create-return-value node has one or more incoming data edges, but it returns always just one outgoing data edge. When the return statement is called inside a function, it needs to identify the type of this function that will be used in the return value node. Thus, create-return-value node will have two incoming data edges. For a procedure call, there is no return value. Hence, the corresponding createreturn-value node will have only one incoming data edge that is served as a control edge for activating this node and which is coming from the previous node.
4.8
Mapping statements
The mapping statements describe the connection of components and subprogram calls with parameters. Two kinds of mapping are distinct: subprogram mapping, and entity component mapping. The rst kind belongs to the call of subprograms that have one or more parameters. And the second one concerns the instantiation of components. Subprogram mapping: as it is described in gure ??, the function limit() has three parameter calls. To recognize that a the parameter call has changed its value inside this function, the compiler needs to associate formal parameters to their actual parameters. This is done using the mapping mechanism. However, the de-mapping mechanism is done just before exiting the function and its outgoing data edges are returned to the end-function node as incoming data edge. This explains our choice that the callfunction node is a virtual image of end-function node.
4.9. CONCLUSION
45
Entity mapping: the VHDL compiler instantiates each dened entity component using the associated parameters. This mechanism can be described by a function call with same parameter as the entity component. In this function, the parameters will be identied using their types. On the other hand, the hierarchy between parameters will be considered to get knowledge of the parameter types employed in this function. between them. The gure ?? represents an entity instantiation. Its CDFG is described in gure ??. We observe that CDFG of entity mapping and de-mapping is independent of the process CDFG. The -1 nodes indicate that the sources of the incoming data edges are unknown.
4.8.1
Assertion and report statements
The assertion and report statement rst check a given condition and then evaluates the results using the report function. The assertion statement will be modeled with a select node that has a logic outgoing control edges. This can be true when the considered expression is satised, false otherwise. Figure ?? shows an assertion and report statements. When an assertion condition evaluates to false and a report is used, than the VHDL compiler will execute this report statement. The CDFG compiler does the same: a false control output of the select node (corresponding to the condition) activates the report node. The CDFG of model Figure ?? is shown in Figure ??. In Figure ??, the select node has only one control outgoing edge false, which activates report node. Figure ?? represents an assertion statement without report statement. Now, the select node does not have any outgoing edges. This is similar to the example of the when others statement (case statement in section ??) without any statement. Figure ?? represents the CDFG of model presented in Figure ??.
4.9
Conclusion
In this chapter, we presented a control data ow graph standard for the synthesis and verication of digital designs from a behavioral level description. This control data ow graph is a specication of behavioral description of digital designs. In control data ow graph, nodes represent operations and edges depict the transfer of data values. When values are available on all incoming edges of a node, the node will execute by consuming these values
46
entity model is (port a, b :in bit_vector(7 downto 0); q :out bit_vector(7 downto 0); ) end entity; architecture arch1 of model is begin process (a,b) q <= a xor b; clk <= not(clk); end process; end arch1; use WORK.model.all. entity test is end entity test; architecture struct of test is signal x, y, z : bit_vector(0 to 7); signal clk : bit := 0; adder: model port map (a => x, b => y, q => z); begin process begin wait on clk; y <= x and z; end process; end struct; Figure 4.29: VHDL model 15
4.9. CONCLUSION
47
entity model is (port a, b :in bit_vector(7 downto 0); q :out bit_vector(7 downto 0); ) end entity; architecture arch of model is signal clk : bit :=0; begin process (a,b) q <= a xor b; clk <= not(clk); assert clkevent and clk =1;; report simulation time is changed; servity Note; end process; end arch; Figure 4.31: VHDL model 16
48
entity SR_fliflop is (port S, R :in bit; Q :out; ) end entity; architecture arch of SR_fliflop is signal clk : bit :=0; begin process (S,R) assert S = 1 nand R = 1; if S =1 then Q <= 1; end if; if R =1 then Q <= 0; end if; end process; end arch; Figure 4.33: VHDL model 17
4.9. CONCLUSION
49
and subsequently generates an output value on all outgoing edges. A control data ow graph explicitly shows the order of execution of operations and illustrates also which operations my be executed simultaneously. This makes control data ow graphs very suitable as a starting points for high-level synthesis and allocations. To exploit the powerfulness of control data ow graphs in high-level synthesis scheduling, we will use them to identify signal dependencies inside envisaged models. This reveals a useful evaluation prediction of digital design behaviors and allows a priori analysis of the relevant signal waveforms. The combination of consistent merging of data and control ow even for loops, wait statements, and the maximal parallel representation are the main features of the developed control data ow graph model. These features are, indeed, used to improve the envisaged waveform compression techniques. Whereas, the following chapter gives an idea about the dierent existing data compression algorithms, the next chapter exploits the control data ow graphs to develop a new own approach of waveform le compression.
50
Chapter 5
Introduction to data compression

5.1 Background
Data compression operates in general by taking symbols from an input (text for example), processing them, and writing codes to a compressed le. To be eective, a data compression scheme needs to be able to transform the compressed le back into identical copy of input [?, ?, ?]. Hence, compression is achieved when the data can be represented with an average length per symbol that is less than that of standard representation. In order to make data compression meaningful, we assume that there is a standard representation for uncompressed data that encodes each symbol using the same number of bits [?, ?, ?]. Moreover, a basic idea in data compression is that most information sources of practical interest are not random, but possess some structures. Recognizing and exploiting these structures is a major theme in data compression. The amount of compression that is achievable depends on the amount of redundancy or structure present in the data that can be recognized and exploited. For example, by noting that certain letters or words in English texts appear more frequently than others, we can represent them using fewer bits than the less frequently occurring letters or words. This is exactly the idea behind Morse Code [?, ?], which represents letters using varying number of dots and dashes. In data compression, there is a natural tradeo between the speed of compressor and the level of compression that can be achieved. In order to achieve greater compression, we generally require more complex and time51
52
CHAPTER 5. INTRODUCTION TO DATA COMPRESSION
consuming algorithms. Actually there are many data compression techniques. In fact, data compression methods can be classied into two broad categories: lossless and lossy. Both can be applied for many sorts of data. For lossless data compression, two main classses are distinct: statistical and dictionary-based [?]. Statistical techniques process the input of one character at a time and capture redundancy using probabilistic models that predict the current character to be encoded based on previously encoded characters. The prediction is used in a process called entropy coding. Thereby, statistical methods operate by encoding symbols one at a time. The symbols are encoded into variables length output codes. The length of output code varies based on the probability or frequency of the symbol. Low probability symbols are encoded using many bits, and high probability symbols are encoded using fewer bits [?]. Dictionary-based (or substitution) methods, however, use a dictionary of stored sequences of characters and substitute substrings in the input text that match some dictionary entries by pointers (or indexes) into the dictionary. Indeed, dictionary-based compression systems operate by replacing groups of symbols in the input le with xed length codes. Thus, a dictionary coder is able to process multiple characters in each encoding step. A well-known example of dictionary techniques is Lempel-Ziv data compression algorithm [?, ?], which will be detailed in section ??. In practice, the dividing line between statistical and dictionary methods is not always so distinct. Some schemes cannot be clearly put in one camp or the other, and there are always hybrids which use features from both techniques [?]. However, the eectiveness of statistical methods depends upon the accuracy of the probabilistic models and the eciency of the coder. Dictionary methods, on the other hand, are inherently faster since several characters can be encoded in each encoding step. Also, ecient dictionary data structures and search algorithms are available that allow for implementation of very fast encoding and decoding systems [?]. However, since dictionary coders do not provide for as exible data modeling, they do not give as much compression as statistical coders, which are computationally expensive. In the following sections, we will rst briey introduce the two categories of data compression algorithms: lossy and lossless techniques. However, we will explain the lossless data compression algorithms in more detail as our approaches are based on them.
5.2. LOSSY DATA COMPRESSION ALGORITHM
53
5.2
Lossy data compression algorithm
Lossy compression is compression in which some of the information from the original message sequence is lost [?]. This means that the original sequences cannot be generated from the compressed sequences. Just because information is lost does not mean that the quality of the output is reduced. By this way, most lossy compression techniques that are used are highly dependent on the media that is being compressed. Moreover, in lossy compression, there is a direct relationship between the length of an encoding and the amount of loss or distortion that is incurred. Redundancy exists when an information source exhibits properties that allow it to be encoded with few bits and with little or no perceived distortion. As follows, we will describe the main kinds of the lossy compression algorithms.
5.2.1
Scalar Quantization
A simple way to implement lossy compression is to take the set of possible message S and reduce it to a smaller set S by mapping each element of S to an element in S . In the case that the set S comes from a total oder that is broken up into regions that map onto the elements of S , the mapping is called scalar quantization. Application of scalar quantization include reducing the number of color bits or gray-scale levels in images, and classifying the intensity of frequency components in images or sound into groups (used JPEG-LS). The quantization is used to reduce the number of contexts instead of the number of message values. The term uniform scalar quantization is used when the mapping is linear. Scalar quantization allows one to separately map each color of a color image into a smaller set of output values. By more eective we mean that a better compression ratio can be achieved based on an equivalent loss of quality.
5.2.2
Vector Quantization
The general idea of mapping a multidimensional space into a smaller set of messages S is called vector quantization. Vector quantization is typically implemented by selecting a set of representatives from the input space, and then mapping all other points in the space to the closest representative. The representatives could be xed for all time and part of the compression
54
protocol, or they could determined for each le (message sequence) and sent as part of the sequence. The most important aspect of vector quantization is how we can select the representatives. Typically, this algorithm is implemented using a clustering algorithm that nds some number of clusters of points in the data. A representative is then chosen for each cluster by either selecting one of the points in the cluster or using some forms of centroid for the cluster. Finding good cluster is a whole interesting topic on its own. Notice that the vector quantization is most eective when the variables along the dimensions of the space are correlated. We should note that vector quantization as well as scalar quantization can be used as part of a lossless compression technique. Particular, if in addition to sending the closest representatives, the coder sends the distance from the point to the representative, then the original point can be reconstructed. The distance is often referred to as the residual. Generally, this would not lead to any compression, but if the points are tightly clustered around the representatives, then the technique can be very eective for lossless compression since residuals will be small and probability coding will well work for reducing the number of bits.
5.3
Lossless data compression algorithm
As its name suggests, in lossless coding, information is preserved by the compression and subsequent decompression operation. In lossless data compression, it is possible to reconstruct exactly the original data from compressed data. Lossless compression is commonly used in many applications, where the loss of even a single bit is unacceptable. The types of data that are typically compressed losslessly include natural language texts, database les, sensitive medical images, scientic data, and binary executables. Lossless compression techniques can be applied to any type of digital data. However, there is no guarantee that compression will actually be achieved for all cases. Although digitally sampled analog data is inherently lossy, no additional loss is incurred when lossless compression is applied. The fundamental problem of lossless compression is to compose a data set (for example, a text le or image) into a sequence of events, then to encode the events using as few bits as possible. The idea is to assign short codewords to more probable events and longer codewords to less probable events. Data can be, thus, compressed whenever some events are more likely than others. Below, we discuss some of the more common lossless techniques.
5.3. LOSSLESS DATA COMPRESSION ALGORITHM
55
5.3.1
Statistical Coding
Much of the theory of data compression relies heavily on Shannons theory of information [?], a body of mathematics intended for analyzing the encoding and transmission of messages across a possibly noisy channel. In this formulation Shannon oered a set of axioms describing the properties desired of measure of uncertainty within a communications context, and deduced from these axioms a function, H, that he called entropy: if we have a set of messages M = {mi }, and each mi has an associated probability, pi , then H is dened by the well known formula: H= pi log2 pi
H quanties the uncertainty regarding, which message will be selected for transformation: the reduction in uncertainty then denes the information content of a transmission. Furthermore, it is subsequently shown that H constitutes a lower bound on the ability to compress data: given a set of items and probabilities, {mi , pi }, it is not possible to usefully encode the items {mi } with a binary code so that the lengths of codewords, li , satises pi li < H. H thus denes a theoretical limit on the compressibility of a set of messages. Shannon considers messages to be concatenations of symbols from a set A = {a1 , , am } called an alphabet. For example, the alphabet A for English message contains the Roman letters and grammatic separation symbols. For an 8-bit digital music, A contains the integers from 0 to 255. A lossless code is an invertible function C : A > B , where B represent the set of binary strings or codewords (for example 01, 10, 1001, etc ...) and can be represented by nodes in binary tree. The alphabet A can be extended to A , which is a set of sequences xn = x1 , , xn , where xn represents a string vectors composed of n characters x1 , , xn . The lossless code can also be written as C : A > B . The codewords can be obtained by concatenation: C(xxn+1 ) = C(x)C(xn+1 ). A statistical coder must work in conjunction with the modeler that estimates the probability of each possible event at each point in the coding. The probability model does not need to describe the process that generates the data; it merely has to provide a probability distribution for the data items. The probabilities do not even have to be particularly accurate, but more accurate they are, the better the compression will usually be. When the probabilities are wildly inaccurate, the le may even be expanded rather than compressed, but the original data can still be recovered. To obtain
56
maximum compression of a le, both good probability model and an ecient way of representing the probability are required. On the other hand, to ensure decodability, the encoder is limited to use the model information that is available to the decoder. There are no other restrictions on the model. Thereby, in particular it can be changed as the le being encoded. The model can be adaptive (dynamically estimating the probability of each event based on all events that precede it), semiadaptive (using a preliminary pass of the input le to gather statistics), or non-adaptive (using xed probabilities for all les). Non-adaptive models can perform arbitrarily poorly. Adaptive models allow one-pass coding but require a more complicated data structure. Semi-adaptive codes require two passes and transmission of model data as side information; if the model data is transmitted eciently they can provide slightly better compression than adaptive codes. Since the information theory, much work has been done on statistical source modeling and ecient entropy coding techniques. The logical separation of statistical compression into entropy coding has been proposed by Rissanen and Langdon [?]. Early entropy coders, as exemplied by Human coding and Shannon-Fano coding [?, ?], assign close to {log2 pi } bits to encode the ith symbol in an attempt to encode a source near its entropy limit. Since these approaches encode each symbol with an integral number of bits, the entropy limit cannot be reached exactly in general. In fact, if the probability of the most probably symbol is close to 1, Human coding can waste as much as 1 bit per symbol on average [?].
5.3.2
Dictionary coding
While statistical coders can give excellent compression with sophisticated modeling and entropy coding techniques, they are limited in speed since they encode one character at a time [?]. On the other hand statistical compression methods use a statistical model of data and the compression quality they achieve depends on how good that model is. Dictionary-based compression methods do not use a statistical model, nor do they use variable-size codes [?]. Instead they select strings of symbols and encode each string as a token using a dictionary. The dictionary holds strings of symbols and it may be static or dynamic (adaptive). Moreover, dictionary coders can encode several characters at a time. They work by dividing the input into phrases in a process called parsing. For example, the input text is usually parsed from left to right. During parsing, the text starting at the current coding position is searched against
5.4. PREFIX CODES
57
a dictionary of stored phrases. If a phrase in the dictionary matches a prex of the text starting at the current position, the identity of that phrase is encoded, usually using a xed number of bits. The index of the matching phrase is commonly called a pointer or codeword. If no match is found, the encoder has to inform the decoder of this fact and then send the next character. When the input alphabet is included in the dictionary, a matching phrase can always be found. There may be more than one entry in the dictionary that matches a prex of the text at the current position. With greedy parsing, the longest such match is chosen. Greedy parsing works well only for certain dictionary schemes, such as a dictionary whose entries consist of at most two characters. Although, optimal parsing can be performed, greedy parsing is fast and works well in practice [?, ?].
5.4
Prex Codes
Prex encoding is a lossless data compression algorithm. A code C for a message set S is a mapping from each message to a bit string. Each bit string is called a codeword, and we will denote codes using the syntax C = {(s1 , w1 ), (s2 , w2 ), , (sm , wm )}, where wi is a codeword, i, for message si . Typically, in computer science we deal with xed-length codes, such as the ASCII code, which maps every printable character and some control characters into 7 bits [?]. For compression, however, we would like codewords that can vary in length based on the probabilities of their occurrences in the message. Such variable length codes have the potential to tell where one codeword after the other is. It can be hard or impossible to tell where one codeword nishes or the next starts. For example, given the code {(a, 1), (b, 01), (c, 101), (d, 011)}, the bit-sequence 1011 could either be decoded as aba, ca, or da. To avoid this ambiguity we could add special stop symbol to the end of each codeword, or send a length before each symbol. These solutions, however, require sending extra data. A more ecient solution is to design codes in which we can always uniquely decipher a bit sequence into its code words. We will call such codes uniquely decodables codes [?]. A prex code is a special kind of uniquely decodable code in which no bitstring is a prex of another one. For example, a codeword set of the alphabet X = {a, b, c, d}, can be given as follows C = {(a, 1), (b, 01), (c, 101), (d, 011)}. All prex codes are uniquely decodable since once we get a match, there is no longer code that can also match. To make codewords uniquely decidable, we impose the restriction that no codeword is a prex of the another. A
58
Figure 5.1: Examples of non-prex and prex codes for alphabet A={a, b,c}
prex code can be viewed as binary tree as follows: Each message is a leaf in the tree. The code for each message is given by following a path from the root to the leaf, and appending a 0 each time a left branch is taken, and 1 each time a right branch is taken. Figure ?? illustrate the dierence between a prex code and non prex code for a given alphabet A = {a, b, c}. Note that for the Non-Prex codes, the binary sequence 11 may be either decoded as bb or as a and is hence ambiguous. binary tree.
5.5
Human data compression algorithm
Human codes are optimal prex codes generated from a set of probabilities by a particular algorithm called Human Coding Algorithm [?, ?]. David Human developed this algorithm as a student in a class on information theory at MIT in 1950. The algorithm is now the most presently used component of compression algorithms, used as the back end of GZIP, JPEG, and many other utilities. The Human algorithm is very simple and is most easily described in terms of how it generates the prex-code tree. Start with a forest of trees, one for each message. Each tree contains a single vertex with weight wi = pi Repeat until only a single tree remains: Select two trees with the lowest weight roots (w1 and w2 ). Combine them into a single tree by adding a new root with weight w1 +w2 , and making the two trees its children. It does not matter which is the left or right child, the convention is to put the lower weight root on the left if w1 = w2 Notice that for a code of size n this algorithm requires n 1 steps since every complete binary tree with n leaves has n 1 internal nodes, and each step creates one internal node.
5.6. ARITHMETIC CODING ALGORITHM
59
Figure 5.2: Binary tree for Human code
Figure ?? represents an example of Human code. Notice that in this example, after d and e are joined, the pair will have the same probability as c and a, but it was created afterwards, so we join c and a. Similarly, we select b instead of ac to join with de since it was created earlier. This will give Human code above, and corresponding Human tree in the gure ??.
5.6
Arithmetic coding algorithm
Arithmetic coding is a lossless data compression algorithm. It is a technique for coding that allows the information from the messages in a message sequence to be combined to share the same bits [?, ?]. The main idea of arithmetic coding is to represent each possible sequence of n messages by a separate interval on the number line between 0 and 1. For a sequence of n events (messages) with probabilities p1 , ..., pn , this algorithm will assign the
n
sequence to an interval of size

i=1
pi , by starting with an interval of size 1
(from 0 to 1) and narrowing the interval by a factor of pi on each message i. We can bound the number of bits required to uniquely identify an interval of size s. Arithmetic coding algorithm considers the total number of bits: the sum of the self information of the individual messages. The self information of the 1 message is dened as log2 pi . Thereby, arithmetic codes assign one codeword to each possible data set. The codewords consist of half-open unit interval [0, 1[, and are expressed by specifying enough bits to distinguish the subinterval corresponding to the actual data set from all other possible subintervals. Shorter codes correspond to larger subintervals and thus more probable input data sets. In practice, the subintervals is rened incrementally using the probabilities of the individual events (messages), with bits being output as soon as they are known. Arithmetic codes almost always give better compression than prex codes, but they lack the direct correspondence between the events in the input data set and bits or groups of bits in the coded output le [?].
60
5.6.1
Basic algorithm for arithmetic coding
In this section we explain how arithmetic coding works and give operational details. The arithmetic encoding algorithm works conceptually as follows: 1. a current interval [L, H[ is initialized to [0, 1[. 2. for each event (message), we perform two steps: the current interval is subdivided into subintervals, one for each possible event (message). The size of events subinterval is proportional to the estimated probability that the event will be the next event, according to the model of the input. the subinterval corresponding to the event that actually occurs next is selected and made as new current interval. 3. enough bits are used as outputs to distinguish the nal current interval from all other possible nal intervals. The length of the nal subinterval is clearly equal to the product of the probabilities of the individual events, which is the probability p of the particular sequence of events. The nal step uses at most | log2 p| + 2 bits to distinguish the considered event sequences from all possible events sequences. Hence, a mechanism to indicate the end of the event sequences is needed (this can be end-of-le event for example). In step 2, only the subinterval corresponding to event ai that actually occurs having to be computed. To do this, it is convenient to use two cui1
mulative probabilities: the cumulative probability PC =

k=1 i
pk and the
next cumulative probability PN = PC + pi =

k=1
pk . The new subinterval
is then [L + PC (H L), L + PN (H L)]. The need to maintain and supply cumulative probabilities requires the model to have a complicated data structure, especially when more than two events are possible. The steps involved in pure arithmetic coding are illustrated in table ??. In table ??, we choose between just two events at each step. We assume that we know a priori that we have a le consisting of three events (or three letters in the the case of text compression). The rst event is either 1 a1 with probability p{a1 } = 2 or b1 with probability p{b1 } = 3 . The 3 second event is either a2 with probability p{a2 } = 1 or b2 with probability 2 1 p{b2 } = 2 . The third event is either a3 with probability p{a3 } = 3 or b3 5
5.7. PREDICTION BY PARTIAL MATCHING Action Start Subdivide with left prob. p{a1 } = Input b1 , select right subinterval Subdivide with left prob. p{a2 } = Input a2 , select left subinterval Subdivide with left prob. p{a3 } = Input b3 , select right subinterval Output 11001
61
2 3 1 2 3 5
Subintervals [0, 1[ 2 [0, 3 [, [ 2 , 1[ 3 2 [ 3 , 1[ 5 [ 2 , 5 [, [ 6 , 1[ 3 6 [2, 5[ 3 6 23 [ 2 , 23 [, [ 30 , 5 [ 3 30 6 [ 23 , 5 [ 30 6 0 11001 2 is the shortest binary 23 fraction that lies within [ 30 , 5 [ 6
Table 5.1: Example of pure arithmetic coding
with probability p{b3 } = 2 . The actual le to be encoded is the sequence 5 b1 a2 b3 . In this example the nal interval corresponding to the actual event 23 1 sequences b1 a2 b3 is [ 30 , 5 [. The length of this interval is 15 , which is the 6 probability of b1 a2 b3 , computed by multiplying the probabilities of the three 1 events: p{b1 }p{a2 }p{b3 } = 1 1 2 = 15 . In binary format the nal interval is 325 [0 110012 , 0 110102 [. Since all binary numbers that begin with 0 11001 are entirely within this interval, outputting 11001 suces to uniquely identify the interval.
5.7
Prediction by Partial Matching
One of the more popular and successful adaptive compression algorithms based on nite-context models is the Prediction by Partial Matching (PPM) algorithm of Cleary and Witten [?]. The basic idea of PPM is to use few characters in the input stream to predict the upcoming one. Models that condition their predictions on a few immediately preceding symbols are called nite-context models of order k, where k is the number of preceding symbols used. PPM employs a suite of xed-order context models with dierent values of k from 0 up to some pre-determined maximum, to predict upcoming characters. For each model, a note is kept of all characters that have followed every length-k subsequence observed so far in the input, and the number of symbols that each has occurred. Prediction probabilities are calculated from these counts. The probabilities associated with each character that has followed the last k characters in the past are used to predict the upcoming
62
Figure 5.3: Example of context models created by the PPM method. A portion of the input text is shown in (a). The current coding position is indicated by the vertical bar and the current contexts are underlined. Frequency tables for current contexts are shown in (b), (c), and (d).
character. Thus, from each model, a separate predicted probability distribution is obtained. These distributions are eectively combined into a single one, and arithmetic coding is used to encode the character that actually occurs relative to that distribution. The combination is achieved through the use of escape probabilities. Each model has a dierent value k. The model with the largest k is, by default, the one used for coding. However, when a novel character is encountered in this context, which means that the decoder to switch to the model with the next smaller value of k. The process continues, indeed, until a model is reached in which the character is not novel, at which point it is encoded with respect to the distribution predicted by that model. To ensure that the process terminates, a model is assumed to be present below the lowest level, containing all characters in the coding alphabet. As an example, consider the context models and input in gure ?? for a three-symbol alphabet with the maximum order O = 3. The current rst, second and third-order contexts are respectively a, ca, and bca. The table (b) for context bca keeps count of the number of times that each symbol follows bca in the previous text. In this example, the escape symbol gets a xed count of 1. Similar counts are shown for the contexts ca and a. The current character, a, is encoded by rst consulting the table for context bca. Since a has not yet occurred previously in this context, an escape symbol is 1 sent with a probability value of 2 (using log2 (2) = 1bit). The escape tells the decoder to shift to the order-2 context. Since a has previously occurred in the order-2 context, the encoder transmits the code for a with probability value of 1 (using log2 (3)bits). 3
5.8
Burrows Wheeler data compression algorithm
The Burrows Wheeler algorithm is a recent algorithm. An implementation of this algorithm called bzip, is currently one of the best overall compression algorithms for text. It gets compression ratios that are within 10% of the best algorithms such as PPM, but runs signicantly faster [?, ?].
5.8. BURROWS WHEELER DATA COMPRESSION ALGORITHM
63
Burrows Wheeler algorithm transforms a string, S of n characters by formating N rotations (cyclic shifts) of S, sorting them lexicographically, and extracting the last character of each of the rotations. A string L is formed from these characters, where the ith character of L is the last character of the ith sorted rotation. In addition to L, the algorithm computes the index I of the original string s in the sorted list of rotations [?, ?, ?]. The sorting operation brings together rotations with the same initial characters. Since the initial characters of the rotations are adjacent to the nal characters, consecutive characters in L are adjacent to similar strings in S. When the context of a character is a good predictor for the character, L will be easy to compress with a simple locally-adaptive compression algorithm. Thus, there is an ecient algorithm to compute the original string S given only L and I. This algorithm is so called Burrows Wheeler Transformation (BWT). This transformation is reversible, meaning the original ordering of the data elements can be restored with no loss of delity [?, ?, ?]. Moreover, this algorithm is performed on an entire block of data at once. Most of todays familiar lossless compression algorithms operate in streaming mode: reading a single byte or a few bytes at a time. This algorithm operates on the largest chunks of data possible [?, ?, ?].
5.8.1
BWT description
In this section we detail the BWT. First, we present the compression transformation, then we describe the decompression algorithm and nally we illustrate the involved move-to-front algorithm. This BWT takes as input a string S of N characters S[0], , S[N 1] selected from an ordered alphabet X of characters [?, ?, ?]. The steps of the algorithm are as below: 1. Compression Transformation: The steps of compression transformation are: sort rotation: from N N matrix M whose elements are characters, and whose rows are the rotations (cyclic shifts) of S, sorted in lexicographical order. At least one of the rows of M contains the original string S. Let I be the index of rst such a row, numbering from 0. Let consider an example, S = abcda (N = 5). The matrix M is dened as shown in table ??. nd last character of rotations: let the string L be the last column of M , with characters L[0], , L[N 1] (the same as M [0, N 1], , M [N 1, N 1]. The output of the transformation is
64
CHAPTER 5. INTRODUCTION TO DATA COMPRESSION row 0 1 2 3 4 cycle shifts aabcd abcda bcdaa cdaab daabc
Table 5.2: Example of matrix M where S = abcda .
so the pair (L, I). In the example of table ??, L = daabc and I = 1. 2. Decompression transformation: The input of this step is pair (L, I). The matrix M does not exist at the beginning of the process. The steps of decompression transformation are: nd rst characters of rotations: this step calculates the rst column F of the matrix M dened above. This is done by sorting the characters of L to form F . In our example (table ??), L = daabc . So, we sort these characters, we obtain F = aabcd . Indeed, any column of the matrix M is a permutation of the original string S. Thus L and F are both permutations of S, and therefore of one another. Furthermore, because the rows of M are sorted, and F is the rst column of M , the characters in F are also sorted. build list of predecessor characters: only the strings F , L and the index I are needed by this step. Indeed, if L[j] is the kth instance of a character ch in L, then T [j] = i where F [i] is the kth instance of ch in F . Hence, T represents a one-to-one correspondence between elements of L, and F [T [j]] = L[j]. In our example, T is (40123). Therefore, the rst character in L is d, which is the fourth character in F . I.e. L[0] = d = F [4] and so on. This allows to write T [0] = 4, T [1] = 0, T [2] = 1, T [3] = 2 and T [4] = 3. Thus the tranformation vecctor T = (40123). form output string S: from the construction of T , we have F [T [j]] = L[j]. Substituting i = T [j], we have L[T [j]] cyclically precedes L[j] in S. Moreover, the index I is dened such that row I of M is S. Thus, the last character of S is L[I]. The vector T is then used to give the predecessors of each character: for
5.8. BURROWS WHEELER DATA COMPRESSION ALGORITHM
65
each i = 0, , N 1 we have S[N 1 i] = L[T i [I]], where T 0 [x] = T [x] and T i+1 [x] = T [T i [x]]. This yields S, the original input to the compressor. In our example, I = 1 and L = daabc , so the last character in S is the character S[4] = L[1] = a. The others characters are then determined using the transformation vector T = (40123). Thus, for each i = 0 3, we have S[3i] = L[T i [1]]. For i = 0, we have S[3] = L[T 0 [1]] = L[0] = d. For i = 1, we have S[2] = L[T 1 [1]] = L[T [T 0 [1]]] = L[4] = c. For i = 2, we have S[1] = L[T 1 [1]] = L[T [T 1 [1]]] = L[3] = b. For i = 3, we have S[0] = L[T 1 [1]] = L[T [T 2 [1]]] = L[2] = a. So, the character vector S = abcda . 3. Move-to-front-coding: this algorithm encodes the output (L, I) of compression transformation, where L is a string of length N and I is an index. It encodes L using a move-to-front algorithm and a Human or arithmetic coder [?, ?]. The steps of which algorithm are the followings: move-to-front coding: this step encodes the characters in L by applying the move-to-front technique to the individual characters. Thus, rst, an integer vector, R[0], , R[N 1], is dened. The elements of this vector are the codes of characters L[0], , L[N 1]. In our example (table ??), R = (0123). Second, a character list, Y , is initialized to contain each character of the original alphabet, X, exactly once. For example Y = gf ahbpcdl . For each I = 0, , N 1 in turn, set R[i] to the number of characters preceding character L[i] in the list, Y , then move the character L[I] to the front of Y . Therefore, after this transformation, the vector R changes to R = (72236) and the character list Y changes to Y = cbaadgf hpl . encode: apply Human or arithmetic coding to the elements of R, treating each element as a separate token to be coded. Any coding technique can be applied as long as the decompressor can perform the inverse operation. If the output of this coding process is OU T , then the compression transformation output is then the pair (OU T, I). 4. Move-to-front decoding: this algorithm is the inverse algorithm of that dened in the previous step. This algorithm contain two steps, which are [?, ?, ?]:
66
CHAPTER 5. INTRODUCTION TO DATA COMPRESSION Function S = R = j = BW_Decoder(In, FirstIdenx, n){ MoveToFrontDecoder(In, n); Rank(S); FirstIndex;
for i = 1 to n-1 { Out[i] = S[j]; j = R[j]; } } Figure 5.4: Burrows Wheeler decoding algorithm
Figure 5.5: Burrows-Wheeler decoding example with the decoded message sequence assanissimassa
decode: decode the coded stream OU T using the inverse of the coding scheme used above. The result is then stored in a new integer vectors, R, of length, N . inverse move-to-front coding: the goal of this step is to calculate the string L knowing the move-to-front codes R[0], , R[N 1]. Thus, for each i = 0, , N 1 in turn, set L[i] to be character at position R[i] (numbering from 0) in the dened character lists, Y , then move that character to the front of Y . The resulting string L is the last column of the matrix M of the compression transformation. The output of this step is then the pair (L, I). Abstractly, the Burrows Wheeler algorithm is given by the code illustrated in gure ??. For an ordered sequence S, the Rank(S) function returns a sequence of integers specifying for each character c S how many characters are either less than c or equal to c and appear before c in S. Another way to say this is that it species the position of the character. To show how this algorithm works, we consider an example (gure ?? (a)) in which the MoveToFrontDecoder returns S = ssnasmaisssaai, a string of length n in which FirstIndex = 4 (the rst a). We can generate the most signicant characters of context simply by sorting S (gure ?? (b)). We can therefore simply rebuild the initial sequence by starting at the rst characters one by
5.9. LEMPEL-ZIV DATA COMPRESSION ALGORITHMS one (gure ?? (c)).
67
5.9
Lempel-Ziv data compression algorithms
The Lempel-Ziv algorithms is a data compression algorithm that compresses by building a dictionary. Thereby, the Lempel-Ziv code groups a set of characters with varying lengths [?, ?]. The original algorithms also did not use probabilities, strings were either in the dictionary or not and all strings in the dictionary were given equal probability. At the highest level, this algorithm can be described as follows: given a position in le, look through the proceeding part of the le to nd the longest match to string starting at the current position, and output some codes that refer to that match. Now move the nger past the match [?]. The two main variants of this algorithm were described by Ziv and Lempel in two separate papers in 1977 [?] and 1978[?], and are often referred to as LZ77 and LZ78. The algorithms dier in how far back they search and how they nd matches. The LZ77 algorithm is based on the idea of a sliding window. The algorithm looks only for matches in window a xed distance back from the current position. Gzip, ZIP, and V.42bis ( a standard modem protocol) are all based on LZ77. The LZ78 algorithm is based on the more conservative approach to adding strings to the dictionary. Unix compress, and the GIF format are both based on LZ78. In the following, we describe the two algorithms in detail.
5.9.1
Lempel-Ziv 77 (Sliding Windows)
At a high level, the LZ77 algorithm works by replacing a repeated substring with a pointer to an earlier occurrence in the input. Hence, the pointer is represented by the pair (d, l), where d is a displacement that indicates the position of the matching phrase relative to the current position, and l is the length of the substring, which may be empty [?, ?]. After transmitting the pointer, the encoder transmits the character following the coded substring literally, that is, as it is encoded in the input. Thus, Ziv and Lempel group the displacement, length and character into one codeword. The concatenation of the matched substring and the coded character is called a phrase. In the original LZ77 coder, the displacement d and the length l are limited in range and are represented using a xed number of bits for each. The eective dictionary, therefore, consists of all substrings, within the specied lengths, that occur previously in the input within a nite range of the current position. Conceptually, the LZ77 algorithm and its variants use a
68
sliding window that moves along with a position. The algorithm is currently trying to encode from a current position (called cursor). The window can be divided into two parts, the part before the cursor, called the dictionary, and the part starting at the cursor, called the lookahead buer. The size of these two parts are parameters of the program and are xed during execution of algorithm. The basic algorithm as illustrated in gure ?? is described as follows: Find the longest match of a string starting at the cursor and completely contained in the lookahead buer to a string starting in the dictionary. Output a triple (p, n, c) containing the position p of the occurrence in the window, the length n of the match and the next character c past match. Move the cursor n + 1 to characters forward. The position p can be given relative to the cursor with 0 meaning no match, 1 meaning match starting at the previous character. To decode the message, we consider a single step. Inductively, we assume that the decoder hash constructed correctly the string up to the current cursor, and we want to show that given the triple (p, n, c), it can reconstruct the string up the next cursor position. To do this, the decoder can look the string up by going back p position and taking the next n characters, and then following this with the character c. Figure ?? illustrates the LZ77 algorithm. From the current position, the longest prex matching a substring of previous encoded text is bcabc. This is encoded by the pair (11, 5). The following character c is encoded literally.
5.9.2
Lempel Ziv 78
Whereas the LZ77 coder includes in its dictionary all substrings in the sliding window below a maximum size, the LZ78 coder build its dictionary in slower, more structured manner. Initially, the dictionary contains only the empty string. From the current position, the dictionary is searched for longest entry that matches a prex of the input starting at the current position. The index of the match is transmitted using, log2 L bits, where L is the number of entries in the dictionary. Then, the character following the match in the input is encoded literally. A new phrase that is the concatenation of the match and the literal character is added to the dictionary. It is easy to
5.9. LEMPEL-ZIV DATA COMPRESSION ALGORITHMS
69
Function LZW_Encode(file){ C = ReadByte(file) While C <> EOF do { x = ReadByte(file); C = GetIndex(C,x); Output(C); AddDict(C,x): C = x; } } Function LZW_Decoder(file){ C = ReadIndex(file); W = GetIndex(C); Output(W); While C <> EOF do { C = ReadIndex(file); if IndexInDict?(C) then { W = GetString(C); AddDict(C, W[0]); } else { C = AddDict(C, W[0]); W = GetString(C); } Output(W); C = C; } } Figure 5.6: Code of LZW encoding and decoding
Figure 5.7: LZ77 coder. A portion of the input is shown in (a). The current coding position is indicated by vertical bar. The previously parsed phrases are listed in (b), and (c) shows the next phrase, corresponding to the longest match of bcabc followed by the literal c.
70
Figure 5.8: LZ78: a portion of the input is shown in (a). The current coding position is indicated by the horizontal bar. The current dictionary is listed in (c), where the rst entry is the empty string. The next phrases is shown in (b), corresponding to the longest match of cb followed by the literal c.
verify that if a phrase is in the dictionary, then all its prexes are also in the dictionary. Figure ?? illustrates an example of LZ78 algorithm. From the current position, the longest prex matching an entry in the dictionary is cb. The corresponding index, 6, is encoded using, log2 12 = 4 bits. The following character, c, is encoded literally. The new phrase, cbc, is then inserted into the dictionary. Two main variants of LZ78 are distinct: Lempel-Ziv-Welch: is a LZ78 variant that is the most commonly used in practice. The algorithm maintains a dictionary of strings (sequence of bytes). The dictionary is initialized with one entry for each of the 256 possible byte values, these are strings of length one. As the algorithm progresses, it will add new strings to the dictionary such that each string is only added if a prex one byte shorter is already in the dictionary [?]. Lempel-Ziv-Fiala-Greene (LZFG): in LZ77, the longest matching substring may be found in more than one location in the sliding window. Thereby, a phrase may be encoded by any number of codewords. This is a coding ineciency and is referred to as pointer redundancy. In 1989, Fiala and Greene [?] introduced a dictionary coder that eliminates pointer redundancy by judicious choice of dictionary data structure and pointer encoding scheme. Therefore, LZFG eliminates pointer redundancy by encoding a substring match as a pointer into a digital search tree, trie for short, that stores substrings, which have occurred previously in the text. An important property of a trie is that each substring stored in trie has a unique pointer. LZFG is similar to LZ78 in that stored substrings start at boundaries of previously parsed phrases. The dierence between them is that the eective LZFG dictionary consists of all prexes (up to maximum length) that start at boundaries of previous parsed phrases. On the other hand, LZFG is similar to LZ77 in its use of sliding window.
5.10. CONCLUSION
71
5.10
Conclusion
In this chapter, an overview of the data compression has been presented. There are two main kinds of data compression: lossy and lossless data compression. For each kind, many algorithms have been introduced. For the rst kind, the major approachess are Scalar and Vector quantization. For the second kind, the major approaches are Prex Codes, Arithmetic Coding, Burrows Wheeler and Lempel-Ziv. The next chapter will deal with our waveform compression algorithm and compare it to the other data compression algorithms.
72
Chapter 6
Waveform compression
6.1 Introduction
This chapter is organized as follows: rst, we present the reasons of the waveform compression algorithm development, then we describe the architecture of compression design, and nally the Signal Set compression will be illustrated in details.
6.2
The concept of the waveform compression
In chapter ?? an overview of the dierent data compression algorithms has been given. Two main data compression algorithms are distinct: lossy and lossless. Although many data compression algorithms are distinct, it is hard to nd an algorithm from them which gives a suitable waveform compression without consuming too much computational resources. The most known data compression algorithms are used for text and media compression and are thus not optimized for waveform compression. Hence, to eciently compress our waveform le, we need to develop an own algorithm using main data compression principles and some other developed data compression techniques. Our algorithms must be nearly lossless, because no loss of important information can be tolerated in waveform compression process. Moreover, static algorithms can be very eective when used in large databases of a relatively uniform data type, for example a mailing list of student names and addresses. Even if constantly being updated, the character sequences such as names are still likely to occur within such a database. For general purpose waveform storages, however, it is not possible to rely on such expectations, and adaptive methods must be used. 73
74
CHAPTER 6. WAVEFORM COMPRESSION
Indeed, deciding on a suitable lossless algorithm for waveform compression is not simple. From a systems viewpoint, it becomes soon clear that not only there are several algorithm characteristics of importance, but also these characteristics tend to vary widely among dierent algorithms. Eective waveform compression clearly matters, but other considerations such as the speedup of the compression and quality of required data may in fact sometimes outweigh this matter. The data used for algorithm evaluation often can inuence waveform compression results, and this leads then to the dicult question of what is in fact representative data. Hence, some of the more important characteristics to be considered must include the waveform compression ratio and its robustness across dierent type of waveforms, the complexity of the algorithm, and the resulting speed. Algorithms or their implementations can dier in the symmetry, so that compression versus decompression might be more or less dicult or complex to implement. Some implementations compromise also data compression eectiveness in order to achieve faster speed, and others may exhibit a signicant fallo in compression performance if used on smaller amounts blocks of data. In order to improve the verication algorithms by searching for a suitable solution that reduices the compromise between the compression and decompression, the application of the algorithms on small amount blocks of data is required. This can be an extremely important characteristic for waveform storage systems applications. Beside of the disk space consumption large waveform data sets slow down simulation because of the disk activity involved in writing the data to the le system. Some algorithms achieve good compression but require a complex structure techniques. While waveform compression is a challenging problem only few work is known to the author which addresses this topic. Common compression algorithms may be used to decrease the le size of a waveform database. While these algorithms are well known and may be optimized for waveform compression they usually consume a signicant amount of computation power. Contrary to these approaches a use of the properties of the data to compress will be made in order to provide good compression ratio using only a minimal amount of computation power. These approach can be further improved by exploiting knowledge about the design, i.e. by examining the source code of the design in order to guide the compression algorithms. Because the simulation waveform data may get very large, our compression techniques must operate eciently on-line without knowing the entire waveform. Hence, we did not address any algorithms which must read in the complete waveform database before compression starts. Our algorithm is, therefore, an adaptive algorithm concerning a fast on-line implementation
6.3. COMPRESSION DESIGN ARCHITECTURE
75
Figure 6.1: Compression system architecture
of general purpose lossless data compression algorithm, particular for use in enhancing the data capacity of storage devices which are used to held waveforms.
6.3
Compression design architecture
Waveform compression can be divided into two domains. First, for each transition block the set of signals (signal IDs) that have a transition must be stored. Further, for each signal on a transition block the new value also has to be dumped. The key idea of the current compression approach is to encode signal IDs (signal ID domain) and signal values (signal value domain) separately. As a result, dierent kind of compression techniques can be applied to each domain. The architecture of the proposed compression system is shown in Figure ??. The VCD formatted waveform le is read in and then split into transition blocks by the transition block separator module. Next, the waveform stream is separated into a signal ID stream (signal ID domain) and a value stream (value stream domain). Both stream types are processed separately by appropriate compression modules. The algorithms for signal ID stream compression are explained in Section ?? and the techniques for value stream compression are shown in Section ??. Afterwards, the compressed streams are encoded and merged to form a combined waveform stream again. Optionally, the merged stream may be compressed using common dictionary based algorithms to further reduce le size [?, ?]. Note that the waveform le is compressed transition block wise. I.e., an entire transition block is read, then it is processed and nally written out. In order to use the VCD le format to output VHDL simulation data we extended the format slightly. VHDL incorporates a special mechanism to deal with signal assignment without delays. When a new value is assigned to a signal with zero delay then this new value does not show up on the signal in the same simulation cycle. Instead an additional simulation cycle (a so called delta cycle) is added which actually simulates the same time instance again. During this cycle the new value is displayed on the signal. To separate the delta cycle from the previous simulation cycle (which may
76
also be a delta cycle) a virtual time interval is added to the simulation time. In order to store VHDL simulation data in VCD format we extend the time specication by adding a delta value to the time instance where needed. Hence, #10+2 denotes the second delta cycle (third simulation cycle) for simulation time 10 while #22 is the rst simulation cycle of time 22 (i.e., a -value of 0 is not printed).
6.4
Signal ID compression
First we address how to eciently compress the signal IDs. The simulation experiments showed that the signals of a digital circuit are performing very regular activities. In detail, note that the set of dierent transition blocks which can be observed during simulation is rather small if we do not consider signal values. Denition 1 Let qs = (t, v) denote a transition at time t = tm(qs ) to value v = val(qs ) of a signal s. Then, ws = (qs,0 , qs,1 , qs,ns ) denotes a waveform of signal s if tm(qs,0 ) = and tm(qs,i ) < tm(qs,i+1 ) for all 0 i < ns . qs,0 is called the initial value of signal s. idxs : IR {0, 1, ns } denotes a function which associates each time value t IR with the index of the corresponding waveform element q ws : idxs (t) = i : tm(qs,i ) t < tm(qs,i+1 ) ns : : t < tm(qs,ns ) else.
vs (t) denotes the value of signal s at time t, i.e. vs (t) = val(qs,idxs (t) ). Note that the rst waveform element of a waveform actually denes the initial value of the signal, i.e. the value which is assigned to a signal before simulation starts. From these waveforms an waveform set can be dened which includes all transitions of all considered signals: Denition 2 Let S = {s1 , s2 , sl } denote a set of signals and ws1 , ws2 , wsl the corresponding waveforms. Then, WS = (a0 , a1 , ap ), ai PS = {sk S} wsk is a waveform sequence of S if the following three conditions are met:
6.4. SIGNAL ID COMPRESSION the time stamps of ai must increase monotonically, i.e. tm(ai ) tm(ai+1 ) 0i<p
77
the rst l elements of WS are the initial values of all signals, i.e. ai = qsi+1 ,0 , 0i<l
each transition q PS must also be element of WS , i.e. q PS q WS Hence, WS is an increasing sequence of transition values where the time values of two consecutive elements are not allowed to decrease. However, there may be several consecutive ai with the same time value. Note that WS stores all transitions which are included in the waveforms wsi . Hence, |WS | = sS |ws |, where |X| denotes the number of elements included in a set X. Actually, WS is a formal notation of the transition section of a VCD le. Denition 3 Let TRB (c) be a transition block at simulation cycle c. Then, the transition signal block S (c) is the set of signal identiers that are associated with a transition in T (TRB (c) ). I.e., S (c) = {id(tr) : tr T (TRB (c) )}. Experiments (see chapter ??) show that in many cases for a particular set S (c) an identical older set could be found. I.e., in many cases a simulation cycle j < c could be found for which S (c) = S (j) is true. Note that this does not mean that transition blocks at cycle c and j are the same! It just means that the same signals are having a transition at both cycles while the new values of the corresponding signals will usually dier. Moreover, it turned out that a small group of the signal sets occur over and over again while some other sets show up only once. This is not only true for designs at RTL level but also holds for gate level models. In order to exploit this property, signal ID compression is done using a so called transition block cache TBC . This cache stores all dierent kind of transition signal blocks that are detected during compression. In order to refer to a cache entry, each entry is associated with a unique cache entry
78
model level hit ratio
Table 6.1: TBC hit ratio for various models rtl gate sota ch da par1 leon sxp2 RTL gate gate RTL gate RTL RTL 98% 97% 99% 89% 95% 84% 95%
sdf rtl2 RTL 99%
identier1 . Note that all blocks in the cache are mutually dierent from each other. Instead of dumping the signals IDs to the output stream during compression, a reference to the corresponding cache entry identier is stored for each transition block. Hence, a transition block at cycle i is processed as follows: 1. S (i) is determined and looked up in the transition block cache TBC . If S (i) is already in the cache (namely, cache hit) then the unique cache entry identier associated with S (i) is determined. Otherwise (namely, cache miss), a new entry S (i) is added to TBC and associates with a new unique identier. 2. The cache entry identier is stored to the output stream. 3. In case of a cache miss, the signal set is also written to the output stream (see Section ??). In order to test this approach, we applied it on a set of VCD les which varied from 20 MBytes to more than 300 MBytes in le size. The hit ratio along with the type of the models (gate level = gate, RTL level = RTL) are shown in Table ??. The hit ratio states how many transition blocks could be matched against an appropriate block in the transition block cache during compression. Note that most hit rates are above 90%.
6.4.1
Signal set encoding
In case of a TBC miss, the transition signal block S (i) must be encoded and written to the output stream. Further, the associated cache entry identier is also stored. While the cache identier occupies only a couple of bytes and hence does not need any special encoding, the signal IDs of a signal block often form a long list of numbers. Hence, in order to save space, the signal identiers are dumped as follows:
1 E.g., the simulation cycle at which the corresponding entry was added to the cache, may be used as cache entry identier
6.4. SIGNAL ID COMPRESSION
79
1. The signal identier numbers of S (i) are sorted in increasing order. I.e., the result of this operation is an ordered sequence S (i) = (w0 , w1 , . . . , wl ) where wx < wy for all 0 x < y l. 2. The rst (smallest) signal identier number S (i) [1] is written to the stream. 3. For the remaining identiers, the dierence between the current and the previous identier (i.e., S (i) [k]S (i) [k1], k = 2, 3, , l) is written to the stream. Example: Suppose that the following signals appear on a transition block (only signal identier numbers are shown) : 14, 12, 1, 2, 9, 5. The ID values after sorting are: 1, 2, 5, 9, 12, 14. Finally, calculating the dierences gives: 1, 1, 3, 4, 3, 2. As a result, the numbers generated from the signal identiers are a sequence of positive integers. To further reduce the amount of storage occupied by these integers, additional encoding techniques can be applied such as dynamic Human encoding [?] or arithmetic encoding [?]. Unfortunately, these approaches are computational expensive and do not eciently exploit similarities between signal sets. Further, applying dictionary based (LempelZiv) techniques [?, ?] on top of Human and arithmetic encoding does not produce good results. Directly applying dictionary based techniques is more promising but also requires signicant computational resources. Hence, for our experiments we applied a rather simple but fast variable length encoding technique from [?]. This approach translates a 32 bit integer value into a sequence of 1 to 5 bytes. For the encoding, the integer is split into 7-bit fragments and placed into multiple bytes. The least signicant fragment is placed into the rst byte and the most signicant bits end up in the last byte. All bytes consists of a 7 bit payload (lled up with the corresponding fragments) and a stop bit which is set to 1 if all payload bits of the following higher signicant bytes are zero, 1 otherwise. Finally, all bytes after the rst byte with a stop bit set to 1 are removed. Example: An integer value 0x00000223 is encoded into the the byte sequence 0x23 0x84 according to the following schema:
The major advantages of this approach is that it introduces a rather low runtime overhead. Further, as the output stream is byte oriented, dictionary based techniques can be subsequently applied. Moreover, due to the size reduction of the input data, performance of dictionary techniques is
80
signal a, b, c : signed(0 to 7); signal ctrl : std_logic; ... p: process (a, b, ctrl) begin if ctrl = 1 then c <= a + b; else c <= a - b; end if; end process; Figure 6.2: Simple functional VHDL adder model
Figure 6.3: Example transitions for adder model further improved. Hence, variable length encoding may be used stand alone if compression ratio is sucient or compression speed is the major goal. Alternatively, it may be applied as preprocessing stage for dictionary based algorithms to improve compression ratio at the cost of increased runtime. 6.4.1.1 Transition Estimation
Usually, in a circuit a signal transitions may cause transitions on other or the same signal. While is very hard or even impossible to derive from a waveform set which signal transition causes which other transitions, the source code of a model can be exploited to determine all possible dependency in advance. In Figure ?? a simple combinational model is shown. Depending on signal ctrl either the sum or the dierence of signals a and b is calculated and assigned to the target signal c. Obviously, c will be stable (no transitions) as long as signals a, b and ctrl are stable. Hence, for each signal s a set of dependencies rules T Bases can be constructed to determine which transitions may cause transitions on s. Then, for a given simulation time interval Is = (tstart , tend ] a subset T Bases T Bases is selected. T Bases is now used to predict transitions on signal s. Instead of listing for each time instance the signals which changes its values, only those signals are listed where prediction diers from real transition be-
6.4. SIGNAL ID COMPRESSION
81
havior. E.g. Figure ?? shows some transitions for the model presented in Figure ??. Note that while s depends on the signals ctrl, a and b not each transitions on these signals causes a corresponding transition on c. In an VCD formated le signal c must be listed for time instances 10, 17, 20, 22, 40, 42 and 52. When T Bases for the time interval Is = (10, 42] is selected to predict a transition whenever a or b changes then only time instances 35 and 42 must be explicitly listed. Instance 35 is required as b has a transition at 35 but c does not change its value. Contrary, transition time 42 must be listed as neither a nor b change at 42. All other transitions are predicted by T Bases . Denition 4 Let T Bases = {f1 (t), f2 (t) . . . fn (t)} denote a set of n functions. Each function fi returns a boolean value and takes a single parameter t which denotes the time for which a transition prediction shall be made. T Bases is called a transition generator base for signal s S. Each function fi (t) of a transition generator base predicts whether s has a transition at time t. The decision is based on the current simulation time t and may also depend on the previous transitions of all signals sm S with a time stamp less than t. If fi (t) returns true then it predicts a transition for signal s at time t. Note that the prediction may be wrong, i.e. no transition may be predicted for time t but actually a corresponding transition is present or a predicted transition actually does not appear on the waveform. From the source code of the model a transition generator base T Bases can be generated easily. Based on T Bases the transitions within the time interval Is = (tstart , tend ] are compressed as follows: For the time interval Is a subset T Bases of T Bases is selected and the selection is stored into the compressed waveform le. Within the interval Is only those transitions are explicitly listed which are incorrectly predicted by T Bases . A transition is predicted by T Bases for time t if at least one prediction function of T Bases returns true, i.e. if PT Bases (t) = fi (t) (6.1)
fi T Bases
returns true. Hence, the compression task can be divided into three subtasks. Prior to simulation a transition generator base T Bases must be generated for each signal s S. Further, during simulation an interval Is as well as an
82
corresponding subset T Bases of T Bases must be selected. We will address the dierent subtasks in the following. For extended time instances the following operations are dened: Denition 5 Let t1 = r1 + d1 and t2 = r2 + d2 with ri , di {0, 1, 2, 3 } denote two extended time instances. ri = real(ti ) is the real value and di = delta(ti ) the delta value of ti . For two extended time instances t1 and t2 the follwoing operations are dened: t1 = t2 (compare equal): t1 = t2 is true if (r1 = r2 ) (d1 = d2 ). t1 < t2 (compare less than): t1 < t2 is true if (r1 < r2 ) ((r1 = r2 ) (d1 < d2 )) ts = t1 + t2 = rs + ds (add operation): rs = r1 + r2 d2 ds = d1 + d2 : : r2 = 0 r2 = 0
td = t1 t2 = rd + dd (subtract operation): rd = r1 r2 d2 dd = d1 d2 : : r2 = 0 r2 = 0
All other compare operations (greater, greater or less than, . . . ) are derived from the the equal and less than operations. Note that for add and subtract operations the delta part of the rst operand is ignored if the real value of the second operand is not equal to 0.
6.5
Value compression
Similar to signal set encoding, the values of a transition block are encoded for each transition block separately. First, the bits dened in the VCD le must be mapped to binary values. The VCD format denes 4 possible values for each bit: 0, 1, x and z. These bits are denoted as VCD bits in the following. Each VCD bit is translated into two binary bits as shown in Table ??. Dumping the signal values of a transition block TRB (c) is done as follows:
6.5. VALUE COMPRESSION
83
Table 6.2: Table to map a VCD bit to binary values VCD bit binary 0 00 1 01 x 10 z 11
1. The transition set T (TRB (c) ) is sorted in increasing id-value order of the transitions. 2. All signal values stored in the sorted transition set are written to a temporary stream starting with the rst transition. To save space, the value bits are converted to corresponding binary bits using Table ?? and packed together. I.e., the output is a stream of binary bits which does not contain any gaps. 3. The temporary stream is then split up into sections of 32 bits. Each section is in turn variable length encoded (see Section ??) and nally written to the output. Because signal values are written in the order determined by the corresponding signal identiers, there is no need to also append the signal identier to each value. While packing the value bits save some space, applying variable length encoding on the value stream usually does not work very well. This is due to the fact that variable length encoding only works eciently if the encoded numbers are small (i.e., the upper bits are zero). Unfortunately, in a packed value stream the probability of the upper bits of the generated integers are zero is not very high for the general case. This is addressed by the following approaches.
6.5.1
Shuing
Usually, the signal values are 0 or 1 during simulation most of the time while bit values x or z are seldom appeared. As a result, the odd bits in the packed value stream are usually 0. To exploit this property, we develop a function which takes an integer value and re-arranges the bits so that all even bits go into the lower 16 bit
84
unsigned int shuffle(unsigned int data) { const unsigned int mask = (data ^ (data >> 15)) & 0xaaaa; return data ^ mask ^ (mask << 15); } Figure 6.4: Function to shue a 32 bit integer void shuffle_long(unsigned int &data_low, unsigned int &data_high) { const unsigned int mask = (data_low ^ (data_high << 1)) & 0xaaaaaaaa; data_low = data_low ^ mask; data_high = data_high ^ (mask >> 1); } Figure 6.5: Function to shue two 32 bit integers
part and the odd bits are moved to the upper 16 bits. Obviously, this can be done very eciently as shown in Figure ??. Hence, when this function is applied to the integer sequence created from the packed value stream, the upper part of the shued integers will be 0 most of the time. As a result, variable length encoding can now be eciently applied to the sequence. As shown in Figure ??, shuing can be done very eciently. In detail, it takes only 1 mask (and) operation, 3 xor operations, and 2 shifts to perform the operation. Further, only a single constant for masking is required.
6.5.2
Long shuing
Although the shuing approach is very ecient to rearrange the bits within a 32 bit integer, the runtime overhead can be further improved if the data is processed 64 bit wise. Instead of splitting up the value stream in section of 32 bits, it is now grouped in sections of 64 bits. Then, each 64 bit word is
85
shued using the function shown in Figure ?? where the lower 32 bits are passed over via parameter data_low and the upper 32 bits via data_high. Even bits are returned in data_high, old bits go into data_low. Finally, each shued 64 bit result is 64 bit wise variable length encoded. Note that shuffle_long requires approximately the same amount of operations as shuffle to process a single function call. However, it shues twice the amount of data in the same time.
6.5.3
Value prediction
Although the shuing techniques introduced in the previous sections can be eciently combined with variable length encoding to reduce the amount of data to be dumped, the achievable compression ratio is limited. On average, about 4 bytes will be required to encode one 64 bit (= 8 byte) integer value of the value stream. Even if this is a remarkable compression ratio, further reduction is desirable for nowadays designs. An approach to achieve this goal is value prediction. Value prediction has been successfully applied on image compression [?, ?] to predict the value of a pixel from the neighboring pixels. Similar techniques are also used in processor research [?, ?] to derive memory/register values based on previous values [?]. Our approach is similar as it tries to determine signal values based on previously observed transitions. Instead of storing the actual signal values, the dierence between predicted value and the actual signal values of a transition block are dumped. The dierence is a vector where each bit in the vector is 0 if the predicted bit value and the real bit value are the same, 1 otherwise. If prediction is close to the real signal values, the dierence vector will mostly consist of zero bits. This in turn can be exploited by variable length encoding. The quality of the prediction i.e., the number of bits set to one in the dierence vector depends on the prediction algorithm. Although complex and exhaustive statistical analysis of the signal value patterns could be applied, this techniques would create a signicant computational and memory overhead. Hence, we develop a set of simple and fast approaches which achieve a good balance between runtime and computational overhead on the one hand and compression ratio on the other hand. In detail we develop four approaches which are addressed in the following.
86
6.5.4
Signal history based prediction
Usually, only a few bits of a vector signal change at a time. E.g., the number sequence transmitted over the address bus of a processor will be a counting sequence most of the time. Hence, the lower bits will often ip while the upper bits stay the same. Consequently, a simple prediction approach is to assume that the next value of a (vector) signal is similar to its previous value. Hence, instead of storing the new value the dierence (xor) of the current and the previous value are dumped. This of course requires that for each signal its previous value is recorded in a special data structure in memory. This data structure will be called history buer in the following. Although this feature can eciently reduce the number of bits set in the value stream, it has a couple of signicant downside: It does not work well for single bit signals. If a transition on a single bit signal occurs, there will be at least one bit ipped of the two binary bits which are used to represent the VCD signal value. Hence, xor-ing the current value with its previous one will generate at least one bit of two sets to 1. A signicant amount of memory is required to store the previous signal value for each signal. This becomes especially dicult when gate level designs with a large number of signals must be handled. Storing the current values into the history buer pollutes the data cache of the processor. During processing of the VCD stream, the various transitions generate a somehow random like access pattern for the history buer degrading data cache performance of the dumping system.
6.5.5
Extended signal history based prediction
As discussed in Section ??, signal history based prediction does not work very well for single bit signals. On the other hand, based on the previous signal value the next value can be usually predicted very easily. As discussed in Section ??, value x and z are seldom appeared on any signals. Hence, if the previous value of a (single bit) signal is 0 then the next value will most probably become 1 and vice versa. Based on this assumption, we use a prediction table to determine a new value. The new value is predicted to be 1 if the old value is 0, otherwise 0 is predicted (i.e., if the old value is 1, x or z). Note that the prediction based on previous values x and z can be either set to 0 or 1. For our experiments we arbitrarily chose 0.
87
Finding a suitable prediction for single bit signals is easy, however, the situation becomes more complex for vector signals. To preserve compression speed, we select a simple schema which assumes that the least signicant bit of a vector is ipping while all other bits remain stable. However, if a previous vector bit is x or z then we predict the next value to be 0. Although this approach is capable of predicting signal values more accurately than the approach described in Section ??, it also suers from the additional memory overhead and data cache pollution.
6.5.6
Signal set history based prediction
As mentioned in the previous sections, the signal history based prediction suer from data cache pollution due to the random pattern like access to the history buer. In order to overcome this drawback, we also developed an approach which stores history data into the transition block cache TBC introduced in Section ??. Denition 6 A transition data block is a triple TRDB (c) = (t, T, V ), where c, t and T are dened as in Denition ?? and V is a packed data array section consisting of all signal values listed in T . The modied transition block cache TBC consists of TRDB (c) entries. Note that signal values are now also stored to the transition cache. However, there is still only a single entry in the TBC which can be associated with the same transition signal block. The signals values of a transition block at cycle i are now processed as follows: The signal value stream is xor-ed with the data section stored in the corresponding transition data block. Besides the number of bits which are set to 1 in the result are counted (population count). If the population count is above a specic threshold then the original value section is variable length encoded and written to the value output stream. The value data is prepended by a special token to indicate that original values are written. Otherwise another token is written to the output stream followed by the variable length encoded xor result. The token is used to mark that the xor result has been written to the stream. The value data is written back to the corresponding cache entry of the TBC .
88
The approach described in Section ?? uses the history table as a unique container for each signal value. In the current approach, history information may be scattered over several TBC cache entries. Hence, there will be often more than one previous signal value stored for each signal. However, signal values are stored in packed array format reducing memory consumption signicantly. Further, processor data cache read/write operations with respect to the signal values are more regular because the TBC value arrays are always read and written sequentially.
6.5.7
Hybrid history based prediction
The hybrid history based prediction approach is a mix of the previous techniques. It uses a history table to hold the previous value for each signal as well as corresponding value arrays for each TBC cache entry. However, the value arrays in the TBC cache are now used to hold the dierence between the predicted and the real values for a corresponding transition block. The approach is similar to branch prediction techniques [?] as both exploit several kinds of history information for prediction. Our approach can be considered as a two-level prediction in which the rst level do the actual prediction task and the second one tries to correct any miss-predictions introduced by the rst level. The rst level is based on the previous signal values while the second is based on signal sets. I.e., the hybrid technique mixes together two kind of history information in order to improve prediction. In detail, signals values of a transition block at cycle i are processed as follows: The previous values obtained from the history table are used to predict the values for the current cycle. The result from the operation is a set of predicted values for the signals in the current transition block. The current signal values are stored back into the history table. The predicted values are packed together to build a predicted value stream. This stream is xor-ed with the real signal values to form a predicted value dierence stream. The predicted value dierence stream is xor-ed with the value data section of the corresponding TBC cache entry. If the number of bits set in the result is less than the bits set in the predicted value dierence stream then the result is dumped to the
89
Figure 6.6: Hybrid prediction architecture steps 1,2 and 3
Figure 6.7: Hybrid prediction architecture steps 4 and 5
output stream. Otherwise, the predicted value dierence stream is written out. In each case the data is prepended with a special token to indicate which of the two data sets were written. The predicted value dierence stream is stored back to the corresponding TBC cache entry. The data ow of this algorithm is shown in Figure ??. The signals are represented by small boxes containing the appropriate signal values. To separate the signals, each signal has been assigned a unique box color. On the left hand side of the gure the history buer is shown while on the right hand side the transition block cache is displayed. Dashed arrows indicate the write back operations, which update buer and cache.
6.5.8
Value Compression Techniques based on value computation
The compression techniques introduced in the following sections either try to nd similarities within the same signal or exploit dependencies between dierent signals. This technique will not be implemented due to its compexity and the require of compiler modication.Hence, we just introduce the theory of this technique and give an overview of the dierent involved algorithms. 6.5.8.1 Pattern Matching
The rst technique called pattern matching tries to detect repeated patterns within the same signal waveform. An example is shown in Figure ??. In the upper part of the gure a original waveform is shown while in the middle part
Figure 6.8: Hybrid prediction architecture steps 6 and 7
90
Figure 6.9: Hybrid prediction architecture step 8
Figure 6.10: Pattern compression example the value sequence after value-time separation is displayed. The transition times were replaced we consecutive sequence numbers. Obviously the waveform shows a repeated counter sequence which starts with value 0 and counts up to 3. However, the various counter values last for dierent time intervals. Instead of repeating the entire counting sequence on each occurrence it is listed only once. The repeated sequences are handled by copying the original sequence to other time positions. The copy operation is dened by the start and end sequence number of the copied sequence as well as the start sequence number of the copy destination. Here, [i0, i3]@i4 denote that all transition between sequence number 0 and 3 are repeated at sequence number 4. The compression technique is shown for an integer type signal but may be applied to any signal type. Note that time-value separation is a key technique in order to apply pattern matching on the given waveform sample sequence. After separation the time intervals for the dierent counter states do not have to be taken into account. For further compression the pattern matching technique may be recursively applied.
6.5.9
Strength Reduction
For the counting sequence in Figure ?? each time the counter value changes it is written to the VCD le. However, actually there are only two actions applied on the value of the signal. One action is to set the value to 0 while the other action is to increment the previous value by one. Hence, instead of storing the new value a set of operations {f1 , f2 . . . fn } may be dened which derive the new value form the previous one. Then, the appropriate index number of the function is stored to the waveform le instead of the actual value. Denition 7 Let SBases = {f1 (x), f2 (x) . . . fn (x)} denote a set of n functions and V (ws ) = (v0 , v1 , vl ) a value sequence of signal s. Then, SBases is called a self generator base of signal s if for all 0 < i l the following
6.5. VALUE COMPRESSION signal count : signed(0 to 7); signal up : std_logic; ... p: process (clk) begin if rising_edge(clk) then if up = 1 then count <= count + 1; else count <= count - 2; end if; end if; end process; Figure 6.11: VHDL code for an up/down counter condition holds fj : fj (vi1 ) = vi .
91
I.e. a generator base is a set of dierent functions to generate a new signal value from a previous value. Note that SBases may contain more functions that are actually necessary. For the example shown in Figure ?? two operation are sucient to build a generator base: a reset operation which sets the signal value to 0 (i.e., f1 (x) = 0) and an increment operation (i.e., f2 (x) = x+1) which increments the signal value by one. Hence, only a single bit is required to store a new value for the signal. Bit 0 may be chosen to represent the reset operation and 1 to select the increment operation. While this approach can save a signicant amount of space it might become computational complex to derive an appropriate SBase from the waveforms. Further, to determine SBase in advance some knowledge about the entire waveform would be required which is impractical as stated in Section ??. However, in many cases the not all values will appear on a signal. In Figure ?? the RTL VHDL code for a 8 bit width up by 1/down by 2 counter is shown. In the upper part the signal (register) to store the counter value (count) as well as the signal up to control the counting direction are dened. The lower part of the model presents a synchronous process triggered by a rising edge on clk which will assign signal count a new value depending on
92
Figure 6.12: Pattern compression example the control signal up and the previous value of count. Obviously, from this source code SBase can be easily created as SBase = {f1 (x), f2 (x)} with f1 (x) = x + 1 f2 (x) = x 2. Hence, analyzing the source code SBase can be determined in advance without exploring the waveform output. Moreover, this can be done very eciently at model compile time. However, because some parts of the source code might be never executed SBase might contain more functions as actually are needed to cover the waveform obtained from a specic simulation run. E.g., if in Figure ?? signal up stays 1 for the entire simulation run then count will be never decremented and hence function f2 is never selected.
6.5.10
Cross Signal Strength Reduction
While in Section ?? we considered a single signal the approach may also be extended to operate across dierent signals. Comparing signals a, b and c Figure ?? show that signal c is actually either the sum or the dierence of a and b. Hence, c can be replaced by the waveform shown at the lower part of the gure, where the value is either a + b or a b. Denition 8 Let S denote a set of k signals S = {s1 , s2 , sk } and sp a signal s S. OP = {f1 (t), f2 (t) . . . fn (t)} and = {s,i (t)}, s S, 1 i n denote two sets of functions where s,i (t) returns a time instance less than t, i.e. s,i (t) < t. Then, XBases of signal s if dened as XBasesp = (f1 [vs1 (s1 ,1 (t)), vsk (sk ,1 (t))], f2 [vs1 (s1 ,2 (t)), vsk (sk ,2 (t))], fn [vs1 (s1 ,n (t)), vsk (sk ,n (t))]) XBasesp is called a cross value generator base of signal sp if for each transition (t, v) wsp there is a function fi XBasesp for which the following condition holds: fi (vs1 (s1 ,i (t)), vsk (sk ,i (t))) = vsp (t).
93
Note that function vx (t) will return the value of signal x at time t (see Denition ??). A cross value generator base use the previous values of all signals in the waveform to determine the current value of a signal. While in Section ?? only the previous value of the same signal was exploited here all signals of S may be used. Further, all previous values of the signals may be used here instead of exploiting only the most recent previous one. If a cross generator base is found for signal s then instead of storing the actual signal value each time s changes an appropriate function index number fi is put into the compressed waveform le. In order to maximize the compression ratio Xbases should contain as less functions as possible. However, this would require searching the entire waveform database which is inacceptable for models containing a large number of dierent signals. While is it is computational too complex to determine functional dependencies between dierent signals based only on the signal waveforms the model source code can be exploited to reduces the search space signicantly. In detail form each process a function set is generated as follows: Combinational processes Each signal assignment included within an combinational process is converted into a function fm (x0 , x1 , xk ) where xj denote the corresponding signals included in the assignment value expression. Further, all xj ,m (t) associated with fm are set to xj ,m (t) = t td , where td is the delay associated with the signal assignment. If no delay is specied then td is set to . Hence, vxj (xj ,m (t)) will return the value of signal vxj at time t td . Finally, fm [vx0 (x0 ,m ), vxk (xk ,m )] is added to XBases where s is the target of the assignment. Synchronous processes Form the asynchronous part a appropriate set of generator functions are generated as described for combinational processes. Each signal assignment included within the synchronous part of a process P is converted into an function fm (x0 , x1 , xk ) where xj denote the corresponding signals included in the assignment value expression.
94
CHAPTER 6. WAVEFORM COMPRESSION Further, all xj ,m (t) associated with fm are set to xj ,m = t : (t + td = t) (t T (wclk )) (vclk (t ) = P ), (6.2)
where td is the delay associated with the signal assignment. If no delay is specied then td is set to . clk is the clock signal of the process. P is a constant and depends on whether the process is activated by a rising edge (P = 1) or a falling edge (P = 0) of signal clk. vxj (xj ,m (t)) will return the value of signal xj at the time instance t for which the following conditions hold: t delayed by td is equal to t, i.e. t + td = t the clock signal clk made an active transition (i.e. the process were activated) at t . See also the right hand side of Figure ??. Finally, fm [vx0 (x0 ,m ), vxk (xk ,m )] is added to XBases where s is the target of the assignment. Figure ?? lists VHDL source which might be used to generate the waveform shown in Figure ??. Depending on the value of control signal ctrl either the sum or the dierence from signals a and b are calculated and assigned to c. Hence, XBasec may be set to XBasec = (f1 [va (a,1 (t)), vb (b,1 (t))], f2 [va (a,2 (t)), vb (b,2 (t))]) with f1 (x, y) = x + y f2 (x, y) = x y and a,1 (t) = b,1 (t) = t a,2 (t) = b,2 (t) = t . Finally, this will lead to a function set XBasec XBasec = (va (t ) + vb (t ), va (t ) vb (t )). As the set only consists of two functions a single bit is sucient to represent a new value for signal c.
6.5. VALUE COMPRESSION signal a,b,c,d : signed(0 to 7); signal control : std_logic; ... p_sync: process (clk) begin if rising_edge(clk) then if control = 1 then a <= a + b after 10 ns; end if; end if; end process; p_async: process (a, c) begin d <= a - c after 15 ns; end process; Figure 6.13: VHDL sample code to illustrate timed strength reduction
95
Figure ?? shows an example model to illustrate our method for both a synchronous and an asynchronous process. The model consists of a synchronous process p_sync which is activated on each rising edge of clk. The process either adds b to the value of a if control is 1 or keeps the old value in a otherwise. Further, a second process p_async is shown in the gure which calculates a new value for signal d by adding a and c. Note that both signal assignment included in the model are associated with a delay. First, we derive XBasea for signal a. As there is a single signal assignment which targets a the corss signal generator base XBasea is XBasea = (f1 [va (a,1 (t)), vb (b,1 (t))]) with f1 (x, y) = x + y and a,1 (t) = b,1 (t) = t : (t + 10ns = t) (t T (wclk )) (vclk (t ) = 1). Note that only signal values which might aect the results of f1 are included in the parameter list.
96
CHAPTER 6. WAVEFORM COMPRESSION Similar, XBased for signal d is: XBased = (f1 [va (a,1 (t)), vc (b,1 (t))])
with f1 (x, y) = x y and a,1 (t) = b,1 (t) = t 15ns. Note that the process associated with signal d is a combinational process.
6.6
Conclusion
This chapter described all developed waveform compression algorithm. Two main algorithms are involved: Signal Set compression techniques, which look for signals that change values for a given transition block and compressed the signal IDs set for each transition block. The second algorithm is the value compression, this technique is used to develop algorithm that able to compress signal values for each transition block. To improve the results of both algorithms, the CDFG will be invoked and used for both algorithms to make them more convenient. This will be the goal of the following chapter.
Chapter 7
Using CDFG to improve waveform compression

7.1 Introduction
Architectural synthesis, also called high-level synthesis [?], is a process adding structural information to functional description at the same abstraction level. This results in data path and a controller description. The data path consists of binding blocks such as functional units, memories and interconnection structure among them. The controller describes how the ow of data inside the data path is managed and is described in terms of state transitions. A common application for the controller description is translated into an implementation at the abstraction level of gates by using logic synthesis. Thereby, after selection, allocation and binding of functional units, memories and interconnect, the next step is to model the high-level synthesis scheduling [?, ?, ?]. Furthermore, control data ow graph can be easy used to predicate the state changes of signals inside a given model. Hence, the control data ow graph can be may be used to improve the waveform compression algorithms. The control and elimination of unrequited signals is a dominat eect in ensuring the correct functionality of waveform les. Thereby, as the size of waveform les can be very large due to the number of stored signals, the control data ow graph will be used to reduce such a number. Thus, identifying and selecting which signals will be stored is the rst task of the deployed approach. The principle of signal selections is the aim of the actual techniques. Indeed, the signals which must be stored will be selected throughout their appearances in waveform les and the complexities they 97
98CHAPTER 7. USING CDFG TO IMPROVE WAVEFORM COMPRESSION enjoy to be restored during the decompression process. In this chapter, we detail with the use of CDFG to improve waveform compression dened in sections ?? and ??. First, we describe how the control data ow graph must be used to select the list of signals that should be stored, then we deal with the envisaged algorithm which is based on genetic algorithms, and nally we illustrate the developed software concept.
7.2
7.2.1
Signal selection
CDFG based compression
As we dened in the chapter ??, waveform compression plays a big role in improving design verication and speeding up design simulation. In chapter ?? two basic tasks of waveform compression are presented: signal ID compression (section ??) and value compression (section ??). These two techniques are suitable for waveform compression, but when the number of signals increases, not only the the size of waveform les will not be reduced as well as it must be but also the runtime of these algorthim become huge. To improve the eectiveness of these techniques, a new idea of using control data ow graph is concluded. The control data ow graphs are very suitable for predicting the computation inside a given model and permit many kinds of high level optimizations.We will use the control data ow graph to reduce the size of corresponding waveform les. Thus, we have only to apply our waveform compression techniques dened in chapter ?? on a so called reduced-size waveform le. A common compression program can then be additionally applied on the obtained les to achieve more compression. Moreover, the hard task in current approach is to identify which signals that must be stored. Hence, an appraoch based on the genetic algorithms is achieved to decide which list of signals must be stored in the corresponding waveform le. The runtime of signal identication depends rst on the number of signals engendered in the considered model.
7.2.2
Signal dependency calculations
In main eect, the CDFGs describe the dependencies between signals and variables inside digital circuits. I.e, they gather knowledges a priori of relationships between signals and variables. Due to this principle, the identication and selection of signal that must be stored will be achieved. For example, consider the VHDL model described in gure ??. Notice that
7.2. SIGNAL SELECTION process(a,b) begin c <= a xor b; d <= c and a; end process; Figure 7.1: VHDL model 18
99
signal c is a function of signals a and b. This dependency will be used to reduce the stored information belonging to the waveform of signal c. I.e., the signal c will not be stored because the signals a and b will be. On other hand, signal d is a function of signals a and c. Depending on the complexity of restoring signal d, its value will be either stored or not. To restore the non-stored signals, we need to consider the relationships between these signals and the stored signals from the involved control data ow graph. Hence, the signal identications and selections are determined based on the dependencies between signals and the complexities of signals to be restored. Indeed, in Figure ?? we can predict the output signal value (signals c and d) throughout the knowledge of the input signal values (signals a and b). The CDFG of the model described in Figure ?? is presented in Figure ??. We notice that the CDFG illustrates the dependencies between all considered signals in this process. Figure ?? represents the CDFG of the VHDL model shown in gure ??. Thus, this graphical representation illustrates the relationships between all signals in this VHDL model. Thereby, the incoming data edges a and b of the XOR node come from the Start Process node, which represents the source of outgoing data edges, however the outgoing data edge c is an incoming edge of the End Process node, which represents the source of incoming data edges. Between Start Process node and End Prcoess node there is just one clock signal cycle delay. After a clock signal cycle, the value of signal c will be transfered to Start Process node. Further, the incoming data edges: a and c of the AND node are coming from Start Process node. This means that there is a clock signal cycle delay between these signals and the outgoing data edge d of this node. The dependency of these nodes on the clock signal is characterized with control edges true and false coming from the Select node that is activated by the clock event node. Hence, we do not need to store d and c due to the dependencies between those signals and the signals a and b. The proposal algorithm has to search a method to store the information describing the dependencies between those
100CHAPTER 7. USING CDFG TO IMPROVE WAVEFORM COMPRESSION
Figure 7.2: CDFG of VHDL model 18 process (clk) begin if clkevent and clk = 1 then c <= a xor b; d <= c and a; end if; end process; Figure 7.3: VHDL model 19
signals. This can be solved by storing these operations: c = a XOR b and d = c AN D a, which link each set of signals to the other. The rst operation describes the relationship between the signals a, b and c, and the second one describes the relationship between the signals a, c and d. These dependencies between the corresponding signals are included in the control data ow graph. The next step is to deal with processes that depends on clock signal. Thereby, in VHDL two kinds of synthesized processes are distinct: processes that depend on signals and processes that are triggered by a falling or rising edge of a single signal (CLK). In the previous paragraph, the rst kind has been treated. Next, the second kind will be investigated. Figure ?? shows a sample example where the behavior of process P depends on a clock signal clk. Note that in the VHDL model of gure ??, the considered process depends of whether the clock signal clk changes value or not to assign signal values to c and d. Figure ?? shows signal waveform in the case of ideal electrical circuits without any component delays. As shown, there is a clock cycle delay between the two assignments and the a and b. I.e., the signal assignment will be done just after the change value of the clock signal. Notice the at the beginning of simulation, signals c and d have undened values x. When signals a and b change values, the change value of signals c and d will
Figure 7.4: CDFG of VHDL Model 19
7.2. SIGNAL SELECTION
101
Figure 7.5: VHDL signal assignment procedure
happen after a clock cycle. The new relations between signals a, b, c and d will be then described as follows: c(n) = f (a(n1) , b(n1) ) where f (x, y) is a logic function, which is dened as follows: f (x, y) = x XOR y; and n is the nth clock signal (clk) cycle. d(n) = g(c(n1) , a(n1) ) where g(x, y) a logic function, which is dened as follows: g(x, y) = x AN D y; and n is the nth clock signal (clk) cycle. At rst step, we dene a primary input a signal or variable that does not depend of any other signals and variables, and is used in a part of the corresponding model. All primary signals will be stored. Depending of process types, we distinguish two kinds of relationships between signals and variables inside processes. The rst type is a direct relationship, which binds signals and variables based on mathematic and logic functions. For, this dependency kind, we need to store the relationships in their mathematical and/or logical forms. The second type depends on a clock signal. This type is more complex and needs to determine a relationships between signals and variables based on a clock signal. Therefore, for signals, which depend on the primary inputs(signals and/or variables), only the corresponding primary inputs will be stored in the VCD le. The values of signals will be calculated for each simulation time based on the relationship binding these signals with the primary inputs and depending on the clock signal value.
7.2.3
Inter-process dependencies
In this section, we will describe how the CDFG can determine the dependencies between processes inside the same VHDL model. In the example of Figure ??, we notice that the three processes P1 , P2 , and P3 are dependent on each another. Based on the CDFG of each process and on the knowledge that all processes inside the same VHDL model have the same Start Process and End Process nodes, we can identify some relationships between signals inside dierent processes to improve the optimization of the waveform compression.
102CHAPTER 7. USING CDFG TO IMPROVE WAVEFORM COMPRESSION P1: process(a) begin b <= a xor 1; end process; P2: process(b) begin c <= b nor 1; end process; P3: process(c) begin d <= c xor 1; end process; Figure 7.6: VHDL model 20
Figure ?? describes the CDFG of the model represented in Figure ??. Thus, the CDFGs allow us to get information about the dependencies between signals inside theses processes. To achieve the needed optimization, which is the storage of signal a and all relationships between considered signals are determined as: b = f (a, 1), c = f (b, 1), d = f (c, 1), where f (x, y) = x xor y. In Figure ??, we illustrated an example when the processes depend on signals. Let us look, what will happen when there are processes, which depend on clock signal inside a considered model. The simple example of processes that depend on clock signal is the example when all processes have the same clock signal and are activated on the same clock signal cycle. Thereby, the relationship is described as followed:
7.2. SIGNAL SELECTION
103
S (n+1) = f (Si (n) , Sj (n) ) Where f () represents a mathematical multi-variable function. It is more complex to determine the dependencies between signals inside processes where each process has its own clock signal. Here, we must dene a relationship between all clock signals, and try to write the other clock signals function of the rst one for example. If we suppose for example that the clock signal clkk is faster k times than the clock signal clk0 , and the clock signal clkp is p times slower than signal clock clk0 , we can write the following relationships between the distinguished clock signals: clkk = k clk0 clkp =
1 p
clk0
The relationships between signals inside theses processes will be dened as follows: Sl0 (n) = f (Si0 (n1) , Sj k ((n1)k) , Shp ( (n1)/p ) ) Where x is a function, which is dened as follows: x : IR > IN : x | > n such as : n 1 < x <= n. IR is the real sets and IN is the natural number sets. Slk (n) = f (Si0 (k(n1)) , Sj k (n1) , Shp (kp(n1)) ) Slp n = f (Si0 (E((n1)/p)) , Sj k (E((n1)/pk)) , Shp (n1) ) However the identication of the relationships between processes that have dierent clock signals is very hard, the relationship between signals that dier by a changing of phase is easier. Indeed, when the processes have dierent phases, the relationship between these processes is described as follows: Sn = f (Sj (+clk) ) where f (.) is the function dening the dependency between signals Sn and Sj and is the phase between these signals. Figure ?? illustrates a VHDL model, where considered processes inside this model are dependent of dierent clock signals. The CDFG of the VHDL model described in Figure ?? is shown in Figure ??. In the VHDL model of Figure ??, the three clock signals clk0 , clk1 and clk2 are distinct. So, to get a relationship between signals a, b, c and d, we must dene relations between the clock signals. Suppose for example that:
P1: process begin if clk0rising_edge then b <= a xor 1; end if; end process; P2: process begin if clk1rising_edge then c <= b nor 1; end if; end process; P3: process begin if clk2rising_edge then d <= c xor 1; end if; end process; Figure 7.8: VHDL model 21
7.3. STORED SIGNAL IDENTIFICATION clk1 = 3 clk0 and clk2 = 0.5 clk0
105
To study the relations between signals that are dened previously, we can consider these equations: b(n+1) = a(n) XOR 1 Where n is the nt h cycle of clock signal clk0 . c(k+1) = b(k) N OR 1 Where k is the k t h cycle of clock signal clk1 . d(m+1) = c(m) XOR 1 Where m is the mt h cycle of clock signal clk2 . Indeed, we can use these equations to minimize data storage. Thereby, we can just store the signal value a and these relations between dierent clock signals clk0 , clk1 and clk2 , and signals a, b, c and d. These relations will be stored just only once a time duration the VHDL model simulation. Therefore, depending on the type of processes, the relationships between signals and variables inside processes is identied based on mathematical and/or logical functions and on the dierent clock signals.
7.3
Stored signal identication
In this section, we describe how the CDFG will be used to improve the waveform compression techniques described in chapter ??.
7.3.1
Resolving signal dependencies
The identication of signal list that must be stored is the major task to achieve for a suitable waveform compression based on the CDFG. Thus, we require to identify minimal required information that must be stored in order to be able to restore all other signals. As described in section ??, we have two kinds of dependencies: signal clock synchronous dependencies and asynchronous dependencies. For these two kinds, an algorithm that idenes the relationships between signals and select the list of signals, which are required to be stored, will be investigated.
106CHAPTER 7. USING CDFG TO IMPROVE WAVEFORM COMPRESSION In the previous sections, we investigated signal dependencies. There, CDFG is proposed to determine the signal list that must be stored. In Section ??, we illustrated that a signal will be stored only when this signal is a primary signal. I.e., it is independent of all other signals dened inside a the model. Furthermore, the relationships between signals are shown in the control data ow graph. This is done just once a time during a simulation process of waveform compression. However, it is very hard to identify such a list for many reasons such as the huge number of signals involved in waveform le, the complex dependencies between signals and processes, and signal dependencies on one or more clock signals. While the number of primary inputs varies from model to model, this number is relatively small compared to the total number of signals included in a model. As a consequence, only storing of the primary inputs will make the process of the decompression very hard. To restore the original waveform le for further use such as the o-line verication, a huge number of operations must be done due to the large number of signals that have to be rebuild from the dependencies and the primary signals. Hence, storing the primary inputs only will make the waveform decompression very complex. Our task is nding an algorithm that makes the compression process as well as the decompression process relatively easy. It is hard to nd an algorithm, which optimizes the decompression process of the compressed waveform le. We do not know exactly if the complexity of the mathematical functions, which link the signals to each other. For complex VHDL models, we have mostly very complex mathematical functions that link a set of signals to a specic signal. This will make the process of decompression very complex. In order to nd a good algorithm for solving the compromise between compression and decompression processes, gentic algorithms will be used. Thus, to reduce the compromise between the compression and decompression process, we try to nd an optimized solution for the waveform compression techniques based on the signal dependencies. The proposed idea is described as follows: for signal that depends of a set of signals: store only the needed signals. Let consider the signal Sk that depends on a set of signals. This signal will be written as: Sk = fk (S0 , ..., Si , Sj , ..., Sn) where S0 , ..., Si are independent signals or primary input signals, and Sj , .., Sn are signals that depend on each another and other signals. Thus, we will just store the signals that do not depend on any other signals. Therefore, the decompression process complexity
7.4. GENETIC ALGORITHMS TO IDENTIFY STORED SIGNAL
107
will depend on the complexity of the functions fi (). Furthermore, these functions can be arbitrary complex and so the decompression task may become complex too. To resolve this problem, we can proceed as follows: rst, store all independent signals. secondly, when the signals are dependent: calculate the complexity of the dependency function fi () : i = 1...n and i <> k. the corresponding signal will be then stored depending on the minimization of the dened criterion that minimise the sum: l imitsi ASi + BCi To achieve these tasks, genetic algorithms are used (section ??). This, allows to identify the set of signals that must be stored. Finally, store all needed signal.
7.4
Genetic algorithms to identify stored signal
To nd a compromise between complexity and storage space, we developed an approach based on genetic algorithms. In this paragraph, we will detail this approach. The main principle of Genetic Algorithms (GA) is to consider rst a random initial solution then to evaluate the results and make an autooptimization to get the desired results. The principle of Genetic Algorithms is described as follows: start with a population of randomly numbers called individuals, x0 [a, b], i = 1, N . N is the number of individuals and it must be i kept constant during the optimization process. We dene a Population as a set of individuals. So, X0 = {x0 , , x0 } is an initial Population. 0 N for each Population we dene a Fitness function that will be minimized during the optimization task. The denition of Fitness function depends on the user. For our work, we chosen the following Fitness
n1
function: f (SS, C) =
j=0
A(1 j )Cj + Bj SSj , where SSj is the
storage space needed for the signal Sj , Cj is the complexity of such a signal, n is the number of involved signals and j is a coecient. j is equal to 1 if corresponding signal Sj is stored and 0 otherwise. A and
108CHAPTER 7. USING CDFG TO IMPROVE WAVEFORM COMPRESSION B are two user dened constants that identify the user choice either to optimize for complexity or for storage. It is possible to dene a relative Fitness, which called probability of reproduction in order to make the optimization better. This probability is dened as follows: pi = nfi .
fi
i=1
Where i is the index of signal Si and fi are the Fitness functions. We dene the cumulative distribution of the probability density dene
i
above for a given Gene xi as : P (Xi ) = Pi =

j=1
pj . Therefore, the
dened criteria can be applied on the cumulative distribution. I.e., the new criteria can be written as follows: min P (Xi ) . By this way,
i=1..n
we can choose between minimizing the total Fitness function or the cumulative distribution. we pick pairs of Genes at random and submit their genetic individuals to crossover: to choose the new generation can be chosen from the best, medium and bad ones. The probabilities of the choices are arbitrary and depend on the user. For example, 40% from the best and 30% from the mediums and 20% from the bads can be chosen for a given process of minimization. The number of such pairing called crossover rate should be around 0.9N . The resulting set, I.e. the children, xi , i = 1, N , coming from the previous cross, is called ospring population. we draw N individuals in accordance with their reproduction probability. This is called the transformation method. We obtain then a new population xi , i = 1, N . nally mutation comes into play. The resulting population is regarded as the next generation and we are back at step 2. To nish, we dened two criteria, the rst one is to indicate a number major of simulation iteration and the second one is that the solution must be convergent.
7.4.1
Stored signal identication
As it has been described above, the genetic algorithm is used to identify the signals that must be stored and nds a compromise between schedule complexity and the storage space.
7.4. GENETIC ALGORITHMS TO IDENTIFY STORED SIGNAL
109
Our algorithm minimizes a dened criterion in order to select the signal that must be stored. This algorithm is dened as follows: 1. minimize F (A, B, C, SS, ) where : A: is a user dened weight coecient for the complexity calculation. B: is a user dene weight coecient for the storage space minimization. C: is dened as follows: C =
i
ci , where ci is the complexity
of signal Si . We have dened a specic function to calculate the complexity of each signal. To calculate this complexity, rst we identify the relationships between the considered signal and the other signals. Then, we sum the complexities of all involved signals. Depending of the signal type, the complexity of such a signal is dened as follows: ci = Where: wi :is the weight of the outgoing node (dened in the relevant CDFG) of the corresponding signal. For each node type, a predened weight is considered. All theses weights are stored in a corresponding weight table. i : is either 0 or 1. It denes whether signal Si is stored in the compressed le (= 1). j: presents signal index that are involved within the complexity function of signal si . SS: is the sum of needed storage space: SS =
i
wi +
j
0 (1 j )cj
: Si primary input : otherwise
i ssi . Where ssi
is a real value characterizing the needed storage space for signal Si . Depending on the type of considered signal, the storage space is calculated based on signal type. Indeed, a storage space table is used to store the corresponding values for each kind of signal. For example, for integer type, the storage space value is 8, however, for a real type the storage space value is 16 and so on. : is an integer vector that stores the values of storage coecients of all signals. = (0 , , n1 ).
Figure 7.10: Structure of the de-compressor tool The criterion is now dened as follows: min (
i i
A(1 i )ci + Bi ssi )
For this criterion, we have only one parameter to search such is . This parameter will be identied using the genetic algorithm. 2. the next step of this algorithm is then to create a new VCD le in which only the needed signals are stored. The compression techniques dened in the chapter ?? are then applied to this le. 3. to facilitate the decompression task a new le served to stored the vector integer is deployed. This le is only used, when the decompression process is run. As dened above, each element i is associated within a signal. If the value of i is 1, this means that the corresponding signal has been stored. Thereby, after identication the list of signals that must be stored, the value of the vector is then stored in a corresponding le.
7.4.2
Decompression structure
Decompression is required in order to rebuild the original VCD waveform le from the compressed le. While the content of the original le is preserved during this process the order of the transitions within a single time step may be altered during this process. However, as this order is arbitrarily chosen during dumping of the original le this does not aect the information stored in the waveform le. The nal goal should be to integrate the decompression routines into a browser in order to decompress only those waveform data which are required for displaying or signal analysis. Similar to the compression program the decompression tool is based on the dependency le generated during elaboration. The decompression tools consists of four modules (see also Figure ??): A pattern de-compressor which rebuilds the original function number stream for each signal form the recoded pattern stream. A transition predictor module which is necessary to determine the predicted transitions. The results of the prediction may be overwritten
7.5. SOFTWARE DESIGN
111
by the correction module. This module extracts the correction information from the compressed waveform le and uses it to alter the transition decisions made by the predictor. The output of this module is a stream of real signal transitions. As the functionality of this module is rather simple we do not list it in the remaining sections. A signal value de-compressor which reads in the results of pattern decompression and rebuilds the original signal values based on the function number stream the input signal values and the signal transitions.
7.5
Software Design
In this section we will give an overview of the developed software design. Figure ?? represents the general aspect of this software. FreeHDL Simulator: FreeHDL Simulator analyzes the VHDL source code and generates a simulation executable from it. It also inserts code to output design information, which is used to extract signal dependencies required by the compression algorithms. As a result, the Simulator generates VCD and DDB les. VCD le is the Value Change Dump le which contains the envisaged output waveforms resulting from the inputs applied to the circuit during the simulation step. DDB le is Design Database le, which contains general information about the design such as: signals that are read within the expression, constants which are used within the expression, delay associated with the signal assignment, signals which might trigger the signal assignment, signal mappings, etc. It is generated during elaboration step and will be served to create the corresponding Control Data Flow Graph. CDFG generator: the CDFG generator allows to create the control data ow graph of considered VHDL model and store it in a specic le. Waveform Compressor: reads in the CDFG le as well as the original VCD waveform le and performs the compression task. Moreover, from CDFG le a set of predictor functions are generated which predict based on past transitions and the signal dependencies new transitions for the signals. The compressed VCD le will be stored for further use. In the next sections we will describe this mechanism in detail.
Figure 7.11: Software Design
Waveform Decompressor: allows the decompression of compressed VCD le to view it. Waveform viewer: allows the viewing of the decompressed VCD le. Figure ?? represents the software module designs in detail. Waveform Compressor: the waveform Compressor is composed of two basic modules: Waveform Compression module: this module deals with the compression techniques. It allows to compress the dierent data based on distinct algorithms. Three compression modules are distinguished: Header le compression module: as it is described above, the compression of the VCD header le is done independently of the control data ow graph. Simulation Time Compression module: this module is served to compress the signal IDs. The inputs of the this module are the original VCD le and signal dependencies derived from the Dependency Scheduler module. Signal value compression module: the signal value compression module is served to compression the corresponding signal values. The innputs of this module are the orignal VCD le and the signal dependencies that are delivered by the Dependency Scheduler module. Dependency Scheduler module: it studies the dependency of signals and processes using the developed CDFG. The dependency scheduler module has to control whether a given signal ID and value will be stored or not. Thereby, it controls the Signal Value Compression and the Simulation Time Compression modules. If it nds a needed dependency between signals, which must be identied, it returns it in mathematical form to the Waveform Compression module to be stored. Its input comes from the CDFG Scanner and Parser.
7.6. CONCLUSION
113
Figure 7.12: Sofware Module Designs
CDFG Scanner and Parser module: this modules translate the CDFG, which is written in a given le by the CDFG Simulator to a data base inside the memory for further need. The inputs of this module come from the DDB and CDFG les. Viewer: the viewer has to get out the verication results of considered VHDL model. Basically it consists of two modules : Waveform Decompressor: it is composed of four models, which are needed for the decompression of the compressed VCD le. Header File Decompression module: this module is used to decompress the VCD header le. Simulation Time Decompression module: this module allows the decompression the simulation time based on the information returned from the compressed VCD le and the Dependency Simulator. Parameter Value Decompression module: this module decompress all parameter values based on the information stored in the compressed VCD le and those returned from the Dependency simulator. Dependency Simulator module: this module allows to simulate the stored dependencies between parameters and returns either the signal value or the simulation Time. waveform viewer: it selects the decompressed VCD le to get out the Verication results of VHDL model.
7.6
Conclusion
In this chapter, we presented an idea of using CDFG to improve the developed waveform compression technique. Two strategies for using CDFG in improving waveform compression are involved. While, the rst one investigates signal and process dependencies, the second one studies the complexities of the dependency functions. We can conclude that the CDFG advantages are to allow a rapid interpretation of the information inside a
114CHAPTER 7. USING CDFG TO IMPROVE WAVEFORM COMPRESSION VHDL model. After identifying the signal list, which must be stored, we create new le which has be used by the dened waveform compression algorithms. This le is very smaller than the real VCD le. By this way, we can speedup the waveform compression algorithms. Moreover, the advantage of using the CDFG to improve waveform compression is its extreme composability and simplicity, which get out all needed dependency information. Based on this knowledge, we could dene an algorithm, which decides whether a signal must be stored and whether a signal must be restored from the stored signal list. To optimize the compromise between needed storage space on the disc and the complexity to restore a signal from a stored signal list a genetic algorithm is used. In the next chapter a compared study of the experimental results will be considered.
Chapter 8
Experimental results
8.1 Introduction
In this chapter, we show and compose simulation results for each of the developed algorithms. The developed system should of course achieve highest compression ratios while using as less CPU and memory as possible. Moreover, an important goal is to implement a system which can dynamically adjust and balance compression ratio and CPU/memory usage. In order to test implementation and algorithm eciency all modules are benchmarked using typical VHDL designs. The results from these benchmarks were used to ne tune the algorithms. A major issue for transition prediction is CPU usage. Hence, the data structures and implementation algorithms must be chosen carefully to support ecient prediction. Note that transisiton prediction is used by both, the compressen and the de-compression tool. As the modules used within both tools dier only slightly we do not distinguish between transition prediction for compression and de-compression.
8.2
Waveform compression without using CDFG
This section deals with the simulation results when the CDFG is not used. In this part of simulation only signal set compression and value compression algorithms are used. Hence, ten compression programs that take a VCD le and output a compressed le are implemented to visualize simulation results in their obtained forms. The programs are named: plain, sh, lsh, lsh hist, lsh hist pred, xor, xor sh, xor lsh, xor lsh hist and xor lsh hist pred. All versions applied 115
116
CHAPTER 8. EXPERIMENTAL RESULTS rtl gate sxp2 sota
Figure 8.1: Compression results for various models and vcompress congurations
Figure 8.2: Value hit ratio
signal id compression (Section ??). Program plain did not use any other additional technique presented in this paper while the features of other programs are encoded in their names: sh = 32 bit shuing (Section ??), lsh = 64 bit shuing (Section ??), hist = signal based history (Section ??), pred = value prediction (Section ??) and xor = signal set based history (Section ??). Note that xor lsh hist and xor lsh hist pred are hybrid history based prediction approaches (Section ??): xor lsh hist uses signal prediction based on Section ?? while xor lsh hist pred uses the approach from Section ??. In order to test the compression programs, we applied them on a set of VCD les. Files rtl and sxp2 were generated from RTL models while gate and sota were obtained from gate level designs. Figure ?? shows the compression ratio with respect to the original VCD le size. In order to enhance size reduction, we also applied Lempel-Ziv (dictionary based) compression programs on the output les. In detail we applied the UNIX compress command as well as bzip2 with option -9 (highest compression). The overall compression results are shown in the corresponding sections of each diagram. Obviously, using xor lsh hist pred gives best compression results for RTL models and lsh hist pred is best suited to gate level models. This is especially interesting as lsh hist pred requires less runtime overhead compared to xor lsh hist pred. In order to analyze the eectiveness of the value prediction approaches, we measured the number of (binary) value bits that were set to zero in the compressed value stream. Figure ?? shows the number of zero bits divided by the total number of (binary) bits occupied by the signal values (note that each VCD bit occupies 2 binary bits) for dierent models and dierent
8.2. WAVEFORM COMPRESSION WITHOUT USING CDFG
117
Table 8.1: Compression ratio (related to original le size) and speedup (compared to runtime of bzip2) bzip2 model rtl sxp2 leon gate sota ratio 5.0 27.1 22.3 2.3 4.0 speedup 1 1 1 1 1 compress ratio 3.5 6.3 6.8 1.8 2.2 speedup 5.4 5.9 5.3 5.0 4.9 xor lsh hist pred ratio 29.9 11.7 9.5 31.9 7.2 speedup 12.6 9.0 10.7 9.9 5.7 xor lsh hist pred + bzip2 ratio speedup 317.9 9.5 215.4 5.8 26.6 4.6 112.6 2.9 71.6 4.5 xor lsh hist pred + compress ratio speedup 115.6 11.9 68.1 8.3 18.2 8.7 60.2 9.5 19.9 4.9
Figure 8.3: Runtime with respect to the plain approach
compression algorithms. Note that the lsh approach does not perform any real prediction. Instead, it just assumes that all signal values are 0. This approach is somehow successfully as roughly 75% of the bits are predicted correctly to be 0. Actually, this is as our expected because most VCD bits are of 0 and 1 (i.e., very rarely x or z). As a result, most VCD bits are binary encoded either 00 or 01 which gives about 75% probability for a binary bit to be 0 (i.e., only 1 of the four bits is 1). Another interesting result is history based prediction (lsh hist; see Section ??) performs worse than the simple approach predicting only 0 bits (lsh). The dierence is especially signicant for gate level designs. Finally, hybrid history based prediction (xor lsh hist pred; see also Section ??) is the most successful approach as it achieves above 98% hit ratio. Surprisingly, for gate level models, signal set history based prediction (hist pred; see Section ??) gives a better compression ratio than hybrid history based prediction even if its hit ratio is lower. I.e., the hit ratio does not directly correspond to the nal compression ratio. Figure ?? shows the runtime of the various compression algorithms. The numbers show the runtime dierence related to the runtime of the plain approach: runtime of program - runtime of program plain . runtime of program plain
118
CHAPTER 8. EXPERIMENTAL RESULTS
Figure 8.4: A comparative investigation
E.g., program sh requires about 2% less runtime than program plain for model rtl. Note that in some cases the runtime with respect to program plain decreased. This is mainly caused by the fact that less data were stored to disk which osets the additional overhead introduced by the algorithms. Further, an interesting result is that the signal set based history compression has a signicant lower impact on runtime than signal based history. The reason is that the signal based history technique has a bad impact on data cache performance due to the random like accesses to the history table. As a result, all compression programs that make use of history table (i.e., lsh hist, lsh hist pred, xor lsh hist, xor lsh hist pred) suer from a signicant runtime overhead. Finally, as some chip designers use common compression programs to reduce the size of waveform les, we also compared the performance of our algorithm xor lsh hist pred with UNIX programs compress and bzip2. Table ?? shows the compression ratio as well as the speedup compared to the runtime of bzip2. Note that the last two columns show the results obtained from applying compress and bzip2 on the les compressed by xor lsh hist pred. The table clearly shows the gain that can be obtained by tailoring the compression algorithms to this special application domain.
8.3
Waveform compression using CDFG
This section deals with the simulation results when the CDFG is used to improve waveform compression process. This approach is applied on the previous signal set compression and value compression algorithms. Its main task is using CDFG to identify the stored signal list and reduce the size of involved VCD les. The principle of waveform compression based on the CDFG is detailed in two steps. In the rst step, the waveform compression algorithm based on the CDFG reduces the size of the involved VCD le by identifying the signal list that must be stored and creating the new corresponding VCD le. This makes the size of the involved VCD le shorter. Figure ?? illustrates an example of ve models: z, xy, p and leon were generated from RTL models while wave were obtained from gate level designs, and the dierent methods
8.4. CONCLUSION
119
Figure 8.5: Ratio Investigations
Figure 8.6: Compression ratio using CDFG
to reduce the size of corresponding les. We notice that the CDFG can reduce the size of les as well as other compression tools. In Figure ??, we present a comparative study as simulation results between the compression of the VCD le le using the traditional Unix compression programs such as bzip and gzip with the option -9, the dened waveform compression programs and the based CDFG waveform compression programs. We implemented the same compression programs used in section ?? on the new VCD le, which has a reduce size then the orinigal one. These programs take a reduced size VCD les and output a compressed les. Figure ?? shows the compression ratio with respect to the original VCD le size. In order to enhance original le size reductions, we also applied LempelZiv (dictionary based) compression programs on the output les. In detail we applied the UNIX compress command as well as bzip2 with option -9 (highest compression). The overall compression results are shown in the corresponding sections of each diagram. As it was shown in gure ??, we notice that CDFG gives a powerfull result of compression and reduces the initial size of corresponding le. Furthermore, to compare the dierent ratios, Figure ?? is used. Here, we notice the eectiveness of using CDFG to reduce the initial size of VCD les. This allows rst to speedup the compression and basically to get a better compression.
8.4
Conclusion
In this chapter, we illustrated the dierent kinds of the simulation and showed and the obtained results. Notice that the CDFG was used to improve the compression process by reducing the initial size of VCD les. This allows to speedup the compression process and to get small compressed VCD les. However, the speedup of the simulation depends on the algorithms employed in the search of the signal list that must be stored. This feature presents a future work.
120
CHAPTER 8. EXPERIMENTAL RESULTS
Chapter 9
Conclusion
This work has introduced the tools, which was used in the waveform compression. First, the target of this work was been introduced, then waveform les and the Control Data Flow Graph have been dened and nally the dierent kind of waveform compression algorithms have been detailed. The objective of our work was to improve design verication by implementing algorithms which reduce the size of waveform les that are generated during simulation. . Indeed, waveforms play a big role in improving design eciency during the debug phase of the design cycle of complex digital circuits. Waveforms are generated in the system verication phase and used to permit a structural system verication that is served for design correctness and improvement. Moreover, in nowadays, many tools have been developed to use waveform les to enhance system verication and increase rst-time design success rates. Thereby, advanced waveform analysis and debugging capabilities reveal potential improvement compatibilities early in the verication process. Value Change Dump (VCD) le is a popular format to store waveforms. A VCD le is divided into two parts: a VCD header le, which stores general information about the simulation and the design, and a transition part that encompasses transition blocks. For each simulation time, a transition block is involved. A transition block is composed of two items: signal values and corresponding time value. In this part, the signal values are stored in a simplied formats: all signal values are transformed to binary formats, redundant bit values that result from left are reduced, signal identiers follow signal values for each transition block. In our work, waveforms are generated using discrete event simulation. 121
122
CHAPTER 9. CONCLUSION
Discrete event simulation is a model of dynamic system which is subject to series of instantaneous happenings, or events. Its principle is to simulate discretely each event into corresponding logical process. A dened simulation kernel generates the ordering of events and ensures the transfer of events to the appropriate logical processes. The advantage of this simulation technique is that it yields a discreteness of each system behavior to individual events which are performed separately. Furthermore, the underlying implementation of the discrete event simulation is the queuing techniques for which each event is represented by dened structure called event structure composed of next event in queue item, previous event queue item, event processing routine item, and net specic data item. The event processing routine points to the code to be executed in order to process the system. The disadvantage of discrete event simulation is that it involves the use of the same central internal clock for all processes and events. In nowadays, cycle-based simulation is often used due to its advantages of improving performance in digital design simulation. It uses, indeed, algorithms that eliminate unnecessary calculations to achieve huge performance gains in verifying functionality: results are only calculated at clock edges, and inter-phase timing is ignored. Waveforms are generated using discreteevent simulation by associating signals to state-changes of system components, and using cycle-based simulation by evaluating the outputs at each simulation cycle. The dierence between the two methods is that: in the rst method we have more signal waveforms that must be stored than in the second one. In order to get more information about the real design behavior, the event-driven simulation, which is based on discrete-event simulation mechanism, is further used to generate signal waveforms that will be performed. A detailed description of waveform generation has been illustrated in chapter two. On the other hand, to exploit the powerfulness of control data ow graphs in design verication, we used them to identify signal dependencies inside the models and to select the signal list that must be stored. This reveals a useful prediction of digital design behaviors and allows a priori analysis of the relevant signal waveforms. The combination of consistent merging of data and control ow even for loops, wait statements, and the maximal parallel representation are the main features of the developed control data ow graph model. These features are, indeed, used to improve the waveform compression techniques. Whereas, an overview about the dierent existing data compression algorithms has been described, we exploited the control data ow graphs to develop a new own approach of waveform le compression.
123 In this work, we presented a control data ow graph standard for the synthesis and verication of digital designs from a behavioral level description. This control data ow graph is a specication of behavioral description of digital designs. In control data ow graph, nodes represent operations and edges depict the transfer of data values. When values are available on all incoming edges of a node, the node will execute by consuming these values and subsequently generates an output value on all outgoing edges. A control data ow graph explicitly shows the order of execution of operations and illustrates also which operations may be executed simultaneously. This makes control data ow graphs very suitable as a starting points to select the signal to be stored. Due to the long term research aspect of the related work and the dierent requirements of individual tools (software), exibility and extensibility were major design goals for the le format of the developed control data ow graph design. A textual format was strongly preferred over a binary format for several reasons, but human read-ability was hardly a choice issue of such a deployed format. The control data ow graph format is relatively simple interface, both in syntax and in semantics. The employed graph models distinguish between data path and control ow, and allows cycles to model loops in the algorithmetic behavior providing maximal freedom for dierent implementation styles. The uniform and combined treatment of data and control has resulted in both a concise semantical denition and an extreme exibility for architectural synthesis. On the other hand, control data ow graphs provide a maximal parallel description of design verication algorithms, for which many design alternatives can be generated. The generation of such graphs from a procedural sequential programming languages such as VHDL requires a full data ow analysis, involving a detailed lifetime and scope investigation through conditional statements, loops, and subprogram interfaces. In practice, will means that descriptions of relatively complex digital design (chips) can be converted to a control data ow graph representation in few seconds CPU time on a standard workstation. By this way, our work reach to model the most used statements in the VHDL language. The confusion in our work, we can say that there is no dened data ow between the read signal and write signal nodes in the reverse direction. This means that, when we look at all CDFGs that are presented in this work, we observe that all incoming data are coming from Start Process node and the outgoing data are going to End Process node. However, there is no data ow, which indicates that the all signals that have changed values are transmitted from End Process node to Start Process node. In our approach this is done virtual. We can assume that there is a
124
kind of automatic transfer of data between End Process and Start Process nodes. We assume too that our End Process node is a present a temporary storage for signals and it absorb the variables. This means that it never returns a value of variable to the Start Process node, which represents the source of all data presented in a considered model. The dependency and independence approach is treated in our approach.Thereby, for each CDFG, we can identify the dependencies between all considered nodes. This allows us to search to reduce the unused statements. A detailed description of the deployed CDFG has been presented in chapter three. Data compression is very important in nowadays. Indeed, the huge increasing of the number of transistors in a given design makes the verication process more and more dicult. Many data compression algorithms are distinct. Basically we classied these algorithms in two categories: lossy data compression algorithms and lossless data compression algorithms. Whereas for the rst category some of the information from the original data are lost, for the second category no data is lost. For each category, many algorithms are distinct. For lossless data compression, we distinguish two types: statistical coding that is based on statistical Shannons theory, which consists to associate for each data a probability and search the corresponding codeword; and a dictionary coding which select strings of symbols and encode each string as a token using dictionary. A detailed description of these algorithms has been introduced in chapter four. Furthermore, for waveform compression only lossless data compression algortihms are authorized because each data has a signicant sense in design verication process. For this reasion, we developped an own lossless wavefrom compression algorithms. The rst algorithm is based on signal IDs compression. It compresses a set of signal IDs using a developed algorithm based on so called transition block cache. This cache stores all dierent kinds of the transition signal blocks that are detected during compression. Mainly, three steps are involved within this algorithm: the rst step is to determine the set of signals that must be cached in the transition block cache, the second step is to store the cache entry identier to an output stream, and the third step is to write the corresponding signal set to the output stream when the cache is missed. The second algorithm is to compress the values of corresponding signals. It is similar to the previous algorithm and involves three steps within: the rst one consists to store each transition block in increasing id-value order of the transitions, the second step is to write all the signal sets that are stored in the transition block to a temporary stream. In order to improve the developed algorithms, a new approach consists to employ the CDFG in waveform compression has been evolved. In the rst
125 time, his approach develops a CDFG to model the envisaged design. Then it uses such a CDFG to select the signal list that must be stored in order to create a new reduced size VCD le. The signal selection is based on the genetic algorithms. The algorithms associates a storage coecient for each signal. If this signal must be stored then, this coecient is 1 either is 0. Parallel to the creation of the reduced size VCD le, a new le to store the storage coecients is generated. A detailed description of this algorithm is illustrated is chapter 6. The simulation results show that the employment of the CDFG to improve the waveform compression algorithms has a great eectiveness. Therefore, the ratio of compression is improved by about 30%. This is a suitable result compared to the use of other Unix compression algorithms such as bzip and gzip. A detailed description of simulation results has been introduced in chapter seven. As conclusion, we can say that our approach has a powerful eectiveness in improving waveform compression. Moreover an introduction of a neuronal network approach to speedup the identication of the signal list that must be stored could reveal our approach and give best results.
126
Chapter 10
Appendix A
10.1 VHDL model simulation
In this paragraph we will present the developed simulator and describe its components in details. As it is shown in gure ??, the simulation is done in dierent tasks. Five tasks are distinguished. These are: First task: a design software in VHDL language is written. Second task: VCD le is created during simulation task. Third task: the Control Data Flow Graph is created. Fourth task: the signal selections to identify which signals must be store is invoked. Fifth task: the compression process is achieved. First, a VHDL model is written, then it will be compiled in order to get corresponding C++ code source. During this task an executable le with the name of the VHDL model will be created. To run the simulation, this executable le will be called. Figure ?? illustrates simulation main menu. This menu oers mainly four tasks, which will be described in the following section. As it is illustrated in gure ??, the simulator is composed of basic components that will be detailed in the following sections.
10.1.1
Design Simulator
The design simulator is created in the purpose of generating a new compiler, which translates a VHDL code source to C++ code source. Therefore, it 127
128
CHAPTER 10. APPENDIX A
Available commands: h : prints list of available commands c : execute cycles = execute simulation cycles n : next = execute next simulation cycle q : quit = quit simulation r : run = execute simulation for d : dump = dump signals doff : dump off = stop dumping signals don : dump on = continue dumping signals s : show = show signal values dv : dump var = dump a signal from the signal lists ds : dump show = shows the list of dumped signals nds : number show = shows the number of dumped signals wdd : write binary design info wddl : write design info using a CDFG style syntax dc [-f ] [-t ] [-cfg ] [-q] :control waveform dumping Figure 10.1: Simulator main menu simulates the obtained C++ code source in order to speedup the simulation of hardware designs.
10.1.2
Viewing simulation
To achieve simulation task many software tools are developed. Figure ?? represents the simulation main menu. To view simulation results, run command r, execute cycle command c, next command n and printing screen command s, are required. This simulation task evaluates VHDL signal and variable values and display them on the screen for desired simulation time.
10.1.3
Creating DDB(Design Data Base) and CDFG(Control Data Flow Graph) les
The CDFG(Control Data Flow Graph) le will been generated during compilation phase of the VHDL model using -D argument. The CDFG le is a lisp format le that describes the design architecture. It contains functions and procedures which illustrate each statement inside VHDL models. The DDB(Design Data Base) le will be created during simulation task
10.2. CDFG GENERATOR-SIMULATOR
129
using the wddl command. A DDB le is a lisp format le that is assumed to a data base of considered design. More details about these les will be given in the next chapter.
10.1.4
Creating VCD(Value Change Dump) le
This simulation task creates the VCD le for further use. A detailed description about this simulation task and the VCD le will be illustrated in section ??.
10.2
CDFG Generator-Simulator
The CDFG Generator-Simulator is used to create Control Data Flow Graph of considered model based on the DDB and CDFG les. It allows the CDFG viewing in its graphic form and creates the .cdfg.dat le, which is a lisp format le, that contains the node and edge names. Section ?? will illustrate created CDFG.
10.2.1
Signal Selections Simulator
This simulator composed of CDFG scanner and parser. It stores the CDFG in hash table forms for further use. Moreover, it denes a kernel which will use stored data to identify signals that must be stored and those that could be restored. This kernel uses genetic algorithm to achieve his task as well as possible. More details about the principle of this kernel will be given in section ??.
10.2.2
Waveform Compression Simulator
This simulator creates the waveform compressed le based on the information returned by the signal selections simulator and this stored in the VCD le. Many compression techniques are used. These techniques will be described in details in the next chapter.
10.3
Waveform Decompression Simulator
This simulator creates the waveform decompressed le based on the data returned by the signal selections simulator and this stored in the waveform compressed le. Many decompression techniques are used. In the next chapter these will be described in details.
130
CHAPTER 10. APPENDIX A
10.3.1
Waveform viewer
To view the obtained waveforms, a commercial software is used. This allows to view all signals for each time simulation.
Chapter 11
Appendix B
11.0.1.1 Description of keyword commands
In this section, we try to give an overview of keywords that do not have been dened below. $dumpvars : the section beginning with $dumpvars keyword lists initial values of all variables dumped. Its syntax is dened as follows: $dumpvars { value_changes } $end
$scope: the $scope section denes the scope of the signals being dumped. Its syntax is dened as follows; $scope scope_type scope_identifier $end scope_type ::= module | task | function | begin | fork
The scope type indicates one the following scopes: module: top-level module and module instances ( entities and architectures in VHDL models). task : tasks. function : functions. begin : named sequential blocks. fork : named parallel blocks. 131
132
CHAPTER 11. APPENDIX B $upscope: the $upscope section indicates a change of scope to the next higher level in the design hierarchy. Its syntax is dened as follows: $upscope $end
$timescale: the $timescale keyword species what timescale was used for the simulation. Its syntax is dened as follows: $timescale number time_unit $end number ::= 1 | 10 | 100 time_unit ::= s | ms | us | ns | ps |fs
$var: $var section prints the names and identier codes of the variables being dumped. Its syntax is dened as follows:
$var var_type size identifier_code reference $end var_type ::= event | integer | parameter | real | reg | supply0 | supply1 | time | tri | triand | tri0 | tri1 | wand | wire | wor size ::= decimal_number reference ::= identifier | identifier[bit_select_index] | identifier [msb_index : lsb_index] index ::= decimal_number
Size species how many bits are in the signal. The identier codes species the name of signal using printable ASCII characters, as previously described. The msb index indicates the most signicant index and the lsb index indicates the least signicant index.
Chapter 12
References
133
134
CHAPTER 12. REFERENCES
Bibliography
[1] J. Marantz, Enhanced visibility and performance in functional verication by reconstruction. in Proceedings DAC-98, June 1998. [2] N. S. Inc., Debussy: Total debug system. http://www.novas.com.tw/products/index.html, 11th April 2002. [3] V. Inc., Database compression tool for verilog http://www.veritools-web.com/optimizing.htm, 2002. users.
[4] J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Transaction on Information Theory, vol. IT-23, May 1977. [5] N. A. G. Mandyam and N. Magotra, A dct-based scheme for lossless image compression, in IS&T/SPIE Electronic Imaging Conference, February 1995. [6] P. M. L. Dzung T. Hoang and J. S. Vitter, Dictionary selection using partial matching, Information Sciences, 1999. [7] G. D. Michell and R. K. Gupta, Hadware/software co-design, in Proceedings of the IEEE, no. 3 in 85, pp. 349365, March 1997. [8] P. M. Maurer and W. J. Schilp, Software bit-slicing: a technique for improving simulation performance, in Procedings DATE99, no. 1 in 1, pp. 12, March 1999. [9] S. Olcoz and P. Menchini, Hdl interoperability: A compiler technology perspective, in DATE98, vol. 1, pp. 5158, DATE, 1998. [10] H. D. Foster, Techniques for higher-performance boolean equivalence verication, technical Report, The Hewlett-Packard Journal, August 1998. 135
136
BIBLIOGRAPHY
[11] S. Gupta and K. Pingali, Fast compiled logic simulation using linear bdds, technical Report, Department of Computer Science, Cornell University, Ithaca, New York, 1998. [12] R. E. Bryant, Symbolic boolean manipulation with ordered binary decision diagrrams, technical Report, Fujitsu Laboratories, Kawasaki, Japan, July 1992. [13] D. Kelf, The native compiled code performance, technical Report, Cadence Design Systems, Inc. Marketing Services, 1997. [14] A. Sherer, Delivering a high performance simulation solution, technical Report, Cadence Design Systems, Inc. Marketing Services, 1998. [15] C. D. S. Inc., Verilog-XL ReferenceManual. 2.3, Cadence Design Systems, 1995. [16] C. S. Cooperation, Testdeveloper:exible, high performance test program development software for complex soc devices. http://www.uence.com/tdev/TDev datasheet.pdf, April 2003. [17] R. T. M.H.H. Weusthof and H. Kerkho, A complete digital design and test ow in an academic environment. Testable Design and Testing of Microsystem group, September 2002. [18] I. Technology, Graphical signal generator and simulator post processor software: Wavemake. http://www.interfacetech.com/wavemake-7.pdf, April 2003. [19] P. Ball, Introduction to discrete event simulation, in 2nd DYCOMANS workshop on Management and Control: Tools in Action, no. 1 in 1, pp. 367376, May 1996. [20] P. M. Maurer, Gateways: a technique for adding event-driven behavior to compiled unit-delay simulations, technical Report, Department of Computer Science and Engineering, University of South Florida, 1998. [21] K. Westgate and D. McInnis, Cycle-based simulation, technical Report, Quickturn Design Systems, Inc., 1998. [22] K. Westgate and D. McInnis, Cycle-based simulation, Reducing Logic Verication Time with Cycle Simulation, 1999.
BIBLIOGRAPHY
137
[23] J. M. P. Cardoso and M. P. Vestias, Architecture and compilers to support recongurable computing. http://www.acm.org/crossroads/xrds5-3/rcconcept.html, November 2000. [24] S. Bashford and R. Leupers, Phase-coupled mapping of data ow graphs to irregular data paths, in Design Automation for Embedded Systems, vol. 4, pp. 150, Kluwer Academic Publishers, Boston, 1999. [25] L. C. V. dos Santos, Modeling speculative execution and avaibility analysis with boolean expressions, Proceedings of the ProRISC/IEEEBenelux Workshop on Circuit, Systems and Signal Processing, November 1998. [26] E. S. Jong-Deok Choi, Vivek Sarkar, Incremental computation of static single assignment form, technical Report, Application Development Technology Institue, San Jose, California, November 1995. [27] L. A. A. M. P.Kission and A.Jerraya, Analysis of dierent protocol descriptions styles in vhdl to high-level synthesis, technical Report, EURO VHDL 2.1, 1.2, April 1995. [28] M. Kaufmann, PLP Figures. http://www.cs.rochester.edu/www/u/scott/paragmatics/figures/, 2000. [29] S. Unger, Transforming irreducible regions of control ow into reducible regions by optimized node splitting, Masters thesis, Computer Science Institue, Humboldt University, Berlin, 1998. [30] R. C. Young, Path-Based Compilation. PhD thesis, Computer Science Harvard University Cambridge, Massachusettes, Januar 1998. [31] E. J. Feigin, A case for automatic run-time code optimization, Masters thesis, Harvard College Cambridge, Massachusettes, April 5 1999. [32] Z. P. Petru Eles, Krzysutof Kuchcinski and M. Minea, Compiling vhdl into a high-level synthesis design representation, ida technical Report, Department of Computer and Information Science, Linkoeping University, April 1992. [33] L. P. A. P. A. Mesquita, High level synthesis of protocols described by formal description technique, IX IFIP International Conference on WLST97, 1997.
138
BIBLIOGRAPHY
[34] L. P. A. P. A. M. M. P. K. A. Jerraya, Analysis of dierent protocol descriptions tyles in vhdl to high-level synthesis, Euro-DAC, 1996. [35] J. T. J. van Eijindhoven and L. Stok, A data ow graph exchange standard, in Proc. of the European Conf. of Design Automation (EDAC), vol. 1, (Brussles, Belguim), p. 193, EDAC92, March 199. [36] A. R. Ganesh Lakshinarayana and N. K. Jha, Incorporating speculative execution into scheduling of control-ow intensive behavioral descriptions, DAC98, Juni 1998. [37] D. Salomon, Data Compression, the complete reference. 1, Springer, 1997. [38] W.-K. Ng and C. V. Ravishankar, Block-oriented compression techniques for large statistical databases. IEEE Transactions for Knowledge and Data Engineering, October 1999. [39] M. V. Mahoney, Fast text compression with neural networks, technical Report, American Association for Articial Intelligence, 2000. [40] G. S. Kinnear, The compression technology in multimedia, technical Report, University of Wolverhampton, Uk, March 1999. [41] O. I. Pentakalos and Y. Yesha, Online data compression in a mass storage le system, technical Report, Computer Science Department, University of Maryland Baltimore Country, March 1999. [42] D. T. Hoang, Fast and Ecient Algorithms for Text and Video Compression. PhD thesis, Department of Computer Science, Brown University, Providence, Rhode Island, May 1997. [43] J. E. Fowlert and R. Yagel, Lossless compression of volume data. AT&T, National Science Foundation,USA, 2000. [44] G. E. Blelloch, Introduction to data compression, technical Report, Computer Science Department, Carnegie Mellon University, October 2001. [45] C. E. Shannon, A mathematical theory of communication, in The Bell System Technical Journal, no. 27 in 27, pp. 379423,623656, July,October 1948. [46] J. Rissanen and G. G. Langdon, Universal modeling and coding. IEEE Transactions on Information Theory, IT-32, Januar 1981.
BIBLIOGRAPHY
139
[47] A. M. N. Sharman and J. Zobel, Static compression for dynamic texts, technical Report, Department of Computer Science, University of Melbourne, Australia, 2000. [48] J. Rissanen and B. Yu, Coding and compression: a happy union of theory and practice, in Year 2000 commemorative Vignette on Engineering and Physical Sciences (J. A. S. Assoc., ed.), 1999. [49] P. G. Howard and J. S. Vitter, Arithmetic coding for data compression, technical Report, Departement of Computer Science, Brown University, USA, November 1993. [50] J. G. Cleary and I. H. Witten, Data compression using adaptive coding and partial string matching. IEEE Transactions on Communication, COMM-32, April 1984. [51] B. Balkenhol and S. Kurtz, Universal data compression based on the burrows-wheeler transformation: Theory and practice, in IEEE Transaction on computer, vol. 49, pp. 10431053, October 2000. [52] M. Nelson, Data compression with the burrows-wheeler transform. Dr. Dobbs Journal, September 1996. [53] M. Burrows and D. Wheeler, A block-sorting lossless data compression, technical Report, Digital System Research Center, California, May 1994. [54] B. B. S. Kurtz and Y. . Shtarkov, Modication of burrows-wheeler data compression agorithm, in IEEE Data Compression Conference, vol. 10, October 1999. [55] J. Vitter, Design and analysis of dynamic human codes, Journal of ACM, 1987. [56] J. Rissanen and G. Langdon, Arithmetic coding, IBM Journal of Research and Development, 1979. [57] W. A. P. F. Ltd., Wireless Application Protocol, Wireless Session Protocol Specication. 5, Wireless Application Protocol Forum, 1999. [58] X. Wu and N. Memon, Calic a context based adaptive lossless image coding scheme, IEEE Transaction on Communications, 1996. [59] G. R. Brad Calder and D. M. Tullsen, Selective value prediction, in 26th International Symposium on Computer Architecture, May 1999.
140
BIBLIOGRAPHY
[60] M. H. Lipasti and J. P. Shen, Exceeding the dataow limit via value prediction, in International Symposium on Microarchitecture, 1996. [61] B. K. Bohuslav Rychlik, John Faistl and J. P. Shen, Ecancy and performance impact of value prediction, in in Proceedings PACT-98, October 1998. [62] F. Gabbay and A. Mendelson, Using value prediction to increase the power of speculative execution hardware, in ACM Transactions on Computer Systems, no. 3 in 16, pp. 234270, August 1998. [63] P.-Y. C. Marius Evers and Y. N. Patt, Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches, in in Proceedings ISCA, 1996. [64] A. M. Maciek Kormicki and B. S. Carlson, Parallel logic simulation on a network of workstations usinga parallel virtual machine, in ACM Transactions on Design Automation of Electronic Systems, no. 2 in 2, pp. 123134, April 1997. List of Figures Figure ?? : VCD File. Figure ?? : VHDL Model 1. Figure ?? : CDFG of VHDL Model 1. Figure ?? : VHDL Model 2. Figure ??: VHDL Signal Assignment procedure. Figure ?? : CDFG of VHDL Model 2. Figure ?? : VHDL Model 3. Figure ?? : CDFG of VHDL Model3. Figure ?? : VHDL Model 4. Figure ?? : CDFG of VHDL Model 4.

Discrete Event Simulation

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Discrete Event Simulation

Загружено:

Авторское право:

Доступные форматы

Waveform Compression

Said Mchaalia December 31, 2011

Waveform File Generation

CHAPTER 2. WAVEFORM FILE GENERATION

Digital Design Concept

2.1. DIGITAL DESIGN CONCEPT

Figure 2.1: HDL Design Example

Figure 2.2: Digital simulation algorithm

CHAPTER 2. WAVEFORM FILE GENERATION

2.2. DISCRETE EVENT SIMULATION