Академический Документы
Профессиональный Документы
Культура Документы
Ltd
IEEE VLSI ABSTRACT 2013-14
HARDWARE IMPLEMENTATION
HVL01
A Current Consumption Measurement Approach for FPGA-Based
Embedded Systems
Abstract:
An approach for field programmable gate array (FPGA) based embedded system
dynamic current consumption measurement is presented in the paper. The measurement
method is based on current conversion to time interval. The time interval is then
measured using timer implemented in FPGA. Measurement uncertainty budget analysis is
performed. It reveals key components and their parameters mostly affecting current
measurement uncertainty. System architecture for incorporating measurement setup to the
standard FPGA development flow is suggested.
HVL02
Design of a Real-Time FPGA-Based Data Acquisition Architecture for
the LabPET
Abstract:
The LabPET II detector block was designed to achieve submillimeter spatial
resolution in small animal PET imaging. Each detection block consists of two arrays of
4$, times,$ 8 avalanche photodiodes (APD) individually coupled to an 8$, times,$8
scintillator array, to form 64 independent detectors with parallel readout channels. This
new detection block entails an eightfold increase in pixel density compared to the
LabPET I. A 64-channel mixed-signal application-specific integrated circuit (ASIC) was
designed to extract relevant PET data in real time from the LabPET II detection blocks.
In order to interface the ASICs forming the PET camera with the storage units, a realtime FPGA-based digital data acquisition (DAQ) system was designed. The DAQ system
allows event harvesting, processing and transmission to a host computer for data storage
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
Page |1
HVL03
Project-Based Learning in Embedded Systems Education Using an
FPGA Platform
Abstract:
With embedded systems becoming ubiquitous, there is a growing need to teach
and train engineers to be well-versed in their design and development. The
multidisciplinary nature of such systems makes it challenging to give students exposure
to and experience in all their facets. This paper proposes a generic architecture,
containing multiple processors, that allows easy integration of custom and/or predefined
peripherals. The architecture allows students to explore both the hardware and software
issues associated with real-time and embedded systems. Furthermore, the architecture can
be extended to train students in advanced concepts in embedded multiprocessor systems.
This generic architecture has been used for two courses at the National University of
Singaporeone on real-time embedded systems and the other emphasizing the hardware
aspects of embedded systems. The project in the real-time embedded systems course has
students develop a five-a-side soccer system on multiple field-programmable gate array
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
Page |2
HVL04
A Reliable and Cost-Effective Sand Monitoring System on the Field
Programmable Gate Array
Abstract:
Sand monitoring gives the benefits of avoiding equipment erosion and production
failure in the oil industry. This paper presents the design and implementation of a reliable
and cost-effective sand monitoring system for measuring sand production in gas and oil
flows in real time. The designed monitoring system involves two acoustic emission (AE)
sensors and one Doppler sensor for gauging the sand impaction and the velocity of sand
particles in a nonintrusive manner, respectively. It is implemented on a field
programmable gate array (FPGA) as a prototype for real-time data acquisition and
processing, and evaluated using a testbed pump skid available in our laboratory. Relying
on a low-cost FPGA board to integrate all acquisition and processing functionality, our
monitoring system can measure the sand production amount on-the-fly reliably with high
accuracy, according to our experimental evaluation. It is likely to achieve better accuracy
than one without the Doppler sensor. The proposed monitoring system is affordable for
wide deployment, given its high accuracy, good resilience to surrounding noise, and low
cost (when compared with commercially available systems whose price tags can be some
tenfold higher).
HVL06
FPGA Based Real Time Systems For Position Tracking
Abstract:
Position tracking systems supported with RISS and gyroscopes are found to be
better solutions in the places where GPS is unemployable or in the places where GPS
cannot work. Generally systems that are based on GPS for position tracking, face a lot of
problems in the areas where line of sight is hard to achieve i.e. GPS denied environment,
like dense terrestrial areas, subways, tunnels and hidden places. This system provides
continuous and highly reliable position tracking by synchronising real time stimulus
obtained from the sensors and the actual GPS values. The core processor of the system is
built on an FPGA which is used in the system kernel. The key factor for using FPGA in
the system is its customisable core and its flexibility to interface with the sensors. The
core employees the Hybrid Kalman filter for estimating the displacement and position. In
this system we integrate the 3D-RISS with GPS to achieve a Reliable and uninterrupted
Position Tracking. In these systems the processor estimates the posit on of the object
based on the four inputs taken from the RISS and the Odometer, they are Velocity,
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
Page |4
HVL07
FPGA based Real Time 3D-RISS / GPS Integrated System for Position
Tracking
Abstract:
Navigation algorithms integrating measurements from multi-sensor systems
overcome the problems that arise from using GPS navigation systems in standalone
mode. Algorithms which integrate the data from 2D low-cost reduced inertial sensor
system (RISS), consisting of a gyroscope and an odometer or wheel encoders, along with
a GPS receiver via a Kalman filter has proved to be worthy in providing a consistent and
more reliable navigation solution compared to standalone GPS receivers. It has been also
shown to be beneficial, especially in GPS-denied environments such as urban canyons
and tunnels. The main objective of this paper is to narrow the idea-to-implementation gap
that follows the algorithm development by realizing a low-cost real-time embedded
navigation system capable of computing the data-fused positioning solution. The role of
the developed system is to synchronize the measurements from the three sensors, relative
to the pulse per second signal generated from the GPS, after which the navigation
algorithm is applied to the synchronized measurements to compute the navigation
solution in real-time. Employing a customizable soft-core processor on an FPGA in the
kernel of the navigation system, provided the flexibility for communicating with the
various sensors and the computation capability required by the Kalman filter integration
algorithm.
HVL09
Implementation of FPGA based PID Controller for DC Motor Speed
Control System
Abstract:
In this paper, the implementation of software module using VHDL for Xilinx
FPGA (XC3S400) based PID controller for DC motor speed control system is presented.
The tools used for building and testing the software modules are Xilinx ISE 9.2i and
ModelSim XE III 6.3c. Before verifying the design on FPGA the complete design is
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
Page |6
HVL010
Implementation of Hamming code using VLSI
Abstract:
This paper tries to explain the implementation of hamming code using VLSI. In
the present world the field of communication has got many applications, and in every
field the data is encoded at the transmitter and transferred on a communication channel
and received at the receiver after it is decoded. During the transmission of data it might
get corrupted because of some noise on the channel. So it is necessary for the receiver to
have some function which can detect the error in the received data. Hamming code is one
of such forward error correcting code which has got many applications. In this paper the
algorithm for hamming code is discussed and then implementation of it in verilog is done
to get the results. Hamming code is an improvement over parity check method. Here a
code is implemented in verilog in which 4-bit of information data is transmitted with 3redundancy bits. In order to find the value of these redundancy bits a code is written in
verilog which will be simulated in Xillinx 9.1 software. The r sult of simulation and test
bench waveforms are also shown.
HVL011
FPGA Based Critical Patient Health Monitoring Using Fuzzy Neural
Network
Abstract:
We have designed FPGA based system and trained a fuzzy neural network for
early diagnosis of a patient. The system employs a fuzzy interface cascaded with a feedforward neural network in order to obtain an optimum decision regarding the future
pathology physiological state of a patient,. The neurons that are considered in the
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
Page |7
HVL012
VIDEO acquisition through I2C using VHDL
Abstract:
With I2C implementation of video acquisition system, devices can communicate
with each other very speedily than with any other communication protocol. The main
purpose of this technique is not only to communicate the devices but keep in touch with
every operation which can be performing by with the help of this protocol. I2c just uses
two lines for data communication making it lightweight, economical, and omnipresent.
The design is use in acquisition system to make overall system more efficient and
accurate, with the use of I2C data transmit rate get also increased.OV7620 single-chip
CMOS VGA colour digital camera is used in the design using VHDL and FPGA.
Security System
Abstract:
Home and industrial security today needs to make use of the latest technological
components. In this paper I going to present the design and implementation of a remote
and sensing, control and home security system based on GSM (Global System for
Mobile) . This system offers a complete, low cost, powerful and user friendly way of 24
hours of real time monitoring and remote control of a home and industrial security. The
system works as a remote sensing for the electrical appliances at home to check whether
it is on or off, at the same time the user can control the electrical appliances at home by
sending SMS ( Short Messaging Service) message to the system, for example turning on t
he AC before returning home. In case of fire/security the chip will receive signals from
the different sensors in the monitoring place and acts according to the received signal by
sending an SMS message to users Mobile Phone, it also works as automatic and
immediate reporting to the user in case of emergency for home security, as ell as
immediate and automatic reporting to the fire brigade and police station according to
activated sensor to decrease the time required for tacking action.The design has been
described using VHDL (VHSIC Hardware Description Language) and implemented in
hardware using FPGA (Field Programmable Gate Array).
HVL014
FPGA Implementation of Picoblaze based Embedded System for
Monitoring Applications
Abstract:
PicoBlaze is an 8-bit soft core microprocessor developed by Xilinx that can be
synthesized in some FPGA families. This paper presents a set of peripherals that have
been developed to interface with PicoBlaze: VGA control, serial communication, PS/2
keyboard port and LCD control. To demonstrate its capabilities, the system has been
implemented in a FPGA board and some typical control and monitoring systems have
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
Page |9
HVL015
Design and Development of FPGA Based Data Acquisition System for
Process Automation
Abstract:
This paper presents a novel approach to the design of data acquisition system for
process applications. The core heart of the proposed system is Field Programmable Gate
Array (FPGA) which is configured and programmed to acquire a maximum of 16 MB
real time data. For the real time validation of the designed system, a process plant with
three parameters i.e. pressure, temperature and level is considered. Real time data from
the process is acquired using suitable temperature, pressure and level sensors. Signal
conditioners are designed for each sensor and are tested in real time. Designed FPGA
based data acquisition system along with corresponding signal conditioners is validated
in real-time by running the process and comparing the same with the corresponding
references. The data acquired in real time compares well with the references.
HVL016
Intelligent Car Parking Management System On FPGA
Abstract:
Car parking has become an immense issue, especially in big cities. There are two
main reasons: Firstly, the growth in population, secondly, the security. Moreover, the car
theft has become an evil art haunting drivers. In this paper, we provide an interface and a
software/ hardware module for Intelligent Car Park Management System (ICPMS). The
ICPMS will provide an extensive management for vehicles including parking facilities
and security. The ICPMS is validated using a test case scenario and extensive
experimentation proves the feasibility of the approach.
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 10
HVL018
Design Of FPGA Based PWM Solar Power Inverter For Livelihood
Generation In Rural
Abstract:
The paper presents development of a utility interface solar power converter
(Inverter) in Grid / DG power supply for a Solar lighting system used in rural home of
Indian villages. The power supply system comprises of solar (PV) array, PWM converter
incorporating PWM control strategy, energy storage battery devices. The model of the
system has been designed for its operation and a prototype solar power converter. The
system simulation of PWM Pulse generation has been done on a XILINX based FPGA
Spartan 3E board using VHDL code. The test on simulation of PWM generation program
after synthesis and compilation were recorded and verified on a prototype sample.
HVL020
Design of Control Module for Serial DAC Based on FPGA
Abstract:
In order to increase the flexibility of control for serial DAC, a new control method
for DAC based on FPGA is proposed in this paper. A state transition diagram can be
drawn according to the timing diagram of DAC, Which can be realized in FPGA using
Very High-speed Integrated Circuit Hardware Description Language. The simulate
results show that logic in FPGA is consistent with the requirements. The module based
on FPGA can be modified just by modifying software, not the hardware.
FPGA-based
Run-Time
Configuration
system,
JBits-based
Run-Time
HVL022
Implementation of Maximum Power Point Tracking Using Kalman
Filter for Solar Photovoltaic Array on FPGA
Abstract:
This paper proposes FPGA implementation of a novel approach to track
maximum power point of a solar photovoltaic array. The approach uses Kalman filter
algorithm to track maximum power point. Using this approach tracking becomes much
faster than using the generic Perturb & Observe algorithm in case of sudden weather
changes. In this paper output of the proposed algorithm on FPGA is provided.
Experimentation was performed under optimal conditions as well as under cloudy
conditions i.e. falling irradiance levels. Using the proposed technique the maximum
power point of a solar PV array is tracked with an efficiency of 97.11%. Moreover, the
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 13
HVL023
An FPGA Based Implementation of a Flexible Digital PID Controller
For a Motion Control System
Abstract:
Implementation of digital controllers in embedded environment suffers from the
inherent problems associated with analog-digital signals interfacing in hard real-time,
therefore, the control algorithms are invariantly subjected to approximations. This paper
presents a novel technique for implementation of an efficient FPGA based digital
Proportional-Integral-Derivative (PID) controller for the motion control of a permanent
magnet DC motor. The implementation technique circumnavigates the problem of
interfacing analog and digital systems in real-time. The controller is used in a speed
control loop. The hardware implementation has been done on a Xilinx Spartan 3 FPGA
chip. A novel technique has been adopted for the generation of the control input as a
PWM signal for controlling the motor driver circuit and decoding the optical encoder
data for using it for the speed feedback in the PID control loop. The VHDL algorithm for
the proposed implementation has also been presented in this paper. A comparison of the
experimental results with the Matlab based simulation shows the effectiveness of the
proposed method.
HVL024
Design of an Oximeter Based on LED-LED Configuration and FPGA
Technology
Abstract:
A fully digital photoplethysmographic (PPG) sensor and actuator has been
developed. The sensing circuit uses one Light Emitting Diode (LED) for emitting light
into human tissue and one LED for detecting the reflectance light from human tissue. A
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 14
HVL025
Smart Camera Based on FPGA Oriented to Embedded Image
Processing
Abstract:
This paper presents an image processing system based on smart camera platform,
whose two principle elements are a Wide-VGA CMOS Sensor and a Field Programmable
Gate Array (FPGA). The latter is used to control the various sensor parameter
configurations and, where desired, to receive and process the images captured by the
CMOS sensor. With the advent of today's highly integrated Field Programmable Gate
Array (FPGA) it is possible to have a software programmable processor and hardware
computing resources on the same chip. Apart from having sufficient logic blocks on
which the hardware is implemented these chips also have an embedded processor with
system software to implement the application software around it. In this paper, the
Spartan-3A DSP based Xilinx VSK platform is used for developing the proposed
extensible hardware-software video streaming and processing modules. In order to
develop the required hardware and software in an integrated fashion, Xilinx Embedded
Development Kit (EDK) design tool has been used. A number of Xilinx provided IPs are
customized to realize the hardware modules in the FPGA fabric. Copyright 2013 Praise
Worthy Prize S.r.l. -All rights reserved.
HVL028
FPGA-Based Educational Platform for Real-Time Image Processing
Experiments
Abstract:
In this paper, an implementation of an educational platform for real-time linear
and morphological image filtering using a FPGA NexysII, Xilinx, Spartan 3E, is
described. The system is connected to a USB port of a personal computer, which in that
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 17
HVL029
Real-time Traffic Sign detection and Recognition on FPGA
Abstract:
This paper presents the implementation of an embedded automotive system that
detects and recognizes traffic signs within a video stream. In addition, it discusses the
recent advances in driver assistance technologies and highlights the safety motivations
for smart in-car embedded systems. An algorithm is presented that processes RGB image
data, extracts relevant pixels, filters the image, labels prospective traffic signs and
evaluates them against template traffic sign images. A reconfigurable hardware system is
described which uses the Virtex-5 Xilinx FPGA and hardware/software co-design tools in
order to create an embedded processor and the necessary hardware IP peripherals. The
implementation is shown to have robust performance results, both in terms of timing and
accuracy.
HVL030
Design and implementation of a secure RFID system on FPGA
Abstract:
Radio Frequency Identification systems have been widely used in many
applications nowadays. Since then, data security has been an important issue in RFID
communication in order to prevent undesired people to decrypt communication data.
Considering the problem, a secure RFID system is designed in this study. An RFID
communication at 868MHz is demonstrated by programming transceiver modules via
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 18
STIMULATION ONLY
HVL031
A 2.63 Mbit/s VLSI Implementation of SISO Arithmetic Decoders for
High Performance Joint Source Channel Codes
Abstract:
This paper highlights the implementation challenges faced by the current high
performing error resilient joint source channel coding (JSCC) techniques based on the
concept of soft-input soft-output (SISO) decoding of arithmetic codes (AC). Further, it
proposes several efficient algorithmic and a very large scale integration (VLSI)
architectural techniques to improve the throughput performance of SISO for JSCC. The
VLSI hardware implementation of the proposed algorithm, when implemented on a 90
nm standard cells technology running at 588 MHz, achieves a decoding throughput of up
to 2.63 Mbits/s capable of decoding QCIF format for video conferencing.
HVL032
3-D Mesh-Based Optical Network-on-Chip for Multiprocessor Systemon-Chip
Abstract:
Optical networks-on-chip (ONoCs) are emerging communication architectures
that can potentially offer ultrahigh communication bandwidth and low latency to
multiprocessor systems-on-chip (MPSoCs). In addition to ONoC architectures, 3-D
integrated technologies offer an opportunity to continue performance improvements with
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 19
HVL033
A Built-In Repair Analyzer With Optimal Repair Rate for WordOriented Memories
Abstract:
This paper presents a built-in self repair analyzer with the optimal repair rate for
memory arrays with redundancy. The proposed method requires only a single test, even
in the worst case. By performing the must-repair analysis on the fly during the test, it
selectively stores fault addresses, and the final analysis to find a solution is performed on
the stored fault addresses. To enumerate all possible solutions, existing techniques use
depth first search using a stack and a finite-state machine. Instead, we propose a new
algorithm and its combinational circuit implementation. Since our formulation for the
circuit allows us to use the parallel prefix algorithm, it can be configured in various ways
to meet area and test time requirements. The total area of our infrastructure is dominated
by the number of content addressable memory entries to store the fault addresses, and it
only grows quadratically with respect to the number of repair elements. The
infrastructure is also extended to support various types of word-oriented memories.
HVL035
A High-Speed Low-Complexity Modified FFT Processor for High Rate
WPAN Applications
Abstract:
This paper presents a high-speed low-complexity modified radix-25 512-point
fast Fourier transform (FFT) processor using an eight data-path pipelined approach for
high rate wireless personal area network applications. A novel modified radix-25 FFT
algorithm that reduces the hardware complexity is proposed. This method can reduce the
number of complex multiplications and the size of the twiddle factor memory. It also uses
a complex constant multiplier instead of a complex Booth multiplier. The proposed FFT
processor achieves a signal-to-quantization noise ratio of 35 dB at 12 bit internal word
length. The proposed processor has been designed and implemented using 90-nm CMOS
technology with a supply voltage of 1.2 V. The results demonstrate that the total gate
count of the proposed FFT processor is 290 K. Furthermore, the highest throughput rate
is up to 2.5 GS/s at 310 MHz while requiring much less hardware complexity.
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 21
HVL037
A Meet-in-the-Middle Algorithm for Fast Synthesis of Depth-Optimal
Quantum Circuits
Abstract:
We present an algorithm for computing depth-optimal decompositions of logical
operations, leveraging a meet-in-the-middle technique to provide a significant speedup
over simple brute force algorithms. As an illustration of our method, we implemented this
algorithm and found factorizations of commonly used quantum logical operations into
elementary gates in the Clifford+T set. In particular, we report a decomposition of the
Toffoli gate over the set of Clifford and T gates. Our decomposition achieves a total Tdepth of 3, thereby providing a 40% reduction over the previously best known
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 22
HVL038
AC-Plus
Scan
Methodology
for
Small
Delay
Testing
and
Characterization
Abstract:
Small delay defects escaping traditional delay testing could cause a device to
malfunction in the field and thus detecting these defects is often necessary. To address
this issue, we propose three test modes in a new methodology called AC-plus scan, in
which versatile test clocks can be generated on the chip by embedding an all-digital
phase-locked loop (ADPLL) into the circuit under test (CUT). AC-plus scan can be
executed on an in-house wireless test platform called HOY system. The first test mode of
our AC-plus scan provides a more efficient way to measure the longest path delay
associated with each test pattern. Experimental result shows that our method could
greatly reduce the test time by 81.8%. The second test mode is designed for volume
production test. It could effectively detect small delay defects and provide fast
characterization on those defective chips for further processing. This mode could be used
to help predict which chips are more likely to fall victim to operational failure in the
field. The third test mode is to extract the waveform of each flip-flop's output in a real
chip. This is made possible by taking advantage of the almost unlimited test memory our
HOY test platform provides, so that we could easily store a great volume of data and
reconstruct the waveform for post-silicon debugging. We have successfully fabricated a
Viterbi decoder chip with such an AC-plus scan methodology inside to demonstrate its
capability.
HVL040
All-Digital
Fast-Locking
Pulse
width-Control
Circuit
With
HVL041
An Analytical Latency Model for Networks-on-Chip
Abstract:
We propose an analytical model based on queueing theory for delay analysis in a
wormhole-switched network-on-chip (NoC). The proposed model takes as input an
application communication graph, a topology graph, a mapping vector, and a routing
matrix, and estimates average packet latency and router blocking time. It works for
arbitrary network topology with deterministic routing under arbitrary traffic patterns.
This model can estimate per-flow average latency accurately and quickly, thus enabling
fast design space exploration of various design parameters in NoC designs. Experimental
results show that the proposed analytical model can predict the average packet latency
more than four orders of magnitude faster than an accurate simulation, while the
computation error is less than 10% in non-saturated networks for different system-onchip platforms.
L2
Cache
Architecture
Using
Way
Tag
HVL043
An On-Chip Network Fabric Supporting Coarse-Grained Processor
Array
Abstract:
Coarse grained arrays (CGAs) with run-time reconfigurability play an important
role in accelerating reconfigurable computing applications. It is challenging to design onchip communication networks (OCNs) for such CGAs with dynamic run-time
reconfigurability whilst satisfying the tight budgets of power and area for an embedded
system. This paper presents a silicon-proven design of a 64-PE circuit-switched OCN
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 26
HVL044
An
Ultrasynchronization
Checking
Method
With
Trace-Driven
Space
Exploration
of
Heterogeneous
Run-Time
HVL046
Architecture
for
Real-Time
Nonparametric
Probability
Density
Function Estimation
Abstract:
Adaptive systems are increasing in importance across a range of application
domains. They rely on the ability to respond to environmental conditions, and hence realtime monitoring of statistics is a key enabler for such systems. Probability density
function (PDF) estimation has been applied in numerous domains; computational
limitations, however, have meant that proxies are often used. Parametric estimators
attempt to approximate PDFs based on fitting data to an expected underlying distribution,
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 28
HVL047
Built-In Generation of Functional Broadside Tests Using a Fixed
Hardware Structure
Abstract:
Functional broadside tests are two-pattern scan-based tests that avoid overtesting
by ensuring that a circuit traverses only reachable states during the functional clock
cycles of a test. In addition, the power dissipation during the fast functional clock cycles
of functional broadside tests does not exceed that possible during functional operation.
On-chip test generation has the added advantage that it reduces test data volume and
facilitates at-speed test application. This paper shows that on-chip generation of
functional broadside tests can be done using a simple and fixed hardware structure, with a
small number of parameters that need to be tailored to a given circuit, and can achieve
high transition fault coverage for testable circuits. With the proposed on-chip test
generation method, the circuit is used for generating reachable states during test
application. This alleviates the need to compute reachable states offline.
HVL049
Check pointing for Virtual Platforms and SystemC-TLM
Abstract:
Integrating simulation models created using different simulation systems is a
common problem when constructing virtual platforms. Different companies and different
departments can create models, and virtual platforms for different purposes using
different tools. There are also existing models that need to be integrated into new tools, or
the other way around. The simulators can be quite different in details, even in the case of
transaction-level models. We present work in integrating SystemC transaction-level
models into two typical full-system simulation environments, QEMU and Simics. We
present issues in reconciling the semantics of the different platforms, and our proposed
solutions. In the Simics integration, we additionally enable checkpointing in the models,
based on the Simics checkpoint mechanism.
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 30
HVL051
Combined Architecture/Algorithm Approach to Fast FPGA Routing
Abstract:
We propose a new field-programmable gate array (FPGA) routing approach,
which, when combined with a low-cost architecture change, results in a 40% reduction in
router runtime, at the cost of a 6% area overhead and with no increase in critical path
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 31
HVL052
CORDIC Designs for Fixed Angle of Rotation
Abstract:
Rotation of vectors through fixed and known angles has wide applications in
robotics, digital signal processing, graphics, games, and animation. But, we do not find
any optimized coordinate rotation digital computer (CORDIC) design for vector-rotation
through specific angles. Therefore, in this paper, we present optimization schemes and
CORDIC circuits for fixed and known rotations with different levels of accuracy. For
reducing the area- and time-complexities, we have proposed a hardwired pre-shifting
scheme in barrel-shifters of the proposed circuits. Two dedicated CORDIC cells are
proposed for the fixed-angle rotations. In one of those cells, micro-rotations and scaling
are interleaved, and in the other they are implemented in two separate stages. Pipelined
schemes are suggested further for cascading dedicated single-rotation units and birotation CORDIC units for high-throughput and reduced latency implementations. We
have obtained the optimized set of micro-rotations for fixed and known angles. The
optimized scale-factors are also derived and dedicated shift-add circuits are designed to
implement the scaling. The fixed-point mean-squared-error of the proposed CORDIC
circuit is analyzed statistically, and strategies for reducing the error are given. We have
synthesized the proposed CORDIC cells by Synopsys Design Compiler using TSMC 90G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 32
HVL053
Design and Implementation of an On-Chip Permutation Network for
Multiprocessor System-On-Chip
Abstract:
This paper presents the silicon-proven design of a novel on-chip network to
support guaranteed traffic permutation in multiprocessor system-on-chip applications.
The proposed network employs a pipelined circuit-switching approach combined with a
dynamic path-setup scheme under a multistage network topology. The dynamic pathsetup scheme enables runtime path arrangement for arbitrary traffic permutations. The
circuit-switching approach offers a guarantee of permuted data and its compact overhead
enables the benefit of stacking multiple networks. A 0.13- m CMOS test-chip validates
the feasibility and efficiency of the proposed design. Experimental results show that the
proposed on-chip network achieves 1.9 to 8.2 reduction of silicon overhead compared
to other design approaches.
HVL054
Design of Digit-Serial FIR Filters: Algorithms,Architectures, and a
CAD Tool
Abstract:
In the last two decades, many efficient algorithms and architectures have been
introduced for the design of low-complexity bit-parallel multiple constant multiplications
(MCM) operation which dominates the complexity of many digital signal processing
systems. On the other hand, little attention has been given to the digit-serial MCM design
that offers alternative low-complexity MCM operations albeit at the cost of an increased
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 33
HVL055
Efficient Implementation of Reconfigurable Warped Digital Filters
With Variable Low-Pass, High-Pass,Bandpass, and Bandstop Responses
Abstract:
In this brief, an efficient implementation of reconfigurable warped digital filter
with variable low-pass, high-pass, bandpass, and bandstop responses is presented. The
warped filters, obtained by replacing each unit delay of a digital filter with an all-pass
filter, are widely used for various audio processing applications. However, warped filters
require first-order all-pass transformation to obtain variable low-pass or high-pass
responses, and second-order all-pass transformation to obtain variable bandpass or
bandstop responses. To overcome this drawback, the proposed method combines the
warped filters with the coefficient decimation technique. The proposed architecture
provides variable low-pass or high-pass responses with fine control over cut-off
frequency and variable bandwidth bandpass or bandstop responses at an arbitrary center
frequency without updating the filter coefficients or filter structure. The design example
shows that the proposed variable digital filter is simple to design and offers substantial
savings in gate counts and power consumption over other approaches.
HVL057
Exploration and Optimization of 3-D Integrated DRAM Subsystems
Abstract:
Energy efficiency is the major optimization criterion for systems-on-chip (SoCs)
for mobile devices (smartphones and tablets). Through silicon via (TSV) technology
enables 3-D integration of dies and the heterogeneous stacking of multiple memory or
logic layers, allowing increased bandwidth and lower energy consumption of the memory
interface compared to traditional approaches. In this paper, we explore the 3-D-DRAM
architecture design space. The result is an optimized 2 Gb 3-D-DRAM, which shows a
83% lower energy/bit than a 2 Gb device. Furthermore, we propose a highly energyefficient DRAM subsystem for next-generation 3-D-integrated SoCs, consisting of a
SDR/DDR 3-D-DRAM controller and an attached 3-D-DRAM cube with fine-grained
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 35
HVL058
Glitch-Free NAND-Based Digitally Controlled Delay-Lines
Abstract:
The recently proposed NAND-based digitally controlled delay-lines (DCDL)
present a glitching problem which may limit their employ in many applications. This
paper presents a glitch-free NAND-based DCDL which overcame this limitation by
opening the employ of NAND-based DCDLs in a wide range of applications. The
proposed NAND-based DCDL maintains the same resolution and minimum delay of
previously proposed NAND-based DCDL. The theoretical demonstration of the glitchfree operation of proposed DCDL is also derived in the paper. Following this analysis,
three driving circuits for the delay control-bits are also proposed. Proposed DCDLs have
been designed in a 90-nm CMOS technology and compared, in this technology, to the
state-of-the-art. Simulation results show that novel circuits result in the lowest resolution,
with a little worsening of the minimum delay with respect to the previously proposed
DCDL with the lowest delay. Simulations also confirm the correctness of developed
glitching model and sizing strategy. As example application, proposed DCDL is used to
realize an All-digital spread-spectrum clock generator (SSCG). The employ of proposed
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 36
HVL059
Improved Trace Buffer Observation via Selective Data Capture Using
2-D Compaction for Post-Silicon Debug
Abstract:
This paper presents a novel technique for extending the capacity of trace buffers
when capturing debug data during post-silicon debug. It exploits the fact that is it not
necessary to capture error-free data in the trace buffer since that information can be
obtained from simulation. A selective data capture method is proposed in this paper that
only captures debug data during clock cycles in which errors are present. The proposed
debug method requires only three debug sessions. The first session estimates a rough
error rate, the second session identifies a set of suspect clock cycles where errors may be
present, and the third session captures the suspect clock cycles in the trace buffer. The
suspect clock cycles are determined through a 2-D compaction technique using multipleinput signature register signatures and cycling register signatures. Intersecting both
signatures generates a small number of suspect clock cycles for which the trace buffer
needs to capture. The effective observation window of the trace buffer can be expanded
significantly, by up to orders of magnitude. Experimental results indicate very significant
increases in the effective observation window for a trace buffer can be obtained.
HVL060
IsoNet: Hardware-Based Job Queue Management for Many-Core
Architectures
Abstract:
Imbalanced distribution of workloads across a chip multiprocessor (CMP)
constitutes wasteful use of resources. Most existing load distribution and balancing
techniques employ very limited hardware support and rely predominantly on software for
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 37
HVL061
Joint Decoding of LDPC Code and Phase Factors for OFDM Systems
With PTS PAPR Reduction
Abstract:
In this paper, we investigate a low-density parity-check (LDPC)-coded orthogonal
frequency-division multiplexing (OFDM) system with a peak-to-average power ratio
(PAPR) reduction using the partial transmit sequence (PTS), which does not transmit
PTS side information about the phase factors. We view the PTS processing as a stage of
coding and call the resulted code of LDPC coding and PTS processing a concatenated
LDPC-PTS code. Then, we derive the parity-check matrix of the concatenated LDPCPTS code. With the parity-check matrix, the LDPC code and phase factors can be jointly
decoded using belief propagation algorithms. Neither transmission of PTS side
information (phase factors) nor phase factor estimation before decoding is required by the
proposed scheme. Simulation results show that the proposed joint decoding provides
nearly perfect phase factor recovery and LDPC decoding for a small number of PTS
partitions.
HVL063
MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM
Systems
Abstract:
This paper presents an multipath delay commutator (MDC)-based architecture
and memory scheduling to implement fast Fourier transform (FFT) processors for
multiple input multiple output-orthogonal frequency division multiplexing (MIMOOFDM) systems with variable length. Based on the MDC architecture, we propose to use
radix-$N_{s}$ butterflies at each stage, where $N_{s}$ is the number of data streams, so
that there is only one butterfly needed in each stage. Consequently, a 100% utilization
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 39
HVL064
Mining Hardware Assertions With Guidance From Static Analysis
Abstract:
We present GoldMine, a methodology for generating assertions automatically in
hardware. Our method involves a combination of data mining and static analysis of the
register transfer level (RTL) design. The RTL design is first simulated to generate data
about the design's dynamic behavior. The generated data is then mined for candidate
assertions that are likely to be invariants. The data mining algorithm is a decision-treebased supervised learning algorithm. These candidate assertions are then passed through
a formal verification engine to filter out the spurious candidates. The assertions that are
attested as true by the formal engine are system invariants. These are then evaluated by a
process of designer ranking that is provided as feedback to the data mining engine. We
demonstrate the scalability of GoldMine by showing assertion generation of the RTL of
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 40
HVL065
NCTU-GR 2.0: Multithreaded Collision-Aware Global Routing with
Bounded-Length Maze Routing
Abstract:
Modern global routers employ various routing methods to improve routing speed
and quality. Maze routing is the most time-consuming process for existing global routing
algorithms. This paper presents two bounded-length maze routing (BLMR) algorithms
(optimal-BLMR and heuristic-BLMR) that perform much faster routing than traditional
maze routing algorithms. In addition, a rectilinear Steiner minimum tree aware routing
scheme is proposed to guide heuristic-BLMR and monotonic routing to build a routing
tree with shorter wirelength. This paper also proposes a parallel multithreaded collisionaware global router based on a previous sequential global router (SGR). Unlike the
partitioning-based strategy, the proposed parallel router uses a task-based concurrency
strategy. Finally, a 3-D wirelength optimization technique is proposed to further refine
the 3-D routing results. Experimental results reveal that the proposed SGR uses less
wirelength and runs faster than most of other state-of-the-art global routers with a
different set of parameters , , , . Compared to the proposed SGR, the proposed parallel
router yields almost the same routing quality with average 2.71 and 3.12-fold speedup on
overflow-free and hard-to-route cases, respectively, when running on a 4-core system.
HVL067
On the Fixed-Point Accuracy Analysis and Optimization of Polynomial
Specifications
Abstract:
Fixed-point accuracy analysis and optimization of polynomial data-flow graphs
with respect to a reference model is a challenging task in many digital signal processing
applications. Range and precision analysis are two important steps of this process to
assign suitable integer and fractional bit-widths to the fixed-point variables and constant
coefficients in a design such that no overflow occurs and a given error bound on
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 42
HVL068
Pipelined Radix- Feedforward FFT Architectures
Abstract:
The appearance of radix-22 was a milestone in the design of pipelined FFT
hardware architectures. Later, radix-22 was extended to radix-2k . However, radix-2k
was only proposed for single-path delay feedback (SDF) architectures, but not for
feedforward ones, also called multi-path delay commutator (MDC). This paper presents
the radix-2k feedforward (MDC) FFT architectures. In feedforward architectures radix-2k
can be used for any number of parallel samples which is a power of two. Furthermore,
both decimation in frequency (DIF) and decimation in time (DIT) decompositions can be
used. In addition to this, the designs can achieve very high throughputs, which makes
them suitable for the most demanding applications. Indeed, the proposed radix-2k
feedforward architectures require fewer hardware resources than parallel feedback ones,
also called multi-path delay feedback (MDF), when several samples in parallel must be
processed. As a result, the proposed radix-2k feedforward architectures not only offer an
attractive solution for current applications, but also open up a new research line on
feedforward structures
HVL070
Reconfigurable Adaptive Singular Value Decomposition Engine Design
for High-Throughput MIMO-OFDM Systems
Abstract:
Singular value decomposition (SVD) is an optimal method to obtain spatial
multiplexing gain in multi-input multi-output (MIMO) channels. However, the high cost
of implementation and high decomposing latency of the SVD restricts its usage in current
wireless communication applications. In this paper, we present a complete adaptive SVD
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 44
HVL071
Scalability Analysis of Memory Consistency Models in NoC-based
Distributed Shared Memory SoCs
Abstract:
We analyze the scalability of six memory consistency models in network-on-chip
(NoC)-based distributed shared memory multicore systems: 1) protected release
consistency (PRC); 2) release consistency (RC); 3) weak consistency (WC); 4) partial
store ordering (PSO); 5) total store ordering (TSO); and 6) sequential consistency (SC).
Their realizations are based on a transaction counter and an address-stack-based
approach. The scalability analysis is based on different workloads mapped on various
sizes of networks using different problem sizes. For the experiments, we use Nostrum
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 45
HVL072
Scaling Energy Per Operation via an Asynchronous Pipeline
Abstract:
Statistical analysis of computations per unit energy in processors over the last 30
years is given that illustrates a sharp reduction in the rate of energy efficiency
improvements over the last several years resulting in the formation of an asymptotic
wall with our dataset; we use the measure of giga multiply accumulates per Joule. We
have developed an energy model which takes into account the realities of scaling,
specifically for asynchronous systems. Studies of an energy efficient asynchronous
pipeline show fabricated results of 17 Giga Operations per Joule in 0.6 m at
subthreshold when fully pipelined, and simulations at a more modern 65 nm process
show a further order of magnitude improvement on that.
HVL074
Selective Flexibility:Creating Domain-Specific Reconfigurable Arrays
Abstract:
Historically, hardware acceleration technologies have either been applicationspecific, therefore lacking in flexibility, or fully programmable, thereby suffering from
notable inefficiencies on an application-by-application basis. To address the growing
need for domain-specific acceleration technologies, this paper describes a design
methodology (i) to automatically generate a domain-specific coarse-grained array from a
set of representative applications and (ii) to introduce limited forms of architectural
generality to increase the likelihood that additional applications can be successfully
mapped onto it. In particular, coarse-grained arrays generated using our approach are
intended to be integrated into customizable processors that use application-specific
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 47
HVL075
Self-Repairing Digital System With Unified Recovery Process Inspired
by Endocrine Cellular Communication
Abstract:
Self-repairing digital systems have recently emerged as the most promising
alternative for fault-tolerant systems. However, such systems are still impractical in many
cases, particularly due to the complex rerouting process that follows cell replacement.
They lose efficiency when the circuit size increases, due to the extra hardware in addition
to the functional circuit and the unutilization of normal operating hardware for fault
recovery. In this paper, we propose a system inspired by endocrine cellular
communication, which simplifies the rerouting process in two ways: 1) by lowering the
hardware overhead along with the increasing size of the circuit and 2) by reducing the
hardware unutilized for fault recovery while maintaining good fault-coverage. The
proposed system is composed of a structural layer and a gene-control layer. The structural
layer consists of novel modules and their interconnections. In each module of our system,
the encoded data, called the genome, contains information about the function and the
connection. Therefore, a faulty module can be replaced and the whole system's functions
and connections are maintained by simply assigning the same encoded data to a spare
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 48
HVL076
STBC-OFDM Downlink Baseband Receiver for Mobile WMAN
Abstract:
This paper proposes a space time block code-orthogonal frequency division
multiplexing downlink baseband receiver for mobile wireless metropolitan area network.
The proposed baseband receiver applied in the system with two transmit antennas and
one receive antenna aims to provide high performance in outdoor mobile environments. It
provides a simple and robust synchronizer and an accurate but hardware affordable
channel estimator to overcome the challenge of multipath fading channels. The coded bit
error rate performance for 16 quadrature amplitude modulation can achieve less than 10-6
under the vehicle speed of 120 km/hr. The proposed baseband receiver designed in 90-nm
CMOS technology can support up to 27.32 Mb/s uncoded data transmission under 10
MHz channel bandwidth. It requires a core area of 2.41 2.41 mm2 and dissipates 68.48
mW at 78.4 MHz with 1 V power supply.
HVL078
Test Patterns of Multiple SIC Vectors: Theory and Application in BIST
Schemes
Abstract:
This paper proposes a novel test pattern generator (TPG) for built-in self-test. Our
method generates multiple single-input change (MSIC) vectors in a pattern, i.e., each
vector applied to a scan chain is an SIC vector. A reconfigurable Johnson counter and a
scalable SIC counter are developed to generate a class of minimum transition sequences.
The proposed TPG is flexible to both the test-per-clock and the test-per-scan schemes. A
theory is also developed to represent and analyze the sequences and to extract a class of
MSIC sequences. Analysis results show that the produced MSIC sequences have the
favorable features of uniform distribution and low input transition density. The
performances of the designed TPGs and the circuits under test with 45 nm are evaluated.
Simulation results with ISCAS benchmarks demonstrate that MSIC can save test power
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 50
HVL079
The LUT-SR Family of Uniform Random Number Generators for
FPGA Architectures
Abstract:
Field-programmable gate array (FPGA) optimized random number generators
(RNGs) are more resource-efficient than software-optimized RNGs because they can take
advantage of bitwise operations and FPGA-specific features. However, it is difficult to
concisely describe FPGA-optimized RNGs, so they are not commonly used in real-world
designs. This paper describes a type of FPGA RNG called a LUT-SR RNG, which takes
advantage of bitwise xor operations and the ability to turn lookup tables (LUTs) into shift
registers of varying lengths. This provides a good resourcequality balance compared to
previous FPGA-optimized generators, between the previous high-resource high-period
LUT-FIFO RNGs and low-resource low-quality LUT-OPT RNGs, with quality
comparable to the best software generators. The LUT-SR generators can also be
expressed using a simple C++ algorithm contained within this paper, allowing 60 fullyspecified LUT-SR RNGs with different characteristics to be embedded in this paper,
backed up by an online set of very high speed integrated circuit hardware description
language (VHDL) generators and test benches.
HVL080
Theoretical Modeling of Elliptic Curve Scalar Multiplier on LUT-Based
FPGAs for Area and Speed
Abstract:
This paper uses a theoretical model to approximate the delay of different
characteristic two primitives used in an elliptic curve scalar multiplier architecture
(ECSMA) implemented on k input lookup table (LUT)-based field-programmable gate
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 51
HVL081
A Flexible and Customizable Architecture for the Relaxation Labeling
Algorithm
Abstract:
This brief presents a flexible and customizable architecture for the probabilistic
relaxation labeling (PRL) algorithm. The algorithm has been restructured by using a
hardware-friendly process that is executed on the proposed architecture. This enables the
design to handle different numbers of objects and labels flexibly. Moreover, in the
design, the proposed PRL unit can be easily duplicated for K times according to the
available resources on the field-programmable gate array (FPGA). In this brief, K can be
scalable up to 10 by using a Virtex-6 FPGA XC6VLX240T platform. Compared with
existing architectures that are not suitable for a large number of objects, the proposed
architecture reduces the time complexity from O(N M) to O(N) with the same O(N
M2) space complexity, where N and M are the numbers of objects and labels,
respectively. The experimental results show that the execution time of our design is about
15 times less for five objects and about 35 times less for a 128 64 image block than the
software implementation running on a Quad-core Intel 32-nm machine.
HVL083
An Adaptive Subsystem Based Algorithm for Channel Equalization in a
SIMO System
Abstract:
The principle of multiple input/output inversion theorem (MINT) has been
employed for multi-channel equalization. In this work, we propose to partition a singleinput multiple-output system into two subsystems. The equivalence between the
deconvoluted signals of the two subsystems is termed as auto-relation and we
subsequently exploit this relation as an additional constraint to the existing adaptive
MINT algorithm. In addition, we provide analysis of the auto-relation constraint and
show that this constraint confines the solution of equalization filters within a multidimensional space. We also explain through the use of convergence analysis why our
proposed algorithm can achieve a higher rate of convergence compared to the existing
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 53
HVL084
Binary Discrete Cosine and Hartley Transforms
Abstract:
In this paper, a systematic method for developing a binary version of a given
transform by using the Walsh-Hadamard transform (WHT) is proposed. The resulting
transform approximates the underlying transform very well, while maintaining all the
advantages and properties of WHT. The method is successfully applied for developing a
binary discrete cosine transform (BDCT) and a binary discrete Hartley transform
(BDHT). It is shown that the resulting BDCT corresponds to the well-known sequencyordered WHT, whereas the BDHT can be considered as a new Hartley-ordered WHT.
Specifically, the properties of the proposed Hartley-ordering are discussed and a shiftcopy scheme is proposed for a simple and direct generation of the Hartley-ordering
functions. For software and hardware implementation purposes, a unified structure for the
computation of the WHT, BDCT, and BDHT is proposed by establishing an elegant
relationship between the three transform matrices. In addition, a spiral-ordering is
proposed to graphically obtain the BDHT from the BDCT and vice versa. The application
of these binary transforms in image compression, encryption and spectral analysis clearly
shows the ability of the BDCT (BDHT) in approximating the DCT (DHT) very well.
HVL085
Computing Two-Pattern Test Cubes for Transition Path Delay Faults
Abstract:
Considering full-scan circuits, incompletely-specified tests, or test cubes, are used
for test data compression. When considering path delay faults, certain specified input
values in a test cube are needed only for determining the lengths of the paths associated
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 54
HVL086
Design of Hardware Function Evaluators Using Low-Overhead
Nonuniform Segmentation With Address Remapping
Abstract:
In the piecewise function evaluation with polynomial approximation, nonuniform
segmentation can effectively reduce the size of lookup tables for some arithmetic
functions compared to uniform segmentation approaches, at the cost of the extra segment
address (index) encoder that results in area and delay overhead. Also, it is observed that
the nonuniform segmentation reflects a design tradeoff between the ROM size and the
area cost of the subsequent arithmetic computation hardware. In this paper, we propose a
new nonuniform segmentation method that searches for the optimal segmentation scheme
with the goal of minimized ROM, total area, or delay. For some high-variation arithmetic
functions, the proposed segmentation method achieves significant area reduction
compared to the uniform segmentation method. We also demonstrate the design tradeoff
among uniform and nonuniform segmentation, and degree-one and degree-two
polynomial approximations, with respect to precision ranging from 12 to 32 bits for the
elementary function of reciprocal.
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 55
HVL088
FPGA-Based 40.9-Gbits/s Masked AES With Area Optimization for
Storage Area Network
Abstract:
In order to protect data-at-rest in storage area networks from the risk of
differential power analysis attacks without degrading performance, a high-throughput
masked advanced encryption standard (AES) engine is proposed. However, this engine
usually adopts the unrolling technique which requires extremely large field
programmable gate array (FPGA) resources. In this brief, we aim to optimize the area for
a masked AES with an unrolled structure. We achieve this by mapping its operations
from to as much as possible. We reduce the number of mapping [ to ] and inverse
mapping [ to ] operations of the masked SubBytes step from ten to one. In order to be
compatible, the masked MixColumns, masked AddRoundKey, and masked ShiftRows
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 56
HVL089
Low-Cost FIR Filter Designs Based on Faithfully Rounded Truncated
Multiple Constant Multiplication/Accumulation
Abstract:
Low-cost finite impulse response (FIR) designs are presented using the concept of
faithfully rounded truncated multipliers. We jointly consider the optimization of bit width
and hardware resources without sacrificing the frequency response and output signal
precision. Nonuniform coefficient quantization with proper filter order is proposed to
minimize total area cost. Multiple constant multiplication/accumulation in a direct FIR
structure is implemented using an improved version of truncated multipliers.
Comparisons with previous FIR design approaches show that the proposed designs
achieve the best area and power results.
HVL090
Low-Resolution DAC-Driven Linearity Testing of Higher Resolution
ADCs Using PolynomialFitting Measurements
Abstract:
A low-cost linearity test methodology for high-resolution analog-to-digital
converters (ADCs) is presented in this paper. Linearity testing of ADCs requires highprecision digital-to-analog conversion (DAC) capability, commonly 3-bit higher
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 57
HVL091
One Analog STBC-DCSK Transmission Scheme not Requiring Channel
State Information
Abstract:
Both the inherently wideband differential-chaos-shift-keying (DCSK) modulation
and the space-time block code (STBC) are techniques that can mitigate the effect of
multipath fading. By applying STBC at the chaotic segment level, a novel analog STBCDCSK scheme is proposed in this paper. The proposed scheme is a simple configuration
that combines the advantages of STBC and chaotic modulation. Due to the very low
correlation between different analog chaotic signals, the proposed scheme can
remarkably suppress the inter-transmit-antenna interference so as to recover the desired
information and to achieve the full diversity gain. The theoretical bit-error-rate (BER)
performance and the highly consistent simulation results demonstrate that the STBCDCSK scheme outperforms the conventional single-input-single-output (SISO)-DCSK
scheme by about 5 dB at a BER of $10^{-4}$ . The performance superiority of the
proposed scheme is further demonstrated in a typical UWB channel by simulations. More
importantly, the proposed scheme maintains the same low transceiver cost as the SISOG2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 58
HVL092
Reconfigurable Accelerator for the Word-Matching Stage of BLASTN
Abstract:
BLAST is one of the most popular sequence analysis tools used by molecular
biologists. It is designed to efficiently find similar regions between two sequences that
have biological significance. However, because the size of genomic databases is growing
rapidly, the computation time of BLAST, when performing a complete genomic database
search, is continuously increasing. Thus, there is a clear need to accelerate this process. In
this paper, we present a new approach for genomic sequence database scanning utilizing
reconfigurable field programmable gate array (FPGA)-based hardware. In order to derive
an efficient structure for BLASTN, we propose a reconfigurable architecture to accelerate
the computation of the word-matching stage. The experimental results show that the
FPGA implementation achieves a speedup around one order of magnitude compared to
the NCBI BLASTN software running on a general purpose computer.
HVL093
Reduced-Complexity LCC ReedSolomon Decoder Based on Unified
Syndrome Computation
Abstract:
Reed-Solomon (RS) codes are widely used in digital communication and storage
systems. Algebraic soft-decision decoding (ASD) of RS codes can obtain significant
coding gain over the hard-decision decoding (HDD). Compared with other ASD
algorithms, the low-complexity Chase (LCC) decoding algorithm needs less computation
complexity with similar or higher coding gain. Besides employing complicated
interpolation algorithm, the LCC decoding can also be implemented based on the HDD.
However, the previous syndrome computation for 2 test vectors and the key equation
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 59
HVL094
Scale-Free Hyperbolic CORDIC Processor and Its Application to
Waveform Generation
Abstract:
This paper presents a novel completely scaling-free CORDIC algorithm in
rotation mode for hyperbolic trajectory. We use most-significant-1 bit detection
technique for micro-rotation sequence generation to reduce the number of iterations. By
storing the sinh/cosh hyperbolic values at octant boundaries in a ROM, we can extend the
range of convergence to the entire coordinate space. Based on this, we propose a pipeline
hyperbolic CORDIC processor to implement a direct digital synthesizer (DDS). The DDS
is further used to derive an efficient arbitrary waveform generator (AWG), where a
pseudo-random number generator modulates the linear increments of phase to produce
random phase-modulated waveform. The proposed waveform generator requires only one
DDS for generating variety of modulated waveforms, while existing designs require
separate DDS units for different type of waveforms, and multiple DDS units are required
to generate composite waveforms. Therefore, area complexity of existing designs gets
multiplied with the number of different types waveforms they generate, while in case of
proposed design that remains unchanged. The proposed AWG when mapped on Xilinx
Spartan 2E device, consumes 1076 slices and 2016 4-input LUTs. The proposed AWG
involves significantly less area and lower latency, with nearly the same throughput
compared to the existing CORDIC-based designs.
HVL096
Two-Rate Based Low-Complexity Variable Fractional-Delay FIR Filter
Structures
Abstract:
This paper considers two-rate based structures for variable fractional-delay (VFD)
finite-length impulse response (FIR) filters. They are single-rate structures but derived
through a two-rate approach. The basic structure considered hitherto utilizes a regular
half-band (HB) linear-phase filter and the Farrow structure with linear-phase subfilters.
Especially for wide-band specifications, this structure is computationally efficient
because most of the overall arithmetic complexity is due to the HB filter which is
common to all Farrow-structure subfilters. This paper extends and generalizes existing
results. Firstly, frequency-response masking (FRM) HB filters are utilized which offer
further complexity reductions. Secondly, both linear-phase and low-delay subfilters are
treated and combined which offers trade-offs between the complexity, delay, and
magnitude response overshoot which is typical for low-delay filters. Thirdly, the HB
filter is replaced by a general filter which enables additional frequency-response
constraints in the upper frequency band which normally is treated as a don't-care band.
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 61
HVL097
VLSI Architectures for the 4-Tap and 6-Tap 2-D Daubechies Wavelet
Filters Using Algebraic Integers
Abstract:
This paper proposes a novel algebraic integer (AI) based multi-encoding of
Daubechies-4 and -6 2-D wavelet filters having error-free integer-based computation.
Digital VLSI architectures employing parallel channels are proposed, physically realized
and tested. The multi-encoded AI framework allows a multiplication-free and
computationally accurate architecture. It also guarantees a noise-free computation
throughput the multi-level multi-rate 2-D filtering operation. A single final reconstruction
step (FRS) furnishes filtered and down-sampled image outputs in fixed-point, resulting in
low levels of quantization noise. Comparisons are provided between Daubechies-4 and -6
designs in terms of SNR, PSNR, hardware structure, and power consumptions, for
different word lengths. SNR and PSNR improvements of approximately 30% were
observed in favour of AI-based systems, when compared to 8-bit fixed-point schemes
(six fractional bits). Further, FRS designs based on canonical signed digit representation
and on expansion factors are proposed. The Daubechies-4 and -6 4-level VLSI
architectures are prototyped on a Xilinx Virtex-6 vcx240t-1ff1156 FPGA device at 282
MHz and 146 MHz, respectively, with dynamic power consumption of 164 mW and 339
mW, respectively, and verified on FPGA chip using an ML605 platform.
HVL099
VLSI Implementation of a Multi-Mode Turbo/LDPC Decoder
Architecture
Abstract:
Low-density parity-check (LDPC) codes and convolutional Turbo codes are two
of the most powerful error correcting codes that are widely used in modern
communication systems. In a multi-mode baseband receiver, both LDPC and Turbo
decoders may be required. However, the different decoding approaches for LDPC and
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 63
HVL0100
An Efficient Interpolation-Based Chase BCH Decoder
Abstract:
BCH codes are adopted in many systems, such as flash memory, optical
communications, and digital video broadcasting. By trying 2 test vectors, the softdecision Chase decoding algorithm of BCH codes can achieve significant coding gain
over hard-decision decoding. Previous one-pass Chase schemes find the error locators
based on the Berlekamp's algorithm and need hardware-demanding selection methods to
decide which locator corresponds to the correct code word. In this brief, a novel
interpolation-based one-pass Chase decoder is proposed for BCH codes. By making use
of the binary property of BCH codes, an innovative yet low-complexity method is
developed to select the interpolation output leading to successful decoding without
G2, Metha Complex, Little Mount, Saidapet, Chennai-15
Ph: 044-22200258, Mobile: 9840989556, 9952050233
Mail: projects@hades.in, contact@hades.in, www.hades.in
P a g e | 64
HVL0101
An Efficient Multi-Standard LDPC Decoder Design Using HardwareFriendly Shuffled Decoding
Abstract:
This paper presents an efficient multi-standard low-density parity-check (LDPC)
decoder architecture using a shuffled decoding algorithm, where variable nodes are
divided into several groups. In order to provide sufficient memory bandwidth without the
need for using registers, a FIFO-based check-mode memory, which dominates the
decoder area, is used. Since two compensation factors, rather than a single factor, are
dynamically used in the offset Min-Sum algorithm, the number of quantization bits, and,
hence, the memory size, can be reduced without degradation in error performance. In
order to further reduce the memory size, artificial minimum values, which do not need to
be stored in memory, are used. We also propose an algorithm that can be used to partition
variable nodes such that the hardware cost can be minimized. Using the proposed
techniques, a multi-standard decoder that supports the LDPC codes specified in the ITU
G.hn, IEEE 802.11n, and IEEE 802.16e standards was designed and implemented using a
90-nm CMOS process. This decoder supports 133 codes, occupies an area of 5.529 mm2
, and achieves an information throughput of 1.956 Gbps.