DCIS 2004
XIX Conference on Design of Circuits and Integrated Systems
Bordeaux, France November 2426, 2004 Organized by
Laboratoire IXL  ENSEIRB  CNRS UMR5818 UNIVERSITE BORDEAUX 1
Sponsored by
La Rgion Aquitaine La Communaut Urbaine de bordeaux La Mairie de Bordeaux La Maison du Tourisme de la Gironde
And
IEEE CAS ANSOFT CIS ST Le Club EEA Mentor Graphics Nasa Coherent
Foreword
On behalf of the DCIS Organizing and Program Committees, we would like to welcome you to the 19th Conference on Design of Circuits and Integrated Systems (DCIS), held in Bordeaux, France, November 24th26th. The Conference aims at gathering the experts in the field of Microelectronics, and providing a forum to exchange ideas and information on industrial and research results. The 2004 edition of DCIS confirms its international impact with the contribution of 16 countries. Experts in a wide range of areas have decided to participate, leading to a very high number of submitted papers and a reviewing effort shared upon 178 reviewers. The quality of the submissions made the selection very difficult. Following the reviewers recommendations, the program committee accepted 182 papers for oral presentation at the Conference. This year, the threeday technical program is organized in four parallel tracks and 36 sessions. Two plenary sessions take place with distinguished speakers from STMicroelectronics and the NASA to present the new challenges offered to our community by Software Designed Radio and Space Electronics. A panel session is also scheduled in order to evaluate and discuss altogether the return on experiments from European countries with our new higher educational system according to the Bologna process. And for the first time this year, a special award will be conferred to honour the best paper from the previous edition of DCIS, in 2003. DCIS 2004 results from the work of many dedicated volunteers: the authors of the papers, the reviewers, the session organisers, the moderators, the invited speakers, and the sponsors. We would like to thank the ENSEIRB Graduate Engineering School for providing the modern environment abreast of the scientific level of this event. We also would like to use this opportunity to express our gratitude to the members of the IXL Organizing Committee, for all the time and efforts they have offered freely for the great pleasure of our scientific community. We wish you a productive and enjoyable stay in the sweet area of Bordeaux. Pascal Fouillat General Chairman Maria Lusa Lpez Vallejo & Jean Tomas Program CoChairs
III
Table of Contents
1 Plenary Sessions
2 Software Defined Radio : Theory and Applications, Ernesto Perea (STMicroelectronics) 3 Space Electronics : a Challenging World for Designers, Christian Poivey & Kenneth LaBel (Goddard Space Flight Center, NASA)
4 Panel Discussion
5 The Bologna Process : Return on Experiment, Moderator: Prof. Yves Danto (U Bordeaux 1)
6 Exhibits
7 Microwind : An introduction to nanoscale CMOS cell design. Prof. Etienne SICARD, Sonia BENDHIA (INSA Toulouse) 8 ICEmit: Comparing simulated/measured Parasitic Emission of Integrated Circuits. Prof. Etienne SICARD (INSA Toulouse), Amaury SOUBEYRAN (EadsCCR)
11 "SUSANA: a MOSMixedCircuit Simulator Using Logic/ELogic Algorithms Implemented in Python", Tiago Carrisosa , Tiago Flix , Miguel Jernimo (INESCID/IST), Jos Soares Augusto (INESCID/FCULDep. Fsica) 12 "A Distributed Enhanced Genetic Algorithm Kernel Applied to a Circuit/Level Optimization EDesign Environment", Manuel Barros (Instituto Politecnico de Tomar), Goncalo Neves (Instituto Superior Tecnico  IST/IT), Jorge Guilherme (Instituto Politecnico de Tomar), Nuno Horta (Instituto Superior TecnicoIST/IT) 13 "A CAD Tool for the Design of RTD Programmable Gates based on MOBILE ", Hector Pettenghi , Maria Jose Avedillo , Jose Maria Quintana (Instituto de Microelectrnica de Sevilla, Centro Nacional de Microelectrnica)
IV
16 "Simulationbased Highlevel Synthesis of Pipeline AnalogtoDigital Converters", Jess RuizAmaya , Jos M. de la Rosa, Manuel DelgadoRestituto (Instituto de Microelectrnica de Sevilla, IMSECNM (CSIC)) 17 "Digital Background Technique for Gain Error Correction in Pipeline ADCs", Antonio Jos Gins Arteaga , Eduardo Jos Peralas Macas , Adoracin Rueda Rueda (Instituto de Microelectnica de Sevilla, Centro Nacional de Microelectrnica)
18 "Mismatch Properties of MOS and Resistors Calibrated Ladder Structures", Rafael SerranoGotarredona , Teresa SerranoGotarredona, Bernab LinaresBarranco (IMSECNMCSIC)
VI
43 "Signal Processing Unit for River Tugboat Telemetry System", Humberto Campanella (Instituto de Microelectrnica de Barcelona IMBCNM / U. del Norte), Mauricio Pardo, Vctor Manotas, Javier Pez (Universidad del Norte (BarranquillaColombia)), Juan Carlos Niebles, David Angulo (Flota Fluvial Carbonera Ltda)
44 "A Sensorless Electronically Controlled Horn for Automobiles", M. Cesar Rodriguez, Cesar Sanz (Universidad Politecnica de Madrid), Jacinto M. Acero, Fernando Nozal (Robert Bosch Espaa S.A.) 45 "Design of LowPower CMOS ReadOut ICs for Large Arrays Cryogenic InfraRed Sensors", Bertrand Misischi, Francisco SerraGraells (Centro Nacional de Microelectrnica  CSIC), Eduardo Casanueva, Csar Mndez (Indra Sistemas S.A.), Llus Ters (Centro Nacional de Microelectrnica  CSIC) 46 "A Dynamic Current Mode Logic to Counteract Power Analysis Attacks", Franois Mac, FranoisXavier Standaert, Illham Hassoune, JeanDidier Legat, JeanJacques Quisquater (Laboratoire de Microlectronique, UCL, Belgium.)
VII
55 "An FPGA Landmine Detection System based on Infrared Images", Fernado Pardo (Universidad de Santiago de Compostela. Santiago. Spain), Marco Balsi (Universit La Sapienza. Roma. Italy), Paula Lpez (Fraunhofer Institut fr Integrierte Schaltungen. Erlangen. Germany), Diego Cabello (Universidad de Santiago de Compostela. Santiago. Spain) 56 "Implementation of Optimized FFT on Stratix DSP Development Board ", Nouvel Fabienne (IETR) 57 "Comparison of Two Implementations of Scalable Montgomery Coprocessor Embedded in Reconfigurable Hardware", Milos Drutarovsky (Technical Univeristy of Kosice, Slovak Republic), Viktor Fischer (Universite Jean Monnet, SaintEtienne, France), Martin Simka (Technical Univeristy of Kosice, Slovak Republic) 58 "An implementation of a Parallel Architecture for the SelfSorting FFT Algorithm applied to IEEE 802.11a", Ainhoa Cortes, Igone Velez, Juan Francisco Sevillano (CEIT), Andoni Irizar (Universidad de Navarra), Pilar Calvo (CEIT) 59 "Optimized FPGA implementation of Trigonometric Functions with Large Input Argument", Javier Hormigo, Manuel Sanchez, Mario A. Gonzalez, Gerardo Bandera, Julio Villalba (Dept. Computer Architecture. University of Malaga)
VIII
69 "A Mixed Neuromorphic ASIC for Computational Neurosciences", Sylvain Saighi, Jean Tomas, Yannick Bornat, Sylvie Renaud (Laboratoire IXL) 70 "MixedMode Class AB Neuron Building Blocks: Analysis and Real Application", Guillermo ZatorreNavarro, Nicolas MedranoMarques, Santiago CelmaPueyo (Universidad de Zaragoza) 71 "A DiscreteTime Cellular Neural Network Architecture for a PixelLevel Snake OnChip Implementation", V.M. Brea, D.L. Vilarino, D. Cabello (Dept. of Electronics and Computer Science, University of Santiago de Compostela) 72 "ChargePacket Driven MismatchCalibrated IntegrateandFire Neuron for AddressEventRepresentation", Rafael Serrano Gotarredona, Bernabe LinaresBarranco, Teresa Serrano Gotarredona (Instituto de Microelectrnica de Sevilla) 73 "Digital Implementation of a Simplicial Cellular Neural Network", Pablo Echevarria, Victoria Martinez, Jose M. Tarela, Ines del Campo (Universidad del Pais Vasco)
IX
83 "Adviser Coprocessor for Image Compression on FPGA", Antonio Guzman, Marta Beltran (Rey Juan Carlos University) 84 "PowerAware Tuning of Dynamic Memory Management for Embedded RealTime Multimedia Applications", David Atienza (DACYA/Complutense University of Madrid & IMEC vzw), Stylianos Mamagkakis (VLSI CenterDemokritus University), Miguel Peon, Jose Manuel Mendias (DACYA/Complutense University of Madrid), Francky Catthoor (IMEC vzw), Dimitrios Soudris (VLSI CenterDemokritus University) 85 "An IIR Based 2D Adaptive and Predictive Cache for Image Processing", Stphane Mancini , Nicolas Eveno(LIS  Laboratoire des Images et des Signaux) 86 "Real Time Smart Pixels Processing Array for Mobile Multimedia Applications", Sebastian Lpez, Rafael Calzada, Ayoze Tejera, Jose Fco. Lpez, Roberto Sarmiento (Research Institute for Applied Microelectronics (IUMA)) 87 "Adaptation of Altera Stratix DSP Board for Realtime Stereoscopic Image Processing", Pavol Pavelka, Vincent Betheas, Viktor Fischer, Virginie Fresse (Laboratoire Traitement du signal et InstrumentationUniversite Jean Monnet)
97 "1.5V SquareRoot Domain Magnitude Locked Loop", Carlos A. De La CruzBlas, Antonio LopezMartin, Alfonso Carlosena (Public University of Navarra) 98 "HighSpeed HighPrecision Analog Rank Order Filter with O(n) complexity in CMOS Technology", Ramon Carvajal (Dpto. de Ingenieria Electronica, Escuela Superior de Ingenieros, Universidad de Sevilla (Spain)), Jaime RamirezAngulo, Gladys Omayra Ducoudray (Klipsch School of Electrical and Computer Engineering, New Mexico State University), Antonio LopezMartin (Dept. of Electrical and Electronic Engineering, Public University of Navarra, Pamplona (Spain)) 99 "A Seventh Order Elliptic CMOS Continuous Time GmC Filter for PLC applications", Juan Francisco FernndezBootello, Manuel DelgadoRestituto, Angel RodrguezVzquez (Instituto de Microelectrnica de Sevilla, Centro Nacional de Microelectrnica) 100 "Tunable GmC Biquadratic Filter Operating in Moderate Inversion", Jaime RamirezAngulo (New Mexico State University), Chandrika Durbha (Biomorphic VLSI, Inc.), Antonio J. LopezMartin (Public University of Navarra), Ramon G. Carvajal (University of Sevilla) 101 "FullyDifferential CMOS Current Conveyor Operating in Moderate Inversion", Antonio J. LopezMartin (Public University of Navarra), Jaime RamirezAngulo, Chandrika Durbha (New Mexico State University), Ramon G. Carvajal (University of Sevilla)
XI
XII
XIII
135 "A 2.45GHz Low PhaseNoise CMOS ", Vincent Cheynet de Baupr, Lakhdar Zaid, Wenceslas Rahajandraibe (L2MP  Polytech), Gilles Bas (STMicroelectronics)
149 "Testing of RF Systems by Zoning the Constellation Diagram", Daniel Arum Delgado, Rosa Rodrguez Montas, Joan Figueras (Universitat Politcnica de Catalunya)
XIV
150 "On the Minimum Number of Measurements for Single Fault Diagnosis in Linear Circuits", Jose Soares Augusto (INESCID/FCUL  Physics Dept.)
XV
166 "DPA on Quasi Delay Insensitive Asynchronous Circuits: Concrete Results", Fraidy Bouesse, Marc Renaudin (TIMA Laboratory ), Bruno Robisson, Edith Beigne (CEAGrenoble), PierreYvan Liardet, Solenn Prevosto (STMicroelectronics, ZI Rousset) 167 "Four Phase Alternating Latches Clocking Scheme for CMOS Sequential Circuits", David Guerrero, Manuel Jess Bellido, Jorge Juan Chico, Alejandro Milln, Paulino Ruiz de Clavijo, Enrique Ostua (Instituto de Microelectrnica de SevillaCentro Nacional de Microelectrnica/Departamento de Tecnologa ElectrnicaUniversidad de Sevilla) 168 "A Memoryless Clock Domain Adaptation Unit IP", Roberto EsperChan, Flix Tobajas, Francisco Gonzlez, Rubn Arteaga, Roberto Sarmiento (Instituto Universitario de Microelectrnica Aplicada) 169 "Synchronization of Sequential Circuits using the Asynchronous Wave Pipelining Technique", Stephan Hermanns, Sorin Alexander Huss (Integrated Circuits and Systems, Darmstadt University of Technology)
XVI
181 "New Low Voltage ClassAB CMOS Unity Gain Buffer and Current Mirror", Antonio Torralba, Ramn G. Carvajal, Mariano Jimnez, Fernando Muoz (Universidad de Sevilla (SPAIN)), Jaime RamrezAngulo (New Mexico State University, USA) 182 "New Lowvoltage Class AB/AB CMOS OpAmp with RailtoRail Input/Output Swing", MilindSubhash Sawant, Shanta Thoutam, Jaime RamirezAngulo (New Mexico State University), Antonio LopezMartin (Universidad Publica de Navarra), Ramon G. Carvajal (Escuela Superior de Ingenieros Universidad de Sevilla) 183 "A New Family of LowVoltage PowerEfficient Class AB CMOS OTAs", Sushmita Baswa (New Mexico State University), Antonio J. LopezMartin (Public University of Navarra), Jaime RamirezAngulo (New Mexico State University), Ramon G. Carvajal (University of Sevilla)
XVII
XVIII
XIX
222 "Spectral Characterization of the Digital Noise", Miguel Angel Mndez Villegas, Jos Luis Gonzlez Jimnez, Diego Mateo Pea, Jos Antonio Rubio Sol (Universitat Politcnica de Catalunya) 223 "On the Relation between Digital Circuitry Characteristics and Power Supply Noise Spectrum in MixedSignal CMOS IC", Miguel ngel Mndez, Jos Luis Gonzlez, Enrique Barajas, Diego Mateo, Antonio Rubio (Electronic Engineering Department, Universitat Politcnica de Catalunya)
XX
Conference Committees
Steering Committee Daniel Auvergne Salvador Bracho del Pino Rafael Burriel Lluna Fulvio Corno Joan Figueras Pmies Jos Epifanio da Franca Leopoldo Garca Franquelo Eugenio Garca Moreno Miguel A. Hernndez y Coll Jos Luis Huertas Daz Juan Carlos Lpez Lpez Antonio Nez Ordez Emilio Olas Ruiz Michel Renovell Armando Roy Yarza Antonio Rubio Sol Jos A.R. Silva Matos Antonio J. Torralba Silgado Javier Uceda Antoln LIRMM, F U. de Cantabria, E CeDInt, U. Politcnica Madrid. E Politecnico Torino, I U. Politcnica Catalunya, E Inst. Superior Tcnico, P U. de Sevilla, E U. Illes Balears, E Siemens A.G. Munich, D CNM Sevilla, E U. CastillaLa Mancha, E U. Las Palmas G. Canaria, E U. Carlos III, E LIRMM, F U. de Zaragoza, E U. Politcnica de Catalunya, E U. Porto, P U. de Sevilla, E U. Politcnica Madrid, E
General Chair Pascal Fouillat ENSEIRB Programme CoChairs Maria Luisa Lpez Vallejo U.Politcnica de Madrid Jean Tomas U. Bordeaux 1 Local Organizing Committee Valrie Cauhap Stphane Azzopardi Dominique Dallet Yann Deval Genevive Duchamp Rgis Devreese Isabelle Dufour Eric Kerherv Herv Lapuyade Nathalie Malbert Nicolas Moll Sylvie Renaud Anglique Ttelin Sylvain Saghi Patrick Villesuzanne JeanMichel Vinassa Local Secretariat Valrie Cauhap Laboratoire IXL Universit Bordeaux 1 351 Cours de la Libration 33405 Talence Cedex  FRANCE Tel: +33 (0) 540 002 807 Fax: +33 (0) 556 371 545 dcis2004@ixl.fr Registration and Hotel accomodation Dominique Aurieres SUD CONGRES CONSEIL  DCIS'04 166 cours du Marchal Gallini  33400 Talence FRANCE Fax : +33 (0) 556 249 948 dominique.aurieres@wanadoo.fr
XXI
Reviewers
Abouchi,N. Acosta Jimnez,A.J. Aguiar,R.L. Aguirre Echanove,M.A. Alarcn,E. Alcubilla,R. Alexandre,A. Alexandres Fernndez,S. Alves,J.C. Amendola,G. Aragons,X. Arapoyanni,A. Aubepart,F. Aubry,J.F. Augusto,J. Auvergne,D. Ayala,J.L. Azcondo,F.J. Badets,F. Ballester Merelo,F.J. Barthelemy,H. Bausells,J. Begueret,J.B. Belhaire,E. Bellido Diaz,M.J. Bota,S. Bourdel,S. Bracho,S. Burriel,R. Campo,E. Canas Ferreira,J. Capraro,S. Carmona,R. Carrera Usiabaga,A. Celma,S. Charlot,B. Chatelon,J.P. Crand,S. Dallet,D. Dejous,C. Del Rio Fernandez,R. Deltimple,N. Deval,Y. Dilillo,L. Dualibe,C. Duchamp,G. Dufour,I. Erwin,O. Farina Rodriguez,J. Fernandez,A. Ferreiros,J. Ferrer,C. Figueras,J. Fischer,V. Fouillat,P. Garcia Franquelo,L. Garcia Moreno,E. Garda,P. Girard,P. Granado,B. Hebrard,L. Hermida,R. Hernandez,A. Herve,Y. Houzet,D. Isern,E. Izpura,I. Jacquemod,G. Kerherve,E. Landrault,C. Lapuyade,H. Levant,J.L. Leveugle,R. Levi,H. Lewis,D. Lewis,N. Linan Cembrano,G. Lopez Nozal,L.A. Lopez Vallejo,M. Lopez,C. Lopez,J.C. LopezVillegas,J.M. Lorenz,M.G. Louerat,M.M. Luxey,C. Machado da Silva,J. Madrenas,J. Mancini,S. Maneux,C. Manich,S. Marc,F. Martin,J.L. Martinez Salamero,L. Martinez,M. Mengibar,L. XXII Meresse,A. Mieyeville,F. Mir,S. Molina,M.C. MontielNelson,J.A. Moreno Arostegui,J.M. Moya,F. Moya,J.M. Navarro,D. Naviner,J.F. Nebel,W. Nouet,P. O'Connor,I. Olas Ruiz,E. Oliver,J. OrtizConde,A. Ousten,Y. Prez Verd,B. Petit,G. Petrashin,P. Pinna,A. Pissaloux,E. Psychalinos,C. Quero,J. Ramdani,M. Rebiere,D. Renaud,S. Renovell,M. Ribas,L. Ribeiro Alves,G. Rincon,F. Rius Vzquez,J. Robert,M. Roca,M. Rodrguez Andina,J.J. Rodrguez,R. Romain,O. Roy,A. Rubio,A. Rueda Rueda,A. Samitier,J. Sanchez Espeso,P.P. Sandoval Hernndez,F. Santos,D.M. Santos,H. Santos,M. Sauerer,J. Silva,M.
XXIII
Plenary Sessions
A brief overview of Software Defined Radio (SDR) principle yields the required characteristics for some of the key building blocks. Although the analogtodigital converter appears to be a severe bottleneck as expected, the huge bit stream the system has to deal with generates the strongest constraints on the Digital Signal Processor. It is demonstrated that a sampledanalog signal processing approach can solve this problem and initiates others.
DCIS 2004
2
Christian Poivey will address in his talk the concern of Radiation effects for the design of space electronics systems. He will describe first the radiation environment and how this environment affects electronics parts and embedded systems. A special focus will be given on CMOS devices. Then, examples of radiation effects on spacecraft will be presented. The talk will end with a short description of hardening by design methods for CMOS electronics devices.
DCIS 2004
3
Panel Discussion
Panel Discussion
The Bologna Process : Return on Experiment
Moderator:
Prof. Yves Danto (U Bordeaux 1)
Participants:
Prof. Olivier Bonnaud (U. Rennes 1) Prof. Fausto Fantini (U. de Modena) Prof. Lpez Barrio (U. Madrid) Prof. Jos Silva Matos (U. Porto) Prof. Joo Paulo Teixeira (U. Lisboa)
DCIS 2004
5
Exhibits
Microwind
An introduction to nanoscale CMOS cell design
Etienne SICARD Professor, INSA 135 av de Rangueil, 31077 Toulouse France etienne.sicard@insatlse.fr www.microwind.org Sonia BENDHIA Senior Lecturer, INSA 135 av de Rangueil, 31077 Toulouse France sonia.bendhia@insatlse.fr
Abstract: Microwind is a friendly windowsbased tool for designing and simulating microelectronic CMOS cells at layout level. The tool features full editing facilities, various views (MOS characteristics, 2D cross section, 3D process viewer), and a high performance builtin analog simulator. Microwind aims at illustrating the technology scale down, the major improvements allowed by nanoscale technologies, as well as main substrate options (buried layer, SOI, RF). The nchannel and pchannel MOS devices, simple/double/triple oxide, simple/doublegate, are illustrated and simulated based on BSIM4 models. Basic cells such as Inverters, logic gates, complex gates, arithmetic blocs, latches can be designed, simulated and optimized in a very efficient way with Microwind. A specific effort has been dedicated to the handling of static, dynamic, nonvolatile and magnetic memories. Furthermore, radiofrequency analog cells, such as mixers, voltagecontrolled oscillators, fast phaselockloops and power amplifiers are also illustrated by Microwind. Finally, input/output interfacing principles, electrostatic discharge protections, pad structure, and package are also covered through numerous examples. Technologies ranging from 1m down to 65nm are supported. The tool runs on Windows 98, 2000, NT, and XP. Microwind is used in more than 500 Universities around the world and in industry training centers. The tool has proven very efficient in the illustration of CMOS technology and design principles, either for teachers during their lecture or for students realizing integrated logic or analog functions as practical training.
DCIS 2004
7
ICEmit
Comparing simulated/measured Parasitic Emission of Integrated Circuits
Etienne SICARD INSALesia Toulouse, France etienne.sicard@insatlse.fr Amaury SOUBEYRAN EadsCCR Suresnes, France amaury.soubeyran@eads.net
Abstract: ICEmit is a Windowsbased environment for the simulation of parasitic emission of integrated circuits. The tool consists of a dedicated schematic editor, an IBIS translator, a core activity evaluator, an analog SPICE simulator and a dedicated postprocessor. The IBIS translator gives information about the input/output characteristics and the package and supply model. The core activity evaluator translates the integrated circuit specification into a current source which aims at modeling the core switching noise and onchip decoupling. The analog simulation is performed by WinSpice, and a post processing features an immediate comparison of predicted and measured spectrum in frequency domain. ICEmit handles a set of standards for integrated circuit modeling, emission modeling and test setups. ICEmit can be downloaded from www.icemc.org. The tool has been used to modelize successfully the parasitic emission of 16bit, 32bit microcontrollers, Xilinx programmable devices as well as dedicated ASICs, within the range 1MHz2GHz. The freeware runs on Windows 95,98, NT, XP.
Free copies of the package and the manual will be available at DCIS'04. ICemit has been developped within MEDEA+ "Mesdie" project A509
DCIS 2004
8
Session 1a
ODERN SAT solvers, which implement improved variants of the DavisPutnam algorithm, can determine the satisfiability of large CNF formulae in a few seconds. This fact has
favoured the development of noncanonical methods of representing Boolean functions. In this paper, we introduce BNSAT, a new package that implements a noncanonical representation of Boolean functions. The central idea behind BNSAT is representing a Boolean function F as a composition of small functions fi. The main data structure employed in BNSAT is a cyclic directed graph with specific features, resembling a Boolean network. Each nonterminal vertex has an nvariable function f associated to it. Functions fi are represented by means of BDDs, and BDD variables are shared among several vertices. As a consequence, a BDD node can be used in the representation of the functions fi of different vertices. Such reusing of BDD nodes involves a great saving of memory. Two parameters, the maximum number of fanin nodes and the maximum number of BDD nodes, control the size of the functions fi. BNSAT package can compute the usual Boolean operations. The most intuitive way of doing a binary operation among two Boolean functions F1 and F2 is by operating the BDDs of the vertices that represent those functions. However, limits imposed by the aforementioned parameters can be exceeded. To avoid this problem, one of the operands (or both) can be replaced by a BDD variable. It gives rise to four different methods of implementing a binary operation. Different strategies have been studied in order to determine the satisfiability of a Boolean function F represented with BNSAT. By the time being, the strategy that best performs consists of a specific method of translation into CNF formulae in conjunction with the use of a SAT solver, Zchaff. The translation into CNF format is based on the use of ESPRESSO. We have tested BNSAT on some common combinational circuit benchmarks in BLIF format. It can be seen that the combination of BNSAT and Zchaff outperforms the combination of Zchaff and a direct BLIFCNF translator in most cases. In the short term, we are going to implement Quantification Boolean Formulae, which are widely used in formal verification algorithms. Some of these algorithms (for instance, reachability analysis) will be implemented in order to explore the efficiency of BNSAT in the field of formal verification.
DCIS 2004
 10 
In this paper we describe the simulator SUSANA (Alternative Numerical Algorithmsbased Simulator), based on the ELogic1 simulation approach and implemented in Python/wxPython. ELogic, an eventdriven simulation algorithm, traditionally used in digital MOS circuits simulation, is also suitable for simulating analogue and mixed circuits. SUSANA was applied to large digital ISCAS85 benchmark circuits. Several improvements have been added to standard ELogic, such as the implementation of a logic simulator to obtain initial conditions before starting ELogic simulation of digital circuits. The precision of simulation can be controlled by the user through the number of discrete states (voltages) allowed for the circuit nodes. The smaller the number of states, the faster the simulation but, also, in this case the simulation error becomes larger. V , the voltage dierence between adjacent states, controls their number. The use of a very high level programming language (Python) permitted the rapid development, test and debugging of a quite complex circuit simulator and of the associated visual input and data analysis components. Examples, results and a description of the simulation environment are presented in the full paper. The eciency of SUSANA when compared to Spice, despite being written in Python, is clearly shown in table 1. In the simulation of the ALU shown in g. 1 a speedup of 12X was observed. SIMULATOR Spice SUSANA (V =0.25 V) TIME (s) 4557 380 REL. SPEED 1 12
Table 1: ALU simulation run times in Spice and in SUSANA The simulation of pulse propagation in a chain of 1000 inverters has shown a speedup of 168X due, in part, to the use of a digital simulator to initialize correctly the digital values.
1 R.
Saleh, S.J. Jou and A. R. Newton, MixedMode Simulation and Analog Multilevel Simulation, Boston, Massachusetts:
DCIS 2004
 11 
A Distributed Enhanced Genetic Algorithm Kernel Applied to a Circuit/Level Optimization EDesign Environment
1,2
IPT Inst. Pol. de Tomar Qt. do Contador Est. de Serra 2300313 Tomar, Portugal
IST/IT  Centre for Microsystems Av. Rovisco Pais, 1 1049001 Lisboa, Portugal
HIS paper presents a distributing implementation of a circuit/systemlevel optimization EDesign environment (fig.1) based on an enhanced modified genetic algorithm kernel. First, we discuss the
main features of the optimization kernel such as automatic search space decomposition, premature convergence prevention procedures and the ability to optimize a broad range of circuits based on either an equationbased approach or an simulationbased approach, using Spicelike simulators. Then, a simple, inexpensive and efficient distributed processing method applied to the serial genetic algorithm is described. Finally, the achieved increase on optimization efficiency, compared to the standard genetic algorithm implementation, as well as the validity of the proposed approach, is demonstrated by a multiobjective, multiconstraint optimization of some well known circuits.
um
DCIS 2004
 12 
A CAD Tool for the Design of RTD Programmable Gates based on MOBILE1
Hctor Pettenghi, Mara J. Avedillo, and Jos M. Quintana Instituto de Microelectrnica de Sevilla, CNM, Sevilla, SPAIN Email: {hector, avedillo, josem}@imse.cnm.es
Resonant Tunnelling Diodes (RTDs) exhibit a negative differential resistance (NDR) region in their currentvoltage characteristics which can be exploited to significantly increase the functionality implemented by a single gate in comparison to MOS and bipolar technologies, thus reducing circuit complexity. Because of these attractive features they are receiving much attention as device elements for circuit applications. However there is a wide gap between research on the device development and automatic tools to design circuits using them, which can limit the success of this emergent technology. This paper presents a CAD tool for the design of complex programmable logic blocks (able to implement a set of functions) using RTDs. Starting from a functional specication, it generates a sized netlist implementing it. The derived circuits exploit the MOBILE operating principle but increase the logic complexity which can be implemented with a single gate by rising the number of negative differential resistance devices connected in series, and by the simultaneous implementation of functions with such structure. The tool is based on the maximization of the number of functions which are simultaneously realized (minimization of number of control variables), and in the formulation of the design problem as a mixed integer linear problem (MILP) with a suitable cost function which allows minimizing the circuit complexity in terms of device counts. From the solution, the sized circuit and the control combination for selecting each function are derived. Figure 1(a), despicts the circuit derived for a 2 input programmable gate that implements the functions NAND, OR and EXOR. Simulation results in the figure 1(b) show correct operation. The proposed tool can be useful in translating the attractive features of RTDs to the circuit level.
Vbias
1.8
vbias
x1
0.6
0.6
1.2
y2
x2
y2 = 1
0.2 0.2 0.2 1.3 y1
y2
C=0
y1 = x1 + x2 y2 = x1 x2 y1 = x1 x2
y1
y2
C=1
y1
1. This effort was partially supported by the EU QUDOS project IST 200132358.
DCIS 2004
 13 
Session 1b
A New CapacitorRatio and Offset Independent Amplifier for Pipelined A/D Converters
F. Muoz, R.G. Carvajal, A. Torralba, B. Palomo Departamento de Ingeniera Electrnica, Universidad de Sevilla
HE
mismatch of ratio capacitors used in the residue amplifier of the first pipelined stages limits the
residue amplifier which is inherently insensitive to capacitor mismatch and amplifier offset is presented. Using a fourphase switched capacitor circuit, the proposed technique (shown in figure 1) senses and compensates the mismatch capacitor error.
Although other ratioindependent residue amplifiers have been proposed in the literature, the technique proposed here is, in the authors knowledge, the only one which allows an operational amplifier to be shared between two successive pipelined stages, providing, in addition, cancellation of the amplifier offset. Simulation results show the potentiality of the proposed technique for the design of very lowpower highresolution pipelined converters.
I
I
I I
C1
I
I
A1 +
Vin
I I
I
Vout
I
C2
I
C3
I
A2 +
DCIS 2004
 15 
HIS paper presents a toolbox for the simulation, optimization and highlevel synthesis of pipeline
Ccoded Sfunctions to model all required subcircuits including their main error mechanisms. This approach allows to drastically speed up the simulation CPUtime up to 2 orders of magnitude as compared with previous approaches based on the use of SIMULINK elementary blocks. Moreover, Sfunctions are more suitable for implementing a more detailed description of the circuit. For all subcircuits, the accuracy of the behavioural models has been verified by electrical simulation using HSPICE. For synthesis purposes, the simulator is used for performance evaluation and combined with an hybrid optimizer for design parameter selection. The optimizer combines adaptive statistical optimization algorithm inspired in simulated annealing with a designoriented formulation of the cost function. It has been integrated in the MATLAB/SIMULINK platform by using the MATLAB engine library, so that the optimization core runs in background while MATLAB acts as a computation engine. The implementation on the MATLAB platform brings numerous advantages in terms of signal processing, high flexibility for tool expansion and simulation with other electronic subsystems. Additionally, the presented toolbox comprises a friendly graphical user interface to allow the designer to browse through all steps of the simulation, synthesis and postprocessing of results. In order to illustrate the capabilities of the toolbox, a 0.13Pm CMOS 12bit@80MS/s A/D interface for power line communications is synthesized and highlevel sized. Different experiments show the effectiveness of the proposed methodology.
This work has been supported by the MEDEA+ (A110 MIDAS) Project.
DCIS 2004
 16 
IGHspeed high resolution for communications needs good performance of the analogue blocks in data converters as well as selfcorrection/selfcalibrating techniques. In the particular case of Pipeline
ADCs, correction techniques can improve the linearity of subADCs dealing with the transition errors, but for resolution greater than 10 bits, a calibration technique is still necessary. Moreover, even for lower resolutions calibration can relax the analogue block specification, and therefore, should be considered as an additional design variable. There exist foreground calibration techniques that need the interruption of the normal converter operation to start a calibration cycle. Normally, the error measurements are obtained just after power is turned on. Thus, any miscalibration, environmental change such as temperature, power supply or component aging cannot be overcome if the system works continuously. However, background calibration allows that, performing error measurements during the ADC operation. This paper presents a new digital technique for background calibration of gain errors in Pipeline ADCs. The proposed algorithm estimates and corrects both the MDAC gain error of the stage under calibration (SUC) and the global gain error associated to the least signicant stages. This process is performed without interruption of the conversion and without reduction of the dynamic rate. The proposed system (Fig. 1a) uses a stage with two inputoutput characteristics depending on the value of a digital pseudorandom noise signal N[i] to modulate the output residue of the SUC and to estimate the calibration code by an adaptive averaging process. The proposed method introduces no signicant modications in the analogue blocks of the Pipeline ADCs making this technique a very promising alternative for the background calibration of the nonlinearity associated to the gain errors due to the capacitor mismatches and limited OPAMP gain. Simulation results (Fig. 1b) have probed the stability of the algorithm and the tracking capability for fast gain error changes considering errors in both the subADC of the SUC and the backend stages. a) b)
R
SUC
sub ADC
sub DAC
Calibration Code 1 0.95 0.9 0.85 0 5 10 normalised time 15 4 0 Calibrated gain error = 20% gain error = 18% Theoretical 6 12 10 8
ENOB Ideal
+
G1
y2n
MDAC
ADC2
r2
Z2n
R
Calibrated
RNG N[i]
c1n
r1
5 10 normalised time
15
c 1 n [ 2 r 1 1, 2 r 1 1 1 ]
Zcal
r T = r 1 + r 2 1 bits
DCIS 2004
 17 
THE
mismatch behavior of MOS and resistor based calibrated ladder structures, used in arrays of DACs, is studied theoretically and experimentally. It is found that the calibrated DAC worst case output current standard deviation is approximately 1/3 that of its individual components. MOS experimental measurements illustrate the discussed mismatch behavior. Directions on how to design ladder DACs for a target precision are provided.
DCIS 2004
 18 
Session 1c
SiGe Designs
Wednesday nov. 24 9h00 10h00, St Emilion Room
Chairs JeanBaptiste Bgueret (U. Bordeaux 1) JeanMichel Fournier (E.N.S.E.R.Grenoble)
N original multistandard integrated power amplifier is presented for communication systems. It consists
of two subamplifiers in parallel, each one being devoted to a specific frequency range, either the 900MHz
bandwidth (GSM), or the 1700/2000MHz bandwidth (DCS/PCS/WCDMA). These circuits are to be implemented with a SiGe BiCMOS technology. Part of the challenge lies in the integration of both amplifiers on the same chip and in the reconfigurability of the PCS/DCS/WCDMA amplifier in terms of linearity, power added efficiency, and output power, via control bits. The architectures of both amplifiers are respectively described in figure1. This work was done in the framework of a complete transceiver design. Thats why some extra functionalities such as sleepmode and bypass mode had to be integrated for convenience purpose with the upstream frequency synthesizer. The simulations were carried out in the Cadence environment, with the SpectreRF simulator, and their results are as follows: x The GSM power amplifier is able to provide a 26.5 dBm output power and a 46% PAE with a 5dBm input power.
x The linear output power and the PAE of the 2GHz power amplifier were respectively simulated to be
32dBm and 39% with a 5dBm input power in the UMTS mode. In DCS/PCS mode, the maximum output power is 33dBm and the amplifier features a PAE above 30% for input power values down to 5dBm.
DC power supply
Power control
2.5V
3.3V
Chip
BypPass mode
Input
Driver stage
Active stage
Load network
Load
50 ohm  Input
In+
Switch Bypass+
Driver Stage
50 ohm  Output
In
Out+ OutBypass
In+ In
Out+ Out
In+ In
GSM PA
DCS/PCS/WCDMA PA
Fig 1: Block diagrams of both the singleended GSM switchingmode amplifier, and the reconfigurable DCS/PCS/WCDMA power amplifier
DCIS 2004
 20 
Laboratoire des Instruments et Systmes dIle de France UPMCP6, 4 place Jussieu, 75252 Paris cedex 05, France Email: prele@lis.jussieu.fr
(2)
Laboratoire de Gnie Electronique de Paris UMR 8507 CNRS, SuplecP6/P11, 11 rue JoliotCurie, 91192 GifsurYvette cedex, France Email: Alain.Kreisler@supelec.fr Centre National dEtudes Spatiales 18 avenue Edouard Belin, 31055 Toulouse cedex, France Email: cyrille.boulanger@cnes.fr
(3)
specific readout circuit operating at cryogenic temperature, has been investigated to process the lowlevel signals delivered by highTc superconducting (YBaCuO Tc~85 K) hot electron
bolometers. An ASIC has been designed, including a low noise and wide band (quasi DC to 1 GHz) amplifier, operating from room temperature down to 77K. This amplifier has been successfully tested at liquid nitrogen temperature (Fig. 1). The vicinity between sensor and processing electronics allows to reduce parasitic noises due to connecting leads and improves the compactness of the overall detector. This experiment shows that it is possible to realise, with SiGe BiCMOS technology, an ASIC designed for processing, in a cryogenic environment, the signal delivered by a YBaCuO hot electron bolometer on a large frequency scale.
DCIS 2004
 21 
A SiGe Power Amplifier with Dynamic Bias for Efficient Power Control in UMTS/WCDMA Applications
1
Nathalie Deltimple1, Eric Kerherv1, Didier Belot2, Yann Deval1 and Pierre Jarry1 IXL Laboratory, CNRS UMR 5818, CNRS FR 2648, ENSEIRB Bordeaux1 University, 351 cours de la Libration, 33405 Talence Cedex, France, 2 ST Microelectronics, 850 rue Jean Monnet, 38926 Crolles Cedex, France email: deltimple@ixl.fr
OWER AMPLIFIERS (PAs) are the most power consuming components in portable equipment so
achieving high power added efficiency (PAE) PAs are deeply expected. Moreover, in WCDMA
systems, where a nonconstante envelop modulation is used, the handset rarely transmits the signal at maximum power, so it is important to reduce the power consumption at low transmitted powers. An integrated twostage power amplifier operating at 1.95 GHz frequency range is proposed. The PA uses 2.5 V supply voltage and was designed using 0.25m SiGe BiCMOS technology from ST Microelectronics. The linear gain is 24 dB and the output 1 dB compression point (CP1) is 26.2 dBm. The amplifier achieves a maximum PAE of 54%. In order to fulfill UMTS/WCDMA requirements, especially on linearity, the output power is 24 dBm in a linear ClassA operation, with a 32.4% PAE. By acting on the twostage bias circuits, the amplifier is able to shift dynamically the CP1 at constant power gain according to input power level. Thanks to this, greater PAE is achieved at low input power level. For instance, if the PA is backedoff by 6 dB and 11.4 dB from its CP1, PAE is equal to 15.6% and 5.1% respectively. In order to enhance PAE, the driver stage bias circuit is used to shift the CP1 to lower level, as shown in Figure 1, whereas the power stage bias circuit realizes the gain compensation, then PAE reached is 27.1% and 13% respectively. The circuit designed with a SiGe quarter micron technology from STMicroelectronics is still in progress, the layout is depicted in Figure 2. With a CP1 dynamic controlled, this PA paving the way to reconfigurable PA well suited to multimode multiband transceivers.
Ibiasd=1.4mA Ibiasd=100A Ibiasd=500A
RF driver transistor (20 unit cells)
Pout (dBm)
460 m
Pin (dBm)
15
620 m
Figure 1: Output power as a function of the input signal level for different driver stage bias conditions
DCIS 2004
 22 
HE low fabrication cost and high packing density makes Silicon the most suitable material to choose in many RF IC applications. The devices requirements cannot be fulfilled in many cases
without the use of onchip inductors. However, standard integrated inductors suffer from their poor quality factor due to the low resistivity silicon substrate. In this work, silicon integrated standard spiral inductors are studied and some guidelines to optimized the performance are deduced. A highquality factor inductor library on a 0.35 m SiGe technology at 5 GHz has been designed using electromagnetic simulations. The inductors, designed with no changes in the process technology or postprocessing techniques, reach values up to 10 nH. As an application, a completely integrated LC voltage controlled oscillator (VCO) according to the IEEE 802.11a WLAN standard has been designed. The achieved phase noise is 113 dBc/Hz at 1 MHz offset, and the power consumption is 116 mW. The total VCO area, shown in Figure 1, is 0.424 mm2. This work demonstrates the feasibility of a low cost silicon technology for the design of 5 GHz band circuits.
DCIS 2004
 23 
Session 1d
N this paper we present a new builtin current sensor (BICS) dedicated to monitor the current of analog and mixedsignal building blocks. His principle is the same as the initial ratiometric BICS
presented by Yvan Maidon and al. in the ninetys. This initial BICS was first designed to operate under a 3.3 Volt power supply, as the CMOS technology used to implement the circuit was a 0.6 m one. His output range presented a good linearity but an important technology dispersion. The new version of the BICS (figure1) uses a design methodology that allows to dramatically reduce the dispersion (from 70% to 8.5% of the output range). We have first adapted the initial version of the BICS to a 130 nm VLSI CMOS technology, and have substituted the classical current mirrors by lowvoltage bootstrap cascode ones. This design approach allows a 1.2 Volt power supply and reduces the channel length modulation effect. At last, we added to the BICS degenerative resistors that prevent the circuit from thermal burst and improve its robustness. The new BICS here presented appears to be robust enough to be implemented in mass production mixedsignal integrated circuits such as System on Chip (SOC) solutions, in which testability is of major importance.
IDD
5/ 1 4/ 2 40 / 1
10
10 0
Imeas
4/ 1
Imeas
80 / 2 8/ 2 40 / 0 .5 25/ 0 .4 40 / 1 40 / 1 80 / 2 10 / 0 .4 50 0 50 0 50 0 0 10 0 10 0 80 / 2 4/ 1
40 / 0 .5
8/ 2
DCIS 2004
 25 
A NonIntrusive BuiltIn Sensor for Transient Current Testing of Digital VLSI Circuits
B.Alorda, V. Canals and J. Segura Univ. de les Illes Balears, Dept. Fisica, Cra. Valldemossa, km. 7.5, 07071 Palma de Mallorca, Spain Fax: +34 971 173 426. Tel: +34 971 172 506. email: tomeu.alorda@uib.es
transient current based testing of digital CMOS VLSI circuits. The monitor measures the
transient current idd(t) by sensing the voltage drop at an inductance coupled to the magnetic field produced by the power supply transient current. Designed in 0.18Pm CMOS technology, the sensor proposed has two blocks. The transducer circuit senses the transient current and provides a voltage waveform, while a second module amplifies the voltage waveform and computes the transient current waveform Idd(t). Simulation results, using an elaborated CUT model, demonstrate the performance of the new transceiver element.
DCIS 2004
 26 
BuiltIn Current Sensor using FloatingGate MOS Transistors for LowVoltage Applications
A.A. Hatzopoulos(1), S. Siskos(2)
Aristotle University of Thessaloniki 54124 Thessaloniki GREECE (1) Dept. of Electrical and Computer Eng., Electronics Lab., alkis@vergina.eng.auth.gr (currently Visiting Professor at the Katholieke Universiteit Leuven, Belgium) (2) Dept. of Physics, Electronics Lab., siskos@surf.physics.auth.gr
In recent years Floating Gate MOS Transistors (FGT) have found many applications. In case where the input terminal is divided in two parts, the FGT can be used as a variable threshold transistor, when the first input is used as the signal input of the device and the second is used to control the threshold voltage. Supply current testing, known as IDDQ testing in CMOS digital circuits, has been recognized for over 25 years now as an advantageous method supplementary to the conventional logic testing. It can reveal defects that are missed by logic testers. Various designs have been proposed in the last decade, especially for builtin current testing circuits. A major problem with all builtin current sensors (BICS) is their influence to the normal operation and performance of the CUT. The voltage drop across the current sensing device is a considerable drawback of the BICS. In this work the application of floating gate transistors in the design of a BICS is proposed. The important benefit from this application is that the voltage drop across the sensing device can be reduced to almost zero value, while preserving adequate linearity for the current monitoring. This linearity makes the proposed BICS also appropriate for analog and mixedsignal circuit testing. The proposed BICS structure is given in fig. 1. For two input gates it is shown that for the FGTs it K is: I D >w1VGS 1 w2VGS 2 VT @2 . With proper selection of the coupling ratio w2 and using a 2 K corresponding bias voltage VGS2, we can have: w2VGS 2 VT 0 , which results in: I D >w1VGS 1 @2 . Since 2 the value of VDS1 is kept quite low for the range of the ID current under consideration, the voltage degradation of the CUT supply will be minimal, making this structure suitable for low voltage builtin current sensing applications. The mirrored current in FGT2 can be converted to a voltage by the use of a loading transistor. This voltage output, followed by an appropriate buffer or a latch, may be directly used as a fault indicating flag. The mirrored current can be downscaled for power saving by scaling the sizes of the floatinggate transistors. The proposed FGTBICS structure of fig. 1 has been simulated, utilizing various circuits as a CUT. The relation between Vout of the BICS and the supply current of a simple opamp circuit in a voltage inverting configuration with a 10k load as a CUT is plotted in fig. 2, showing very good linearity.
VDD CUT FGT2 Vbias FGT1 Vbias VDD Load Out
(or VSS) Fig. 1. The proposed FGTBICS structure with 2input floatinggate transistors. Fig. 2. BICS voltage output versus supply current of an inverting opamp configuration as the a CUT.
DCIS 2004
 27 
HIS paper presents the experimental characterization of a builtin current sensor (BICS) for
analog circuits. The BICS gives greater specific weight to the higher frequency components of the
current waveform. Thus, an inductive rather than a resistive load has been used to carry out the conversion of the sampled current to voltage. The circuit has been fabricated with the Austria Micro System (AMS) 0.6 micron technology. The test approach relies on obtaining a copy of the supply current by means of the integration of additional transistors within the current mirrors of the CUT. In this way, the sensor overcomes the drawback of impacting the effective voltage supply seen by the CUT and consequently degrading the circuit performance if the sensor were placed in series with the supply/ground node. The proposed BICS gives an output that reflects the dynamic power supply consumption of the CUT. This signal has been digitized by a simple window comparator made of logical gates. The key parameter is the width of the pulse at the sensor output. Thus, a low cost counter or an integrator can easily do the signal postprocessing and the result will be compared with either the one obtained from simulation or the one obtained from a golden circuit. Finally, the sensor has been coupled to a transconductance amplifier in order to experimentally validate the structural test approach. Together with the fault free circuit, three parametric faults were implemented. The discrimination between them can be easily done by means of the measured value of the pulse width at the BICS output.
DCIS 2004
 28 
Session 2a
URRENTLY, the lack of compact magnetic tunnel junction (MTJ) model is a truly limiting factor
for the design of spintronics circuits. In this paper, we present a compact MTJ model written in
VHDLAMS. This behavioral model is based on the StonerWohlfarth model and takes most of the important phenomena such as magnetic coupling, capacitance, and magnetizations dependent conductance into account. The method employed to model a two layers magnetic tunnel junction is detailed. Applications of this model such as the simulation of the operation of a MRAM cell and of a magnetometer are also presented.
DCIS 2004
 30 
HIS contribution presents a methodology, based on VHDLAMS modeling, for synthesis and
optimization of systems designed with the switchedcurrent technique (SI). This methodology has
been implemented in Simplorer Software environment and allows a reduction of simulation runtime and a characterization of SI systems at a high level of the hierarchical design methodology.
DCIS 2004
 31 
O develop Systems On Chip for imaging, models of photodetectors for APS cells are needed.
The mainly used photodetectors are photodiodes but it is possible to realize APS cells with
phototransistors. Next complete models of vertical and lateral phototransistors are presented. They are based on a physical approach which leads to an electrical model. These models were implemented by using VHDLAMS language. The simulations of these structures give the spectral response of these components and are in good agreement with the usual results.
DCIS 2004
 32 
DCIS 2004
 33 
HIS work describes an optical Position Sensing Detection (PSD) algorithm. A mixedsignal
model of a photodetector cell for electrical simulation has been developed, including the complete
dynamic model for a photodiode. It is shown how standard simulators employed in electrical environments can be adapted to describe devices included in optical based system. This enables it to perform complex system characterizations including optical and electrical parts using the same environment (Spectre), and to extend the mixedmode simulation concept to a wider field than nonelectrical systems. A system simulation application for Position Sensing Detection (PSD) with a resolution in the micrometer range is reported along the paper.
DCIS 2004
 34 
Session 2b
UZZY
controllers are used in many applications because of their rapid design by translating
heuristic knowledge, robustness against perturbations, and smoothness in the control action.
However, their direct implementation requires parallel processing and special operators (such as fuzzification or defuzzification) which are not available at standard digital signal processors (DSPs). The novel idea followed in this paper is to translate the fuzzy rule bases of a fuzzy controller into non fuzzy ones that can be implemented easily by using the relational and logical operators, the standard ifthen conditional statements, and the addition and multiplication operators available at a DSP. This is done by using hierarchical structures and adequate membership functions, connective operators, and inference methods. The parking problem of an autonomous robot (Figure 1) is described to illustrate this design process. Experimental results (Figure 2) show the efficiency of the designed fuzzy controller embedded into a standalone card based on a fixedpoint DSP from Texas Instruments.
y (m)
x (m)
DCIS 2004
 36 
Universitat Rovira i Virgili (URV), Department of Electronic, Electrical and Automatic Control Engineering Av. Paisos Catalans 26, 43007 Tarragona (phone: 977558522; fax: 977559605; email: ecanto@etse.urv.es).
ingerprintbased automatic recognition systems are rapidly growing on a wide shell of applications. Most of biometrics authentication systems are implemented on high performance computer based platforms executing a set of complex algorithms implemented on software. Those solutions cannot be applied to lowcost embedded systems, based in microprocessors without floatingpoint arithmetic unit. The use of fingerprint biometrics coprocessors is still a young field. A great majority of commercial fingerprint OEM modules are based on embedded high performance 32bit processors or DSPs In this article we present a biometrics coprocessor to speed up the ride line following minutiae extraction algorithm. It covers the minutiae extraction stage, the one with higher computational requirements. In our work we use the MaioMaltoni ridge line following algorithm1 because it permits minutiae extraction directly from the grayscale fingerprint image, it is computationally less expensive than others, and it can be rewritten to be implemented without floating point operations. In order to develop an efficient hardware implementation of the coprocessor, in terms of lowcost and highspeed, floatingpoint computations used on the algorithm have been substituted by fixedpoint computations, among other substituted complex functions. It has also been adopted a pipelined scheme to reduce the critical pathdelay and to execute several steps in parallel, to increase the clock frequency and throughput. The execution of the steps performed by the coprocessor running at 50MHz, is 14.4Ps, while the time devoted by an ARM7TDMI processor at the same clock speed to execute the same computation tasks, was 211Ps averaged. The overall execution time of the algorithm running in the ARM with the coprocessor is reduced from 700ms to 215ms, that is a reduction of about 70%.
Maio, D.; Maltoni, D. Direct GrayScale Minutiae Detection In Fingerprints IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 19, No. 1. January 1997
 37 
DCIS 2004
HE use of biometrics is increasing everyday, as security is becoming one of the most important
concerns in Information society. In Biometrics, one of the most promising techniques is iris
recognition, which presents lower error rates than other biometrics techniques. One of the processes in a biometric system is the verification or matching between data obtained and a template previously stored. This process can be done on centralized systems, such as a central database, or in a distributed way, using identification tokens. When developing new identification tokens, computational cost and processing time should be reduced, to provide cheaper devices, which could allow a viable solution in a commercial system. The authors, in this paper, develop different implementations of low cost biometric verifiers, to be included in identification tokens. The biometric technique chosen for that issue has been iris recognition, and therefore, the verification technique has been based in Hamming Distance.
DCIS 2004
 38 
authentication at very low cost. Biometrics is the only way to perform a real user authentication,
but has a high computational cost. In this work we study the integration of fingerprint biometrics in smart cards. Since commercial smart cards use low performance microprocessors, fingerprint verification may take up to several seconds, which is unacceptable for practical applications. On the other hand, simply moving to a more powerful processor will result in an important cost penalty and will not solve the performance bottleneck. In order to speedup fingerprint verification, we identify the most critical operations of an efficient fingerprint matching algorithm. These include the computation of euclidean distance, radial angle and element matching. Then, we propose an extended instruction set that can be implemented with low hardware overhead. With this extension, the total time required to complete fingerprint matching is reduced more than 60% and accounts to less than a second for medium size minutia sets. An improved instruction for element matching, termed vector matching, is also proposed that provides larger speedup. Final results using vector matching allow to perform fingerprint verification in a fraction of a second for large minutiae sets.
This work was supported in part by the Ministerio de Ciencia y Tecnologa (Spain) under Project TIC200301793
DCIS 2004
 39 
Hardware implementation of the Bresenham line generation algorithm applied to robot movement
R. Casanova, A. Diguez, J. Lacort J. Samitier Departament dElectrnica, SIC, Universitat de Barcelona, C/Mart Franqus 1, E08028, Barcelona, Spain. Email: casanova@el.ub.es,
presented. The circuit is able to generate trapezoidal and sawtooth signals with programmable
amplitude, period and phase. These control waveforms are used to actuate over a bimorphic locomotion unit. As the robot has to be capable to operate with nanometric resolution, waveforms must be generated with great precision. Waveforms are generated by using the Bresenham algorithm in order to deal with integer operations. The circuit has been designed with the 0.35m C35b4 technology of AMS.
DCIS 2004
 40 
Session 2c
Industrial Applications
Wednesday nov. 24 10h30 11h45, Auditorium Chairs Franck Badets (STMicroelectronics) Daniel Auvergne (L.I.R.M.Montpel
Dualport serial arbiter with GSM modules for simultaneous local/remote control of RS232based devices
Eloi Ramon1, and Llus Ribas2, Member, IEEE Electronics, and 2Computer Science Departments, Universitat Autnoma de Barcelona (UAB), Cerdanyola, Spain, {Eloi.Ramon, Lluis.Ribas}@UAB.es
1
S ERIAL communications are extensively used within the electronics industry due to its relative
simplicity and low hardware overhead. One of the most popular serial communications standard in use is certainly the RS232. There are many devices that can be controlled by RS232 ports at home and, especially, in industrial applications. Remote control and/or monitoring of such devices enables users to access these devices from anywhere. Particularly, providers of such devices or related services can remotely monitor their functionality and yield, and take actions accordingly. SMS (Short Message Service) was introduced as a GSM (Global System for Mobile Communications) service in 1992. In the last years, SMS have been widely used in remote control due to its ubiquity, area coverage, cost and ease of use. Despite the increase in GSM/GPRS modules in the market, most industrial applications are still using SMS as a communication protocol. In this paper we present a module to both extend the serial communications port for other devices or protocol adapters and to allow remote control by SMS messaging. In order not to interfere with local applications, it is interesting that RS232 device ports are kept connected to local application hosts. Consequently, there should be a module that, apart from having 2 serial ports, is capable of transmitting information wirelessly. Unfortunately, the RS232 serial protocol is a pointtopoint communication one, thus the introduction of a thirdparty port necessarily requires the participation of an arbiter that resolves the conflicts and sets the appropriate connections. The arbiter has been implemented on a SonyEricsson GR47 module, as well as its applicationspecific queue reading/writing functions. The application presented has been used in an industrial appliance designed for MAM Electrnica to remotely control UPS systems in order to monitor charger status, battery status and condition, utility status, et cetera.
DCIS 2004
 42 
HIS article describes the design and implementation of a complete signal acquisition system
intended for use in telemetry systems for river tugboats. Specifically, the design aspects of the
signal processing unit are covered. A complete prototype was built, installed and put into operation in a tugboat of the fleet of a fluvial transportation company in Northern Colombia (Flota Fluvial Carbonera Ltda). System functionality was mainly synthesized on an Alteras FPGA, taking great advantage on rapid prototyping. Field measurements are reported, making detail in the fuel volume calculation, which demonstrated to be more accurate than previous method. As an industrial application, this system represents an innovation for the companys operation. Specifically, a method to estimate fuel level and volume is proposed. This method does not require installation of neither 3D angular sensors nor complex calculation to compensate measurements due to swell or navigation conditions. A single sensor was installed in each one of the fuel tanks achieving less difference with real dumped fuel than previous method. The telemetry system is composed by hardware and software components. The hardware component is an acquisition and processing unit. The second component is a network management application, totally tailored in C++ for this telemetry application and conformed by three main software modules. Processed data is transmitted to the companys network operation center using a satellite link.
Sensor connection (from machine room)
Signal Processing
Figure 1. Standalone enclosure with the acquisition and signal processing hardware
DCIS 2004
 43 
HIS paper describes a new kind of sensorless electronically controlled horn for automobiles.
The main benefits of this new horn over the classical electromechanical counterparts are: a much
longer lifetime, a lower level of generated electromagnetic interference, a better behavior against aging and stress, the avoidance of the adjusting operation during manufacturing and its multifunction capabilities. To fulfil all these targets we have substituted the electromechanical breaker of previous designs by a solid state switch controlled by a microprocessor on board the horn. In order to detect the resonance condition of the horn we use a novel technique based on the analysis of the current across the coil (as seen in the figure), avoiding the need for any soundlevel, position or motion sensor. The lifetime of these new horns is, at least, forty times that of previous breaker horns. The level of generated EMI for these new horns is much lower than previous electromechanical designs. These new horns can fulfil the requirements of both 95/54 EC directive and CISPR 25 standard, whereas the breaker horns cannot. As the horn is now selfadjusting, its behavior against aging and stress is better, as it can compensate for these factors dynamically. This selfadjusting operation also allowed the avoidance of the expensive trimming operation during fabrication. We have also developed a multifunction horn capable of producing different kinds of sound at the command of the vehicles electronic control unit. Both kinds of horns were assembled using the very same machinery as previous breaker horns and serve as perfect spare parts for them.
Magnetic Assembly
DCIS 2004
 44 
Design of LowPower CMOS ReadOut ICs for Large Arrays Cryogenic InfraRed Sensors
B. Misischi1 , F. SerraGraells1 , E. Casanueva2 , C. Mndez2 and L. Ters1 bertrand.misischi@cnm.es, paco.serra@cnm.es, ecasanueva@indra.es, cmendez@indra.es and lluis.teres@cnm.es 1 Institut de Microelectrnica de Barcelona, CNM, CSIC (Spain) 2 Indra Sistemas S.A. (Spain)
HIS
paper describes a complete design methodology for a low power cryogenic design readout
integrated circuit (ROIC) of large arrays of infrared (IR) detectors. The presented methodology
includes IR sensor modeling, MOSFET modeling at cryogenic temperature, circuit design, physical verication strategies and the systemonchip realization. Also, novel lowpower and compact CMOS circuits are proposed to implement all the required basic building blocks, from the active pixel sensor (APS) to the composition of the output video signal. The resulting high performance 50012 array and 60ns/pixel systemonchip, capable of capturing highresolution and realtime infrared images, like 640500@100fps, has been designed for a standard 0.35m CMOS technology from AMS.
APS
reset reset
Iqwip Ctest
reset test
Cint/CDS
+
reset
Vpixel
select init+reset
CA
init
Vrefi columnbus
CB

Figure 1: Simplied schematic (left) and layout including bumping pads (right) of the APS cell, and landscape view of the complete ROIC layout including bonding pads (bottom).
DCIS 2004
Vcol
Vrefo
 45 
INCE their publication in 1998, power analysis attacks have attracted significant attention
within the cryptographic community. So far, they have been successfully applied to different kinds
of implementations (eg: smart cards, ASICs, FPGAs) of cryptographic algorithms. To protect such devices against power analysis attacks, it has been proposed to use a dynamic and differential logic style for which the power consumption does not depend on the data handled. In this paper, we suggest to use the Dynamic Current Mode Logic to counteract power analysis. The resulting circuits exhibit similar resistance to the previously published proposals but significantly reduce the power delay product. We also demonstrate that certain criteria previously used to evaluate the resistance against power analysis have no cryptographic relevance.
DCIS 2004
 46 
Session 2d
Gildas Leger, Adoracin Rueda Instituto de Microelectrnica de Sevilla (IMSECNM), Universidad de Sevilla Edificio CICA, c/ Tarifa s/n, 41012SEVILLA, SPAIN. email: leger@imse.cnm.es, tlf: 34 95 505 66 66
IRST and second order sigmadelta modulators are commonly used in cascaded structures to
achieve high resolution analogtodigital converters. While these modulators are gaining more and
more resolution, they become harder to test. Embedded test solutions and BuiltIn SelfTest (BIST) techniques are faced to important issues to ensure the test stimulus precision or the test data acquisition. This makes functional test a tricky path to follow. For almost all systems, and in particular for 6' modulators, a behavioural model where the principal noidealities are quantified is usually used to settle highlevel design specifications and realize highlevel simulations. Designers know that to reach a given precision they have to guarantee a number of parameters like amplifier DC gain, amplifier slewrate and bandwidth, capacitor switching noise level, integrator output range, etc. Hence, from a test viewpoint it can be assumed that the principal causes of unexpected performance decrease should be related to these highlevel design specifications. In other words, if a modulator is not performing as expected it is likely that some highlevel parameter have been brought out of specification. It is thus of utmost interest to diagnose and measure these parametric faults. This paper presents a simple and fully digital test technique able to evaluate amplifiers settling error in second and first order 6' modulators. These settling errors are related to amplifier gainbandwidth product and slewrate, which are part of the above mentioned set of high level design specifications. Actually, they are known to be a source of nonlinearity. The realistic simulations presented in this paper exhibit good matching with theory and show very promising results as the integrator settling error can be determined with good precision. It is also shown that the settling error can directly be related to a precision loss, which enables a functional interpretation of the test signature.
DCIS 2004
 48 
IXL UMR CNRS 5818 ENSEIRB Univerist e Bordeaux 1 351 cours de la lib eration, 33405 Talence CEDEX France Telephone: +33 5 4000 6540, Email: [jridi,dallet]@ixl.fr Institut Sup erieur des arts du multim edia de Manouba Campus universitaire Manouba, 2010 Tunisie Telephone: +216 71 602 050, Email : chiheb.rebai@voila.fr STMicroelectronics  850, rue Jean Monnet, F38926 Crolles Cedex Telephone: +33 4 76 92 50 26, Email : sylvain.engels@st.com STMicroelectronics  12, rue Jules Horowitz BP217, 38019 Grenoble Telephone: +33 4 76 58 62 54, Email : laurent.dugoujon@st.com
This work is deployed in the context of Analog to Digital test Converters where a precise sinusoidal sources is needed. A promising solution to this challenge, a digital Lossless Discrete Integrator (LDI) resonator combined with a 1bit Delta Sigma modulator and simulation results has been yet presented. This present article proposes a new methodology enabling us to go towards ASIC using the same parameters of FPGA implementation. As summarized later, the digital source is build using digital hardware: digital registers, adders, multiplexers and shifters. The work presented will be divided into three parts: this paper rst reviews the fundamentals of DeltaSigma modulator based signal generation (Section II): rst of all we talk about the digital resonator and its drawbacks. Then we detail the principle of Sigma Delta attenuator where we emulate the multiplier operator by Sigma Delta modulator and a multiplexer. This oscillator has to generate a precise 1bit signal to test the ADC. To perform the SNR, we need to increase the modulator order. A problem of stability can appears if the modulator quantier is on only one bit. Schreier seems found an empirical methodology to resolve this. Section III briey outlines the material FPGA implementation. The simulation results will be presented to validate our preferences to the selected LDI structure and the simulation parameters. The difculties of this part consist on the data ow where we used a xedpoint precision. Section IV describes the circuit. The layout schema will be shown in the end of this paper.
DCIS 2004
 49 
Optimal implementation of linear and adaptive filter bank for ADC characterization
1
MEDIATRON Laboratory Ecole Suprieure de Communication (SUPCOM) 2088 Cit Technologique des Communication, Tunis, TUNISIE
Laboratoire IXL UMR CNRS 5818 ENSEIRB Univerist Bordeaux 1, 351 cours de la libration, 33405 Talence CEDEX France, tel : 33 5 4000 2632, dallet@ixl.fr
signal in its main spectral components in the field of ADC characterization. Both structures are based on band pass LDI filter (Lossless Digital Integrator) which is known for its efficient implementation. From real acquisitions, we show the efficiency of linear structure to obtain an estimation of the main spectral parameters. Nevertheless, if the input frequency is not the expected one, we have to use an adaptive structure to track the fundamental component. In this way, we can estimate the spectral parameters related to the ADC performances. These two architectures were simulated in floating point precision. This paper shows the design consideration to take into account for the implantation in fixed point precision: the data flow for the linear structure and the latency problem for the adaptive one.
I N this paper we present a linear and an adaptive filter bank employed to decompose the
z 1
k1
k2
z 1
1/2
H (q,n) y'(n)
Xin
Xout
DCIS 2004
 50 
1.J. M. GarcaGonzlez, S. Escalera, J. M. de la Rosa, F. Medeiro, O. Guerra, B. PrezVerd and A. RodrguezVzquez, Design and Implementation of a 0.35Pm CMOS Programmablegain 21 Cascade 6' Modulator for Automotive Sensors, Proceedings XIX Design of Circuits and Integrated Circuits Conference, pp. 114119, 2003. 2.O. Guerra, J. Ruiz, J. M. de la Rosa, F. Medeiro and A. RodrguezVzquez, A decomposition methodology to test highresolution 6' modulators Proc. 9th International MixedSignal Testing Workshop, pp. 6570, 2003
DCIS 2004
 51 
HIS paper presents some guidelines for the design of an onchip analyzer for extracting, in the
digital domain, the main characteristic parameters of an analog sinewave signal. The analyzer,
reported elsewhere1, is based on a doublemodulation, squarewave and sigmadelta, altogether with a simple digital processing algorithm. We discuss the specifications required for the analog part of the analyzer and describe an areaefficient implementation of the digital part. In addition, we show simulations results which demonstrate the validity of the proposed guidelines, while the simplicity and the robustness of the circuitry make it very suitable for BIST applications.
D. Vazquez, G. Huertas, G. Leger, A. Rueda, and J. L. Huertas, A Method for Parameter Extraction of Analog SineWave Signals for MixedSignal BuiltInSelfTest Applications, IEEE Design and Test in Europe (DATE04), Paris, France, Feb. 2004.
DCIS 2004
 52 
Session 3a
Implementing the FFT Algorithm on FPGA Platforms: A Comparative Study of Parallel Architectures*
M. A. Snchez1, M. Garrido1, M. LpezVallejo1 and J. Grajal2
1
jesus@gmr.ssr.upm.es
this paper we present an in depth analysis of the implementation of different FFT architectures
in FPGA platforms. The target applications are radar processing systems and wideband digital
receivers, what enforces hard constraints in processing speed. Thus, parallel pipelined architectures of the FFT have to be used. In particular, feedback and feedforward architectures are analized in detail, studying the variations of results with a set of key design parameters: radix, word length, number of points or the effect of truncation. Additionally, the impact due to the implementation in programmable devices will be considered when designing and analyzing the different architectures. Two alternative structures have been studied: feedback and feedforward architectures. They provide very diverse results in terms of area and performance, what results in different applications of the proposed architectures. In this way, feedback structures can be used for long Npoint FFTs, because of their small area, while feedforward architectures are better suited for applications with hard realtime constraints due to their better speed. Figure 1 depicts the results of area and speed obtained for different implementations of feedback (FB) and feedforward (FF) implementations.
14000
12000 10000
8000
6000
4000 2000
0
16 64 25 6 16 64 FF FB FB FF 10 24 10 24 FF 25 6 FF
FB
Figure 1. Area and perfomance results of feedback (FB) and feedforward (FF) architectures for 16, 64, 256 and 1024 points
*
This work was supported by the Spanish Ministry of Science and Technology under contract TIC200307036.
DCIS 2004
 54 
FB
umanitarian deminining has become an important issue in regions where an army conflict has
occurred. The detection of small plastic mines can not be done using classical detection
techniques, such as metal detectors, because their metal content is null or very low. The use of infrared images of the soil is an efficient technique to detect this kind of mines1. This approach is based on thermal modeling of the heat transfer processes in the soil and at the soilair interface. This is used to characterize the soil thermal response to a given stimulus, also known as the thermal signature. Perturbations on the expected signature constitute reliable indicatives of the presence of mines, due to the different thermal properties of the soil and the mine. The detection of the mines is divided into two steps; in the first step a comparison between the real data and the data obtained in the simulation of the thermal model is made. In the simulation process we first assume that there are no mines present. The differences between the real data and the simulated data give us indicatives of the presence of unexpected objects. The second step is an inverse problem, in which the thermal model must be run for multiple soil configurations, representing different possible depths of burial and different types of targets (mine, stone ...). The nearest configuration to the real data gives us the estimated position and the nature of the unexpected targets. Several soil configurations (nature and position of the object) must be run in order to detect with high precision the position and nature of the unexpected patterns. This detection scheme is a very long timeconsuming process on a personal computer. In this work an architecture of a system which simulates the thermal model is projected onto an FPGA in order to reduce the computing time. The system is formed by four memory banks, a processing element, a unit that generate the addresses that must be loaded/uploaded from the memory and an element that generates the required control signals. The pipelined structure of the design lets to update several nodes in parallel. In the current implementation a reduction factor in the computing time of 15 is achieved.
P. Lpez. Detection of Ladmines from Measured Infrared Images using Thermal Modeling of the soil. PhD, Univ. Santiago de Compstela, Spain, April 2003.
DCIS 2004
 55 
his paper deals with the comparison of two FFT/IFFT cores implementations on the Altera Stratix Component.
As FPGAs are particularly well suited to highspeed and regular functions, they can perform DSP functions, answering both the need for flexibility and high performances. The FFT, FIR, DCT, DSP functions are iterative and need high level pipeline, space and time parallelism. However, FPGA architecture must be optimized in order to increase the performances of the cores, including MAC ( Multiply Accumulate) hard blocks. Using wired blocks, the DSP functions run faster. In the first part of this paper, the studied DSP board is presented, by mean of the Stratix component, connected to SRAM, converters DAC and ADC.
MultiCarrier modulation
/demodulation (MC). In fact, this one is easily carried out in the digital domain by performing IFFT and FFT operations. In the receiver, after direct FFT, the received sequence is "equalized" in the frequency domain. Nowadays, MC combined with spread spectrum is undoubtedly a high potential candidate for the air interface of the 4G cellular networks.
The two IFFT/FFT IP cores are presented and compared : the Altera FFT MegaCore function is a
parameterizeable IP core. It uses an inplace mixed radix 4 and 2 decimation in frequency architecture, and implements any transform length that is a power of 2. The Jaguar II is a variable FFT/IFFT core up to 1024 points. Available as a softcore, it is parameterized to allow up to 32bits of resolution (32 Inphase / 32 Quadrature).
The two cores have been implemented, taking into account the specific architecture of the Sratix component. The results show that the Altera FFT core runs faster than the Jaguar core and used less resources (both DSP blocks and memories). Neither, the Jaguar requires less cycles than Altera. Indeed, a 1024 points FFT with a 45 MHz system clock is performed in only 29 Ps with the Jaguar core, compared with the 130 Ps with Altera, which means a ratio of 4. This lowest clock system will result in a lowest power consumption.
DCIS 2004
 56 
HIS paper presents a comparison of two possible approaches for the efficient implementation of
Programmable Logic Devices (FPLDs). The first implementation uses a data path based on traditionally used redundant CarrySave Adders (CSA), the second one exploits standard CarryPropagate Adder (CPA) with fast carry chain logic not yet used in fully scalable designs. Both implementations use large embedded memory blocks available in recent FPLDs. Speed and logic requirements comparisons are performed on the optimized designs. The issues of targeting a design specifically for a FPLD are considered taking into account the underlying architecture imposed by the target FPLD technology. It is shown that carrysave adder is not an optimal building block for constrained scalable MM coprocessor in modern Altera FPLDs. The proposed implementation method can also be applied for FPLDs from other vendors since it uses building blocks generally available in modern FPLDs highspeed dualport embedded memories and fast carrypropagated logic.
Yw1(j) Mw1 (j) Yw2 (j) Mw2 (j) Y0(j) M0(j)
qi xi
Yw1(j) Mw1(j)
Yw2(j) Mw2(j)
qi
Y0 (j) M0(j)
xi c in1
( j) ( j) 1S w1 2S w1
( j) ( j) 1S w2 2S w2
(j) (j) 2S 0 1S 0
FA cout1
t
FA Sw2 (j) FA
FA S0(j) FA c in2
FA
FA
FA
Sw1 (j) FA
FA
FA
FA
cout2
( j1) (j1) 1S 0 2S 0
Sw1
(j1)
Sw1
(j1)
Sw1 (j1)
Figure 1. Block diagrams of analyzed CSA and CPA based processing elements A.F. Tenca, C.K. Koc, A scalable architecture for modular multiplication based on Montgomerys algorithm. IEEE Transactions on Computers , vol. 52, no. 9, pp. 12151221, Sept. 2003.
1
DCIS 2004
 57 
An implementation of a Parallel Architecture for the SelfSorting FFT Algorithm applied to IEEE 802.11a
Ainhoa Corts* , Igone Vlez* , Pilar Calvo* , Juan F. Sevillano and Andoni Irizar . * CEIT Research Center, Department of Electronics and Communications, Spain. Universidad de Navarra, Department of Electrical and Electronic Engineering, Spain.
N this paper we present an implementation of a parallel architecture for the SelfSorting (SS) Fast Fourier Transform Algorithm that optimizes the processing rate for the IEEE 802.11a standard. Two
structures have been developed in the radix2 Butterfly to improve the architecture. In order to analyze the dependence of the FFT on the bitwidth of the input data and of the twiddle factors, the SNR of our module has been studied. The resulting design is parameterizable, regular and modular, presenting constant geometry. The total processing time required is (2nN ) (rQ) log r N for a number of points N=rn , where r is the radix and n represents the number of the stages to process the FFT, computed using Q=ru processors. The SS algorithm was implemented on a processor column (PEs). The data flow between PEs, by using eight processors in parallel to execute a FFTradix 2, is shown in figure 1. In table I we compare the processing time of our design with other architectures for different clock frequencies. As IEEE 802.11a needs 4Ps as processing time, the Parallel Architecture presented here fulfils the timing specifications.
Table I. Comparison with other architectures
ROM PE0 ROM PE1 ROM PE2 ROM PE3 TOP_FIFO_INPUT ROM PE4 ROM PE5 ROM PE6 ROM PE7 TOP_FIFO_OUT
CLK (MHz) Fast64 1 64Xilinx 2 Cobra 3 64Point 4 64Point Parallel SS Parallel SS Parallel SS 50 50 40 40 100 50 40 100
Processing time (Ps) 2.82 3.84 5.55 3.2 1.3 2.22 2.775 1.11
CONTROL STAGE
L. Fanucci, M. Forliti, and P. Terreni, Fast: FFT ASIC automated synthesis, INTEGRATION, the VLSI journal, vol. 33, pp. 230234, 2000. 2 Xilinx Product Specification: HighPerformance 64point complex FFT/IFFT V1.0.5. 3 T. Chen and L. Zhu, COBRA: A 100 MOPS singlechip programmable and expandable FFT, IEEE Transactions Very Large Scale Integration (VLSI) Systems, vol. 7, n 2, pp. 174182, 1999. 4 Tiong Jiu Ding, John V. McCanny and Yi Hu, Rapid Design of Application Specific FFT Cores, IEEE Trans. on Signal Processing, vol. 47, n 5, pp. 13711381, May 1999.
DCIS 2004
 58 
RIGONOMETRIC
digital signal processing, image processing, simulation of physic phenomena, etc. An initial range
reduction is required to perform forward trigonometric functions when the input angle is too large. The most usual method for range reduction involves two consecutive multiplications. The first one allows obtaining a scaled version of the reduced input angle, and the second one calculates the correct value for the reduced input argument. The CORDIC algorithm is a wellknown method for computing trigonometric functions. For the sine and cosine computation, a vector (1, 1/k) is rotated over the input angle, using iterative rotations over a fix set of given elementary angles, which are stored in a lookup table. In this paper, a new range reduction technique which is optimized for the CORDIC algorithm is proposed. To directly operate over the scaled version of the reduced input angle, the elementary angles are scaled by the same factor, before store them in the lookup table. Thus, the computation of the second multiplication is avoided. The designs based on our proposal require a classical CORDIC module where the table contains scaled elementary angles. Two basic implementation alternatives are considered: wordserial and pipeline implementation. Both alternatives have been implemented in FPGA to verify the improvement obtained with our proposal. For the wordserial implementation, the experimental results show a speedup of about 32% with the similar hardware cost. For the pipeline case, the classic approach requires about 32% more CLBs with similar cycle time and large latency.
DCIS 2004
 59 
Session 3b
CMOS Buried Double Junction Active Pixel Sensor For HighSensitivity LowResolution Linear Arrays
P. Pittet, G. Carrillo, G.N. Lu, L. Hannati LENAC, Universit Lyon 1, Villeurbanne, France, Patrick.Pittet@lenac.univlyon1.fr
N this paper, we present the study and design of a CMOS active pixel sensor (APS) for a highdetectivity, lowresolution linear array, which is intended to be used as part of biochemical
microanalysis systems for imaging and spectrophotometric purposes. The proposed CMOS APS implements a large buried double pn junction photodetector (BDJ) and charge sensitive regulated cascode amplifiers. One benefit of using a BDJ photodetector rather than a simple photodiode is that the former has two junctions for collecting carriers, thus providing higher sensitive response. Another advantage of employing a BDJ detector is that it can be used as a wavelengthsensitive device, which may be helpful for selectivity achievements in biochemical analysis applications. The detector of the APS has an area of 100 m x 300 m. To deal with its inherent junction capacitances related to its size, we propose a pixel circuitry integrating charge sensitive regulated cascode amplifiers. This allows the use of integration capacitors much lower than the detectors parasitic capacitances, thus achieving much higher conversion ratio (160nV/e) compared to conventional architectures. Time domain analysis and simulations are performed for dominant noise source identification and quantification. At a supply voltage down to 3V, the proposed APS has a dynamic range larger than 60dB. For an integration time of 200ms, the detectivity of the proposed APS is evaluated to be 3.9 1012 cm Hz W1 for the well channel and 2.3 1012 cm Hz W1 for the diffusion channel.
Vdd T'5 T'4 T'2 T'3 T'1
Cint
h
Reset
Vdd
Cint
T1 T3 T2 T4 T5
a)
b)
DCIS 2004
 61 
HIS paper presents a first prototype of a system monitoring a set of remote temperatures. The most important requirement of the system is that the temperature sensors must be watertight modules,
without any incoming or outgoing wire, neither for information transmission nor for power supplying. This implies the use of intelligent battery powered temperature sensors, wireless connected to a central monitoring unit. A reliable RF link on the 433 MHz ISM band has been selected for data transmission because this type of wireless link is the best suited for the envisaged industrial applications of the system, as temperature monitoring of coldstorage rooms, where walls must be traversed. The temperature monitoring system is integrated by up to 255 intelligent temperature sensors and a master unit. Each temperature sensor is wireless connected to the master unit through an RF link, and all the RF links are operated at the same frequency, thus requiring only a transceiver in the master. This architecture implies some design considerations, as an addressing scheme for the nodes in the system and a transmission policy to avoid collisions (only one node may be transmitting at a time). A policy based on time multiplexing was selected, thus requiring the existence of a "global" clock from where to build quasisynchronous time multiplexing in every node. This "global" clock is obtained through a synchronization mechanism based on periodical transmission of synchronization frames. The intelligent temperature sensors are implemented by connecting an accurate temperature sensor (DS18B20 from DallasMaxim) and an RF transceiver (nRF401 from Nordic) to a microcontroller (PPD78F9076 from NEC). For improving the debugging performances, the sensor node has been completed with an external user interface that includes microswitches, LED's and an LCD. A simple master unit has been considered, with its architecture similar to the architecture of the sensor nodes. A low power design has been considered for the temperature sensor in order to improve its autonomy. This low power design significantly affects the temperature acquisition process and the communication mechanism established between the central unit and the remote sensors, as both the RF emitter and receiver are very consuming subsystems. Sensor autonomy of 8 months has been achieved for the first prototype when using three AA battery cells for powering it. The system is able to work at distances greater than 100 m in an open area using cheap loop antennas. Improvements of the system, as considering the use of the new nRF9E5 device from Nordic, are under study.
DCIS 2004
 62 
ignal conditioning and analog signal conversion constitute essential links in data processing sequence. Each sensor needs its own interface. However, a same interface, provided that it is parametrable, can be used to implement various kind of sensors. This article describes a methodology used to design this interface. Our approach is illustrated by an example of a Sigma Delta modulator implementing switched capacitors. CMOS standard technology was chosen because it is well adapted to switched capacitor technique. Moreover, Sigma Delta modulator allows small signal conversion without initial amplification. Finally, direct integration of sensor in the modulator architecture is possible. The interface that we present uses individually parametrable analog and digital blocks whose assembly constitutes itself a block able to interface a given sensor. The concept is validated on a second order Sigma Delta converter. The adaptation of the first stage integrator enables to associate it with various kind of sensors. The architecture of the second stage remains the same. The methodology is partly illustrated on a capacitive sensor. This sensor replaces the sampling capacitor in a Sigma Delta DT modulator.
DCIS 2004
 63 
An improved Love wave oscillator for low concentration chemical sensing application
Nicolas Moll1, Corinne Djous1, Dominique Rebire1, Jacques Pistr1, Roger Planade2
1
Laboratoire IXL, ENSEIRB, CNRS UMR5818, Univ. Bordeaux 1, 351 cours de la libration, F33405 Talence, France 2 Centre dEtudes du Bouchet, (DCE/DGA), 91710 Vert Le Petit, France
OVE wave chemical sensors are very sensitive Surface Acoustic Waves sensors, their high sensitivity is due to the confinement of the acoustic energy in a guiding layer nearby the
surface of the sensing area. We present an electronic system devoted to chemical and biological sensing applications using Love waves delay lines. In an oscillator configuration, detection of target compounds corresponds to a frequency shift. This oscillation frequency has to be the least noisy as possible in order to reduce the detection limit. The oscillator loop is composed of Love waves delay line and electronic feedback to satisfy the Barkhausen conditions in phase and gain. The role of the electronic feedback is essentially to equilibrate the insertion losses of the delay line estimated to up to 40 dB for gaseous or liquid detection; a change of the wave phase velocity in the delay line due to sensing implies a variation of the phase condition in the oscillation loop and so an oscillation frequency shift. The electronic feedback is composed of amplifiers, attenuator, filter and coupler to sample the oscillation frequency. Noise figure of each component, and especially amplifiers, has been studied to improve the stability of the oscillator. The transmission of the signal has also been studied to avoid signal reflection. Thats why the characteristic impedance of the transmission lines have been matched to 50 : and an upper and a lower ground plan have been introduced in the design of the electronic card. Short term stability has been monitored in order to evaluate the stability of the oscillator. This soachieved oscillator (Figure 1) results in a shortterm stability lower than 1 Hz/s at a 110 MHz working frequency with phase noise of 100 dBc/Hz @ 1kHz. Such stability enables very low concentration detection of target compound, lower than 1 ppm for gaseous detection.
DCIS 2004
 64 
The hotcarrier effects and electroluminescence are wellknown phenomena in CMOS technology, many devices are not subject to luminescence from transistors but CMOS imagers are sensitive to it. The requirements of both scaling down the pixel size of CMOS Image Sensor and maximizing its fill factor are commonly fulfilled by making use of deep submicron process and closetominimum size geometries for inpixel transistors. In this context, we analyze the degradation of pixel induced by hot carriers generation of the inpixel source follower transistor and show that it is associated with electroluminescence. We present the different kind of parasitic effects in the CMOS imagers. These effects have been observed in several process generations and more particularly in a 0.25m technology with various operating conditions. In the CMOS imagers, the hot carriers (HC) generation can occur in the source follower transistor Msf when this one operates in saturation, i.e. during the selection of the pixel. In this condition, a secondary impact ionization induces minority carriers that flow in the substrate and can be collected by the photosensitive areas. Physical 2D simulations (ISETCAD environment) of the pixel behavior allow to location clearly the HC generation and to demonstrate that impact ionization induced carriers is measured as an excess current in darkness. Comparison of analog simulations of the substrate current from Msf and measured photodiode current demonstrates a strong correlation between the excess dark current of the photosensitive area and the HC generation revealed by the substrate current. Furthermore, this impact ionization phenomenon allows a photon generation whose light intensity increases as gate voltage Vin decreases. The correlation between the HC creation (shown by the substrate current of Msf) and the electroluminescence is presented in a 0.25m. Figure. 1 shows the electroluminescence phenomenon from a pixel subarray using a 0.35m process and show that only the selected rows emit light.
Light emission
One pixel
Figure. 1 Electroluminescence from the source follower transistors when reading successively three different rows. (0.35 process, Ibias=13 A).
DCIS 2004
 65 
ITH the increasing power density in deep submicron integrated circuits, the occurrence of failures due to overheating has considerably increased. In this paper, a simple and efficient built
in temperature sensor for the online thermal monitoring of standardcell based VLSI circuits is presented. The proposed smart temperature sensor is based on a ringoscillator. It has been found that the oscillation period can reach a linear dependence with temperature using an adequate ratioed inverter (fig 1). This implies that to obtain an adequate nonlinearity error, the ringoscillator must be optimised at transistor level; thus involving transistor sizes different to those to the inverters of the target standardcell library. To produce an output signal with a period proportional to the temperature, we have replaced the inverters of the oscillator by more complex inverting gates. Simulation results obtained in a 0.18mm CMOS technology show that the nonlinearity error of the sensor can be reduced when an adequate set of standard logic gates is used (fig. 2).
4,2E10
4,0E10
3,8E10
0,2
Period (s)
3,6E10
Error (%)
3,4E10
3,2E10
3,0E10
Temperature (C)
Temperature (C)
DCIS 2004
 66 
Session 3c
Bioinspired Circuits
Wednesday nov. 24 14h15 15h45, Auditorium Chairs Sylvie Renaud (E.N.S.E.I.R.Bordeaux) Herv Barthlmy (E.P.U. de Marseille)
We present an electronical analog circuit modelling a FitzHughNagumo neuron with a modified excitability. To characterize this basic cell, the bifurcation curves between stability with excitation threshold, bistability and oscillations are investigated. An electrical circuit is then proposed to realize an unidirectional coupling between two cells, mimicking a chemical synaptic coupling. In this masterslave configuration, we show experimentally that the coupling strengh and the master interspike period control the dynamic of the slave neuron, leading to period doubling, chaotic behavior and synchronization. The architecture of the neural network is then described allowing small assemblies studies.
DCIS 2004
 68 
paper presents an original mixed IC, designed for the development of computational This neuroscience hardware/software tools. This ASIC integrates configurable neuromorphic functionalities, and computes in realtime the electrical activity of various neural elements, described by conductancebased models. The neurons structure and models parameters are onchip programmable. We present here the key issues of the design, the ASIC main characteristics, and how it will be integrated in the complete simulation system. Different neural behaviors (spiking, bursting) are then programmed on the chip. Results are evaluated and compared to those obtained by software simulation tools. Index Terms : Mixed Integrated Circuit, Silicon neurons, Neuromorphic engineering, BiCMOS
DCIS 2004
 69 
RTIFICIAL Neural Networks are computing tools consisting of small processing elements, called artificial neurons, highly interconnected and arranged in layers. Inputoutput function carried out
by these systems is learned by means of a training process, adjusting the system free parameters that connect inputs from a neuron layer with the preceding neuron layer outputs. Previous works have presented the use of mixedmode electronic blocks in artificial neurons implementation, showing promising results applied to real problems. In this work, design and simulation of some electronic basic blocks to build a class AB mixedsignal current mode artificial neural network are shown. The main electronic blocks are a fourquadrant mixedmode multiplier based on an R2R ladder (Fig. 1) and a current mode nonlinear transfer function (Fig. 2). Resulting practical models are used to designing a Multilayer Perceptron, and are applied to a sensor linearization problem. In order to minimize effects due to mismatching and offsets in the proposed blocks along the training process, a perturbative algorithm is selected to match the suitable weights. The performance achieved with the neural model in four different sensor samples show a sensor linear range extension (an error lower than 1 degree) of 50% or more.
+Vcc B1 B2 B3 B4 B5 B6 B7
Iout
I0 I0/2
I0/2 I0/4
I0/4 I0/8
I0/8
B8 B9 B10 B11
Iin
O1
MUX
Ia
Iin
Iout
Ib
AnologIN
b0
b1
b2
b3
Iout1
Iout2
B16 B17 B18 B19 B20 B21 B22
Vcc
DCIS 2004
 70 
A DiscreteTime Cellular Neural Network Architecture for a PixelLevel Snake Onchip Implementation
V.M. Brea, D.L. Vilario, D. Cabello Department of Electronics and Computer Science University of Santiago de Compostela Santiago de Compostela, Spain Phone:+34981563100, Ext. 13572. Fax:+34981528012 Email : victor@dec.usc.es
N this paper, we approach the hardwarelevel design for the onchip implementation of an active
contourbased technique, an improved PixelLevel Snake (PLS) version. Such a new pixellevel
snake technique counts on features from both, parametric and implicit models. This leads to a better performance on contour detection with low computational cost. The computation time is also decreased to reach video rate processing with an onchip implementation by means of an SIMD architecture with a direct correspondence between pixel and processing element, namely DiscreteTime Cellular Neural Networks (DTCNN). The synergy of PLS and CNN, either ContinuousTime (CT) or DiscreteTime (DT) is a promising tool for real time processing. This has been proven by running the PLS technique reported here onto a general purpose CTCNN chip, the ACE4K1. The design of a specific CMOS DTCNN chip has also been successful for the original PLS2. This paper addresses the new DTCNN PLS architecture along with the new DTCNN PLS cell for a future specific CMOS onchip implementation.
D.L. Vilario, Cs. Rekeczkey, Implementation of a PixelLevel Snake Algorithm on a CNNUMbased Chip Set Architecture, IEEE Transactions on Circuits and SystemsI, Volume 51, Issue 5, pp. 885891, May 2004. 2 V.M. Brea, D.L. Vilario, A. Paasio, D. Cabello, Design of the Processing Core of a MixedSignal CMOS DTCNN Chip for PixelLevel Snakes, IEEE Transactions on Circuits and SystemsI, Volume 51, Issue 5, pp. 9971013, May 2004
DCIS 2004
 71 
AddressEventRepresentation (AER) transceiver chips such that (a) input events can be
weighted according to a digital word, (b) this weight includes a sign bit, (c) the incoming event is accompanied by a sign bit, and (d) the pixel can be calibrated to compensate for mismatch in large arrays of these pixels. A prototype has been fabricated in the AMS 0.35m CMOS process, whose experimental measurement results are provided.
DCIS 2004
 72 
HIS paper presents a highperformance fully digital implementation of cells of the recently
introduced simplicial cellular neural network (CNN). The simplicial CNN exhibits a higher
functional capacity than the standard CNN, while keeping the complexity within acceptable limits. The theory of canonical piecewiselinear (PWL) representation underlying the simplicial CNN makes the structure particularly advantageous when the CNN needs to be trained from examples. This work presents a digital implementation on a FPGA platform of the simplest twodimensional configuration of a cell in a simplicial CNN, i.e. a 3x3 neighborhood architecture. The use of reconfigurable devices to implement emulated digital CNNs provides more flexibility than the VLSI designs because different architectures can be used on the same FPGA device. The design proposed here and shown in Fig. 1 implements the interconnection behavior of a simplicial cell, and has been compiled for an Altera FPGA of the family Stratix, the EP1S80F1508C7 (Stratix I). The result is an occupation of less than 1% of the logic elements, memory bits and DSPblocks, and a maximum performance frequency of 171 MHz. The design is intended to be a part in emulated digital simplicial CNNs, which could offer better characteristics than software versions for the synthesis of PWL models used in many applications of nonlinear systems. The results of this work are being compared with software and digital implementations of other techniques oriented to efficiently represent PWL components.
u1 8 u9 8
Controller
16
vi,l 8 i,l
ROM ( c i,j )
8
MAC
16
overflow
output
Figure 1. Digital implementation of the connectivity behavior of a simplicial cell in a CNN architecture.
DCIS 2004
 73 
Session 3d
Optimization of a high voltage pchannel transistor fabricated using a standard CMOS process
A. PrezToms, X. Jord, P. Godignon, M. Vellveh and J. Milln Centre Nacional de Microelectrnica (CNMCSIC). Campus UAB, 08193, Barcelona. Spain Author coordinates: Tel. (34) 93 594 77 00, Fax. (34) 93 580 14 96, email: aperez@cnm.es
HIS paper has been focused on the optimization of a simple high voltage extended drain p
channel structure (EDpMOSFET), fabricated using an standard low cost twintub 2.5Pm CMOS
technology. There is a strong interest to monolithically integrate CMOS and highvoltage power devices within the same process. An existing lowvoltage technology should desirably be used for the fabrication of Power Integrated Circuits (PICs), in order to reduce the development efforts and time. One of the main advantages of this approach is that all the existing standard cells and libraries can be used in designing the low voltage parts of the circuit. CMOS technologies can be nwell, pwell or twintub. The fabrication of high voltage pMOSFET devices fully CMOS compatibles with nMOSFET devices in a twintub process could become a difficult or even impossible task. Only one process step and one mask level has to be added to the standard CMOS process to implement the EDpMOSFET. This structure has been optimized attending the channel length LCH, the extended drain length LED, and the doping level of the extend drain NED. Other parameters (i.e. nwell doping level) are fixed by the CMOS process. The simulation results (TMAMEDICI) have been verified by experimental implementation. EDpMOSFET transistors with low specific onresistance (Active area) Ron=6.0m:cm2 (@Vg=5V) and breakdown voltage of 36V have been implemented. These results evidence that, with an adequate optimization, this device is competitive with most of previous pchannel devices reported, using more expensive and sophisticated technologies and processes. The EDpMOSFET is one of the simplest and fully CMOS compatible of the designs usually proposed. Along with an nchannel LDMOS (completely integrated within the standard CMOS process), the EDpMOSFET composes a CMOS based technology (50V/1A) suitable for power integrated circuits.
DCIS 2004
 75 
ULTIPHASE
This is due to their advantages, such as higher current capability, improved dynamic response or
reduced EMI and harmonics, which compensate for their increased components count. Some of the key applications for multiphase converters are Voltage Regulation Modules (VRM), Dynamic Voltage Scaling (DVS) or automotive power supplies. As a drawback, multiphase controllers become more complex than their onephase counterparts. However, digital controllers based on custom hardware (FPGAs or ASICs) easily solve the new problems, such as multiple driving signals generation, current sharing among phases or phaseshifting. This work especially focuses on the phaseshifting problem (generating the driving signal of each phase shifted from the previous one), proposing and comparing two different hardware structures for this task. The first structure is based on additions and comparisons, while the second one uses a shiftregister (see figure). Both methods are explained in detail and compared. The comparison rule that leads to bigger differences is area. The addition and comparison phaseshifter is appropriate for high duty cycle resolutions, while the shiftregister solution is better for high number of phases. Both methods have been implemented and tested with a prototype, showing that both allow passive current sharing (no current loop).
Counter
Resol/N Resol(N1)/N
+
Duty cycle
<
Shiftregister
<
Driving signals
<
<
Phase N
Phase 1 Phase 2
Phaseshifted counters
Phase 1
Phase 2
Phase 3
Phase 4
Shiftregister
Phase 2
Phase 3 Phase 4
Driving signals
(b)
(a)
Figure 1. Addition and comparison phaseshifter (a) and shiftregister phaseshifter (b).
DCIS 2004
 76 
LTRACAPACITORS
are attractive devices for electric storage. Because of their very low serial
resistance, they are able to exchange high levels of instantaneous power. The cell capacitance can
reach five thousand of Farads. They can work at very low temperature and can support a significant number of charge/discharge cycles. As typical application, they can be used as peak power sources in Hybrid Electric Vehicles (HEV). The pulsed chargedischarge current mode needed for these applications, leads us to propose an original approach for electrothermal characterization of ultracapacitors. In a first part, the ultracapacitor electric behavior is investigated and leads to the proposal of a specific electrical model. It is composed of an access resistor and capacitor, a serial inductor and a non linear transmission line which is approached by four RC branches. The capacitance of the transmission line depends on the ultracapacitor voltage. The model parameters are identified using both constant currents and impedance spectroscopy tests. At last, the model validation is made thanks to a pulsed current profile. In the second part, regarding thermal characterization, a simple model based on two thermal timeconstants is presented. The use of high current levels and repetitive chargedischarge profiles that are energetically neutral has allowed to reach a significant heating of the ultracapacitor. So, the model parameters have been extracted in order to predict the maximum heating of the device. At last, the combination of the thermal model and the electrical one allows us to compare simulation and experimental results. The good matching between these data leads to consider that Joule losses are the only heating source. So, the proposed electrothermal characterization is validated.
DCIS 2004
 77 
'
"
# &
$ %
DCIS 2004
 78 
Specific Drivers and Integrated 20V Regulated ChargePump for an Autonomous MicroRobot: MiCRoN
A.Saiz Vela, P. MiribelCatal, J.Brufau, R.Casanova, M.PuigVidal, J. Samitier. Electronics Department. Instrumentation & Communication Systems Lab. Universidad de Barcelona. C/ Mart i Franques,1. Barcelona 08028. (SPAIN). {asaiz,catala}@el.ub.es
HIS paper deals with the design of the driving and power supply system for a microrobotic In this paper is presented the evolution of the MINIMANV microrobot1 where the main limitation
was the autonomy. Here are presented the drivers and the solution to stepup the standard 3.3V from a battery needed to bias and actuate on the piezoelectric structures that have been developed by some partners in the frame of the MICRON project (IST200133567) where our work is being developed. A first prototype for the electronics has been developed, based on SMD components, where the power consumption and area are two main key points. Regarding the drivers, based on the LM7301 operationa amplifier, they present an important dc power dissipation on board, around 40mW for each driver. This amplifier is for general purposes, with 4MHz of GBW, and it is not present any type of power down control. Our High Voltage Operational Amplifiers (HVOA), have been designed to present stability for a fixed gain, not for the general case of the unity frequency (fu). So we can adjust the power dissipation to lower values, and also a power down control has been designed. Regarding the signals needed for two types of piezoelectric actuators, drivers based on the use of high voltage operational amplifier have been investigated, with voltage bias up to 20V. Class A and class AB operational amplifiers in a control loop configuration have been designed to meet the both requirements of both actuator types. In the case of classe A op amp, a specific design for one actuator has been made. For both piezoelectric actuators a class AB operational amplifier has been designed, with power consumption lower than a defined threshold for the total power dissipation of 15mW for each output. Concerning the integrated power supply, a two phase voltage doubler charge pump has been adopted to rise the output voltage up to 20V.
J. LpezSanchez, U. Simu, M. PuigVidal, S. Johansson, P. MiribelCatal, E. Montan, S.A. Bota, J. Samitier. "A miniature robot driven by Smart Power Integrated Circuits". IEEE/RSJ Intl. Conference on Intelligent Robots and Systems, pp. 19541959, Lausanne (CHE SUIZA), 2002.
DCIS 2004
 79 
HE power converter technologies for motor drive are very popular in various industrial
applications such as robots, machine tools etc. Those dc or ac drives need a step up / down dc bus
voltage and regeneration capability. In this type of converter, for normal operation, three kinds of sensors for detecting ac current, dc voltage and load current are basically required. A dc voltage sensor is demanded for the dc voltage feedback control. The load current sensor is needed to improve dynamic response in the dc voltage control. The two line current sensors are required for the input current control and insure power factor control operation. This paper presents a novel control scheme of bidirectional threephase PWM boostbuck rectifiers eliminating both the ac input current and dc output voltage sensors (Figure 1). The purposes of the proposed control scheme are : Reducing the number of sensors, which minimizes the cost of system. Improving the reliability by getting rid of input currents disturbance influence. obtaining a completely programmable control scheme, which facilitates its evolution without modifying hardware.
Figure 1. Bidirectional threephase boostbuck converter without accurrent sensors nor dcvoltage sensor
The dc output voltage is estimated by measuring load and coupling inductance currents. The ac currents are reconstructed from switching states of the PWM and from measured voltage and coupling circuit current. The estimators are developed with MATLAB / SIMULINK (Power System Blockset) software and converter operations are validated for a 5,5 kVA model. The implementation is currently carried out on a TMS320LF2407A DSP, in order to realize a prototype of this converter.
DCIS 2004
 80 
Session 4a
Image Processing
Wednesday nov. 24 16h15 17h45, Lacanau Room Chairs Yannick Berthoumieu (E.N.S.E.I.R.Bordeaux) Jos Luis Martn (U. del Pas Vasco / Euskal Herriko U.)
***
multiprocessor platforms are an interesting option to satisfy the Heterogeneous computational performance of dynamic multimedia applications at a reasonable energy cost for embedded portable systems. In this paper is shown different inter frame compression algorithms for different GOPs (Group of Pictures) and different trade offs in terms of power consumption/time execution, Image Quality/Bit rate. This study gives information about this multidimension optimization objective. Taking into account packet and transmit data through a wireless channel. The objective is to adapt this amount of compressed data to transmit it through the channel bandwidth optimizing system performance. This paper focuses in the study of different scheduling for a GOP (Group of Pictures) that has to be compressed with real time requirements. Instead of working having in mind worst case (WCET). In a multi processor platform different scheduling permit to obtain different working points (power consumed, executed time). Hence, we would decide what scheduling (working point) we need in each case depending on the power consumption and time to compress information.
For example, information is transmitted by a noisy channel and with a short bandwidth. We can decide to compress more and the time consumed to compress data is higher and also power consumption maintaining image quality more or less constant (PSNR), or if transmission channel is wider then we can relax compression. Hence, it adapts performance to channel maintaining the PSNR.
DCIS 2004
 82 
The image activity measure (IAM) gives a measure of how busy or complicated is an image in terms of edges, contours or textures. It has been demonstrated that there is a strong relationship between the IAM and the reconstruction error values after decompressing an image. This paper pro poses a coprocessor to perform images Discrete Wavelet Transform (DWT) and to measure their activity from this transform. With the IAM value, the proposed coprocessor is able to advise the user or the host system what kind of compression algorithm is more suitable for each image with the desired compression ratio and reconstruction error. Furthermore, this coprocessor allows to choose the compression algorithm that obtains a greater compression ratio with less reconstruction error. The adviser coprocessor is implemented on a Field Programmable Gate Array (FPGA) because these reconfigurable platforms provide flexible and high performance solutions at a relatively low cost. The main contribution of this paper is the design of an adviser coprocessor capable of predicting the reconstruction error of an image when it is compressed with a certain wavelet family to a fixed compression ratio, that is, it can advise what kind of compression technique is more suitable for user requirements. In addition, the implemented prototype allows to evaluate the convenience of sacrificing performance (area and operation frequency) in favour of accuracy, using floating point arithmetic instead of fixed point. The presented design has been developed with a high level hardware description language, HandelC, because it allows the developer to obtain general and parametrized prototypes not biased by architectural choices, without spending a considerable time on the hardware signal semantics.
DCIS 2004
 83 
PowerAware Tuning of Dynamic Memory Management for Embedded RealTime Multimedia Applications
David Atienza1, Stylianos Mamagkakis2, Miguel Peon1, Francky Catthoor3, Jose M. Mendias1, Dimitrios Soudris2 1 DACYA/UCM, Avda. Complutense s/n, 28040, Madrid, Spain. EMail:{datienza, mendias, mikepeon}@dacya.ucm.es 2 VLSI CenterDemokritus Univ., Thrace, 67100 Xanthi, Greece. EMail:{smamagka, dsoudris}@ee.duth.gr 3 IMEC, Kapeldreef 75, 3001 Heverlee, Belgium. EMail:catthoor@imec.be
N the near future, portable embedded devices must run multimedia applications with enormous
memory footprint and must rely on dynamic memory due to the unpredictability of input data (e.g. 3D streams features) and system behavior (e.g. variable number of applications running concurrently). Within this context, the dynamic memory subsystem is one of the main sources of power consumption and embedded systems have very limited batteries to provide efficient generalpurpose dynamic memory management. As a result, consistent design methodologies that can tackle efficiently the customization of dynamic memory managers according to the complex dynamic behavior of these new applications for low power embedded systems are in great need. Nowadays, the complex engineering process of partially customizing dynamic memory managers for the specific platform features (and the range of dynamic applications that will run on it) can usually take several months. The reason is that everything is based on manual profiling and testing according to the inspiration of each developer and his programming style to apply convenient transformation in his own code. Moreover, if the purpose of the customized dynamic memory manager (e.g. maximizing performance) is slightly changed (e.g. power reduction is also added as system constraint in the new design), the new custom dynamic memory manager needs to be redesigned from the beginning to respect the new requirements. In this paper, we present a new systemlevel design approach that is able to obtain a detailed view of the dynamic behavior, i.e. (de)allocation pattern, of new multimedia applications and optimize it using a stepwise refinement flow to reduce the power consumption of dynamic memory managers in such embedded systems. In our approach, the dynamic memory managers are built in a systematic way making use of our own C++ library, which simplifies enormously the profiling and implementation effort for the designer. The experimental results in reallife case studies show that our approach improves power consumption up to 89% over current stateoftheart dynamic memory managers for complex applications.
DCIS 2004
 84 
MAGE
processing for tasks such as computer vision would benet of data cache which ts the
illustrated gure 1. Therefore we propose a 2D adaptive and predictive cache (2DAP cache) relying on a statistical analysis of 2D coordinate accesses: position and geometry of the 2D cached zone are determined by parameters such as mean and pseudostandard deviation (PSD) of the issued coordinates, considered as a signal. Furthermore a predictive mechanism tries to predict the next image zone that would be used to download it from memory on time. Results show us that this strategy is very efcient compared to standards TM32 and PowerPC cache: 2DAP cache is 50% faster and the memory cache size is reduced by a factor 4 to 40. The mean of 2D coordinates issued by the processing unit is used to track the path followed by the algorithm. That for, a guard region is dened around the current cached zone center and a cache displacement is performed when the computed mean gets out of this region. The current cached zone size, guard zone size and displacement speed are computed from the PSD and evolve as the followed path, assuming a rst order constant speed, as shown gure 2. To get high clock frequency and reduce complexity, the mean and PSD are computed with rst order IIR lters which coefcients are power of 2. Doing so, the cache architecture is made of adders and simple shifters. An interresting result is that the cache control and performance are independent from the cached memory size.
Processing Unit
Cached zones
DCIS 2004
 85 
Real Time Smart Pixels Processing Array for Mobile Multimedia Applications
S. Lpez, R. Calzada, A. Tejera, J.F. Lpez and R. Sarmiento Institute for Applied Microelectronics (IUMA) Department of Electronic Engineering and Control (DIEA) University of Las Palmas de Gran Canaria, Spain, E35017 seblopez@iuma.ulpgc.es
OBILE multimedia communications are expected to achieve an unprecedented growth and worldwide success in the next couple of years, with a potential market
composed by millions of users around the world. Due to its mobile nature, good visual and voice qualities at high compression ratios as well as reduced area/power dissipation are key factors for commercial products.
In this paper, a Smart Pixels Array designed to perform efficiently key video coding operations is presented. In particular, the design is capable to compute the Discrete Wavelet Transform (DWT), Zerotree Entropy (ZTE) coding and Frame Differencing (FD) over SQCIF video frames (12896 pixels). The array is composed by a 12896 bidimensional network of interconnected smart pixels processors working in a massively parallel fashion, allowing the operation at very low clock frequencies and hence reducing its power dissipation. Each of these smart pixel cells present a power dissipation as low as 4.15 W@128 kHz in a square area of 110110 m2 while the whole array presents a power dissipation of 57.3 mW@128 kHz in a area of 166.24 mm2 using a 0.25 m CMOS technology. These characteristics make the designed Smart Pixel Array a highly suitable option for next generation mobile multimedia devices.
DCIS 2004
 86 
Adaptation of Altera Stratix DSP Board for realtime stereoscopic image processing
Pavol Pavelka, Vincent Bertheas, Viktor Fisher, Virginie Fresse. Laboratoire Traitement du Signal et Instrumentation 10 rue Barrouin, 42 000 Saint Etienne, France. Virginie.fresse@univstetienne.fr
article Image Velocimetry (PIV) is a method of imaging and analyzing flow fields in Fluid Mechanics. In PIV technique, motion vectors are extracted by analyzing the displacement of some
particles added in the liquid or air flow. The retained and more appropriate method for our applications consists in image acquisitions by means of two separated cameras and an FPGA basedarchitecture. Separated video cameras are positioned in a stereoscopic position and measurement consists in comparing two images of particles in realtime. The major difficulty of such system lies in the camera synchronization as incorrect results can be obtained from a lack of precision in the data acquisition. Essence of this work is the design of generic cores and the adaptation of a Stratix DSP board for stereo applications. Based on our existing PIV system, the principle for generic core designs is presented. Great flexibility and accessibility are required for the true potential generic aspect of cores. They can be parameterized to fit to any FPGA device and its embedded platform (including additional devices such as memories, A/D converters..) and eventually coupled architectures to and ensure an immediate reuse. Following this novel approach, cores dedicated to video acquisition are cameraindependent including the synchronization of both video signals. Realtime memorization cores targeting buffer and external memories are developed and a set of parameterized cores are available to cover any external or peripheral components. The PIV algorithm is implemented on a Stratix DSP board coupled to an expansion board. Observations are made that the synchronization problem is solved with internal synchronization mechanisms. Time for image processing development is estimated to few hours, efforts must only be concentrated on the image processing unit. Number of LEs for synchronization mechanisms is a small percentage of FPGA area and QoR of the implemented algorithm are satisfying for realtime video applications.
DCIS 2004
 87 
Session 4b
On the Performance of ThreeState and Multiplexor Logic Interconnection for Shared Bus SoC Design
Unai Bidarte, Armando Astarloa, Jos Luis Martn, Jaime Jimnez, and Carlos Cuadrado Universidad del Pais Vasco, E.T.S. de Ingeniera de Bilbao, Departamento de Electrnica y Telecomunicaciones, Urquijo s/n 48013 Bilbao
EVERAL control applications require very high speed data exchange between data source and sink elements: industrial machinery like filling or milling machines, polyphonic audio, three
dimensional images, video servers, PC equipment like plotters and printers, etc. After dealing for quite a long time with such applications, it was considered that much work could be reused, and a generic and reusable corebased architecture for circuits that require high bandwidth data transfers was designed in order to reduce the SoC design cycle time as much as possible. The first application in which the generic architecture was used consisted in a high bandwidth router. The selected technology was Spartan II from Xilinx. In Spartan II architecture, horizontal routing resources are provided for onchip threestate buses. Four partitionable bus lines are provided per Configurable Logic Block (CLB) row, permitting multiple buses within a row. The bitrate, defined as the amount of information per second exchanged between two cores, is the key parameter in the architecture, but the amount of resources needed to comply with the bitrate specification must also be taken into account. When the application implementation moment arrived an important decision was whether to use internal threestate buffers or multiplexor logic. The main drawback of threestate buffers is that many integrated circuits do not have any internal threestate routing resources available to them, so multiplexor logic interconnections are more portable than threestate logic designs. Other integrated circuits are very restrictive in terms of location or quantity of these interconnects. The second one is that it is inherently slower than direct interconnections, and usually slower than multiplexed constructions. That is because there are always minimum timing parameters that must be met to turn buffers onandoff. The main disadvantage of the multiplexor logic interconnection is that it requires a larger number of routed interconnects and logic gates (which are not required with the threestate bus approach). This paper contains a quantitative comparison between threestate and multiplexor logic design alternatives. We have observed that the resource usage, mainly the number of LUTs, can be significantly reduced using threestate buffers, without an important drop in the maximum clock frequency. On the other hand, synthesis and place and route tools achieve good optimization results and do not show any problem to manage internal threestate buffers.
DCIS 2004
 89 
YNAMICALLY Reconfigurable Systems (DRS), those where the hardware can be Dchanged at runtime, have the potential to enhance hardware flexibility to a degree similar to that of software. At the same time, they may lead to better performance and a smaller system size. This happens because it allows that parts of the system not needed in some time interval be removed from the hardware, to make room for another part of the system, required at that same time interval. On the other hand, potential drawbacks of using RunTime Reconfiguration (RTR) techniques are the performance penalty induced by long reconfiguration times and the area overhead to implement the hardware responsible for controlling the reconfiguration process. Moreover, the deployment of DRSs requires extensive support that is not yet available. This support is composed by tools to enable the use of RTR techniques and infrastructure to implement DRSs. A framework for the design, verification and implementation of DRS named PaDReH has been proposed by the authors as one step forward to reduce this lack of support. One of the main problems for enabling reconfigurable systems is the unavailability of a module to control the hardware reconfiguration process. This configuration controller commands which reconfigurable IP core(s) must be inserted on the reconfigurable device at any moment, and which must be removed. The main contribution of this paper is the proposition of a configuration controller totally built in hardware. This is different from previous approaches, where software implementations dominate. The proposed controller has been designed, validated and prototyped successfully in VirtexII Xilinx FPGAs.
DCIS 2004
 90 
A SoCbased Architecture coupled with a CMOS Image Sensor for measurements by Image Processing
Lionel Lelong, Guy Motyl, Grard Jacquet and Nathalie Bochard. Laboratoire Traitement du Signal et Instrumentation UMR CNRS 5516 Universit Jean MONNET, Btiment F 10 rue BARROUIN 42000 SAINTETIENNE, FRANCE email : {lionel.lelong, motyl, jacquet, nathalie.bochard}@univstetienne.fr
Physical parameters measurement and control by image processing imply real time operations on high data flow (high resolution image). Real time constraint of many imageprocessing applications can be met with specific embedded systems eclipsing the current computer system inefficiency. In this paper we present an architecture based on System on Chip approach to realise real time measurement by image processing on flow visualisation applications. The used technique is called Particles Images Velocimetry (PIV), ensuring a velocity measurement in flows in a nonintrusive way. PIV techniques ensure extraction a motion vector field from a flow seeded of fine particles. The system consists of two main devices that are a CMOS image sensor and an FPGA. Processing design is implemented inside the FPGA, mixing ring and bus communication for data and control flows. Our architecture is based on a Dominant input data flow model. This model presents a large bandwidth of input data flow and a reduced output data flow coupled with command flow (Fig.1). An embedded processor (NIOS) controls the entire process; acquisition logic and processing elements are all written in VHDL language. The implementation of processing design with one processing module (PM) takes up 51 % of Logical Elements and 87 % of embedded RAM (Quartus II results). Our system runs at a frequency of 50 MHz, but it can run at a maximum of 61 MHz. The system flexibility realises a compromise between speed and resolution without any material changes. With the retained FPGA, external RAM memory is needed. For future implementation, memory banks will all be hosted inside the FPGA. Real time measurement on highspeed flow reaches a minimum of 100 images/sec using adapted correlation algorithms.
Command and result flow CMOS Image Sensor Acquisition Module Memory Module Control Module Data flow
PM PM Static RAM PM PM
PC
FPGA
DCIS 2004
 91 
CCORDING as
the integration capability of silicon increases, the number of resources that can be
implemented on the same chip increases as well. However, new technological problems appear,
and these must be overcome. The Network on Chip paradigm arises to give solutions to such problems. It is necessary to have appropriate simulation tools for the investigation of NoCs. Since the paradigm of NoC is still a recent idea, there is not a wide offer of simulation tools yet. Reusing existing tools originally aimed to the simulation of General Computer Networks is an interesting alternative. In this work we have developed a component for the simulator Ns2 (Network Simulator) which allows the simulation of the communications in a NoCbased heterogeneous system. We present a simple model of a NoCbased heterogeneous system in which a set of tasks is executed concurrently by different computation nodes (which represent resources of the NoC). Each computation node sends data requests to other computation nodes. The time between two successive data requests in the same computation node is known as Tproc. When a computation node receives a data request, it sends a response packet with load data after a time Tresp. Using this model as a basis, we develop a component for Ns2, named NocprocAgent. The agents in Ns2 represent active nodes that generate traffic in the network. The special characteristics of NocprocAgent, which make it different from other agents, are: 1) The destination of the requests is randomly selected. 2) The requests are sent at random instants of time (Tproc.). 3) The responses are sent at random times after the request is received (Tresp). Using the developed component, we showed its application in real experiments by simulating a 55 mesh (25 computation nodes). Varying the mean and the standard deviation of Tproc and Tresp, and the buffer size of the links we obtained the amount of dropped packets in the network, which is a representative magnitude of the behaviour of the network. We performed several simulations, varying the different parameters, and found that both the link buffer size and Tproc had a great impact on the NoC performance. With this work we show that it is possible to integrate new components that adapt a General Computer Simulator to Networks on Chip, with a relatively low effort.
DCIS 2004
 92 
HIS article presents a new proposal for realising a Checkpoint and Rollback (C&R) scheme in
singleprocessor SoCs, based on the nonorthodox reuse of the contextswitch (CS) function, heart
of all multitasking systems. The idea comes from a comparison between the two methods: both are based upon the restoring of a previously saved state (the checkpoint for C&R, the context for CS). Anyway CS does not save memory state, so a successful recovery is subjected to certain restrictions that are pointed out and analysed in the article. In particular three validity criteria are derived and detailed, presenting possible application conditions. For the experimental part two practical implementation possibilities are analysed: the direct reuse of contextswitch function and the use of nonlocal jumps ANSI C functions setjmp and longjmp. Preliminary evaluation measures, reported in Table 1, showed that the latter would yield better performances and be easier to apply as a general method, so it has been chosen to be implemented in a prototype. The experimental platform, based on a SPARC V8 VHDLIP processor, called Leon and developed by Gaisler Research, turning eCos, the embedded operating system form Red Hat. Overhead measures, reported in Table 2, have been done on TSIM, the timeaccurate Leon simulator, always from Gailser. A real prototype using a complex development board based on a Xilinx VirtexII Pro FPGA is under development. The results are encouraging, confirming the good estimations. Good expectations are also arisen from the prototype, whose realisation is proceeding swiftly and will enable complete and accurate measures in real environments once completed.
Overheads in normal operation (without faults) Runtime CSwitch Memory (%) (%) (bytes) 0 0.006 0 6,25 80 128
CSwitch (%) 20
Table 1 Summary of overhead evaluations (with a scheduler allocating time slots of 50 ms per thread)
DCIS 2004
 93 
Realistic 3D information is needed in several application fields. In robotics for example, an autonomous robot roving in an unknown field must, to situate itself, discover the morphology of his path, identify and estimate objects shape and dimension in his surrounding in manner to have the best trajectory. In medicine, the observation of the human body interior with tools providing additional information about 3D shape will help physicians to sharpen their diagnostics. The paper describes the development of an original integrated 3D image sensor. It is a chip scale component based on active stereovision to reconstruct a realistic 3D representation of an object or a scene. It has wireless communication ability to transmit its results and to be reconfigured Over The Air. It is dedicated to be used in hazardous and unreachable spots. The novel approach here is the miniaturization and the chip level integration of several parts, such as a light emitting component, an image sensor, a HF circuit and a standard digital signal processing circuit which are not technologically compatible, in the same package.
DCIS 2004
 94 
Session 4c
High sensitivity and Wide Bandwidth CMOS Transimpedance Amplifier for Optical Receiver Circuit
M. B. Guermaz 1, L. Bouzerara 1, H. Escid 2, and M. T. Belaroussi1 Centre de Dveloppement des Technologies Avances, Microelectronics and Nanotechnologies Division, Cit 20Aot 1956, BP.17, Baba Hassen, 16303, Algiers, ALGERIA 2 Universit des Science et de Technologies Houari Boumediene, Systems Engineering Laboratory, BP.32, Bab Ezzouar 16111, Algiers, ALGERIA guermaz@cdta.dz, lbouzerara@cdta.dz
1
Tamplifier featuring a large dynamic range. The designed amplifier is configured on three
identical stages that use an active load compensated by an active resistor to improve the stability performance of the amplifier. This topology displays a transimpedance gain of
150k , which is necessary to obtain a high sensitivity of 32dBm. This structure operates at
his paper describes and analyzes a low noise and high bandwidth transimpedance
5V power supply voltage, exhibits a gain bandwidth product of 18THz and a low noise level of about 0,94 pA Hz . This transimpedance amplifier can reach a transmission speed of
240Mb s for a photocurrent of 0,5A . A transmission speed of 622Mb s can be achieved by using a connection with optical fiber containing four channels and this for a photocurrent of
9,5A . The predicted performances are verified by simulations using PSPICE tool with
0.8m CMOS AMS parameters. The stability problems that occurred in such a kind of amplifier, has been solved by using an active resistor at the level of the active load of a stage. The compensation technique has improved the phase margin of the designed amplifier. The proposed transimpedance topology presents good performances in terms of noise and bandwidth features. The obtained performances fulfill the expected specifications such as a considerable gain, a very high gain bandwidth product, a good dynamic range and more particularly a very low noise level at the input, required in most of the communication standards. The main advantage of the designed architecture resides at the level of its very low noise, thus giving a better sensitivity in reception combined with a large bandwidth, which makes more feasible to achieve a higher transmission speed.
DCIS 2004
 96 
N this paper, a currentmode magnitude locked loop based on CMOS companding techniques is
presented. The nonlinear transconductors that form the companding systems are based on the
nonlinear behavior of classAB transconductors. This novel approach is an alternative one respect to the conventional technique based on single MOS translinear loops, leading to more compact and simpler implementations. The circuits are able to operate with very low voltage supply (as low as V_GS+2V_DSsat). Both numeric and measurement results are provided to demonstrate the circuits and the technique proposed.
DCIS 2004
 97 
HighSpeed Highprecision Analog Rank Order Filter with O(n) complexity in CMOS Technology
R. G. Carvajal, J. RamirezAngulo, G.O.Ducoudray, and A. LpezMartin Klipsch School of Electrical Eng., New Mexico State University, Las Cruces NM (USA)
A new scheme for analog rank order filtering based on analog buffers is presented. This scheme is characterized by highspeed, highprecision and simple circuit architectures. The overall architecture exhibits linear complexity with number of inputs (O(n)) at the rate of one buffer per input. Rank is easily programmable with the tail current source for all rank order values from the Max to the Min case and its precision does not depend on the accuracy of the current copy. Simulation as well as experimental results are presented that verify functionality and accuracy of the proposed circuit.
Voltage buffer 1
M13 M14 Ibias
Voltage buffer N
M N3 M N4 Ibias
M15 V1
MN5 V out IL
M11
M12
VN
M N1
M N2
Ib
(Nk+1)*Ibias
Ib
CL
a)
b)
Figure 8. Experimental transient response of the rank order filter with four inputs, k varied from k=4 to k=3, and buffers with high openloop gain: a) k=4, b) k=3,
DCIS 2004
 98 
A Seventh Order Elliptic CMOS Continuous Time GmC Filter for PLC applications
Juan F. FernndezBootello, Manuel DelgadoRestituto, and ngel RodrguezVzquez Instituto de Microelectrnica de Sevilla, Centro Nacional de Microelectrnica Avda. Reina Mercedes s/n, Campus Universidad de Sevilla, E41012 Sevilla (Spain) Emails: {bootello, mandel, angel}@imse.cnm.es; phone +34 955056666
N this paper simulations and design considerations of a seventh order lowpass elliptic filter are presented. The filter has the option to provide high frequency boost to correct the possible at
tenuation in the communication channel. It has a cutoff frequency of 34 MHz with a ripple in the passband less than 1 dB and an attenuation in the rejection band up to 65 dB (without boosting). It is able to provide a boost of 12 dB. Its noise is below 56 nV
Hz
amplitudes A=70mV and frequencies 3031 MHz . It has been simulated in a 0.18 Pm process and consume 485 mW for a 1.8 V power supply. The filter uses the GmC technique. To make the transconductor we have used a technique based on source degeneration to improve the linearity. To evaluate the characteristics of the filter at the system level we have used a program which allows the estimation of characteristics like noise and distortion in a fast way. To evaluate the distortion a method based in the Volterra series has been used. This fact allows the evaluation of the distortion in all the interesting frequencies, in a short time, with no need of transient analysis. Figure 1 shows the conceptual schematic of the filter where GB are extra transconductor which allows the correction of the attenuation provided by the channel at high frecuencies.
DCIS 2004
 99 
n integrated tunable TowThomas GmC biquadratic filter is presented, with independently adjustable frequency, gain, and quality factor in both lowpass and bandpass responses. The
transconductors employed operate in moderate inversion region, leading to an excellent tradeoff between bandwidth and power dissipation, wide adjustment range of filter parameters, large dynamic range, and low die area. The filter has been fabricated in a 0.5Pm CMOS technology. Measurement results of the transconductor and the complete filter are presented. They are in good agreement with theoretical and simulation results, and demonstrate that operation in moderate inversion region can lead to circuits with very high linearity and tuning range. The authors believe this is the first time it is recognized that operation in moderate inversion results in very low distortion levels. This opens potentially many other applications of moderate inversion, of which the proposed filter is just one.
DCIS 2004
 100 
novel fullydifferential CMOS secondgeneration current conveyor (CCII) topology is presented. The circuit operates in moderate inversion region, and features high linearity over a wide input
range. Current gain can be tuned in a wide range. These features are essential to extend the utilization of CCIIbased circuits to highperformance VLSI applications. The circuit also features very low input impedance at the X terminal and low die area. It can be applied as a fully differential universal active block in several circuit topologies like filters and oscillators. The proposed circuit, shown in Figure 1, has been implemented in a 0.5Pm CMOS technology and their main performance characteristics have been measured. When the circuit is employed as a transconductor, measurements show a Total Harmonic Distortion of 66.5 dB with differential input swings equal to 77% of the r1.3V supply voltage, transconductance tuning in two decades, and 1.7 mW of static power consumption.
IB
VY+ IX+ VX+ I1 M5A M2A M1A VCN M3A IZ+ M6A M4A
IZM6B M4B
IB
Iout VCN M3B M2B M5B VYIXVXI2
IDIR
IINV
IINV
IDIR
M1B
+ + 
Y FDCCII X (a) Z
IOUT+
+ 
+ Vid _
IOUTR
+ + 
Y FDCCII X (b) Z
IOUT+
+ 
IOUT
+ + 
Y FDCCII X (c) Z
+ 
R VOUT+ VOUTR
+ Vid _
+ + 
Y FDCCII X (d) Z
+ 
R VOUT+ VOUTR
Figure 2. Application examples (a) Current amplifier (b) Transconductor (c) Transresistor (d) Voltage amplifier
DCIS 2004
 101 
Session 4d
AULT Tolerance (FT) has been a traditional requirement for safetycritical applications working in harsh environments. Very deep submicron and nanometer technologies have increased notably
integrated circuits (ICs) sensitiveness to radiation. Soft errors are currently appearing into ICs working at earth surface. Therefore, hardened circuits are currently required in many applications where Fault Tolerance (FT) was not a requirement in the very near past. During the hardening process of a circuit, fault tolerance evaluation is a key factor. In this sense, the use of platform FPGAs for the emulation of singleevent upset effects (SEU) is gaining attention in order to speed up the fault tolerance evaluation process. In this work, two techniques for the evaluation of FT with respect to SEU effects are described and compared: a FaultMaskbased and a ScanPathbased architectures. Both proposals make profit of the hardware resources, executing most of the tasks in the FPGA instead of in the host, in order to minimise the bottleneck times in the communication between software and hardware. The main difference between both approaches is the fault injection strategy. The first one includes a fault mask chain that is applied on the circuit when injection time is reached while the second solution uses scanpath techniques to download the state of the circuit at the fault injection time. Both techniques are analyzed and compared with respect to area overhead and execution time required by the emulation process for a benchmark circuit. Experiments performed show that emulation techniques can obtain the results in seconds while similar experiments can take hours using simulation based techniques. Furthermore, Mask Scan Injection Technique is better in terms of fault emulation time, when testbench cycle number is smaller than the number of flipflops in the circuit (datapath applications). On the other hand, State Scan Injection Technique is interesting when evaluating the fault tolerance of circuits with short number of flipflops and large number of testbench cycles (control applications). Results obtained prove that this system is a costeffective solution for transient fault evaluation.
DCIS 2004
 103 
Analysis of Input and Feedback Capacitances Effect on Low Noise Preamplifier Performance for Xrays Silicon Strip Detectors
T. Noulis*, S. Siskos*, G. Sarrabayrouse** *Electronics Laboratory of Physics Department, Aristotle University of Thessaloniki, Greece **LAASCNRS, Toulouse, France tnoul@physics.auth.gr, siskos@physics.auth.gr, sarra@laas.fr
N analysis of a charge sensitive preamplifier (CSA) noise behavior of a low energy Xrays silicon strip detector for space applications is presented. Design criteria of CSA noise
optimization are examined in relation to total input stage capacitance and specifically to detector and feedback capacitance and parasitic capacitances of the input and reset MOSFET. A differentiation of the total output noise, chargedischarge time and gain associated with detector capacitance is demonstrated. The effect of input and reset MOS parasitic capacitances on stability and noise contribution is also examined in a CSA configuration with no feedback capacitance. The preamplifier was designed in 0.8m DMILL process and analysis is supported by simulation results. The output signal of the CSA with zero feedback capacitance is shown in Fig.1. Preamplifier operates, but not properly, since an instability of the output discharge voltage level and a strong tendency of the amplifier to oscillate are observed. The design of a low noise preamplifier with no feedback capacitance can be achieved by optimizing the dimensions of input and reset MOS (Fig.2). Increase of the input MOS M1 dimensions results higher Cgd, which is considered to be parallel to feedback path. Decrease of reset MOS implies a reduction of its gate and substrate parasitic capacitances, and therefore attenuation of oscillation tendency. Preliminary results of a work that aims at the design of a low noise preamplifier with no feedback capacitance are presented.
Fig.2. CSA output signal for large input MOS and small reset MOS (Cf=0pF and Cd=2p).
DCIS 2004
 104 
M. Aguirre1, J.N. Tombs1, F. Muoz1, V. Baena1, A. Torralba1, L.G.Franquelo1, A. FernndezLen2, F. TortosaLpez2 and D. GonzlezGutirrez2
1
Escuela Superior de Ingenieros Universidad de Sevilla. Camino de los Descubrimientos s/n 41092 Sevilla (SPAIN){aguirre,jon,fmunoz,baena,torralba}@gte.esi.us.es
2
Data Systems Division. ESTEC/TOSED European Space Agency. Noordwijk (THE NETHERLANDS)
which many can lead to incorrect values being stored in the memory cells of a digital design during execution time. When radiation produces unexpected bitflips they are classified as single event upsets (SEUs) and critical. Designs must mitigate their effect through the use of special cell libraries at the physical level, redundant logic design and voting logic in the memory cells and by the design of robust deadlockfree state machines at the architectural level. The verification of SEU tolerance in a VLSI design netlist is currently an expensive, difficult, and time consuming task. This paper presents FTUNSHADES, a custom circuit emulator that permits the insertion, motorization and analysis of bitflips in digital designs. The system requires simple, full automatic and nonintrusive preparations to the design to be tested, whilst the use of state of the art FPGAs permits that the circuit emulation is performed at full hardware speed in a highly controlled manner. The SEU insertion strategy allows selective provocation of bitflips in any desired flipflop at any desired time during a given test, and allows a detailed analysis of the faulttolerance properties against soft errors of the circuit itself. The current system design can analyse large designs of up to almost 3 million system gates whilst inserting over 80K SEUs per hour in a test of 2 million test vectors. FTUNSHADES system is going to be implemented in the design flow of the microelectronic section of the European Space Agency as a test platform that should reduce the designfabricationtest cycles of space application VLSI designs.
FTUNSHADES System
DCIS 2004
 105 
Radiation Hardness Assessment of an ADC for Space Application using a Laser Test Equipment.
V. Pouget, P. Fouillat, D. Lewis, F. Darracq IXL, UMR CNRS 5818, Universit Bordeaux 1, 33405 Talence, France pouget@ixl.fr
PACE electronics is exposed to heavy ions and other ionizing particles that can produce high densities of electronhole pairs in a semiconductor. The flow of electrons taking place inside the devices may lead to temporary voltage spikes at internal circuit nodes termed SingleEvent
Transients (SETs). Simulating this environment is mandatory for testing ICs and essential in order to evaluate, understand, and mitigate their sensitivity. Evaluating effects of SETs in AnalogtoDigital Converters (ADC) and their impact at system level is a complex task due to the mixed signal nature of these architectures and the various errors signatures that may be encountered [1]. An original methodology is presented for characterizing SEE impact on ADCs performance parameters, based on the use of pulsed laser system (Fig. 1). The spatial and temporal resolution of the pulsed laser beam (Fig. 2) is used for identifying the SEU mechanisms in two halfflash ADCs.
Pump laser 10W 532nm Pulsed laser Ti:Sa Pulse picker
Pulse energy control
Power meter Oscilloscope Lockin amplifier Function generators Pattern generator Power supplies 4 axis controller Si
Sr
Pulse diagnosis
White Light
DUT 3 axis
100x Test board St Sf
CCD
1.3Pm laser diode
GPIB RS232
PC
scan window
WL=30ns
WL=380ns
WL=480ns
Fig. 2 : Scan window on two comparators of an ADC and corresponding laserinduced error maps for three laser pulse delays.
_________________________________
[1] W.F. Heidergott, R. Ladbury, P.W. Marshall, S. Buchner, A.B. Campbell, R.A. Reed, J. Hockmuth, N. Kha, C. Hammond, C. Seidleck, A. Assad, Complex SEU Signatures in HighSpeed AnalogtoDigital, Conversion, IEEE Trans. Nucl. Sci., vol. 48, no. 6, pp. 18281832, 2001.
DCIS 2004
 106 
HIS paper deals with the use of ICEM (Integrated Circuit Electromagnetic Model) in the EMC
performance assessment of a highdensity integrated circuit. This model has already been used for
IC modeling with good accuracy. This work demonstrates how the analysis of IC performance issues, such as PLL jitter or ADC resolution loss, is made possible thanks to this approach. In particular, the problem of PLL jitter in a highdensity ASIC is solved thanks to this methodology.
DCIS 2004
 107 
Electromagnetic compatibility (EMC) compliance is a topical demand in any electronic system, often representing a substantial part of the design eort. The sooner EMC rules are taken into account in the design phase, the sooner the target product can be released on the market, lowering engineering costs at the same time. However, with ever increasing complexity in integrated circuits, and thus ever higher emission and susceptibility levels, it can be seen that reducing these levels in the IC design phase itself then makes it much easier to ensure systemlevel EMC compliance. For that purpose, the recent ICEM (Integrated Circuit Electromagnetic Model) proposal, under the IEC (International Electrotechnical Commission) 620143 reference, allows the designer to predict conducted and radiated emission of an integrated circuit within its environment (PCB with ground planes, decoupling networks, connectors). The IC part of this model contains passive elements (package, bonding, metal and MOS capacitances) as well as an equivalent current generator representing internal activity. This current generator can be obtained by simulating the whole transistor netlist of the IC; however, this method leads to huge simulation times, making it unusable in EMC expertise, in which the inuence of "tunable" parameters (package, decoupling capacitors) on emission levels has to be asserted as fast as possible. Moreover, since the design of complex ICs is based mostly on reusable blocks (for example microprocessor cores and memory blocks), either proprietary or from thirdparty intellectual property (IP), a similar ICEMbased reuse methodology for EMC prototyping can be proposed. The VHDLAMS language, enabling the description of eventdriven, mixedsignal behavioral models, turns to be well suited to this methodology, thanks to its upward compatibility with "digital" VHDL and its standardization. For this purpose, each digital block is associated with a VHDLAMS block using the same inputs; the VHDLAMS model computes the dynamic current generated on the power supply rail as a function of these inputs, allowing to evaluate activitydependent emission. These models can then be assembled in order to obtain the whole dynamic supply current of the integrated circuit, along with functional simulations. Fullchip EMC virtual prototyping is thus made possible, including not only building blocks, but also inputs and outputs (I/Os). For evaluation needs, VHDLAMS ICEM models have been written for an 8bit ATMEL microcontroller, including the core, the embedded SRAM block and the I/O buers. Except the buers, the corresponding models rely on eventdriven, piecewise linear (PWL) approximations of supply current waveforms obtained from transistorlevel electrical simulations. For example, SRAM dynamic supply currents can be easily modeled as a function of read/write modes, addressing and rise/fall times (control signals). Moreover, VHDLAMS allows I/O buers to be described without revealing technological data, while including their mutual inuence with the IC core; this lls in the gap of the IBIS (Input/output Buer Information Specication) and IMIC (Input/output interface Model for Integrated Circuits) models. Comparisons between simulations and measurements are really promising and demonstrate the validity of this approach for virtual EMC prototyping.
DCIS 2004
 108 
Session 5a
N this work, integrated MOS varactors for RF applications in accumulation mode have been designed, fabricated and measured. The resulting varactor is achieved by changing the operation
mode from depletion to accumulation, whereby the capacitance rises from a minimum value to a maximum value. All the integrated varactors have been fabricated in the AMS SiGe 0.8 m standard process technology, which are surrounded by measurement structures, the guard ring, in order to use the Cascade ACP40 GSG microprobes (Figure 1). Their characterization has been carried out with a measurement system based on the HP8719ES Vector Network Analyzer. We report studies of the capacitances scalability against the geometry of the structure (gate length and gate width) and, more specifically, against the number and the arrangement of the basic cells. The impact of metallization is also considered. All our proposed structures present a wide tuning range over the 35%. We demonstrate the capacitance of accumulationmode MOS varactors for RF applications is scalable with the area occupied. Thus, an integrated varactor library based on these devices can be easily implemented. Tuning ranges higher than 57% have been obtained with short gate voltage variations, 2 V, keeping the quality factor over 10. Finally, the performance of the varactors implemented can be optimized increasing the gate length, keeping the gate width constant.
DCIS 2004
 110 
C urrently, the majority of passive devices employed in electronic systems are discrete
components. Small discrete devices dominate the area of PCB mounted process in a typical electronic product. For example, a cellular phone may consist of only about 20 integrated circuits (IC) compared to 300400 passive [1]. Thus, passive components have substantial influence on system cost, size, and reliability. In order to meet the next generation electronic packages (smaller, faster, cheaper, and more reliable), alternatives to discrete passives are necessary. In this framework the present paper proposes a guideline to design embedded capacitors particularly for high frequency applications and some examples are proposed A design rule is evidenced: for a squared electrode structure (either simple or multi layer) and a given technology related to the layer thickness h, the best compromise between the capacitor value C and the maximum operating frequency range f is given by :
.c C.f 2 0 16 h
2
Figure 1. One
DCIS 2004
 111 
HE aim of this work is to carry out a comparative study of two ideally linear configurations: a
MOS Resistive Circuit (MRC), widely employed in the literature, and a MOS Current Divider
(MCD) based on a linear current division principle. The suitability of some MOS models for distortion simulation purposes is discussed. The general chargesheet model is used to perform simulations in the MATLAB environment. The harmonic distortion terms are calculated by means of the FFT and the influence of mismatches on the linearity of both cells is estimated. Approximate strong models do not perform well in distortion analysis, so it is necessary to resort to the general chargesheet model. Simulations carried out in the MATLAB environment show that harmonic distortion components in the MOS Resistive Circuit (MRC) are far from being cancelled out, whereas the MOS Current Divider (MCD) is ideally completely linear. It is also shown that, although somewhat more sensitive to mismatching, the MCD has lower distortion levels and thus is more suitable for highlinearity applications. As an application of MOS resistive cells, Programmable Gain Amplifiers (PGAs) based on the MRC and the MCD are introduced and their performance in terms of linearity and accuracy is studied and compared. Figure 1 shows the variation of HD2 and HD3 with the digital word A(3) for a 500mVpp output signal in both PGAs. An outstanding improvement in linearity is achieved when employing the MCD. Further simulations show the influence of mismatching on linearity in both systems.
40
60
harmonic component (dB)
80
100
120
140
160 0
3 4 digital word
the (*) MRCPGA and the (o) MCDPGA for a 500mVpp output signal.
Fig. 10. Variation of HD2 () and HD3 () with the digital word for
DCIS 2004
 112 
HE RF filters and duplexers, essential elements in the radio frontends, are traditionally based on dielectric electromagnetic resonators (ceramic materials) or surface acoustic wave (SAW) devices.
SAW filters present good selectivity and very small size, but are limited in frequency (up to 3 GHz) and have high insertion loss at high power levels (over 1 W). The dielectric filters are capable to handle high power levels and present good selectivity. However, their application in mobile frontends is still limited by their large dimensions. Film Bulk Acoustic Resonator (FBAR) filters, based on the Bulk Acoustic Wave (BAW) technology are expected to replace the traditional RF filters technologies. The FBAR filters offer unique advantages since they present good selectivity (Q up to 1000), can handle high power levels up to several Watts (up to 3 W) and have reduced dimensions1. Besides, FBAR devices can be manufactured based on the same VLSICMOS basis and be directly integrated above RF active circuits, making possible the design of the full RF integrated frontend at very competitive costs (aboveIC technology). This work proposes a synthesis methodology based on the inline bimode technique to design ladder type FBAR filters. Until now, the design methodologies proposed were based on ladder crystal filters. They are focused on placing the transmission zeroes and need an intensive optimization work to place the poles of the desired transfer function. The methodology proposed permits not only the placement of the transmission zeroes, but also the determination of the exact location of transmission poles. Finally, a design example of a third degree filter for the 2 GHz frequency band is addressed.
R. Aigner et al. BulkAcousticWave Filters: Performance Optimization and Volume Manufacturing. IEEE MTTS 2003. pp. 20012004.
DCIS 2004
 113 
Study of the Proximity Effect in High Q Inductors with CMOS 0.18 Pm Technology
I. Cendoya*, N. Sainz*, J. Mendizabal**, R. Berenguer**, U. Alvarado**, A. GarcaAlonso** * Escuela Superior de Ingenieros de San Sebastin (TECNUN) Universidad de Navarra, Spain. (icendoya@tecnun.es) ** Centro de Estudios e Investigaciones Tcnicas de Guipzcoa (CEIT), Spain. (jmendizabal@ceit.es)
HE objective of this study is to acquire high quality inductors designed in CMOS 0.18 Pm. One
of the pernicious effects for the quality factor of an inductor is the proximity effect. Proximity
effect is due to the magnetic field generated by the own inductor and induces parasitic currents in the tracks. As a consequence of this study some rules are reported. High quality factors are obtained (between 8.1 and 12.6) Focusing in the problem of the proximity effect, some simple design rules to obtain good Q inductors have been reported. These rules take into account the space between tracks and the center hollow of the inductor: 1) In relation with the center hollow of an inductor, it has been proved that the actual important parameter is the ratio between external and internal radius. An inductor with a ratio of 1.75 will be nearly maximized in quality, occupying less area than a bigger one with a little better Q. 2) The second conclusion is possible to reach is that the improvement of the Q is joined to minimize space between tracks.
DCIS 2004
 114 
Session 5b
DCIS 2004
 116 
Department of Electronic Engineering, School of Engineering, University of Sevilla Camino de los Descubrimientos, 41092 Sevilla, Spain Email: {carvajal,torralba}@gte.esi.us.es
(2)
Department of Electronics and Elec. Eng., University of Extremadura Avda. de Elvas s/n, 06071 Badajoz, Spain Email: {jmcarcal,duque}@unex.es
N n this paper a lowpower railtorail CMOS analog buffer is presented. The circuit is based on a class AB input stage made up of two complementary differential pairs, while a simple additional
circuit allows railtorail operation at the output terminal. Besides, the input capacitance of the circuit can be reduced by scaling the size of the input devices, decreasing loading effects on the nodes to be tested or on preceding stages. The class AB capability of the proposed circuit combines a low static power consumption in quiescent operating conditions and a high drive capability in the dynamic operation, resulting very suitable for applications with large capacitive loads. The buffer has been designed in a 0.35Pm CMOS technology to operate with a 1.5 V dual supply. Simulated results are provided in order to demonstrate the proper operation of the proposed circuit. A railtorail signal swing is achieved and a THD lower than 44 dB is obtained for a 2.4Vpp 100kHz input sinewave signal, whereas the input capacitance is lower than 32 fF.
DCIS 2004
 117 
Klipsch School of Electrical and Computer Engineering, New Mexico State University ( Las Cruces, New Mexico), USA
email: carlos.aristoteles[antonio.lopez][carlosen]@unavarra.es
N this paper, a currentmode CMOS RMSDC converter is presented. The basic building blocks
are based on a novel approach to design currentmode computational cells. In such an approach, the
large signal behavior V vs I of classAB transconductors is conveniently exploited leading to a very regular and compact implementation. A proper biasing scheme in such transconductors allows a very low voltage operation with supply voltage as low as V_GS+2V_DSsat. Measurement results from a practical prototype are presented in order to demonstrate the technique here proposed.
DCIS 2004
 118 
New lowvoltage high performance WTA circuits based on flipped voltage followers
J. RamrezAngulo1, G. DucoudrayAcevedo1, R. G. Carvajal2 and A. LopezMartin3 Klipsch School of Electrical and Computer Engineering, New Mexico State University, 2 Escuela Superior de Ingenieros, Universidad de Sevilla, 3 Universidad Publica de Navarra
1
A new lowvoltage CMOS WTA circuit is presented. The proposed circuit exhibits linear complexity with the number of inputs and it is based on a modified version of the common source scheme. In this case each input follower is enhanced by local shunt feedback to increase its gain and to reduce its output impedance. Simulations demonstrate the potential of the circuit to operate at very high speed, with high precision and with a supply voltage close to a transistors threshold voltage. Experimental verification of the circuit using a 0.5Pm CMOS technology is also provided.
MB
M B1
Ib M A1
M B2
MBn
MBout
V1 M A2
MC2
V2 M An
MCn
Vn
Vout
MAout
M C1
c
(a)
MDn MD2 MD1
M B1
M B2
M Bn
In
I2
I1
MA1
M C1
M A2
MC2
M An
MCn
Vcm
Iout
M Aout
(b)
Fig. 1 Proposed circuits : a)Voltage mode FVF based MIN circuit (b) Currentmode version
DCIS 2004
 119 
Klipsch School of Electrical and Computer Eng., New Mexico State University, Las Cruces, NM (USA) 2 Dept. of Electrical and Electronic Engineering, Public University of Navarra, Pamplona (Spain) 3 Dpto. de Ingenieria Electronica, Escuela Superior de Ingenieros, Universidad de Sevilla (Spain)
A versatile lowvoltage CMOS circuit with a triangular/trapezoidal transconductance characteristic and independently programmable height, slope, and horizontal position is presented. Simulation results using Cadence DFWII that verify the functionality of the circuit with r1.5V supplies are presented. A chip prototype has been fabricated in a 0.5 m technology and experimentally verified. The circuit can find utilization for the implementation of membership functions in analog and mixedsignal neurofuzzy systems, for piecewise linear approximation and for the implementation of high resolution, high speed folding A to D converters.
DCIS 2004
 120 
is based on the utilization of floatinggate MOS (FGMOS) transistors biased in weak inversion to
implement the required nonlinear internal processing. A simple circuit topology is obtained due to the use of FGMOS devices, without the penalty in terms of postfabrication removal of initial charge often encountered in former FGMOS circuits. A prototype fabricated in a 0.8Pm CMOS technology (Figure 2) and using a single supply voltage of 1.2 V achieves three decades of frequency tuning for a static power dissipation of less than 5 PW, and occupies an active area of 0.1 mm2. The 1%THD dynamic range at 1 kHz is 75 dB. To the authors knowledge this is the first FGMOS logdomain filter reported operating in class AB, which leads to a dynamic range significantly larger than in former FGMOS filter topologies. The filter is readily cascadable, leading to higherorder topologies.
M5A
M6A
M6B
Iin+ IB1
M1A M2A M3A M7A M8A
M5B
IinIB1
M4A
M9A
M10A M11A
M11B M10B
M9B
M4B
M3B
M2B
M1B
IB2
Vc+
Vc
IB2
500 Pm
Figure 2. Microphotograph of the filter
DCIS 2004
 121 
Session 5c
An Infrastructure and Application Specific Processor for Testing Analogue and MixedSignal SoCs
Francisco X. Duarte, Jos Machado da Silva, Jos C. Alves, and Jos S. Matos Universidade do Porto, Faculdade de Engenharia, and INESC Porto, R Dr. Roberto Frias, 4200465, Porto, Portugal, {fduarte, jms, jca, jsm}@fe.up.pt
ONVENTIONAL
test approaches are unable to cope with the test requirements of tens or even
hundreds of cores, such as digital and analogue I/O interfaces, complex communication subThe IEEE P1500 Standard for Embedded Core Test
1
systems (including optical and radiofrequency circuits), power management, and multiple processors, deeply embedded in complex systems. infrastructure is currently the main standard proposal for modular embedded design for testability approach for digital and memory cores. However, the test of analogue and mixedsignal cores has not been addressed. This paper presents an infrastructure and methodology for testing analogue and mixedsignal cores embedded in systemsonchip. The solution proposed resorts to the reconfigurable block of the system (e.g., field programmable gate arrays), which can be reused to implement an application specific instructionset processor to control and schedule onchip test operations. This processors architecture can be adjusted according to test needs and the space available for implementation. Furthermore, it allows reducing the number of signals to be sourced by the tester, and the extension of data to be transferred between the tester and the core under test, as well as test time by performing onchip preprocessing operations. The reconfigurable block can also be used to implement wrappers digital cells that provide routing of system and test signals, as well as the implementation of the digital test infrastructure. A demonstration prototype is described which implements the testing of an ADC (analogue to digital converter). The test processor was designed to accomplish a specific test method that allows reducing the relevant ADC response data to only a few bytes. Besides the conventional operations (load/store, arithmetic/logic operations, ) and loop control, the instructionset comprises test specific instructions to handle the IEEE 1149.1/4 port that controls the ADC wrapper, to handle the ADC's specific control signals, and to accumulate ADCs output data samples. A firstorder SigmaDelta modulator is embedded in the processor to provide analogue test stimuli. After the test signatures are uploaded to the tester, a polynomialfitting algorithm is then used to compute the harmonics coefficients that characterize ADCs non linearity.
DCIS 2004
 123 
Test Planning for MixedSignal SoCs and Analog BIST: a Case Study
Antonio Andrade, Jr.1, rika Cota2, Marcelo Negreiros2, Luigi Carro1, Marcelo Lubaszewski3 Electrical Engineering Dept., Univ. Federal do R. Grande do Sul, Porto Alegre, Brazil, {andradejr, carro}@eletro.ufrgs.br Informatics Institute, Univ. Federal do R. Grande do Sul, Porto Alegre, Brazil, {erika, negreiro}@inf.ufrgs.br
Instituto de Microelec. de Sevilla, CNM, Sevilla, Spain, luba@imse.cnm.es
NALOG BIST and SoC testing are two topics that have been extensively, but independently, studied in the last few years. However, current mixedsignals systems require the combination of
these subjects to generate a costeffective test planning for the whole SoC. This paper discusses the impact on the global system testing time of using analog BIST based on digital reuse of available embedded processors in the system. Some advantages of the proposed technique are the system test time reduction, due to analog BIST, and obliviation of external mixedsignal test equipment requirements, since the analog test response is performed by the reused processor. Experimental results show that, as long as the BIST technique reduces the analog testing time, the reuse of digital blocks to test analog signals is indeed a very efficient strategy, despite the test serialization, as depicted in figure 1. In addition, better test results may be achieved if the number of available reusable processor onchip increases.Requirements in terms of extra test pins and area overhead are evaluated in the test planning for mixedsignal SoCs. Power restrictions during test are also considered, as there is a widespread use of SoCs in portable electronic devices, and batterylife of such devices is a growing concern in industry.
Figure 1. Test Scheduling of the cores of a SoC (a) without and (b) with analog BIST.
DCIS 2004
 124 
the XY zone testing method the fault detection is based on the XY composition of two In signals of the CUT circuit, x(t) and y(t), in a similar way that an oscilloscope in XY mode represents the evolution of the two signals on the plane. In Figure 1 the XY composition curves for a good circuit and a defective one are shown. To detect defects in the CUT a control line is drown tangentially to the nondefective curve. When a defect changes the shape of the curve, the control line cuts the curve. The XY zone detector implements the control line and is composed by a block configured as a weighted adder of the two composed signals plus a reference voltage and a comparator block. In this paper a QuasiFloating Gate (QFG) structure (see in Figure 2) is used to design a XY zoning detector for BIST in CMOS technologies. A QFGbased layout core has been developed using 0.35Pm AMS CMOS technology. To use the detector for multiple test purposes, we assume input capacitors Cx and Cy to be configurable in order to adjust the slope parameters of the control line. Simulation results based on Spectre analysis reflect the advantages of using the system, bringing important clues about future designs based on this method. The presented scheme based only on capacitors makes the system suitable for been integrated as a configurable BIST solution on CMOS.
2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2
Cy2
vy2 vout
Cdummy
Vn
Fig. 1. XY composition curves for a nondefective and a defective circuit and a control line drawn tangential to the nondefective curve
DCIS 2004
 125 
HE miniaturization of CMOS technology has enabled complex analog and digital cores to be integrated onto the same silicon substrate. Those cores that are fully accessible from the external
pins can be tested by using appropriate testing equipment. However, in the case of embedded cores, the access to internal nodes is not possible. Moreover, analog testing dominates the cost of testing monolithic analog and mixedsignal circuits and, hence, a considerable interest exists in including some circuitry onchip to facilitate industrial testing. The most basic analog functional measurement setup consists of a signal generator exciting the circuitundertest (CUT) with a periodic signal, and an instrument which extracts appropriate parameters from the output response. Hence, a spectral analysis of the output signals is performed. In this work, an effective approach to the design of onchip spectrum analyzers based on switchedcapacitor (SC) techniques, is presented. The proposed spectrum analyzer for measurement of spectralbased metrics of analog circuits is shown in Fig. 1. A programmable SC biquad is used for the implementation of sinewave generator and filter, this ensures the synchronization of the system. High programmability resolution is obtained by using a nonuniform sampling scheme without modifying any capacitor value. As a result, capacitor spread and total capacitor area are reduced as compared to traditional solutions and, hence, test area overhead can be minimized. Frequency response, THD and SNR can be obtained by the proposed spectrum analyzer. To prove the feasibility of the proposed approach, the design and the implementation of an SC spectrum analyzer in a 0.35
technology are discussed. The circuit occupies a chip area of 0.17 mm2.
Frequency Synthetizer
f master fclk
Vref Vref
Voltage Reference Sinewave Oscillator
CUT
S & H
A AD DC C A/D Converter
DCIS 2004
 126 
ESTING
embedded radiofrequency circuits has become a bottleneck for manufacturers. This paper
presents builtin test methodologies to compute typical LNA characterization parameters. Three
methods are suggested which allow to calculate LNAs gain, 1dB compression and thirdorder intercept points, harmonic distortion, signaltonoise ratio and noise figure. The first method relies on finding the transfer function polynomial that best fits a set of points obtained from the LNA test operation. A varying amplitude stimulus is applied and the respective output levels captured. The 1dB compression and thirdorder intercept points are then calculated after the polynomial coefficients. It is also possible to obtain harmonic coefficients, using the mathematical relationship between the polynomial coefficients (ai) and the harmonics coefficients (hi). The second method allows for obtaining gain, phase, and harmonic distortion after correlation operations. Gain and phase are obtained correlating the LNAs output signal y(t) with the input inphase xs=Asin(t) and in quadrature xc=Acos(t) signals. The distortionrelated parameters can be obtained correlating y(t) with harmonics of the input test signals. If we consider the LNA response at a twotone test signal represented by x(t)=Asin(1t)+Asin(2t) and perform its correlation with xs=Asin(2t) and xc=Acos(2t) and later with xs=Asin((221)t) and xc=Acos((221)t), we can compute the amplitude relation between the fundamental component at frequency 2 and the thirdorder intermodulation distortion component at frequency 221. Correlation is also used to estimate signal to noise ratio. This is done by crosscorrelating images of the signal under evaluation at different times. Calculating both input and output signaltonoise ratios, SNRin and SNRout, one can obtain LNAs noise figure. The infrastructure proposed to
LNA
RF Switch RF Switch
Amp
RSSI
implement the first method is shown in Figure 1. A variable local oscillator is used to generate the appropriate LNA test stimulus, being a RF amplifier with RSSI output used to measure the LNA output power.
T/R Mixer
IF Amp
Desmod.
ADC ADC
LO LO LO
DSP
PA
Mixer
Modula.
DAC DAC
DCIS 2004
 127 
Hardware Requirements for Testing MS Circuits based on Multidimensional Lissajous Curves
E. Lupon, L. Balado, L. Garca, and J. Figueras Departament d'Enginyeria Electrnica, Universitat Politcnica de Catalunya Av. Diagonal 647, planta 9, E08028 Barcelona (Spain) Phone: (+34)934017784, Fax: (+34)934017785, email: lupon@eel.upc.es
issajous based Testing (LBT) of analog circuits by monitoring the combined evolution of several signals has been shown to effectively detect both parametric and catastrophic faults. If the
composition is made with more than two signals with different frequencies, the curve obtained is said a Lissajous knot. For test purposes, the basic idea is to partition the space of the observed signals into zones and count the number of zone to zone crossings of the knot during a test session. The contents of these counters constitute the digital signature of the analog circuit under test. In this paper, a generalized multidimensional LBT methodology is proposed. The observation space is partitioned into zones using planes or hyperplanes (see an example in Fig. 1). To obtain the signature of a knot, a matrix structure containing a cell for all possible transitions between zones is used. Each cell of the matrix is a counter of the specific transition assigned to the cell in a complete cycle of the knot. The values of the counters change due to the defect allowing the possibility of defining metrics for taking the pass/fail decision. The paper proposes an architecture to determine the Lissajous point position as time evolves. The circuit monitorises in which of the halfspaces of the hiperplanes the point is located. Weighted adders and fast hysteresis comparators determine the zone crossings to be counted. The simulation results show the effectiveness of the proposed testing applied to a biquad filter CUT for three dimensional knots.
4
Fig. 1. Cube with a knot curve and a set of possible control planes that divide the cube in different zones, one of them showed in the right figure.
DCIS 2004
 128 
Session 5d
RF Building Blocks
Thursday nov. 25 8h30 10h00, Auditorium Chairs Eric Kerherv (E.N.S.E.I.R.Bordeaux) Antonio Torralba (U. de Sevilla)
Synchronous Oscillator Locked Loop: A New Delay Locked Loop Using Injection Locked Oscillators as Delay Elements.
F. Badets (1), M. Benyahia (2), D.Belot (1) (1) STMicroelectronics C R&D, Crolles, France (2) STMicroelectronics, CR&D, Rabat, Morocco
N this paper it is explained how injection locked oscillators could be used as delay elements. An
example of implementation of a new DLL called SOLL (Synchronous Oscillator Locked Loop)
DCIS 2004
 130 
A Fully Integrated Mixer in CMOS 0.35 Pm Technology for 802.11a WIFI Applications
R. Diaz, R. Pulido, A. GoniIturri, S. L. Khemchandani, B. Gonzalez, J. del Pino Institute for Applied Microelectronics of Las Palmas de Gran Canaria University, Spain. sunil@iuma.ulpgc.es
HIS work presents the design of a fully integrated passive mixer for the IEEE 802.11a wireless LAN standard using a 0.35 m CMOS standard technology. An operational amplifier has been
used in order to compensate the mixer attenuation. The average DC output voltage of the operational amplifier is fixed using a common mode feedback (CMFB) circuit. All passive devices are integrated on chip, including the impedance matching spiral inductors, which have been designed by electromagnetic simulations. The circuit layout, shown in Figure 1, occupies a total area of 0.605 mm2 including the spiral inductors. The mixer provides 43 dB of conversion gain, 45 dB of single sideband noise figure (NF), a third order input intercept point (IIP3) of 40 dBm, and a power consumption of 3.4 mW. Therefore this performance is valid for the 802.11a standard. This work shows that with proper mixer topology and design techniques it is possible to design a mixer
suitable to be used in the 5 GHz band with a low cost silicon technology.
DCIS 2004
 131 
LTHOUGH study of circuits capable to perform two functions, oscillation and mixing, is still
relaxation oscillator has two outputs with the same frequency which are accurately in quadrature. In this paper we study how to perform mixing directly in the oscillator, which yields an oscillator/mixer circuit. We evaluate how the inclusion of mixing in the oscillator affects its performance. This is an oscillator/mixer highlevel study where equations for the dutycycle, quadrature relation and oscillation frequency as a function of circuit parameters are obtained. We design a 2.4 GHz CMOS relaxation oscillatormixer using AMS 0.35m technology to confirm the theoretical results by simulation.
DCIS 2004
 132 
Two Low Noise Amplifiers (LNA) for 802.11a WLAN standard have been implemented using UMC 0.18 Pm 6 metal layers CMOS technology. This first Stage amplifier must fulfil the hard requirements of the standard, here summarised: the available signal treated varies between 82dBm and 30dBm. Furthermore it must not reach a Noise Figure as high as 14dB, neither a 1dB compression point of 20dBm, considering the entire working bandwidth (300MHz) of the 56 GHz band. In this work, two circuits have been implemented. The first design is built in a singleended architecture whereas the second is differential, both with inductive degeneration in the input stage and cascode at the output. These two concepts deserve a brief explanation. The inductive degeneration consists in an inductor connected to the source terminal of the amplifying transistor: It provides a good input matching adding low noise. Moreover, the cascode improves the frequency response of the whole circuit as it mitigates the parasitic millers effect. It also augments the reverse isolation. In order to complement the study, the passives components used have been modelled in a pimodel and characterised in an accurate onwafer testing which provides a maximum error of 3% in the inductance value and in the quality of the inductor. Finally the complete results are exposed. The reached Noise Figure (NF) values vary from 2.2 to 2.8dB, with a gain as high as 19.6 and 13.9dB respectively. The singleended LNA provides a high gain with a low noise figure (next to the minimum NF available with this circuit), nevertheless the differential one lose precision in all fields because of the noise optimisation, which is the most important parameter of this kind of design. In any case, the WLAN 802.11a is easily accomplished which proves the adaptation of the UMC 0.18Pm 6 metal layers CMOS technology is adequate for this application.
DCIS 2004
 133 
EVERAL authors have already analyzed the noise of the gate mixer in which the noise figure is
varied between 710 dB in microwave applications. The noise figure in the mixer is higher than the
noise figure in the low noise amplifier (LNA). This mixer noise figure is masked by amplifier gain in downconversion receiver. Decrease of this noise figure means a reduction of amplifier number cascade stages. The design philosophy of the HEMT gate mixer is based on the idea to exploite nonlinear behaviours by maximizing the magnitude of the transconductance (Gm) at fundamental frequency to allow a high conversion gain. This paper presents the design, simulation and calculation of two downconversion mixers for LORF frequencies in the 9.511 GHz band. The first mixer is the classic design, which is caracterized by a high conversion gain. The design of the second mixer aimed at obtaining low noise figure in the HEMT gate mixer by optimizing the input circuit. The optimisation and the design of the low noise mixer concept is defined by simulation using harmonic balance tool giving by ADS simulator and calculation using analytical formula describing the mixer noise figure as function of OL power level. This analytical calculation can be achieved using supperposition method1 used generally to describe amplifier noise figure. Noise figure analytical calculation points out the role of this nonlinear element and enables the optimizing of the mixer noise performance. The aim of this work is to demonstrate the possibility of decreasing the noise figure in the HEMT gate mixer, based on the optimisation of the input matching circuit2. The noise figure performances of the mixers are measured and compared with the calculated and simulated performances. The single side band noise figure results are in good agreement with the experimental data. The LO, RF and IF frequencies chosen for this test are 9.5, 11 and 1.5 GHz, respectively. It is shown that the noise figure is reduced of 4 dB in the low noise mixer circuit.
F. Amrouche, R. Allam, J.M. Paillot, Simulation and analytical calculation of the noise figre in HEMT gate mixers, 33rd European Microwave Conference, pp. 351354, Munich, October 2003. 2 F. Amrouche and R. Allam, Analysis and Design of Microwaves Low Noise Mixers, IEEE Mediterranean Microwave Symposium, Abstract 137, Marseille, June 2004.
1
DCIS 2004
 134 
deals with the analysis and design of a 2.45 GHz CMOS ring oscillator (VCO) Twithwork phasenoise lower than 90dBc/Hz at 500kHz frequency offset. The study is placed in
HIS
low cost context with a dramatic constraint for the silicon area and power consumption. Generally speaking, a VCO must be able to work at high frequency with optimal power consumption, but one of the key specifications is phasenoise minimization. We propose in this paper the design of CMOS ring oscillator dedicated to radiofrequency systems belonging to IEEE 802.15 standard applications. Derived from the Yan and Luong cell [1], the proposed structure can work at a frequency twice as high as the initial circuit while insuring low phasenoise and low power consumption. Implemented on 0.28m CMOS technology, the circuit occupies only 35m35m with a power consumption of 19mW. Detailed study of the proposed structure is performed. Simulation results obtained with STMicroelectronics 0.28m CMOS technology are presented.
W. S. T. Yan and H. C. Luong, A 900 MHz LowPhaseNoise VoltageControlled Ring Oscillator , IEEE
Trans. On Circuits and SystemsII: Analog and Digital Signal Processing, vol. 48, NO. 2, pp. 216221, Feb. 2001.
DCIS 2004
 135 
Session 6a
HIS paper discusses several architectures for a floatingpoint processing module capable to execute 100 million multiplyandaccumulate (MAC) instructions per second. Pipelining and
parallel operation will be considered in order to meet the specified goals. The feasibility of the processing module (Figure 1) to be implemented in a 0.35m CMOS fabrication technology with three metal layers and a supply voltage of 3.3V is studied. The module is to serve as standalone IP or as coprocessor to a software programmable DSP core, specially designed for mixedsignal integrated circuits. Beside the MAC instruction, the module needs to compute ADD, SUB, Fixed2Float and Float2Fixed instructions. Data are assumed to be 16 bits wide, out of which 12 bits are used for the mantissa and 4 bits for the exponent. 100 million instructions per second need to be executed with a clock of 100 MHz supplied to the module, with the speed requirement given by the DSP core. Low power operation and small silicon area are of secondary interest, therefore the use of pipelining and parallel logic to increase the operating speed is considered. The application of both techniques provided an increase in speed, however, also additional hardware costs. Parallel computing did not meet the 100 MHz specification for the 0.35Pm mixedsignal fabrication technology, whilst the pipelined solution did. On the other hand, the pipelined solution did show a loss of accuracy for input sequences at Nyquist frequency, which is not visible in the parallel implementation.
DCIS 2004
 137 
Computer Science Department, Universitat Autnoma de Barcelona, Bellaterra, Spain, {Lluis.Ribas, David.Castells, Jordi.Carrabina}@UAB.es
ARDWARE sorters exploit inherent concurrency to improve the performance of sequential, softwarebased sorting algorithms. They are usually based on Batchers oddeven or bitonic
merging networks to attenuate the areagreedy hardware solutions. These sorting networks require
(n log2 n) processing elements (shown in fig. 1) and, with the appropriate pipeline, can sort n data in a
single clock tick. Unfortunately, their size might not fit in singlechip solutions. For cases where log2 n >> 1, elements of merging networks can be reused by recirculating data through them at the cost of an area overhead due to additional programmable interconnection networks and corresponding controllers. Simpler controlling schemes can be applied on linear arrays, thus minimizing area complexity but with a penalty in time, i.e. the number of clock cycles has a linear complexity rather than a logarithmic one. In this paper, a new hardware sorter architecture built on a programmable register file is presented. It is inspired on insertion sorting and composed of n dataslice cells (shown in fig. 2). Each cell optionally shifts its contents to the next one, thus the result module is named shifter sorter. As other approaches based on linear arrays, shifter sorters are easily expandable and require minimal control schemes. Differently from them, they use simpler processing elements. Results show drastic area savings with respect to other approaches. On the other hand, though shifter sorters can operate with much faster clock signals, sorting pipelined networks achieve better time responses with parallel input data. However, the former exhibit a better areatime performance. For serialized input data, shifter sorters outperform both in area and time previous approaches.
D
A AB
1 0
X
d=0
Di
1 0
Ri D pi
Di+1
0 B
d
d=1
load ck
Ri
Ri
DCIS 2004
 138 
E.Pons1, J.L.Merino2, L.Ters2, J.Carrabina1 1 Departament dInformtica, UAB, Barcelona, Spain, Enric.Pons@uab.es; 2 Centre Nacional de Microelectrnica CSIC, Campus UAB, Barcelona, Spain
HIS paper presents a methodology for the design, verification and prototyping of complete systems and microsystems development, that contain CMOS image sensor arrays together with acquisition, When an optical sensor must be selected for a concrete application, some control and image
processing and communication features. processing must be performed to provide the required image format to the application. In order to adapt the image sensor to the desired application, a rapid prototyping environment has been designed. It allows the test of commercial and custom sensors, with its own control capabilities, as image format, frame rate, readout management or sensitivity calibration. The prototyping environment allows the use of a SW environment to develop the image processing necessary for the required application (OCR, pattern detection). Due to the reconfigurability of the system different communication resources are available, as wireless, bluetooth, USB, UARTSo, SW development could be performed over a userdefined platform. In conclusion, the prototyping environment permits the selection of the best sensor for the application and the refinement of the processing application using several SW platforms. Its a useful and powerful platform for the initial development of an image sensing based system and also for testing custom CMOS sensors. This methodology has been validated for different CMOS sensors including proprietary developments, research focused sensors and commercial sensors. Management of acquisition, processing and communication was done using proprietary FPGA platforms. Communication methods include UART, Ethernet and wireless interfaces.
DCIS 2004
 139 
HE use of FPGAs in the implementation of rapid prototyping systems can allow the fast
creation of powerful digital hardware emulators. This potential for system debugging with such
systems is usually very limited, and mainly related to the monitoring of the external interfaces. To improve the analysis capabilities of such systems, some commercial packages have been made which allow the capture of intern signals, but these systems are limited in both the number of signals and the number of captures that can be made. This article describes the most recent advances made in a hardware debugger system known as the UNSHADES system. This system, works by the addition of a small debug controller to the design to be debugged. This controller provides many new design analysis features such as single clock stepping, state modification or register inspection. All these options are provided over the entire design without the need for presynthesis signal selection. The addition of the debug controller can be achieved with only minor design modifications and very little dedicated FPGA resources. A sample debug controller implemented in a virtex2 FPGA requires the use of just 3 IO pins and 43 logic slices. Over half of these logic slices are dedicated to an optional 32bit cycle counter.
DCIS 2004
 140 
Session 6b
A 330 MHz Tunable ContinuousTime Bandpass SigmaDelta A/D Converter for Direct Conversion of Radio Signals
D. Bisbal, J. San Pablo, J. Arias, L. Quintanilla, J. Vicente, and J. Barbolla Departamento de E. y Electrnica, E. T. S. I. de Telecomunicacin, Universidad de Valladolid, 47011Valladolid, Spain, email: davbis@tel.uva.es
CMOS fourthorder continuoustime bandpass deltasigma modulator has been designed and
simulated. The outstanding features of the proposed modulator are widerange tuning capability
and low power consumption. We suggest its use in a radio communications receiver frontend to perform A/D conversion of the RF signal coming from the receiver antenna before any mixing (Fig. 1a). This way, the entire signal process mixing, filtering and demodulation is carried out by digital circuitry, thus allowing high performance to be achieved with low power consumption and at low cost. Receiver analog components are reduced to just a low noise amplifier (LNA), a bandpass filter and an A/D converter. In this paper, the ADC is designed in order to be integrated in a singlechip, shortwave radio receiver, implemented in standard triplemetal 0.35Pm CMOS technology. The architecture chosen to implement the bandpass '6 ADC (shown in Fig. 1b) consists of the cascade of two GmC resonators. It allows highresolution digitalization of narrowband signals modulated at fo = gm/2SC when clocked at fs # 4fo. Capacitors denoted by C are actually capacitors arrays which allow the modulator to be coarsely tuned. Fine tuning may also be done by adjusting the transconductances by means of changing the bias point of transconductors. Of course, the sampling frequency fs must follow the changes either on C or on transconductances. Tuning is done automatically by means of a masterslave tuning system which is also presented in the paper. Simulation results show that the proposed ADC achieves an SNDR higher than 60 dB for 10 kHzbandwidth AM/SSB signals modulated on a carrier in all the HF band (330 MHz), while consuming only 6 mW.
Analog domain Digital domain
AGC
cos(2SfCLKt)
1, 0, 1, 0, ...
DSP
(DEMOD. & FILTERING)
LNA
BPF
BP ADC
DECIMATOR
(LPF)
gm
gm C
gm
C g m
CLK
Vtune
z 1
0, 1, 0, 1, ...
sin(2SfCLKt)
TUNING CIRCUIT
g mb0
CLOCK SINTHESIZER
(Digital tuner)
g mb1
g mb2
g mb3 DAC
fCLK / 4
1 4
fCLK
REF CLOCK
(a)
(b)
Fig. 1. (a) Proposed receiver architecture. (b) Singleended block diagram of the 4thorder bandpass '6 modulator, with loop filter based on the cascade of two GmC resonators.
DCIS 2004
 142 
A New Method for the HighLevel Synthesis of ContinuousTime Cascaded 6' Modulators
Ramn Tortosa, Jos M. de la Rosa, Angel RodrguezVzquez and Francisco V. Fernndez Instituto de Microelectrnica de Sevilla IMSECNM (CSIC) Ed. CNMCICA, Av. Reina Mercedes s/n, 41012 Sevilla, SPAIN. Email: {tortosajrosaangelpacov}@imse.cnm.es*
ONTINUOUS
communication systems. In addition to show an intrinsic antialiasing filtering, CT 6'Ms provide faster operation with lower power consumption than their DiscreteTime (DT) counterparts. In spite of their mentioned advantages, CT 6'Ms are more sensitive than DT 6'Ms to some circuit errors, namely: clock jitter, excess loop delay and technology parameter variations. The latter are especially critical for the realization of cascaded architectures, what has forced to use singleloop topologies in most reported prototypes. However, the need to achieve mediumhigh resolutions (>12bits) within high signal bandwidths (>20MHz) while guaranteeing stability, has prompted the interest in proper methods for the synthesis of cascaded CT 6'Ms. These methods are based on applying a DTtoCT transformation to an equivalent DT 6'M that fulfils the required specifications. One of the problems of using such a transformation is that additional feedforward coefficients are normally needed to achieve an absolute equivalence. As a consequence, a high number of analog components (transconductors and/or amplifiers and digitaltoanalog converters) have to be included in order to implement all the arising coefficients. This paper presents a new methodology for the highlevel synthesis of cascaded CT 6'Ms, that based on dispensing with the DTtoCT equivalence, allows to efficiently place the zeroes/poles of the loopfilter transfer function and to reduce the number of analog components. This leads to more efficient architectures in terms of circuitry complexity, power consumption and robustness with respect to circuit nonidealities. As an application of the proposed method, several new cascaded CT 6'Ms have been synthesized and optimized to cope with VDSL specifications, i.e 12bit resolution within a 20MHz signal bandwidth. Behavioural simulations considering most critical error mechanisms are shown to validate the presented approach.
This work has been supported by the Spanish CICYT Project TIC20010929/ADAVERE.
DCIS 2004
 143 
HIS paper presents new highlevel modeling techniques to improve the simulation speed of allMOS
oversampling A/D converters implemented in the logdomain1 . Functional modeling for all
basic building blocks is obtained from the analytical circuit analysis at transistor level using advanced MOS device models, including thermal noise, moderate inversion, nonlinear MOS capacitance and DAC waveform asymmetry. The resulting behavioral models improve simulation speed by more than 1000 times respect to classic SPICE verication, while preserving devicelevel accuracy, and also allow the independent study of circuit nonidealities. A complete design example of a fourthorder singlebit modulator is given for a digital 0.35m CMOS technology. In this case, the overall dynamic range estimation of the ADC can be computed, using the same resources, in 1 hour instead of 18 weeks.
Vk
Vdac
Figure 1: Example of highlevel modeling for the integrator basic building block (left) and some comparative results from functional and electric simulations (right).
1 F. SerraGraells, 1V AllMOS A/D Converters in the LogDomain, Journal of Analog Integrated Circuits and Signal Processing, Kluwer Academic Publishers, Special Issue on ISCAS02, vol. 35, no. 1, pp. 4757, Apr 2003.
DCIS 2004
 144 
HIS paper presents a dualband sigmadelta modulator for GSM/WCDMA receivers. The modulator makes use of lowdistortion sigmadelta modulator architecture to attain high linearity
over a wide bandwidth. The dualband modulator employs a lowdistortion 2nd order single bit sigmadelta modulator for GSM mode and a lowdistortion 4th order modified cascaded modulator with singlebit in the first stage and 4bit in the second stage for WCDMA mode. In GSM mode, the second stage is switched off to reduce the power dissipation. Our sigmadelta modulator shown in Fig. 1 involves two keys design issues. One is the 2nd order sigmadelta modulator with feedforward signal path, which has a reduced sensitivity to opamp nonlinearities. The other key issue is an architectural approach, which combines the merits of modified cascaded (22) architecture and multibit quantization in the last stage to make all quantization noise sources negligible at low OSR. The modulator is designed in 0.18um CMOS technology and operates at 1.8 supply voltage. Simulation results show that, the proposed architecture has good tolerance to circuit nonidealities and achieves a peak SNDR of 75dB in WCDMA mode for an OSR of 16 and a peak SNDR of 83dB in GSM mode, for an OSR of 160.
DCIS 2004
 145 
Session 6c
Analog Test
Thursday nov. 25 10h30 11h30, Pyla Room Chairs JeanLouis Carbonero (STMicroelectronics) Salvador Bracho (U. de Cantabria)
DCIS 2004
 147 
Experimental Analysis of Transient Current Test Based on GIDD Variations in S2I Memory Cells
Y. Lechuga, R. Mozuelos, M. A. Allende, M. Martnez, S. Bracho Microelectronics Engineering Group; Electronics Technology, Systems and Automation Engineering Department; University of Cantabria; Santander; Spain {yolanda, roman, allende, martinez, bracho}@teisa.unican.es
he current variations, IDD, appearing in the memory cells of SI circuits in the presence of faults,
give rise to changes in the overall dynamic supply current, IDDT, which are analyzed in the test
methods based on this IDDT current. The capability of propagation of the effects of the faults injected inside the circuit can be considered, and we will call it as fault reflection. Basing on this fault reflection mechanism, a new test method that directly analyzes the current variations, IDD, appearing in one of the memory cells that constitute the SI circuit, has been developed. This test method has the advantage of avoiding the losses of information appearing in the integration of the IDDX signal, which can mask the effects of the faults. We have designed and built, in AMS 0.6P technology, a benchmark circuit, shown in Figure 1, based on a switchedcurrent algorithmic A/D Converter topology to establish the fault coverage obtained with this new test method, by real measurements; and conclusions have been extracted.
DCIS 2004
 148 
F systems take benefit from the digital modulation/demodulation techniques to transmit data. These techniques separate the signal into two orthogonal components: I (Inphase) and Q (Quadrature). The
representation of the instantaneous value of the signal in the IQ diagram is called constellation diagram. Sometimes the receiver is not able to recover the data, and the information obtained by the receiver has some errors. This paper presents an error detection method, which takes into consideration the symbols detected close to the limits of the decision regions in constellation diagrams. This information allows the observation of defects and mismatches, which are hardly detected by the classical phase division method. The method consists of a properly division of the IQ diagram. When the symbols are obtained, every symbol is related to an IQ division by means of a code. Due to this codification, errors in RF systems are detected. Figure 1 shows the constellation symbols for a QPSK (Quadrature Phase Shift Keying) modulation and its IQ diagram division when [1,1] is the expected symbol. The method consists in defining four different regions for every symbol: the expected division, the error division, the close undefined and the far undefined divisions. The close undefined divisions enclose symbols which are correct, but are close to being wrongly demodulated. On the other hand, the far undefined divisions give an idea of errors, which are close to being correctly demodulated. An OQPSK (Offset QPSK) modulation system has been implemented with Matlab. The faulty behavior of the system is simulated by means of a Vt mismatched mixer in the quadrature component of the receiver. Table I shows the simulation results. The difference between the close undefined and far undefined symbols number shows the existence of the mismatch, which is more hardly detected with the BER value.
1 [1 1] 9 11 3 [1 1]
 Vt MismatchMISMATCH
5 4 [1 1] 13 12 15 14 7 6 [1 1]
Vt(RF) (mV) 5 3 2 6 Vt(LO1) (mV) 5 4 5 2.5 Vt(LO2) (mV) 3 3 4 Ncu/NT 1.076101 1.107101 1.137101 1.078101 1.11710.1 1.087101 1.128101
SYMBOLS
Nfu/NT 6.337104 7.091104 7.873104 6.337104 7.130104 6.664104 7.709104 Ne/NT 1107 1107 1107 1107 1107 1107 2107 BER 6.338104 7.092104 7.874104 6.338104 7.131104 6.665104 7.711104
0
Correct Close undefined
10
2
Far undefined Error
Ncu : Number of close undefined symbols Nfu : Number of far undefined symbols Ne : Number of error symbols BER : Bit Error Rate
DCIS 2004
 149 
On the Minimum Number of Measurements for Single Fault Diagnosis in Linear Circuits
J. Soares Augusto
INESCID, R. Alves Redol, 9, 1000029 Lisboa, Portugal Physics Dept, Fac. Ci encias da Univ. de Lisboa
We describe a method for performing single fault diagnosis in linear dynamic circuits. The method is useful for building compact fault dictionaries. It also demonstrates that the minimum number of diagnosis (circuit) variables needed for single fault diagnosis at a single frequency is two, and this number is independent of the circuit complexity. In practical cases this number can be larger due to the relationships between circuit variables. The faulty circuit equations are obtained with a numerical modication of the LU factors of the nominal circuit matrix resulting from the use of Modied Nodal Analysis (MNA). vectors, which are unique vectors associated to each circuit component, The main result in the paper uses the t that appear in the mathematical development of the faulty circuit equations. It is the (complex) elementwise between the solution of the nominal (good) vector, and the (complex) vector dierence dx ratio between the t ), which allows for fault diagnosis. This ratio will be the same for circuit ( x) and the faulty circuit solution (x vector corresponds to the correct fault. all the elements when the selected t and of t vectors is shown in The minimum number of variables needed for this task is two. An example of dx gure 1. The main result is: Suppose two circuit variables (voltages or currents), x and x , are chosen as diagnosis variables vector are t and t . Suppose also that there are Nc and that the corresponding elements in each t dierent parameters appearing in the MNA system of equations and that there is a fault in the k th parameter. Then, x and x are sucient as diagnosis variables i:
c N k=1
k k = tk tk k k = tj tj
Nc c N k=1 , j =1,j =k
where and are, respectively, the nominal values of the diagnosis variables x and x , and k and k are, respectively, the values calculated for x and x when there is a fault in the k th component.
t1 V3 t2 t3 dx1 dx2 dx3
V1
V2
IV
I3
vectors for 3 faults in a small circuit. The vectors represent and dx Figure 1: Scaled pictorial representation of t phasors of circuit variables, and those not shown are zero.
DCIS 2004
 150 
Session 6d
HIS paper presents a new approach for realizing digitally programmable VHF/UHF filters
which are suitable for pure digital CMOS technologies and for hard disk drive (HDD) read
channel applications. The strategy followed is based on a technique that provides a programmable/tunable transconductance, based on a parallel connection of unit foldedcascode cells, where the total parasitic capacitances are maintained constant thanks to the specific design of the unit cell. A fullybalanced currentmode GmC integrator has been implemented (Fig. 1, 2). It is able to operate over the 30 MHz 220 MHz range with a phase error of less than 4 and 80 dB of dynamic range for 1% of total harmonic distortion (THD) over all the programming range. The cell has been proved in 0.35Pm and is aimed to be built in 0.18Pm CMOS silicon technology. The transconductor cell consumes 1.63mW from a power supply of 2V.The simulation results confirm this approach as a fine choice to achieve filters exhibiting a good tradeoff between tuning capability and dynamic range working in the very high frequency range. A comparison between several programmable filters implemented with similar technologies is also included. In this way, certain conclusions can be drawn about the proposed design and the benefits of this technique and the most relevant characteristics of the programmable filters are summarized. The most striking quality to point out with a study of this comparison is that the use of current mode operation combined with the proposed strategy leads to substantially wider dynamic ranges with lower power consumption.
IBIAS IBIAS IBIAS
Ii
+
CI CI Ii

gm
gm
gm
IO

MP2
MP3 I+o
MN2
M3
MN3
gm
gm
gm
M6
IO
+
VB
MP4
MP5
MP6
IBIAS
IBIAS
IBIAS
DCIS 2004
 152 
HIS paper presents a mixedsignal ASIC for a FrequencyModulated Differential Chaos Shift Keying (FMDCSK) communication system [1][2] which has been implemented in a 2P3M
0.35m CMOS technology. The prototype has been provided with several programming capabilities to serve as an experimental platform for the evaluation of the FMDCSK modulation scheme. The operation of the integrated circuit is herein illustrated for a data rate of 500kb/s and a transmission bandwidth in the range of 17MHz. Based on experimental results, an estimation of the Bit Error Rate (BER) performance of the modulation scheme in a wireless environment at the 2.4GHz ISM band under different propagation conditions has been realized. Measured results confirm theoretical predictions.
DCIS 2004
 153 
A multifunctional approach of frequency synthesizer dedicated to the next multistandard smart objects.
Christophe Rougier1, JeanBaptiste Begueret1, Herv Lapuyade1, Yann Deval1 and Angelo Malvasi2
2
IXL Laboratory, 33405 Talence, France, email : rougier@ixl.fr ACCO, 21 bis rue dHennemont, 78100 St GermainenLaye, France, email:angelo.malvasi@accoic.com
With the continuous growth of communication standards, microelectronics designers have to adapt their circuits to fulfill telecommunications market. Nowadays, all the modern transceivers rely on multistandard frequency synthesizer in order to cover various standards using the same devices. This way, using both the high level integration with recent technologies and these devices, we can provide circuits able to process different standards while consuming reasonable silicon area. Nowadays, the frequency synthesizer is one of the most fundamental cells in a telecommunication transceiver. This building block must synthesize the required periodic signals for both the upconversion in the transmitter and the downconversion in the receiver. Although multistandard frequency synthesizers exist, they are dedicated only to a specific communication link (voice link, or data link, or positioning link). The emergent idea is to create architectures able to manage simultaneously various standards for different communication links on the same chip. According to Fig. 1, it is obvious that future smart systems ought to process, in the same time, standards as GSM or DCS or PCS (for the voice link) as well as Bluetooth or HiperLAN standards (for the data link) and GPS standard for the positioning link. Consequently, these multistandard systems should own a complex frequency synthesizer which will be both multistandard and multifunctional. So, this paper deals with a new approach of the frequency synthesizer, permitting to provide multiple local oscillators for different communication links. The feasibility of this structure to be both multifunctional and multistandard will be demonstrated through behavioral simulation results. It is a real challenge to manage such systems. Indeed, three different local oscillators will be synthesized on the same silicon substrate. So, parasitic couplings between the three local oscillators synthesized within a single silicon substrate may occur. To ensure a well controlled phase relationship between all local oscillators synthesized, a solution is proposed. Next, the multifunctional Frequency Generation Unit (FGU), depicted on the Fig. 2 able to provide at any time the wanted standard for a given communication link (phone, data or positioning) is presented.
Data link
Fref
Q I Q I Q
Standard Standard 1a 1
LO Standard 2 LO Standard 3
LO
Positioning link
MCU signal
Voice link
DCIS 2004
 154 
IN
this paper we present a prototype of a local oscillator based on the use of a PhaseLocked Loop (PLL) for the carrier frequency mixed with a Direct Digital Synthesis (DDS) system for channel tuning. The system is evaluated with MATLAB, and the DDS is implemented in a FPGA and evaluated. The system is flexible, reconfigurable, and allows several types of digital modulation.
DCIS 2004
 155 
Session 7a
Vdd
M4
I2
I2
DCIS 2004
 157 
his paper reports a programmable PhaseLocked Loop (PLL) frequency synthesizer designed for a MixedSignal BuiltInSelfTest (BIST) application. This synthesizer generates the required signals for the characterization of sinewave signals needed by an approach reported elsewhere. The basic structure of a typical PLL has been modified and adapted for the intended application. The structure and operating modes of the different blocks are presented together with simulation results.
DCIS 2004
 158 
Analog IC Design With A Library Of Parameterized Device Generators Vincent Bourguet, Laurent de Lamarre and MarieMinerve Lourat University of Paris VI, LIP6ASIM Laboratory, 4, Place Jussieu, 75252 Paris, France
Email : Vincent.Bourguet@lip6.fr
Here we present the CAIRO+ language that allows the analog designer to create generators of analog functions. CAIRO+ is aimed to help the designer to capture his knowledge thus creating a library of analog functions. Complex hierarchical analog function generators are designed by using existing generators of simpler functions. These generators can be designed to be independent of the fabrication process thus enabling process and specication migration. The CAIRO+ language, composed of C++ predened functions, is a new answer to the problem of electrical and layout codesign of analog circuits. It has inherited ideas from the CAIRO language concerning layoutaware issues [2, 1], yet it has dramatically enhanced the communication between synthesis and layout throughout the hierarchy. Designing a Module Generator A module is an instance of the hierarchical netlist representing the circuit, it is created by a module generator. A module can instantiate other modules. The module tree is dened where one node corresponds to one module. The leaf cell of the module tree is called a device. In order to predict parasitics resulting from layout, we have chosen an approach using layout templates with layout device generators. The description of the relative placement of instances inside a module is described by a container tree. A container is composed of abuted containers placed besides each other in a specic order. There exists vertical and horizontal containers. The leaf cell of the container tree corresponds to a device. Devices consist of elementary components such as folded MOS transistors, capacitors and resistances but also sets of elementary components that have to be matched (i.e. differential pair, current mirrors, capacitor matrices, matched resistances). The module tree is used to represent the netlist template of the circuit and the container tree is used to represent the layout template of the same circuit. In order to design a new module generator, the analog designer has to write the four following functions corresponding to the four steps of our design ow : 1.Capture of Netlist and Layout Templates. In this step, functions allow the creation of the netlist and the relative placement of unsized instances. 2.Design Space Exploration. In this step, specications are propagated from top to down in the module tree thanks to dedicated functions, the result is a sized schematic. 3.Shape Function. In this step, the module layout shape function of the sized schematic is computed. The shape function gives all the possible aspect ratio for layouts of a sized schematic. The shape function of a module is computed in a recursive manner, from bottom to top, based on the container tree. 4.Layout Generation. Finally, given a geometrical constraint, like module height or aspect ratio, the feasible height of the module is selected, by examining its shape function. With a recursive top to bottom approach, the actual shape of devices is selected. Then the relative placement is performed. Hierarchical routing from bottom to top is then performed.
R FRENCES
[1] Mohamed Dessouky, MarieMinerve Loue rat, and Jacky Porte. LayoutOriented Synthesis of High Performance Analog Circuits. Proc. DATE 2000, pages 5357, 2000. [2] M. Dessouky. Design for Reuse of Analog Circuits. Ph. D. Thesis, University of Paris VI, 2001.
DCIS 2004
 159 
HE required design time needed in VLSI to implement a simple amplifier can be very high if
compared to the speed at which much more complex digital building blocks can be developed. The
problem is the lack of widely available useful automated tools for the design of analog circuits. Due to the complexity of the complete MOS transistor model there is a problem creating useful parametric cell libraries for analog design. The goal of this work is to provide an exact method to calculate transistor sizes while keeping the formulation of the topology simple. The cells are described mathematically in this method with first order transistor models and simple circuit equations, but a parameter update loop makes the solution to this formulation accurate by recalculating the first order model parameters with the complete transistor model. This allows the designer to easily specify different circuits and create scripts to automatically size the transistors as function of the given requirements while obtaining exact solutions. The proposed method is based in the convex optimization approach already used in GPCAD1 and others2, but a way to include complete spice simulations is implemented for the iterative calculation of first order model parameters, therefore achieving more accurate results. The result is a set of scripts from which an automaticallysizable cell library consisting of several basic analog building blocks has been obtained. The development of one of them, the MOS inverter, is explained in detail as an example. The scripts allow the designer to quickly obtain analog circuit prototypes for mixed VLSI systems. A complete operational amplifier, briefly explained in the paper, has been developed an actually fabricated with these scripts to demonstrate that the proposed methodology saves much of the time required to design simple circuits, and thereby concentrates the work of the designer in higher level tasks such as system specification or resource distribution.
M. Hershenson, S. Boyd and T. Lee, Optimal Design of a CMOS OpAmp via Geometric Programming. In IEEE Trans. ComputerAided Design of Integrated Circuits and Systems, 20:121, January 2001. P. Mandal and V. Visvanathan, CMOS OpAmp Sizing Using a Geometric Programming Formulation. In IEEE Trans. ComputerAided Design of Integrated Circuits and Systems, 20:2238, January 2001.
2
DCIS 2004
 160 
WITCHED current class AB memory cells are known to be preferment SI cells. However,
designing such optimal cells is a tedious process which reclines on designer experience. Thus,
automating the transistor sizing process is a very important step towards being able to rapidly design high performance custom cells. In our paper, we present an idea to improve SI memory cells performances. It is mainly based on a stochastic exploration of an advisedly determined parameter vectors and on building a heuristic by which an objective function, composed of a weighted sum of error and performance functions expressed in reduced unities, is optimised. For this purpose we built mathematical models of the cell and non_idealities affecting it. We applied the proposed procedure to design optimal S2I grounded gate class AB memory cells. Figure 1 illustrates the proposed heuristic. The optimisation program, written in C++ software, allowed us to reach high performances in terms of accuracy, SNR and speed. The obtained results were proved by SPICE and CADENCE simulations. With use of 0.35m CMOS process, the treated cell reaches a dynamic range of 80 dB at 16MHz sampling frequency. For top priority given to settling time, the application of the proposed heuristic allowed us to get less than 0.5ns as settling time. The proposed optimized cell will be used for designing switched current sigmadelta converters and programmable filters suited for radio frequency applications.
'''optimal''' parameters
simulation
DCIS 2004
 161 
The Importance of Microwave Approach for High Frequency MOS Analog Designers
Gilles Petit1, Richard Kielbasa2, Vincent Petit3 Service des Mesures, Suplec, Gif sur Yvette, France, email : gilles.petit@supelec.fr 2 Service des Mesures, Suplec, Gif sur Yvette, France, email : richard.kielbasa@supelec.fr 3 Thales Airborne Systems, lancourt, France, email : vincentf.petit@fr.thalesgroup.com
1
ODAY analog products reach X and upper bands because of the increase in the need for low
cost, high bandwidth multimedia products. In order to satisfy to these new criterion, on chip
passive components and especially inductors are required. Thus integrated circuit designers are facing new problems that typically belong to radiofrequency (RF) area and they have to consider both method that was reserved up to now to each frequency domain (low and high) and to choose between them. In this paper, we present rapid method to evaluate Qfactor of inductors layouted both with the traditional analog approach and with coplanar waveguides (CPW). Those method are mainly derived from works from S.S. Mohan et al.1 and E. Yamashita et al.2 concerning the computation of L values. Then we use equation Q
ZL
R
Finally, using a particular SiliconOnSapphire technology this paper shows that the classical analog approach of printed square spiral cannot always meets both technical (high quality factors for low inductance values) and industrial (small die areas) requirements at a frequency of 10GHz. CPW, a second solution borrowed to the hyperfrequency field, are thus studied and their benefits pointed out. Issues, solutions and limits presented in this papers can be easily extended to any kind of onchip inductors and microstrips just by changing values in given equations. RF analog designers must also use presented methods to quickly and early choose between microstrips and printed inductors during the design flow. More precise electromagnetic simulation can then be made for precise layout.
S.S. Mohan et al., Simple accurate expression for planar spiral inductances, IEEE JSSC, 14191424, Oct., 1999 E. Yamashita et al., Analysis of microstriplike transmission lines by nonuniform discretization of integral equations, IEEE MTT24, 195200, 1976
2
DCIS 2004
 162 
Session 7b
This paper aims at introducing a novel design methodology of compact, high performance and secured dual rail primitives widely used in QuasiDelay Insensitive circuits (QDI). An example of application of this design methodology to basic QDI primitives is given on a 130nm process. Performances and security properties of the resulting cells are then compared, using electrical simulations, to the implementations proposed in former works. Selftimed circuits appear to be a promising alternative for cryptology since it is more difficult to correlate leaking syndromes to data flowing in a secured design in absence of a clock signal. Indeed, it exhibits significant timing and power consumption variations depending on input applied data. Moreover, with a standard cell approach, QDI physical implementation of required Boolean functions is sub optimal. With our novel design methodology, QDI circuits exhibit better security properties than in former works. This is explained by the equalization of the number of stages used to implement both rails without adding any extra gate.
U V A0
I1
S0
I1
V U B1 A1
I0
S0
A0 U A1 B0 V B1
S1
B0 A0
A1 B1
S1
B0
U V A1 B0
I0
S1
I0
V U B1 A0
I1
S1
S1
B0 A1
A0 B1
S1
DCIS 2004
 164 
Summary
The efficiency of cellbased design synthesis of high performance circuit is strongly dependent on the content of the library. Great effort has been given in the design of libraries, to define the optimal selection of the logic gate drive strength. But few justifications are available to determine the P/N width ratio of each cell. The relative merits of different cell libraries can be evaluated in terms of area/power necessary in achieving a particular delay for implementing a specific circuit. For that important effort has been devoted to supply highperformance standard cell libraries. Work has been devoted to define the optimal content of the library as well as for determining the best selection of drives. A fluid cell approach is emerging, in which a cell generation tool is used to create a discrete library with 10 to 25 drive strengths and 1 to 4 different P/N width transistor ratios. The question arise to know if it is possible to define an optimal value for the gate transistor ratio allowing to implement a CMOS logic circuit with the best delay/power tradeoff. Recent work has been published on P/N transistor width selection, based on an asymmetric implementation of the gate rise and fall delays. Considering that, for an array of inverters, the minimum delay can be obtained using asymmetric edges, a first order gate delay model has been used to determine an optimal transistor width ratio for each gate. This has been obtained by minimizing the average of the rising and falling delays to obtain an optimal solution in which the Nand gates are oversized the and the NOR gates undersized, with respect to that of an inverter. In fact on a logical path a separate consideration of the falling and rising edges must be given. In this case it can be shown that, considering only the critical edge, the fastest solution is obtained for an inverter implementation with balanced fall and rise delays. Moreover, for gates great attention must be given to the modeling of the transistor serial array current and to the critical input to be considered. On a non critical path the minimum gate area solution can be obtained with unbalanced edges, resulting from identical equal N and P transistor sizes. We want to demonstrate, here, that on a critical path performance constraint satisfaction may result in transistor oversizing and extra power consuming, if no care is given to the balancing of the gate rise and fall delays. In this paper we use an extension of the logical effort model to characterize the dissymmetry of gate delay. The delay model is developed around the logical effort, but with an explicit consideration of the input ramp and Miller effects. This model explicitly captures the sensitivity of the delay to the gate structure and P/N width ratio. We propose a method for determining the P/N transistor width ratio for implementing high performance library cells. We have defined the explicit expression of the P/N width ratio, which is shown to be loading factor and structure dependent. This P/N width ratio is shown to allow a path minimum area implementation under delay constraint. Validations have been obtained, with respect to Spice simulations on a 0.18m process, by comparing, on different benchmarks, simulated values of the delay using different P/N width ratio strategies. We obtain clear evidence that imposing on a logic path equal rise and fall gate delay, results in a high performance implementation for the best area delay tradeoff.
DCIS 2004
 165 
One of the main difficult task designing security devices is to protect from the so called side channel attacks. These attacks take advantage of the correlation that can exist between internal computation and the side channel information provided by the considered device. They are of particular concern, since they are not invasive (i.e.: do not attempt to the integrity of the device) and can be quickly setup at relatively low cost. Among them, one of the most efficiently methods, is the Differential Power Analysis (DPA), introduced by Paul Kocher in 1998. This attack exploits the existing dependence between the data processed and the profile of the current consumed by the chip. Several hardware countermeasures have been proposed against this attack and the asynchronous design technology is presented as a good alternative for reducing the current signature locally. In fact, as it has already been reported, asynchronous circuits are likely to improve chip security. The properties of QDI asynchronous circuits are exploited, particularly the use of the 1ofN encoded data scheme and the fourphase handshake protocol. This paper presents the first concrete results of Differential Power Analysis applied on secured Quasi Delay Insensitive asynchronous logic. We demonstrated on this work by measuring and by quantifying in real chip the benefits brought by the QDI asynchronous logic. For doing so, three different DES circuits have been designed and fabricated: two in asynchronous technology and one in synchronous to be used as a reference. The contribution of this paper is focused on two aspects: the definition of a DPA resistance criterion used to compare different designs, and the evaluation of QDI asynchronous logic as a counter measure to DPA. The concrete results presented in this paper demonstrate that QDI asynchronous logic significantly improve the DPA resistance. This study also enabled us to identify some limits i.e. residual sources of leakage, that will be addressed in future works.
DCIS 2004
 166 
Four phase alternating latches clocking scheme for CMOS sequential circuits
David G. M. Manuel J. B., Jorge J. Ch., Alejandro M., Paulino R. de Clavijo and Enrique O. A., Departamento de Tecnologa Electrnica de la Universidad de Sevilla/ Instituto de Microelectrnica de Sevilla, Centro Nacional de Microelectrnica, Spain {guerre,bellido,jjchico,amillan,pruiz,ostua}@dte.us.es
HE
evolution in the VLSI digital circuits design makes it mandatory to pay special attention to
the clocking scheme used to implement the system and to the clock generation and distribution
over the full system. While the gate size and, as a consequence, the gate delay is getting smaller, the die size is rising. Since the delay in interconnection lines increases quadratically with the line length, it becomes longer than gate delay. Because of that the skew increases significantly. The authors present a generalization of the Parallel Alternating Latches Clocking Scheme that uses separated clock signals to control the load and output enable operation of the latches. This increases the operation speed of the system without losing clockskew tolerance. Also, the fact that the clock frequency is a half of the data rate reduces the switching noise and the power consumption of the clock distribution network.
DCIS 2004
 167 
The upcoming of SoC systems with several cores running with independent clock domains, based on the Globally Asynchronous Locally Synchronous paradigm (GALS), makes the transfer of clock domain crossing data a very common problem. In order to solve this problem several techniques has been used, however most of them are complex or require expensive elements such as DLLs or memories. In this paper a very simple unit is presented to perform this operation. This unit has been demonstrated running on a physical implementation done over a FPGA, with a 100% of efficiency.
DCIS 2004
 168 
synchronous and synchronous circuits have their own advantages and drawbacks. A new
approach towards a controller architecture saving some of the respective strong sides is presented.
Asynchronous Wave Pipelining (AWP) performs the computation of independent state machines. Nextstate and output combinational structure hold several data waves, each representing the computation of the respective state maschine. Controllers with Asynchronous Wave Pipelines hold the reduced average latency of Huffman Circuits together with their often reduced power consumption but without the bounds of state coding of asynchronous circuits. AWPs have no need for global clock signal distribution nets and storage elements such as synchronous state machines. Proposed structure of switching circuits Data paths using Wave Pipelining ask for controllers with an adequate performance. An approach towards sequential circuits using Asynchronous Wave Pipelining was proposed. Constraints imposed by a gate structure supporting wave pipelining and three conclusions on state machines using wave pipelining have been stated. A solution following those statements and thereby solving the problem of synchronizing waves which are carrying either input data or state information was proposed and tested for an example AWP containing waves of two independent state machines. The use of the hazardfree switching technique SRCMOS beside the mechanism of synchronization used within a sequential circuit results in a racefree implementation of asynchronous state machines. The latter bases on the fact that a single local anisochronous signal is used to synchronize plesiochronous signals in the Huffman feedback loop.
DCIS 2004
 169 
Session 7c
J.M. Portal1
2
L2MPPolytechUMR CNRS 6137 IMT  Technople de Chteau Gombert 13451 Marseille Cedex 20, France JeanMichel.Portal@polytech.univmrs.fr Tel:(33)491054787
lash memories have become over the last few years very relevant choices for any application
requiring nonvolatile semiconductor memory. The objective of this paper is to study the impact of Flash cell geometry on the stored value. To do so, a Design Of Experiment (DOE) approach has been used, giving the variations of the threshold voltages of a memory cell in function of geometric parameters. The inputs of this DOE are transient and static electrical simulations of Flash cell. The simulation model is based on a MOS Model 9 transistor coupled with the charge neutrality expression in the cell. The outputs of the DOE are a set of equations describing the evolution of the threshold voltage of a virgin, an erased and a written cell. The threshold voltage VT equation is given in a general way as follows where Xi and Xj are the geometrical parameters and b0, bi, bij are the model coefficients:
VT b0 bi X i bii X i X i bij X i X j
i i ij
The sensibility of the threshold voltage to the geometric parameters is then discussed from the results obtained with this set of equations. As represented Figure 1 for example, this kind of equation permit to study the sensibility with the interaction all the parameters.
VTvirgin (V)
( m
(A) no To
T ox
(A)
DCIS 2004
 171 
( m
pp
V Terase (V)
Modeling the Influence of Time Skew on Crosstalk Induced Delay in Submicron CMOS technologies
Jos Luis Rossell and Jaume Segura Physics Department, Universitat de les Illes Balears, Palma de Mallorca, Balears (Spain), email:{j.rossello;jaume.segura}@uib.es
HE influence of time skew between adjacent signals on crosstalk delay is a complex nonlinear
problem without analytical solution that can only be solved exactly using numerical procedures.
Some previous works on crosstalk delay only take into account the worst case1 since the inclusion of a nonzero time skew is a much more complex problem. Recent works consider the time skew between signals2 but the gate model used (the traditional resistive model) is too simplistic and no closed form expression for crosstalk is provided. The model presented in this work relates crosstalk delay with time skew using a chargebased description of the propagation delay of CMOS gates3. This chargebased propagation delay model is found to be a very useful and accurate tool for the timing description of deepsubmicron CMOS ICs. Crosstalk delay is modeled computing the additional charge that is transferred through the circuit due to the coupling between gates. This additional charge is translated to an increment in the propagation delay (that increases as the charge to be transferred increases). The influence of time skew between the victim and aggressor inputs is traduced to a variation in the additional crosstalkinduced charge (that is accurately translated to a delay variation of the gates). The model provides an intuitive description of crosstalk delay showing very good agreement with HSPICE simulations for a 0.18Pm technology}
P.D.Gross, R. Arunachalam, K.Rajagopal and L.T.Pileggi, ''Determination of worstcase aggressor alignment for delay calculation,'' in Proc. Int. Conf. ComputerAided Design (ICCAD), 1998, pp. 212219
W.Y. Chen, S.K. Gupta, and M.A. Breuer, ''Analytical Models for Crosstalk Excitation and Propagation in VLSI Circuits,'' IEEE Transactions on ComputerAided Design, Vol 21, no 10, pp. 11171131, October 2002 J.L. Rossell and J Segura, "An analytical chargebased compact delay model for submicron CMOS inverters" IEEE Transactions on Circuits and Systems I. Vol 51, no. 7, pp. 13011311, July 2004
3
DCIS 2004
 172 
Due to the decrease in transistor size of CMOS circuits and to the line aspect ratio increase, onchip interconnect resistance must be accounted for during timing analysis. This imposes to separate the path delay into gate and interconnect delay. Usually cells are characterized into tables or with equations that represent delay and output slew (transition time) as a function of load capacitance and input slew. The handling of interconnect resistance as been resolved by determining an effective capacitance loading, defined as the capacitance producing the same gate delay as the considered RC load. The most widely used interconnect delay metric is the Elmore delay metric applied to a lumped interconnect model. Despite its lack of accuracy, this metric has numerous advantages such as to be expressible as a closedform equation, to constitute an upper bound of the delay and mostly to be additive: the delay on a path is the sum of the delays at the different nodes of the path. The accuracy of this model is not sufficient for actual processes since it does not capture the delay sensitivity to the transition time of the edge controlling the interconnect line and ignores the gate reduction delay induced by the resistive shielding of downstream capacitance. Extending the Elmore's based work, several authors have proposed metrics based on higher circuit moments. While more accurate these approaches do not conserve the simplicity of the Elmore delay. These metrics, using multiple moments of the transfer function, completely lost the additive property of the simple Elmore delay. As a result they can not be used for optimization such as buffer insertion or wire sizing. The operating mode of an interconnect line is to propagate a signal from the output of a transmitter (line input driver) to the input of the receiver (line output driver). The delay across an interconnect line is obtained as the sum of the contributions of the input driver, the line and the output driver. The important parameters, for the corresponding path, are the total delay value between the transmitter input and the receiver output, together with the transition time value of the signal at the receiver output. The accurate determination of this output transition time is of fundamental importance, it has a nonnegligible contribution to the delay of the subsequent gate. These parameter values depend on the design of the drivers, their loading and the structure of the wiring. The primary contribution of this work is to propose a complete modeling of the resistancecapacitance (RC) effect on the delay of an interconnect link between two inverters. Considering a previously developed deep submicrometer model, in which the inverter (gate) is considered as a current generator, we have obtained an analytical expression allowing to estimate the line loading effect on the transition time and the delay of the input and output drivers. This results in a simple but accurate closedform equation for estimating the delay in RC interconnect, while conserving the complexity level and the additive property of the Elmore delay. We establish the limiting conditions in using purely capacitive or RC representation of interconnect wire. Load shielding of the input inverter has been captured with respect to the input driveline transition time ratio, which can be used as an efficient metric for considering shielding effects. The fundamental idea for developing this model is to define a metric for characterizing the resistive shielding of line capacitance. The main difference with other approaches is to evaluate the delay and the transition time of each driver, considered as a current generator, according to their structure, size and effective load.
Validations with respect to transmission line simulations (ELDO) on 130nm process, have demonstrated the potential application of this model in estimating the RC interconnect impact on circuit performance. Applications to be considered in using this analytical model are in driver selection and in line repeater insertion.
DCIS 2004
 173 
AbstractThe impact of substrate with deep trenches on dc and ac HBT electrical characteristics has been studied using physical simulation. A SPICE geometry scalable model is proposed and compared with physical simulation results. Then, the model is implemented in a scalable HBT model based on HICUM level 0 (L0). The proposed model is applied on measurement and shows good agreement with the electrical characteristics as a function of the device emitter geometry.
DCIS 2004
 174 
irect determination of some important MOSFET parameters from experimental measurements involves the evaluation of auxiliary functions that include, in many methods, the use of
derivatives. The numerical processes implicated introduce great deals of noise that difficult determining them. Noise reduction filtering allows cleaning the derivative plots. We have demonstrated a filtering method, and rules to find the filter parameters have been given. Results obtained with transistors of submicron technologies show that the method is quite insensitive to the filter parameters. This method has been successfully applied to extract two different key parameters (threshold voltage as depicted in Figure 1, and saturation voltage, as shown in Figure 2) in devices from two different submicron technologies (ES2 0.7 m and AMS 0.35 m). The values extracted with this method compare well with the ones obtained using other methods not based on derivatives. Our procedure combines the advantage of being physicallybased with the noise immunity of the most recent extraction methods, not based on derivatives, without requiring costly computational effort.
AMS 0.35mm technology T=300K W=10mm
Vgs=2.97V
Ge (V 1)
Vgs=1.98V Vgs=0.99V
Vds (V)
Figure 1. The position of the maximum determines the value of the threshold voltage (VTH). The figure shows unfiltered (crosses) and filtered (circles) second derivative of measured IDS vs VGS data.
Figure 2. The position of the maximum of the depicted function (G function) versus VDS determines the value of the saturation voltage (VDSAT)
DCIS 2004
 175 
Impact of Deterministic WithinDie Variation on the Circuit Performance in Nanoscale Semiconductor Manufacturing
Munkang Choi, SeyedAbdollah Aftabjahani, Cheng Jia, and Linda Milor School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia/U.S.A., gtg233c@mail.gatech.edu
s semiconductor technology advances into the nanoscale era and more functional blocks are added into systems on chip (SoC), the interface between circuit design and manufacturing is becoming
blurred. An increasing number of features, traditionally ignored by designers are influencing both circuit performance and yield. As a result, design tools need to incorporate new factors. One important source of circuit performance degradation comes from deterministic withindie variation such as lithography and Cu interconnect chemical mechanical polishing (CMP), so we have established a methodology to consider systematic and deterministic variation from the proximity effect, lens aberrations, flare, and CMP in circuit delay analysis and be able to reduce lithographic correction work. Our methodology involves labeling the cell name by information for lens aberrations and flare. Neighborhood information relating to the proximity effect is reflected in HSPICE files of cells by tags on the transistor names. Cell transistor netlists with the modified gate poly geometries are combined with interconnect RC netlists with modified metal geometries to generate revised cell HSPICE netlists, which are used to determine revised delay tables. The tool has been applied to analyze delays of ISCAS85 benchmark circuits in the presence of imperfect lithography and CMP variation. First, the relationship among speed, leakage current, the minimum CD, and lithography imperfections was extracted. After we control leakage current, delay is affected by lens aberrations and flare, which are less likely to be averaged, but not the other effects. Second, it is revealed that interconnect systematic variation results in a much smaller impact on delay than gate poly variation since variations in interconnect resistance and capacitance partially cancel each other. Third, it is found that delay variation, caused by the underlayer effect, is comparable to that of the samelayer effect.
DCIS 2004
 176 
Session 7d
INEAR regulators are mandatory in the power management of systemsonchip for biomedical
applications. Biomedical signals which are used the most, like EEG (electroencephalogram), ECG
(electrocardiogram) or EMG (electromyogram), have voltages in the V range and require analog frontends which are incompatible with the high noise and ripple levels of typical switching power supplies. This paper presents a 0.8 V capacitorfree CMOS lowdropout regulator (LDO) for biomedical systems. Such low power supply voltage operation is obtained by using a novel voltage reference, avoiding transistor stacking and maintaining the circuit as simple as possible (see Fig. 1). This strategy allows a low static current consumption, easy frequency compensation (as only 2 gain stages are present), reduced silicon area and zero pin count, given that no external component is required for correct operation (compensation capacitors have small values). The voltage reference is based upon1, but the use of a lowvoltage current mirror, resulted in a 0.2 V reduction in the minimum power supply. The proposed LDO has been implemented in a commercial 0.5 m CMOS, threemetal, doublepoly technology, occupying 0.127 mm2. The maximum supply current is 7.5 A, and the line/load regulation is better than 1 % for an output current of 5 mA.
Figure 1. Schematic of the proposed LDO voltage regulator (Cc1 and Cc2 are integrated compensation capacitors) Eric Vittoz and Jan Fellrath, CMOS analog integrated circuits based on weak inversion operation, IEEE J. SolidState Circuits, vol. 12, pp. 224231, June. 1977.
1
DCIS 2004
 178 
We present a new configuration of CMOS subthreshold operational transconductance amplifier (OTA) for lowpower, lowvoltage and lowfrequency applications. To overcome the problem of narrow linear range due to subthreshold operation (less than 15 mV for 1% of transconductance variation in case of a conventional differential pair), we employ floating gate MOS (FGMOS) transistors1, which allow low voltage and provide wider range with lower gm. Furthermore, we implement a novel linearization technique consisting of cancellation of cubic distortion term, which further extends the linear range. This technique can also be applied to a conventional MOS pair, but with the use of FGMOS devices, its implementation is facilitated. The proposed linearized OTA with subthreshold FGMOS transistors is designed in a 0.8m CMOS process (AMS). For demonstration a monolithic GmC, continuoustime, lowpass, secondorder filter is built by using the proposed linearized OTA. Simulation results of the filter show achievements of 76dB linearity for a fully balanced input dynamic range up to 1 Vpp at 1.5V supply voltage. For a tuning range between 10 and 100 Hz, the power consumption of the filter remains lower than 2 W. Secondorder effects such as that of FGMOS's parasitic capacitance are analyzed.
Vdd
M6 M4 M5 M7
M'6
M'7
Vp
Vdd
M2
I2
I3
C2 C1/2
M3
I1
C2
M1
C2 C1
Iout
VVbias
M'8
C1/2
C1
V+
Ib
M0 M'9
Vn
M8 M9
P. R. Gray, P. J. Hurst, S. H. Lewis and R. G. Meyer, analysis and design of analog integrated circuits, 4th ed. New York: Wiley, 2001.
DCIS 2004
 179 
A Low Voltage I/O Interface for High Speed Buses in GaAs Technology
R. EsperChan, F. Tobajas, R. Sarmiento Instituto de Microelectronica Aplicada, Universidad de Las Palmas de Gran Canaria Campus Universitario de Tafira, 35017, Las Palmas, Spain fesper,tobajas,robertog@iuma.ulpgc.es
intrasystem interfaces where both latency and bandwidth are important, highspeed buses have
been adopted as the most effective solution. In this paper, lowvoltage I/O circuits for highspeed buses in GaAs technology are presented. Operation above 600Mbps was demonstrated from experimental measurements performed on different test chips.
DCIS 2004
 180 
New low Voltage ClassAB CMOS Unity Gain Buffer and Current Mirror
A. Torralba, R. G.Carvajal, M.Jimnez, F. Muoz, J. RamrezAngulo
C LASSAB
circuits, which are able to deal with currents which are orders of magnitude larger than their quiescent current, are good candidates for lowpower analog design. This paper presents a new, simple, lowvoltage class AB unitygain buffer, based on the Flipped Voltage Follower cell. This buffer can be used in many applications, and a new lowvoltage classAB current mirror based on the proposed buffer is also presented. Simulation and experimental results are provided.
DCIS 2004
 181 
MilindSubhash Sawant1, Shanta Thoutam1,4 , Jaime RamrezAngulo1, Antonio. J. LpezMartn2 and Ramon G. Carvajal3 1 Klipsch School of Electrical and Computer Eng., New Mexico State University, Las Cruces, NM 2 Dept. of Electrical and Electronic Engineering, Public University of Navarra, Pamplona (Spain), 3 Escuela Superior de Ingenieros, Universidad de Sevilla, (Spain), 4 Freescale Semiconductor Inc. (Motorola) Austin, TX
New LowVoltage Class AB/AB CMOS OpAmp with RailtoRail Input/Output Swing
new lowvoltage CMOS class AB/AB fully differential opamp with railtorail input/output
swing and supply voltage of less than two VGS drops is presented. The scheme is based on
combining floating gate transistors, and class AB input and output stages. The opamp is characterized by very low static power consumption and enhanced slewrate. Moreover the proposed opamp does not suffer from typical reliability problems related with initial charge trapped in the floating gates devices. Simulation and experimental results in 0.5m CMOS technology verify the scheme operating with 1.8V single supply and close to rail to rail input and output swing.
DCIS 2004
 182 
Amplifiers (OTAs) is described. It is based on the combination of adaptive biasing techniques and
resistive local commonmode feedback (LCMFB), which provides increased dynamic current boosting and gainbandwidth product (GBW). Various adaptive biasing schemes are combined with LCMFB, leading to different classAB OTA topologies, shown in Figure 1. A 0.5Pm CMOS implementation of three different OTAs based on this technique shows enhancement factors of slewrate and GBW of up to 280 and 3.6 respectively for an 80pF load compared to a conventional class A OTA with the same quiescent currents and supply voltage, with little overhead in silicon area, noise, and static power consumption. The circuits can find application in lowvoltage lowpower switchedcapacitor circuits and in buffers for testing mixedsignal circuits.
M7 M5 M3 M6 M4
M8
M7 M5 M3 M6 M4
M8
VOUT
VOUT
IBIAS
VINM M9 M1 R1 R2 M2
IBIAS
VINM M12 M9
IBIAS
VINP M1 R1 R2 M2
IBIAS
VINP M12
M10 (a)
M11
M10 (b)
M11
M7
M5 R M17 3 M3 M15
M18 R M6 4 M16 M4
M8
M7
M5 R 3 M3
R4
M6 M4
M8
+ 
IBIAS
IBIAS
M13 M14
IBIAS
IBIAS
VOUT
IBIAS
M13
IBIAS
VOUT
VINM9
M1 R1
IBIAS M2
R2
VIN+ M12 M9
VIN
M1
R1
R2
M2
VIN+ M12
M10
M11
M10
M11 (d)
(c)
DCIS 2004
 183 
Session 8a
Continuoustime 6' modulator with exponential feedback for reduced jitter sensitivity
J. San Pablo, D. Bisbal, L. Quintanilla, J. Arias, L. Enrquez, J. Barbolla Department of E. y Electrnica, E.T. S. I. Telecomunicacin, Campus Miguel Delibes, Universidad de Valladolid, 47011Valladolid, Spain, jacinto@ele.uva.es
Aanalysed and designed in a 0.35 Pm CMOS technology. The complete modulator has been
currentmode continuoustime Sigma Delta modulator with reduced jitter sensitivity has been
implemented following a currentmode approach and some characteristics are: integrators blocks present inherent low input impedance without the need of feedback, good stability and no necessity for commonmode circuit. To our knowledge, there is not any reference in literature of previous CT 6' modulators implemented completely in current mode. Timing error associated to clock jitter introduces noise into the inband spectrum. The tolerable level of clock jitter decreases with increasing the oversampling ratio, and eventually jitter noise power will exceed quantization noise power. In addition to that, clock jitter affects Return to Zero/Half Return to Zero modulators which show a reduced sensitivity to loop delay and, thus, are used to implement practical Continuous Time Sigma Delta modulators more severely than modulators employing Non Return to Zero feedback. Thus, for high speed and/or high accuracy converters, this represents a serious challenge to chip designers. Jitter rejection improvement was achieved by using an exponential decaying feedback current. This reduction has been achieved using exponentialfeedback waveforms generated by a switched capacitorbased DAC (Fig. 1). Simulations show a substantial improvement on jitter rejection with respect to the conventional rectangularfeedback DAC. Functional and transistorlevel simulations have been carried out and the corresponding results are presented. A dynamic range of 67 dB (resolution of 10.8 bits), has been achieved for a second order modulator with an oversampling ratio of 64 and 1 MHz of bandwidth (sampling frequency of 128 MHz). In addition to distortion, quantization noise, clock jitter, thermal, and flicker noise have also been considered. It consumes a power of 2.9 mW at a supply voltage of 2.5 V.
Iop R Ion R
Vref+
rzp prech
rzn
rzn
rzp prech
Vref
Fig. 1. HRZ exponential DAC. Differential exponential feedback waveforms have been included.
DCIS 2004
 185 
Institut de Microelectrnica de BarcelonaCNM, Spain Barcelona Branch Office, Epson Europe Electronics, GmbH
paper presents a detailed study of the returntozero (RTZ) feedback code incidence THIS in continuoustime modulators, including a highlevel modelisation for simulation. Such results derived from the analytical system analysis can be applied to design continuoustime modulators with an arbitrary feedback waveform while the behavioural model explained, is a useful tool to speed up highlevel simulations when a RTZ code is used, increasing about ten times simulation speed. Simulation results on a fourorder, logdomain modulator are presented to demonstrate the validity of the proposed models (Fig. 1).
Fig. 1 PSD obtained from the Matlab model, with RTZ feedback code simulation (left) and with the new proposed equivalent model (right)
DCIS 2004
 186 
HIS paper evaluates two techniques to improve the linearity of the main feedback D/A converter in MultiBit ContinuousTime SigmaDelta modulators (CTSDM). A SelfCalibrated Current
Steering (SCCS) implementation of the D/A converter is compared to the usage of a Data Weighted Averaging (DWA) algorithm on the selection of uncalibrated D/Aelements1. A testchip including the two different solutions is presented and measurement results are compared. The test chip implements a 4th order 4bit CTSDM in 0.13Pm. It has an analog bandwidth of 15MHz and an OSR of 10. The circuit implementation of the DWA algorithm is composed by a shiftingblock and a pointercalculator. The SCCSD/A uses a continuous dynamic background calibration of the current sources2. Fig. 1 shows the measured output power spectrum of the modulator using both SCCS D/A and DWA D/A. It is shown that the use of SCCS D/A has better SNR. The DWA introduces some excess loopdelay that reduces the achievable dynamic range. The latency of the shiftingblock is the main contributor to the added delay.
Figure 1. Measured FFT of the two modulators Steven R. Norsworthy, Richard Schreier, Gabor C. Temes, DeltaSigma Data Converters: Theory, Design and Simulation, WileyIEEE Press, 1996 2 D. W. J. Groeneveld, H. J. Schouwenaars, H. A. H. Termeer, C. A. A. Bastiaansen, A SelfCalibration Technique for Monolithic HighResolution D/A Converters, IEEE J. SolidState Circuits, vol. 24, Dec. 1989
1
DCIS 2004
 187 
Discrete Invariant Set Algorithm for Sigma Delta Modulators Dynamics Analysis
D. Camarero de la Rosa, VT. Nguyen, J.F. Naviner, P. Loumeau ENST Paris, 46, rue Barrault, 75634 Paris, Cedex 13, France. Email : camarero@enst.fr
his paper presents a new tool to study (sigma delta) modulator dynamics. We call dynamics
the behavior of the integrator states that constitute these modulators. The invariant set concept has
already been used in order to prove stability1. The algorithm presented here also uses the invariant set concept, but exploited in a different way in order to predict state bounds as exact as desired. Studied inputs are constant inputs with additive noise whose maximal amplitude is finite. The main idea is to approximate the input signal rather than the modulator. So, an input of real numbers with a limited swing is approximated by a finite set of rational numbers, spaced by a distance . Under an input of this nature, the only possible states are confined into a rational grid if the integrators gains are not irrational numbers. This property is exploited as described in the paper to find an invariant set of states associated to the approximated input. When 0, the approximated input becomes closer and closer to the input of real numbers. The observed states bounds also seem to converge on a limit value: the states bounds really associated to desired input. The predicted bounds obtained in this way seem to be more accurate than others in the bibliography.
R. Schreier, M. Goodson, B. Zhang, An Algorithm for Computing Convex Positively Invariant Sets for DeltaSigma Modulators, IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 44, no. 1, January 1997.
DCIS 2004
 188 
Session 8b
Digital Test
Friday nov. 26 10h30 11h30, St Emilion Room Chairs Mar Martinez (U. de Cantabria) Jos Silva Matos (INESCPorto)
HIS article proposes a technique to optimize test patterns targeting Analogue and MixedSignal
(AMS) cores in SystemOnChip (SoC) devices. In order to render the test of these cores
compatible with the test of digital ones and with the use of lowcost testers, the analogue test patterns are digitally coded. They can then be scanned into the chip where they are easily converted into the required analogue test patterns. A ComputerAided Test (CAT) tool is used to optimize the digital coding of the desired analogue test signal. Several multiobjective optimization algorithms are considered to carry out this task, including MonteCarlo, W.A.R.G.A (Weighted Average Ranking Genetic Algorithm) and N.S.G.A (Nondominated Sorting Genetic Algorithm). The results obtained with these different algorithms are illustrated and compared by using different analogue test signals.
ADev
4 5
7 23
24
25
26
27
SFDR
28
29
30
31
32
DCIS 2004
 190 
Improving the Efficiency of Arithmetic BIST by Combining Targeted and General Purpose Patterns
S. Manich, L. Garca, L. Balado, J. Rius, R. Rodrguez, J. Figueras Universitat Politcnica de Catalunya Diagonal, 647, P9, 08028 Barcelona, SPAIN manich@eel.upc.es
RITHMETIC additive test pattern generators (AdTPGs) are now being proposed as an alternative to linear feedback shift registers (LFSRs) because of their reduced area overhead impact. AdTPGs
allow existing internal datapaths (Figure 1) to be reused to perform the excitation and observation of potential faults in the circuit without a penalty in the circuit area. As in the case of LFSRs, the compactness of the information required in memory and the test generation time needed to achieve a specified fault coverage level (FC) have a significant impact on the necessary resources and thus on the quality of the test. In this paper, a new approach to the preparation of the test is proposed. It has been observed that, if two different and independent preparation methodologies, TPP (Target Purpose Pattern) and GPP (General Purpose Pattern), are combined, a better quality of the test is obtained than that achieved by applying them separately. These two types of strategies have their own advantages, focused on the two different types of faults. TPP is suited for random pattern resistance (rpr) faults while GPP is suited for nonrpr faults generating patterns with low memory impact. It has been shown how to combine these strategies in order to reinforce the efficiency of the complete test. ISCAS benchmark circuits have been selected to compare the results with other existing methodologies. Experiments show that the combined strategy, named LUCSAM+, improves previously published results. If the same fault coverage is assumed, the average reduction in memory is 27% and the average reduction in number of test vectors is 44%.
DATAPATH Increment
Triplet k
Adder
Accumulator
Test Vectors
Figure 1.  Datapath structures in the IC are reused to implement AdTPGs. The area overhead impact is reduced.
DCIS 2004
 191 
Automatic Verification of RTLevel Microprocessor Cores Using Behavioral Specifications: a Case Study
E. Sanchez*, M. Sonza Reorda*, G. Squillero*, R. Velazco Politecnico di Torino, Dip. Automatica e Informatica, Torino, Italy, {edgar.sanchez, matteo.sonzareorda, giovanni.squillero}@polito.it TIMACMP Laboratory, Grenoble, France, Raoul.Velazco@imag.fr
*
The massive diffusion of systemonachip (SoC) and custom cores make processor obsolescence an increasingly complex problem for industrial embedded computer users. Obsolescence of electronic components affects many safety critical applications, active years longer than it was originally anticipated. The current industrial practice is to replace some of the obsolete parts with more modern ones, redesign small portions of the SoCs or of the application specific integrated circuits (ASICs) to meet the new requirements. Sometimes, the obsolete processor is reimplemented by means of programmable logic devices such as FPGAs. However, verifying that the new version is equivalent to the original hardware one is a harder task. The paper tackles this problem, proposing an almost fully automatic methodology to check the correctness of a customized microprocessor core by comparing it to a reference model. A reference model can almost always assumed available: it could be a highlevel instruction set simulator (ISS), the FPGA implementation of the original core, or even the original design. The reference model is used as a black box to compare behavior, while the process is driven by the knowledge of the customized version alone. The internal description of the original core is not exploited.
test program
GP
fitness
RTLevel Simulator
Behavioral Simulator
behavior
Comparator
The proposed methodology exploits an evolutionary technique called GP for automatically generating assembly programs. These test programs are used two times: first they are simulated by a standard RTlevel simulator and by an external tool. The expected behavior is eventually compared with the one extracted from the RTlevel simulator. The reference model has very few requirements, it does not even need to be cycleaccurate, and it used as a mere blackbox. The approach was tested against a customized version of an 68hc11, where a limited number of unnecessary functionalities were removed. The Motorola 68hc11 is a striking example of small microprocessor core that is exploited in a wide range of custom applications, and several industries modified the original design, implementing it as an FPGA to add specific features, or remove unneeded ones. The proposed methodology was able to devise a compact set of 24 short test programs uncovering 4 potential problems.
DCIS 2004
 192 
Solving the State Justification Problem using MILP for RTL Specifications
H. Navarro, Juan A. MontielNelson, J. Sosa & Jos C. Garca IUMA, Institute for Applied Microelectronics. Integrated Systems Design Division Department of Electronic Engineering and Automation University of Las Palmas de Gran Canaria, Las Palmas, E35017, Spain. {hnavarro, montiel, jsosa, jcgarcia}@iuma.ulpgc.es
HE satisfiability problem (SAT) for RTL descriptions has many direct applications in the
electronic design automation (EDA) arena. The majority of industrial hardware verification tools
uses bitlevel decision procedures, like SAT or BDDbased techniques. Unfortunately, these approaches are not efficient enough, because they do not inherit the wordlevel information from the RTL design. Most recent approaches to the SAT problem1 are addressed to RTL designs containing instances of wordlevel arithmetic blocks and bitlevel Boolean logic. They transform the whole SAT problem into a mixed integer linear program (MILP) that must be solved externally by a MILP solver. A complete solution of the SAT problem for RTL descriptions must also provide support for both, wordlevel operators in the data flow, and finite state machines (FSMs) for control flow. As the output of an FSM depends on input values and its actual state, to satisfy an output condition on a FSM involves solving a state justification problem, i.e. to find a right sequence of states and input values. In such problems, there exist an optimal state sequence that requires a minimum number of clock cycles. This paper presents a new approach that automatically solves in a single step, the optimum input sequence applicable to a given RTL description to reach a desired state. This is accomplished by applying a novel timeframe expansion method for state justification that guarantees an optimized solution and avoids performing timeframe expansions iteratively. Experimental results demonstrate that the proposed methodology can solve any state justification problem in one step for complex FSMs. For this purpose, a basic processor with a reduced set of arithmetic and logical instructions was selected as a generic FSM. The state justification problem lies in choosing the best sequence of instructions to establish a desired value into the accumulator, being zero its initial value.
Z. Zeng, P. Kalla and M. Ciesielski, LPSAT: A Unified Approach to RTL Satisfiability, Proc. DATE01, p398402, 2001.
DCIS 2004
 193 
Session 8c
IPbased design
Friday nov. 26 10h30 11h30, Auditorium Chairs Patrick Garda (U. Paris 6) Juan Carlos Lopez (U. de CastillaLa Mancha)
implemented over FPGAs are becoming a problem with the expansion of this design strategy. In
this paper, a new procedure for hiding a digital sign to provide Intellectual Property Protection (IPP) of circuits based on the Residue Number System (RNS) to be implemented over FPL devices is presented. The aim is to protect the author rights in the development and distribution of reusable modules by means of an electronic signature embedded in the FPGA design. The procedure described, is oriented to circuits based on the RNS, but can be easily extended to any system to be implemented on FPGAs. As an example, a CICRNS filter, a 128bits signature identifying both the origin and the recipient of the design is embedded. The proposed structure allows the protection of the filter without any penalties in performance circuit. Design examples were implemented using the Virtex2 family devices of Xilinx. Table 1 compares CIC filter with the CIC signed filter proposed in this paper, both filters built in RNSFPL. The table shows the speed grade, the number of SLICEs, the maximum frequency, the area increase and the speed reduction for the RNSbased study filters. The analysis of the results shows that the area increase for the signed filters is 25 SLICEs (5.95%). Nevertheless this increase is a fixed quantity, and in RNS circuits of major area, as the wavelet transform, the additional percentage of CLBs to the signature embedding and extraction would suppose a minor percentage that the obtained in the example. In addition, the table shows that there is no penalizations in performance.
DCIS 2004
 195 
ncreasing system complexity and tight design deadlines due to time to market imperatives mean that the designers have to achieve the impossible, on a daily basis. Today, virtual components can be designed to be portable from an architecture to another and protect the investment of making them reusable. This paper presents our work on a SoC prototyping methodology focusing on the hardware design. This methodology is based on automatic high level modelled IP (Intellectual Property) integration. We also describe our IP interconnection methodology that allows a rapid integration of communication between the functional IPs through a custom SystemC channel library. IP integration is performed by standardized IP interface synthesis (VCI/OCP standard). The interface synthesis allows heterogeneous IPs to communicate in a plugandplay fashion in the same system.
DCIS 2004
 196 
Cadence Design Systems, Valbonne, France, omedes@cadence.com LIRMM, UMR CNRS/Universit de Montpellier II, Montpellier, France, robert@lirmm.fr 3 ESEO, Angers, France, mohamed.ramdani@eseo.fr
NY physical synthesis solution has limitations on the size of circuits that can be handled in a
single run. Divide and conquer approaches, or hierarchical flows, have been introduced to
overcome these limitations. In these flows, chips are subdivided into smaller blocks. In order to drive optimizations of these blocks, blocks IO constraints have to be computed. Thats what a block constraints budgeting algorithm aims to do. This paper introduces a new block constraints budgeting that speeds up timing closure in timingdriven hierarchical flows. Existing block budgeting approaches show two main issues. First, they do not take into account the design logic flexibility; that is to say optimization possibilities such as buffering, resizing, restructuring, Second, they treat separately timing constraints (arrival and required times) and electrical constraints (drive strength, output load, ) whereas they are closely linked. In the proposed approach, logic flexibility awareness is obtained using a logical effort modeling1 and some simple restructuring or rebalancing algorithms. Logical effort theory is also used to better correlate timing and electrical constraints. The proposed algorithm, called Flexibility Aware Budgeting (FAB) has been compared to some common used budgeting approaches (IMP_T and CPB)2. Experiments based on commercial EDA tools and real designs show up to 55 % reduction in hierarchical flow run time and lead to a good flow timing closure. Approach CPU gain MEM gain Final Slack CPB 66 % 37 %  0.98 ns IMP_T 66 % 35 %  0.40 ns FAB 60 % 31 %  0.25 ns Table 1. Run time and memory gains, and final slack at the end of the first hierarchical flow iteration.
I. Sutherland, B. Sproull, and D. Harris, Logical Effort: Designing Fast CMOS Circuits, MORGAN KAUFMANN PUBLISHERS, 1999. 2 See full paper for IMP_T and CPB descriptions.
DCIS 2004
 197 
paper presents a new HW/SW codesign methodology for applications dealing with real time This video compression based on the use of reconfigurable platforms and embedded processor cores (IPs) for SoC design. In order to select the right solution during systems prototyping, we use Matlab environment with FPGA toolboxes, in order to model, design and verify systems performances. Processors cores (either soft or hard IPs) execute tasks with realtime constrains like building transport streams, managing QoS through configuration parameters. Besides, intensive realtime data processing tasks are implemented using specialpurpose HW. A demonstrator based on realtime MPEG video compression has been designed and validated. It implements video coding using the standard ISO/IEC 138182  ITUT H.262H (also know as MPEG2 Video), for the Main Profile at Main Level. We use as starting point, descriptions of video standard models (i.e. ITUT H.262H), using the standard source software (written in C). The main advantage of a system level description using C/C++ programming language is that allows reusing good software code as well as their verification environments. The HW/SW partition is done after computational complexity analysis of the system in terms of: estimated number of operations per second, number of clock cycles. It gives the viability of SW implementation of the algorithms in the softcore NIOS. Intensive realtime data processing tasks are implemented using specialpurpose HW with the Matlab environment with FPGA toolboxes.
DCIS 2004
 198 
Session 8d
THIS
paper present a closedform evaluating the number of transitions in main nodes of a 2level
ripple carry adder as basic circuit in array multipliers, with random inputs. The different transition
causes are considered and classified and has also been taken into account in the derivation of the expressions. This fact allows the determination of the energetic weight of each class of transition. If the correct energy is assigned to each class of transition the expression is accurate to evaluate the energy consumption of such structures. The effect of spurious caused by unbalanced path delays is also evaluated in a closedform under the assumption of a given spurious generation mechanism.
DCIS 2004
 200 
HE paper is devoted to the study of the dissipated power contribution of crosstalk when two coupled lines make simultaneous or relatively delayed transitions. The importance of the delay, the
line length and the relative driver strength is analyzed through electrical simulation. A typical coupled structure in a 0.18m CMOS technology has been considered (Fig. 1). Electromagnetic (in order to obtain parasitic interconnect parameters) and electrical (to estimate power dissipation) simulations with HSPICE have been performed. Opposite direction transitions with zero delay produces the larger increase in energy, while same direction transitions imply a saving in the energy. As a numerical example, for a 100m line length the difference between this maximum and minimum energies is about 143fJ, which represents a 54% reduction. This fact may be used, for instance, to drive algorithms for coding line transitions in data buses. From the analysis of the line length dependence, it can be seen that for short lines (<50m) intrinsic device contributions (shortcircuit, leakage, etc.) clearly dominates, whereas for larger lines (>100m) a region exists where the dissipation is mainly due to charging and discharging the lines, including the coupling capacitor (Fig. 2). This range of length lines is quite typical in medium to large sized circuits, and therefore crosstalk contribution must be seriously considered in power estimation tools if accurate predictions are pursuit.
VDD2
10000,0
ip2 in2 line 2 V2
Vi2
Cc2 ic2
VDD1
V1 ip1 in1
line 1 C12 C1
i12
C2 i2 i1
Energy (fJ)
1000,0
100,0
Vi1
Cc1 ic1
10,0 1
nmos1
10
Length (um)
nmos2
100
pmos2
1000
total
pmos1
Fig. 1. Circuit used to model the structure of two coupled lines with inverter drivers.
Fig. 2. Energy dissipated in the driver transistors, in front of the line length (opposite direction transitions).
DCIS 2004
 201 
architectures based on circuit simulations. The proposed approach takes into account second order
effects like the effect of internal capacities, not previously considered in analytical models. It also provides a set of design guidelines to lead the system designer on the electronic design with lowpower constraints from the very early stages. The power characterization method we follow is based on the following assumptions: First, the source of power dissipation in this model is the charging and discharging of capacitative loads caused by signal transitions. Therefore, only dynamic power consumption is considered, even though this information could be easily extended with the static power information provided by Spice. Second, the energy consumption in every transition is proportional to the rising and falling times of the signal. Thus, the experimental method will concentrate on measuring these times. Our experimental methodology can be summarized as follows: First, a set of different memories are generated. In order to evaluate either SRAM and DRAM technologies, both kind of schematic representations have been created. Next, test vectors are also generated to perform all the interesting logical transitions in the circuits to be evaluated. Then, the Spice simulator included within the Cadence environment is used to collect the results of the electrical simulations of such circuits. Finally, and after the careful analysis of these results, a set guidelines is established for guiding the circuit and system designer in the efficient design and use of the memory hierarchy when lowpower constraints are involved. Finally, this research work provides some important rules to the system designer that have to be remembered when planning the memory hierarchy: The underlying topology (rows and columns) has a strong impact on the power consumption of the device. For equal memory size, memory rows containing more than one data word can reduce energy consumption. Memories that split the data word between more than one row can show lower energy consumption. However, in those cases, the energy dissipated in the bus during the extra access has to be evaluated. The memory architecture has to be considered from the very early design stages to make feasible the energy reductions and the architecture adaptation.
DCIS 2004
 202 
X. Michel, A. Verle, N. Azmard, P. Maurine, D. Auvergne LIRMM, UMR CNRS/Universit de Montpellier II, (C5506), 161 rue Ada, 34392 Montpellier, France azemard, pmaurine, auvergne@lirmm.fr
HE design of more and more complex, integrated and fast circuits implies to manage tradeoff Tbetween speed, power and area. This can be achieved with circuit simulators and critical path analysis tools to modify iterativel y the size of the transistors until complete constraint satisfaction. More general speedup techniques involve buffer insertion and logic transformation. If these techniques may be found efficient for speedingup combinational paths they may have different impacts in the resulting power dissipation or area. Gate sizing is area (power) expensive and, due to the resulting capacitive loading effects, may slow down adjacent upward paths. This implies complex and iterative timing verifications. Buffer insertion preserves path interaction but is only efficient for relatively highly loaded nodes. To manage these alternatives it is necessary to evaluate and compare the performance of the different implementations. Without using any robust indicator, selecting between all these different techniques for the various gates of a library is NP complex and induces more iterative attempts which are processing time explosive. A reasonable selection of speedup technique must be based on a characterization of the available speed on a critical path, on the determination of the critical nodes and the characterization of the gate sensitivity to the sizing or buffering alternatives. Based on a realistic model for gate timing performance, the main contribution of this paper is to define different metrics for path characterization, transistor sizing and buffer insertion, to be used as efficient indicators for characterizing the logic gates in terms of sensitivity to the sizing and buffering techniques. We propose a method for determining the minimum delay, Tmin, achievable on a path. Then we define, at gate level, the fan out limit for buffer insertion, Flimit. Flimit is used to determine the path critical nodes and Tmin, to select between sizing and buffer insertion alternatives. We define a gate sensitivity factor "a", to distribute the delay constraint, allowing path optimization at provably minimum area cost. These metrics are used to define a general path optimization protocol that is implemented in an optimization tool based on an accurate representation of the physical abstraction of the layout (POPS: Performance Optimization by Path Selection). We have developed this tool to give facilities in analyzing and optimizing combinatorial circuit paths in sub micron technologies. Validation on various benchmark circuits demonstrate the validity of the defined boundaries for selecting between the different optimization alternatives.
DCIS 2004
 203 
Session 9a
order to confront the verification of more and more complex Systems, several Designfor
Verification methodologies (DFV) have been proposed. One of them, Assertionbased Verification
(ABV) has recently emerged as the functional verification methodology capable of keeping pace with increasingly complex systems. This paper presents a static assertion checking technique for hardware behavioral models, which are modeled with polynomials. The algorithm generates vectors automatically to detect the violation of the assertion. If no counterexample is found, the assertion is fulfilled by the description. The technique is based on a modified Interval Analysis (MODIA) and it reduces the verification effort because there is no need to explicitly unroll loops. In order to validate the proposed technique, a set of examples have been proposed. These are executed by the tool SMV and the proposed Assertion Checker. The results are shown in Table I. The tool SMV needs to unroll the loops to handle them, while the proposed tools handle loops without unrolling. TABLE I Comparison with property checkers. SMV Linear Nonlinear Assertion Number of Evaluated Checker Iterations < 1s 1s 10 4.42 s 1 s 5 Cyclic descriptions
The advantage of this method is the efficiency of handling datadominated algorithms independently of the range of the data and it can be directly computed over the Control Data Flow Graph. However, the main disadvantage is the explosion of the number of paths with the number of ifthenelse structures. During cyclic description verification, the algorithm looks for possible input combinations that violate an assertion taking into account all conditional paths. Thus, the memory consumption grows when the number of iterations increases. In future work, the depthfirst search will be implemented to solve this problem. Additionally, heuristic metrics based on statistical probabilities will be used to choose the path with highest probability to reach a violation.
DCIS 2004
 205 
complete highlevel synthesis algorithm specially suited to reduce power consumption in data dominated applications is presented. It performs jointly the scheduling and allocation of
behavioural specifications In addition to classical low power methods, it implements novel design strategies to reduce the datapath power consumption. These new features comprise the successive transformation of the specification operations until a circuit implementation with minimum power consumption in functional and storage units is reached. Circuit power consumption is minimized first by balancing the power consumed per cycle, and secondly by maximizing the bitlevel reuse of datapath HW resources (making possible the disability of the maximum number of datapath FUs per cycle). The algorithm performs these transformations trying to balance the power consumed per cycle (due to the execution of the operations), allowing only the execution of operations over functional units of their same width. To do so, some specification operations are successively transformed into sets of narrower ones whose types and widths may be different from those of the original operation. In consequence, some of the specification operations are executed during a set of nonnecessarily consecutive cycles and over a set of narrower functional units, linked by some glue logic to propagate partial results and carry signals as necessary. Experimental results on some datapath intensive designs show significant improvements in both power consumption and area reduction over conventional lowpower scheduling algorithms, as shown in Table I. In general, the power consumption and datapath area reductions grow with the specification heterogeneity (number of different operation types and widths present in the specification divided by the number of operations).
Table 1. Power consumption estimations and area of some synthesized examples # Ope. 10 20 35 50 70 85 100 Specification Features # Adds. # Mults. # Widths 6 4 3 12 8 4 23 12 5 40 10 7 55 15 8 60 25 8 80 20 10 Latency 3 5 7 10 12 20 30 Power Consumption (pF) Synopsys Low Power Ours 3714 2376 1298 5120 3190 1903 7668 4533 2884 9730 6089 3675 15101 9834 5693 18249 11845 7084 20395 12201 7893 Area (# equivalent gates) Synopsys LowPower Ours 328.75 332.9 248.4 443.45 461.6 305.34 645.23 590.67 429.3 822.3 810.41 467.31 1244.87 1203.5 598.56 2003.67 1904.45 730.45 2690.67 2547.23 984.12
DCIS 2004
 206 
ONVENTIONAL scheduling algorithms usually adjust the clock cycle duration to the execution time of the longest operations. This results in large slack times wasted in those cycles with faster
operations. To reduce the wasted times multicycle and chaining techniques have been employed. The design technique presented in this paper goes one step further. For a fixed latency, the performance improvement is achieved by selecting the minimum clock cycle duration, which is independent of the operation execution times. In order to adjust the arrival times of the results calculated to the cycle duration, some specification operations are fragmented and every fragment scheduled in a different cycle (nonnecessarily consecutive cycles). Also the result bits of one operation are available in the cycle they are calculated, to be used by any successor. So the execution of one operation may start even if its predecessors have not finished yet. Additionally, the regularity of operation chains scheduled in every cycle allows the design of new operators, which not only simplifies the allocation and binding phases, but also produces more structured and smaller datapaths. Experimental results show encouraging improvements in performance, as the cycle duration of the synthesized circuits shows reductions of up to 85% (70% on average) with slight increments in datapath area. Figure 1 compares the cycle length and area of some circuits synthesized by our algorithm to those proposed by the forcedirected scheduling one with chaining and multicycle features builtin.
a)
Our approach 25 Cycle length 20 15 10 5 0 10 15 20 25 30 35 Latency Forcedirected
b)
Ours
Forcedirected
Area
20
25
30
35
Latency
Figure. 1. a) Cycle lengths of the schedules proposed by our algorithm and the forcedirected one for some synthetic circuits, b) area of the implementations obtained from our schedule and the forcedirected one.
DCIS 2004
 207 
Session 9b
Communications Systems
Friday nov. 26 14h00 15h15, Auditorium Chairs Armando Roy (U. de Zaragora) Roberto Sarmiento (U. Las Palmas de Gran Canaria)
An Efficient Priority Queuing System for High Speed Network Processors with QoS Support
F. Tobajas, V. De Armas, N. Cruz, R. EsperChan, R. Arteaga and R. Sarmiento Instituto Universitario de Microelectrnica Aplicada, IUMA Dpto. de Ingeniera Electrnica y Automtica, University of Las Palmas de Gran Canaria 35017 Las Palmas de Gran Canaria, Spain tobajas@iuma.ulpgc.es
HE growth and expansion of Internet allow users the communication, education, work and
entertainment, and it requires an increase in bandwidth demand that will stand in the following
years. The integration of new applications with different requirements imposes on switches and IP routers, not only to absorb the bandwidth increase, but also to provide differentiated services. To provide QoS guaranties, priority queues are needed to store flows from different aggregates while a packet scheduler determines the next packet to be transmitted according to a timestamp value. The feasibility of priority queues with many thousands of entries has critical implications for high speed networks and the future internet. In this paper, a high performance sorting architecture designed to efficiently support any Packet Fair Queuing (PFQ) scheduling algorithm in high speed switches or routers is presented. Its primary function is to quickly sort the timestamp of the packets being stored in SRAM or other queue memory according to a predetermined algorithm, controlling how packets are pulled from queues, locating the lowest timestamp, and placing the packet into the output stream. While a network processor, or even a RISC control processor could do this function, this kind of sort is pretty computeintensive. The fast and scalable priotity queuing system proposed in this paper is based on a pipelined priority heap and was designed in the form of a core, integratable into an ASIC, being capable of holding up to 65535 entries that may be distributed evenly across 1, 2, 4, or 8 independent priority queues managed by means of a novel low cost method, and providing the address with the minimum timestamp at one clock cycle, regardless of the level occupation or the number of independent priorities supported. The basic operations of the proposed priority queuing system allow new entries to be inserted or the entry with the minimum timestamp value to be extracted from the priority queue. Other operations include the ability to read the entry with the minimum timestamp value without altering the priority queue, or to perform both an extraction and an insertion simultaneously. The implemented priority queuing system performs scheduling and QoS decisions with a throughput rate close to 70 million per second that would take hundreds of reads, writes and sorting steps on an unassisted network processor, significantly boosting performance of priority queuebased scheduling algorithms.
DCIS 2004
 209 
INCE Turbo codes were born in 1993, they have been widely spread due to their spectacular
(close to Shannon theoretical limit) performance. On the other hand, this kind of channel coding
scheme presents big area consumption. In order to solve this area consumption there have been proposed approximations to original decoding algorithm (MAP) as logMAP or maxlogMAP. Several standards, as CDMA, UMTS or DVBRCS, have adopted this kind of channel coding system, making these algorithms even more interesting for researchers. DVBRCS may well become a global satellite standard that allows all equipment manufacturers to focus on the same technical solution, thus providing a healthy and open competitive environment, providing enormous benefits to industry and users alike. In this paper several improvements are presented for the DVBRCS maxlog MAP VLSI design, but they can be applied to any other maxlog MAP based system. Due to the fact that the proposed architectural optimizations do not alter the original maxlog MAP algorithm, they keep the same BER performance. First two proposed modifications refer to decoder processors, saving an important percentage of the area spent in classical designs. In the second part of the paper a new way to face the LLR calculation is presented, saving more than 30% of the area required in classical implementations by performing two optimizations that can be applied together or separately. Both modifications save an important amount of area, and only the second one has a slight penalty in the critical path.
DCIS 2004
 210 
DCIS 2004
 211 
ITUCompliant Macrocells for Dual Tone Multiple Frequency Transmission and Reception
Arturo Purroy Isidro Urriza
Department of Electronic and Communications Engineering, University of Zaragoza, c/Mara de Luna 3, 50018 Zaragoza, Spain. {apurroy, urriza}@unizar.es
telephone network provide fast access to almost anywhere in any industrialized country. An THE easy interface must be offered to customers of domotic environments in order to manage the home automation system, and the phone pushbutton keyboard may be an efficient solution. Inside the house, a controller device is needed. This device should be able to decode the DTMF symbols received from the user and link with the home systems to perform the asked service. To develop cheap solutions, single chip designs are preferred. We present in this paper the design of a digital transmitter and receiver of DTMF symbols modeled in a hardware description language (VHDL), which can be easily added to other designs in a single field programmable gate array (FPGA). The efficiency and physical size of the final design have been the main goal of the research. Several chips are available which employ analogic circuitry to generate and decode DTMF signals. The advantages of a digital system include better accuracy, precision, stability, versatility, and reprogrammability as well as lower chip count, and thereby reduced boardspace requirements. There have been several investigations dealing with efficient DTMF detectors, trying to simplify the filter bank. DTMF detectors typically consist of a signal analysis front end followed by a decision logic back end. The usual approach is to use an adaptable filtering architecture which is able to implement multiple filters with just the hardware of one. The Goertzel algorithm or the nonuniform discrete Fourier transform (NDFT) are the basis of some DTMF detectors. The number of products required, and the long windows of samples make their implementation inefficient. A very efficient solution was proposed by A.A. Deosthali, S.R. McCaslin and B.L. Evans. Using adaptive notch filters and sophisticated decision logic, their detector meets the ITU standard when implemented in a 8 bit microcontroller. Based in this algorithm, our design has rearranged all the filters and signal processing units to take advantage of a specific hardware design. As the original implementation used a 8 bit microcontroller, when more precision was needed (16bit data), the original algorithm had difficulties to operate. This and other implementation problems have been solved in our design, which complies with the ITU specifications. Our design is the first solution published that provides VHDL code encapsulated in digital macrocells. This goal has added constraints to our design, as the size of a macrocell should be minimized.
DCIS 2004
 212 
virtual outputqueued (VOQ) switch is a particular case of inputqueued switch such that
VOQ(i, j) is the partition of input queue i that stores the packets directed towards output j. In a
previous paper1, a new maximal size matching algorithm for VOQ switches, namely Parallel Hierarchical Matching (PHM), has been proposed. PHM is a distributed algorithm with the same gate complexity (O(N2), where N is the switch size in I/O ports) as parallel iterative maximal size matching algorithms such as iSLIP2 and RDSRR. All these algorithms maximize instantaneous throughput in highperformance VOQ packet switches such as the Cisco 12000, the Lucent Cajun and the Nortel Versalar TSR45000. Recently, it has been shown that PHM is competitive with them under hotspot, bursty and unbalanced traffic. The results in1 suggest that PHM has both the advantages of previous sequential hierarchical matching algorithms (low hardware complexity) and parallel iterative maximal matching algorithms (low number of iterations). In this paper, a PHM implementation is presented and compared to an efficient parallel iterative maximal matching algorithm implementation. A full PHM VOQ controller and the parallel iterative maximal matching arbiter in2 have been implemented for switch sizes 22, 44, 88 and 16x16, using the AMS Standard Cell Library for the 0.35 Pm 3.3V CMOS process. Table I shows worstcase decision response times per iteration and the corresponding maximum work frequencies.
TABLE I PARALLEL ITERATIVE ALGORITHM VS. PHM:
ITERATION RESPONSE TIMES
IMM 6.85 ns (146 MHz) 10.04 ns (100 MHz) 14.98 ns (67 MHz) 30.67 ns (33 MHz)
PHM 1.54 ns (649 MHz) 1.98 ns (505 MHz) 2.60 ns (385 MHz) 2.32 ns (431 MHz)
The results obtained clearly show that PHM is competitive with stateoftheart VOQ schedulers in terms of delay and speed.
1 AsoreyCacheda, R., GonzlezCastao, F.J., LpezBravo, C., PousadaCarballo, J.M. and RodrguezHernndez, P.S. "On the Behavior of PHM Distributed Schedulers for Input Buffered Packet Switches'' IEEE Transactions on Communications, vol. 51, no. 7, July 2003. 2 N. McKeown, iSLIP: A Scheduling Algorithm for InputQueued Switches, IEEE/ACM Trans. Networking, vol.7, no. 2, 1999.
DCIS 2004
 213 
Session 9c
HIS paper presents a methodology for building SoC systems starting from XML specifications
This methodology includes the implementation of a set of tools, written in Java, to generate the whole set of Hardware (Verilog) and Software (C) files required to synthesize and simulate an entire AMBAbased SoC. This tool can assist architectural exploration during model refinement and HW/SW partitioning, as a critical step to speedup the design process for new complex SoC systems. There are a lot of approaches for building SoCs according to application specific purposes. Our tool can help designer to build different architectures in a short period of time. This will help architectural exploration and verification at simulation levels and using prototyping platforms. Our tool, called UltraWizard, is written in Java and starting with a XML file describing bus architecture and IP interconnection automatically generates HDL files (Verilog) and SW files (.C) to synthesize and simulate an entire AMBA SoC. To show tool benefits, an example has been designed that maps into different architectural solutions that will be generated by UltraWizard and they have been synthesized and tested to our prototyping platform. The implemented test system outputs a sinusoidal waveform through a DAC (with a customizable FIFO depth). Samples are calculated by an ARM7 processor, and sent in different ways to DAC. 1) 2) 3) A simple AHB+APB architecture. CPU is polling DAC continuously and sends new sample to DAC when it is ready. AHB+APB+IC: An interrupt controller is added to notify CPU when DAC is ready (i.e. FIFO empty). AHB+APB+DMA: A DMA is used to decrease CPU load. DMA interrupts processor every time a middlebuffer is empty and DAC interrupts every time its FIFO is empty.
DCIS 2004
 215 
widely agreed that the best way to increase design productivity is to reuse solutions that worked Iin t's the past. The concept reuse, however, has a different meaning when applied to a software system or a hardware one. While in the former it is normally understood as the reuse of the model, in the latter it normally means the reuse of components. Working at the level of components makes it more difficult to find opportunities where a previous solution can be applied again, since often they are to specific. This scenery is quite similar to what happened with software development not so long ago. The improvement in quality and design productivity in the software domain has traditionally been based in raising the level of abstraction of system models, where problems are considered in their own domain, without the disturbance of implementation details. One clear example is generic programming, where abstraction is achieved through the use of parameterizable code which can be applied to any data structure. It is straightforward to suppose that we could expect the same kind of benefits if we were able to apply all this mature software technologies to the problem of designing complex hardware systems. The aim of this paper is to discuss how one of these software technologies (generic programming) can be applied into hardware by means of a partial port of the Standard Template Library in C++. The STL is based in the cooperation of three kinds of elements (containers, iterators and algorithms) whose main purpose is to decouple algorithms from data structures, and where all elements are able to work with arbitrary data structures. A discussion about the minimal set of elements and characteristics of this elements is presented, where some clues about the implementation consequences are depicted. Finally a very simple example has been used to illustrate the benefits of using generic programming to model hardware systems.
DCIS 2004
 216 
System level design using SystemC: a case study of block turbo decoder
Erwan Piriou, Christophe Jgo, Patrick Adde and Michel Jzquel Electronic department GET/ENSTBretagne, CNRS TAMCIC, Brest, France, firstname.surname@enstbretagne.fr
he objective of this article is to give the results of a project where a conventional design flow is
replaced by a system level design flow using SystemC language. Traditional methods for
designing hardware circuits for digital communication systems use an RTL specification. However, they suffer form heavy limitations that prevent them from efficiently addressing the algorithmic complexity and the high flexibility required by the various application profiles. SystemC 2.0 is a standard design and verification language that spans from concept to implementation. One of the primary goals of SystemC is to enable modeling of systems that might be implemented in software, hardware or some combining of the two. A System On Chip may be defined as a complex circuit that integrates the major functional elements of a complete system. Recently, FPGA leading suppliers have developed a new type of programmable circuit: the System On a Programmable Chip. This technology corresponds to the integration of the software and hardware resources into the same FPGA. In that case, one or more processor cores, available in the programmable circuit, carry out the execution of the software resources. This approach provides the flexibility to integrate memory, processors, peripherals, and other intellectual properties (IP) into the same chip. The original flow was successfully applied to the design of a block turbo decoder (128,120,4)2. Currently the iterative process, known as turbo coding, is the most efficient channel coding technique for digital communications. Block turbo codes are an alternative solution to convolutional turbo codes. They are especially attractive for highspeed applications and offer a very good coding gain at high code rates. The functional blocks and the control unit of the elementary BCH decoder (128, 120,4) were respectively implemented into hardware modules and into a software module. For this reason, our architectural solution is composed of a Nios embedded processor. The block turbo decoder (128,120,4)2 was integrated into an Altera's Nios Development Kit, Stratix Professional Edition which is based on the Stratix EP1S40 device.
DCIS 2004
 217 
HE
complexity of digital electronic systems has increased considerably lasts years. Software
development costs will overtake hardware development in future technology processes. In this
paper we address this open issue. The main objective is to propose a new methodology that enables us to reuse SW code in simulation and prototyping to reduce timetomarket and increase the reliability of the design (Fig. 1). This new methodology will use SystemC and will be objectoriented. In this paper we will apply our methodology to the case study of a highspeed LVDS interface DMA transmitter module developed at HewlettPackard BPO Spain.
Verilog stimulus
conversion
C++ stimulus
FM
CPU (compile)
FM
CPU (compile)
RTL
Prototyping
FPGA / ASIC
RTL
Prototyping
FPGA / ASIC
FM Testbench
Electronic engineer SW engineer
FM SystemC testbench
Simulator
DCIS 2004
 218 
Comparing Design Flows for Structural System Level Specifications facing FPGA Platforms
D. Castells, M. Monton, R. Pla, D. Novo, A. Portero, O. Navas, J. Farr, L. Ribas, J.Carrabina Cephis, Universitat Autnoma de Barcelona, Bellaterra, Spain, David.Castells@uab.es
YSTEM level design methodologies introduce new design flows that are complementary to the
ones provided by existing toolsets based on HDLs. Therefore, a miscellaneous of tools and
methodologies are available for the design of complex microelectronic systems driven by different actors playing on the microelectronic arena. This paper compares three different system level design methodologies derived from MATLAB, SystemC and JHDL; together with the classical use of HDL languages (in this case VHDL). A highspeed sorter, defined at structural level, is used as a common specification to test different methods. Independent development teams with experience in each toolset are faced to the same specification. Some development process indicators are selected in order to be able to compare teams and tools productivity. Also the obtained performance and area usage characteristics for synthesized circuits are measured and compared. Results are presented for the different development phases (as shown in Figure 1): Simple design development, unit simulation test, final design development, complete simulation test and synthesis and physical verification Finally obtained results show that, for this particular experiment, MATLAB and JHDL have been more productive than other methods, especially than SystemC. All methodologies produce circuits with similar area usage but having different timing characteristics, being VHDL the more efficient.
DCIS 2004
 219 
Session 9d
Noise in Electronics
Friday nov. 26 14h00 15h15, Lacanau Room Chairs Andr Touboul (U. Bordeaux 1) Antonio Rubio (U. Politcnica de Catalunya)
HIS work is dedicated to compare the state of art of Yparameter based bipolar noise models
using a SiGe low noise heterojunction bipolar transistors: Spice model, thermodynamic approach
model (TDA), correlated shotnoise model (CSN) and TDA/CSN interpolation model. Minimum noise figure, Fmin , comparison is done using measured and simulated Yparameters and both of them are compared with Fmin measurements in low GHz frequency range (from dc up to 6GHz). Simulated Yparameters are extracted from Spice small signal equivalent circuit, using GummelPoon expressions. Measured Yparameters are obtained from Infineon Technologies. Minimum noise figures plots against frequency and against collector current are shown, see figs. 1 and 2. Using simulated Yparameters all models fit very good. However, using measured Yparameters good agreement is only obtained at low currents. The discrepancies between measured and modeled Fmin are discussed. Furthermore, two possible improvements are suggested to solve inaccuracy at large currents.
Figure 1. Comparison of modeled and measured Fmin versus frequency (Ic=20mA, n=0.75 ps) with measured Yparameters.
Fig. 2. Comparison of modeled and measured Fmin versus collector current (freq=6 GHz, n=0.75 ps) with simulated Yparameters.
DCIS 2004
 221 
noise generated by digital circuits is one of the most limiting factors to implement Tmixedsignal integrated circuits. The integration of digital and analog circuits in the same
HE
silicon die is conditioned and limited by the noise levels generated in the digital section in conjunction with the increasingly demanding performance requirements of the analog and Radio Frequency (RF) sections. This work describes the most important spectral characteristics of the digital noise, analyzing its relation with the current demanded by digital gates and the resonance circuit network formed by digital circuit parasitics in addition to the package and substrate. First, we analyze the spectral content and characteristics of the digital switching current waveform, deriving its influence on the digital noise power spectrum. This current source is modeled in frequency and time domain by a generalized expression. The second relevant factor analyzed is the resonant circuit network, which determines a transfer function from that primary noise source to the node where the magnitude and effects of the digital noise is evaluated. This transfer function (actually a transimpedance) multiplied by the spectral content of the excitation source determines the overall characteristics of the digital noise power spectrum. The circuit transfer function acts as a filter that modifies the characteristic spectrum of the digital switching current, modifying its frequency content mainly near to the package resonance frequency. In this work analytical models that allow to predict the main characteristics of the digital noise are presented. These models are applied to some examples, where a good agreement is found between the analytical expressions and simulation results.
DCIS 2004
 222 
On the Relation between Digital Circuitry Characteristics and Power Supply Noise Spectrum in MixedSignal CMOS IC
Miguel ngel Mndez, Jos Luis Gonzlez, Enrique Barajas, Diego Mateo and Antonio Rubio (mamendez, jlgonzalez, ebarajas, mateo@eel.upc.es, antonio.rubio@upc.es) Electronic Engineering Department, Universitat Politcnica de Catalunya C/ Jordi Girona 13, Campus Nord  C4, 08034 Barcelona, SPAIN
IGITAL power supply noise is a key issue in the design of mixedsignal and Radio Frequency (RF) integrated circuits (IC). Implementing lowcost and high performance communication
systems on a single silicon die (System on Chip, SoC) raises concerns about switching noise coupling from the digital section into the sensitive analog parts of the circuits through the common substrate. In the present work we investigate the relation between the most relevant parameters of the power supply noise spectral density (PSD) and some digital circuit characteristics: clock frequency, power supply voltage, synthesis alternatives considered (what means different circuit topologies), and also technology used. To do that we have carried out a statistical analysis over a benchmark circuit (ALU181 from Motorola), analysis which has been extended by using mathematically described switching current waveforms. In order to validate the study different measurements over a test chip (see Figure 1 and 2) built on a 0.35m CMOS, high resistivity p substrate technology, have been carried out.
40
(a) V = 3.3 V
dd
dBV/Hz
50 60 70 80 0.2 0.4
1.2
1.4
40
dBV/Hz
From the study done we have concluded some relations between circuital characteristics and power supply noise PSD, which can be used during the design of mixed signal communication ICs in order to reduce the generation of digital noise and also its effects on sensitive analog parts (for example, taking into account the noise PSD during the RF frequency planning).
DCIS 2004
 223 