Академический Документы
Профессиональный Документы
Культура Документы
Francisco Fons, Mariano Fons, Enrique Cant' Departament d'Enginyeria Electronica, Electrica i Automatica Universitat Rovira i Virgili, Tarragona, SPAIN francisco. fonsgestudiants.urv. cat
Abstract - In this paper, an efficient hardware-software architecture is proposed to cope with the implementation of an
automatic fingerprint recognition system. A flexible field programmable gate array (FPGA) device lets develop the image
reconfigured and reused by several custom coprocessors during the different operation stages of the sequential biometric algorithm. The results reached with this technology reveal thata
real-time applications of compute-intensive nature. Making use of the new paradigm arisen from the advances in reconfigurable computing architectures applied to the
middle-range reconfigurable FPGA faces both real-time and parallel compute-intensive demands of the fingerprint image enhancement process.
I. INTRODUCTION
For certain applications, custom computational hardware based on field programmable gate arrays (FPGAs) highlights significant performance improvements over traditional processors. Moreover, an earned value, low-cost, is accentuated if these logic devices can be reconfigured while the application is in progress. This work contributes to identify suitable application fields in which reconfigurable computing can be deployed as a cost-effective technological advantage. In this direction, image processing collects some attractive features that totally overlap and fit with this approach: there, extensive parallel pixel-focused computing is usually scheduled and performed sequentially, following a series of chained tasks where the output image of a processing loop results at the same time the input image for the next one. Definitive key factors as parallelism and flexibility given by the time-multiplexing of hardware resources let locate reconfigurable logic devices in a good position to displace other cost-competitive solutions, as microcontroller units
development of an embedded automatac fingerprint authentication system (AFAS). The paper specifically focuses on one of the most power-demanding stages of the fingerprint recognition algorithm: the image enhancement. Just there, the digital image previously acquired by a specific fingerprint sensor [1] is processed to reach a well-suited quality level, able to support the final fingerprint matching stage. The design flow is addressed to a flexible hardware platform based on the Excalibur EPXA1O system-on-chip (SoC). It constitutes a concurrent multi-processor architecture composed of an ARM core and an Altera APEX20KE FPGA. Dedicated hardware coprocessors mapped on the FPGA are responsible for the most time-critical tasks of the biometric recognition algorithm while, in parallel, the ARM processor handles the program flow and takes charge of reconfiguring on-the-fly, just between the execution of a task and the next one, the FPGA hardware context [2]. This paper, after introducing its application scope and presenting in section II the stages of the fingerprint recognition algorithm, pays special attention to the synthesis of specific reconfigurable hardware coprocessors, which, illustrated in section III, let accelerate the image processing until achieving real-time features. This target of real-time, on the contrary, became unreachable when implementing all the algorithm purely in software under the ARM processor, without FPGA resources, as shown in the experimental figures collected in section IV. The work concludes at section V by evaluating the cost-performance trade-off reached through our reconfigurable architecture oriented to embedded biometric applications such as personal data assistants (PDAs) or smart cards, becoming thus the first research work that merges fingerprint biometrics and flexible hardware.
III.
Fingerprint matching is one of the most popular and reliable biometric techniques used in automatic personal recognition. With broad strokes, a sequence of computationally expensive image processing stages is performed to end up by sending a verdict concerning whether a user fingerprint sample matches a fingerprint template. In fact, in an era of ever-increasing security concerns, both industrial and scientist communities have recently addressed their attention and efforts on it: a lot of research groups are nowadays focused on developing accurate electronic computing systems capable of recognizing the identity of a person at real-time and with good levels of trust. Conceptually, the image processing algorithm synthesized in an AFAS can be divided in four chained stages [3]: Fingerprint image acquisition, in which a bitmap of the fingerprint is captured at typically 8-bit grayscale and 500 dpi resolution; Fingerprint image enhancement, process usually required to improve the ridge-valley structure of the acquired image until reaching an acceptable level of
-
resultant value determines if both fingerprints come from the same finger to authenticate thus its owner. Despite this, at present, the efficient implementation of an embedded AFAS is still an open issue.
AUTOMATIC FINGERPRINT AUTHENTICATION
SDRAM SDRAM
quality; Fingerprint features extraction, which lets abstract the fingerprint image in a kind of personal genetic code composed of the minutiae (set of fingerprint ridge endings and ridge bifurcations) and the field orientation (ridges-valleys directions); and fmnally, Fingerprint matching. This is the definitive process where the fingerprint sample, acquired and processed in the previous stages, is now compared with an enrolled fingerprint template to obtain a similarity degree between both images. This
The combination of hardware acceleration and flexibility makes reconfigurable FPGAs key devices for implementing efficient computing systems. These, when interfaced to a microcontroller unit, let move the most demanding processing tasks into dedicated 11W coprocessors. Thus, the traditional way of implementing algorithms only in SW is nowadays a more and more obsolete practice, specially in the image processing field where the curr ent generation of FPGAs, with reconfigurable digital signal processing resources as well as embedded processors, are gradually attracting the interest of this market with powerful SoC platforms. In this direction, this work looks for exploiting the benefits of developing an embedded AFAS prototype able to achieve good levels of security at low cost under an efficient 11W-SW architecture, as shown in Fig. 1. Our development platform is constituted by the Altera Excalibur EPXAIO SoC. This device integrates an ARM\922T core processor, standard peripherals, programmable logic resources collected in a lMgates APEX2OKE FPGA and both on-chip single-port and dualport SRAM memory blocks shared by ARM and FPGA. All these devices are linked through two internal AMBA AHB interfaces [4]. In addition to the EPXAIO0 device, the development setup is equipped with an Atmel FingerChip FCD4BI14 fingerprint sensor, which is directly connected to the FPGA by means of a dedicated HW controller in order to acquire the fingerprint images of the user. Next, external SDRAM memory, interfaced to a SDRAM controller present in the SoC, stores the fingerprint images and lets access to them from both MCU (considering the embedded stripe composed of ARM core, memory and peripherals) and FPGA sides. Another external memory available in our AFAS prototype is FLASH. This non-volatile memory stores several bitstreams corresponding to the different 11W contexts that are sequentially downloaded into the FPGA while the biometric algorithm is in progress. Just for this, the EPXAIO
FEXABUEPAOSSE-NCI
XAIU PA0SSE-NCI
AI92
COR
AHB
~~~~~~~~~~SRAM
DP-SRAM
bitsreams USITERFAE B
CONROLLE
CPARAMETERS
SLAVGE
MASTGER
CNOLE
EXTERNAL~
~ ~
~~~~~~~~~~AE2KEFG
LIK.OMU.RAH...
Fig 1.Sysem rchtecureofPheRAEEcogRabS finerpin recognitioOprocesso
JVAO
SoC integrates a configuration controller from where the MCU can handle the reconfiguration of the FPGA resources at any time during system operation. Thus, inside the FPGA, a dedicated arithmetic coprocessor is synthesized to perform the specific computation needed each time. This evolvable coprocessor is connected to the internal DP-SRAM accessible by ARM and FPGA through an AVALON controller. This dual-port memory is used to store partial data computed in a task that could be accessed or shared by both processors in some other task. Furthermore, in the FPGA we have implemented an AHB master controller to access to the AHBbased devices, especially the SDRAM controller that connects with the external SDRAM. An AHB slave controller is also implemented in the FPGA. With this controller, the MCU can configure some registers located in the FPGA used as parameters of the synthesized 11W computer. Some examples are the dimensions Y-X of the image or some threshold values used by the algorithm that the MCU can set up just before starting a processing task. Finally, in connection with all these 1-W interfaces, there is a flexible coprocessor that evolves in each phase of the recognition algorithm. This coprocessor, although is customized in every of these phases, typically consists of internal dual-port RAM memory (LPM DPRAM) to store intermediate results and a specific arithmetic-logic unit (ALU) managed by a finite state machine (FSM). In order to identify what are the computationally most critical tasks, first of all we have developed all the recognition
region to study or, otherwise, it remains in it. For this, the ridges and valleys of the fingerprint determine a gray-level image with a continuous and abrupt change in pixel intensity whereas the background holds its intensity constant. Thus, our algorithm uses the gradient operator to evaluate this rate of intensity change at each pixel [5]. The input image is convolved with the Sobel masks in both Y and X directions to determine the orthogonal gradients at each pixel, considering a neighborhood kernel 5x5. Once we have the gradients Gy and G, for each pixel, the resultant magnitude of the gradient vector for each block G8,8 is approximated to the addition of the absolute values of both directional components for each pixel ofthe block 8x8, as formulated below: 8 8 (1) 88y' G The segmentation criterion (foreground=0 / background=1) is determined by comparing the gradient of each block against some parameterized threshold value: (2) o0, if G8x8 (y,x) > SGMNTTHRIILD SGMNT18 (Y,x)=1, otherwise Fig. 2 shows the 11W processor that performs all this computation while the whole image is shifted through it from SDRAM, via the AHB master, until storing the segmentation result in DP-SRAM via the AVALON interface. Four pixels of the image are processed in parallel each 11 clock cycles.
lets obtain the execution load of each task of the software application to realize where the processor is spending most of its time. With this information, we can balance resources versus performance to meet an efficient embedded system implementation. Consequently, those most time-consuming tasks are processed in 1-W by specific coprocessors instead of SW in order to optimise the application scheduling. Under this idea, the rest of the section discusses the development of the different processing phases involved in the fingerprint image enhancement stage. It covers the set of computationaltasks needed to convert the gray-scale fingerprint image captured through the sensor in a good-quality image with a well-defined structure of binarized ridges and valleys.
A. Image Segmentation
280
280/
21____________________
A
K 1s2d3516
<
_
Gxj+3,i Gyj+3,i
SEAB
5 4
4-7
0-3
Gxj,i
516
A7
ADD
ABS
Gxy l+lGyyxl64
000o
REGISTERS
D643A
LD
The image acquired by the fingerprint sensor usually consists of a foreground or region of interest, composed by the ridges and valleys of the fingertip skin, and the background, constituted by the rest of area with no information. The goal of the segmentation algorithm is to classify the whole fingerprint bitmap in two groups: valid pixels and invalid ones so that from now on focus the next processing phases on only the part of the image with profitable and confident information, and removing or without taking care on the rest ofthe image. Our segmentation algorithm splits the image in a grid constituted by unitary blocks of 8x8 pixels. Each of these blocks is evaluated to decide whether it is discarded from the
hi
KERNEL 5x5
S1
PREADDERS
I 4I I
SIGN
BIT31 BIT30
BlTt
BITO
LIE
B. Image Normalization
This step is typically carried out to achieve a fingerprint image with a pre-specified mean and variance. This fact lets decrease the dynamic range of the gray scale between ridges and valleys in order to facilitate the next image processing steps. The mean and variance values of an image of total coordinates Y and X are defined as: M X L M i Y 2/ Y X ' Y X Y X YX p2 j,I0Lp(j, i)0 2(X VAR= (p(j,i) _M )2YX
Both OFFSET and SLOPE values are function of the previous primitives and are calculated by the MCU. Afterwards, they are downloaded as configurable parameters into the following 11W context after the segmentation, where the image normalization and noise suppression tasks are performed. Just at the same image processing loop and before the noise suppression task, the image is normalized by performing this linear computation. In this way, the two image loops of the Lpf11W normalization process, in contrast to what would happen if it was processed by SW, are hidden in time with other parallel processes, fact that lets reduce the system overhead. ~~~~C. mage Noise Suppression I
j ij i
N(y,X) =
AR
X VAR where M and VAR are the mean and variance of the segmented image and Mo and VARo are the desired mean and variance values for it [6]. This normalization requires two processing loops of the whole image. The first loop can be performed at the same time as the image segmentation computing, that is, once a whole block 8x8 is processed, if this block keeps on the foreground image (SGMANT bit3l=0), the weights of these 8x8 pixels take part in the computing of the mean and variance. The normalization computer, depicted in Fig. 3, processes the three primitives (number of pixels N, 277p(y,x) and 2472(y,x) of the segmented image) required to obtain both mean and variance values. With these primitives, the formula (4) can be rewritten as a linear function to be processed in the second image processing loop: (5) N(y, x) = OFFSET + SLOPE-p(y, x).
| VR)((:)) MO
VAR
t i
P(Y,X) > M
(4)
captured bitmap result disturbed by the noise. Nevertheless, many of these regions can be retrieved by means of digital signal processing techniques. In our case, we make use ofthe
Often, in the
fingerprint acquisition,
before any multiplication takes place. This associative property is taken into account to extremely reduce the ofour Pj+2,i P+3,j Pi3, hardware resources involved in the implementation Pj,j Pj,j Pj+1,i Pj1,i Pj+2,i 2D-convolver. Thus, a pre-adder lets convert the initial kernel of 13x13 8-bit operands in other equivalent 28 operands of 11-bits. Also, a key aspect of the image convolution is to establish an efficient mechanism to move pixel-by-pixel the + u image along our computer. For this, a parallel Y-shift and XACC LOAD Y,Y, SGEMNNATBBLEt shift controller has been designed to efficiently displace the image in the way that each pixel is piped from SDRAM into the HW computer only once for each whole image processing 31 1 loop. Therefore, when it is piped, all the computing operations that affect to this pixel, either having this pixel as central pixel of the kernel or as neighbor of another central 50O 000 0pixel, are performed in that moment thanks to a data buffer constituted by a kernel of registers, as depicted in Fig. 4. A+6/> SEPY,X4- DPPAM ACC ACCThe 2D-convolver implementation is based on the Vector LOAD ENABLE ENABLE 6X2LOAD Multiplier strategy [8]. This 11W implementation results much 3 moreNSGMNT<<6 in area than the approach based on a chain of efficient 1 P-SRA C3 T I1t nmultipliers and adders. As result, 24 clocks is the time jEPSGMNTyI. DSR Irequired to process four pixels in parallel: they flow from to ~~~~~~~~~~SDRAMthe 11W computer to perform the normalization I firstly, the convolution with the DSS filter afterwards and Fig. 3. Image normalization coprocessor finally send back the new four processed pixels to SDRAM.
dyadic scale space (DSS) theory: conceptually, the acquired image is decomposed into a series of images that are processed in order to remove the noise at different scales and then composed again preserving only the true information [7]. This process is performed in our algorithm by convolving the fingerprint image with a 13x13 symmetrical filter. In an image convolution, the value of a pixel p(y,x) is replaced by a value p'(y,x) that depends not only on its current value but also on its neighbors, and where the neighborhood depth is defined by the dimensions of the convolution kernel used. This signal modulation involves a loop of multiply-add operations applied at a pixel level until covering the entire fingerprint image. A relevant property of a mask with repeated coefficients, as our symmetrical kernel, is that it allows for pre-addition of these common input values
22P
x.P
64
P4.3A2M|
;0 0
SiP3P2SGMNTY2.
000
|PIXEL
PIXEL 1
PIXEL 2
PIXEL
0<
2<
0-3
Y + 1 8x8(Y,x)=-tan-1 12G,(y+ j,x+ i)GY(y j,x+ i) 2 28+,+ /Y+/C Y<L (G,2(y+ j,x+i)-Gy( 0n+))
j i
LPM RAMDP 5t12x32
(6)
L1LM RAMDP
5t12x32
280
RAMDP 5t12x32
LPM
5k
1 23516
C-F
0-3
4-7
8-B
C-F
4-7
8-B
C-F
0-3
8-B
0-3
4-7
C-F
0-3
4-7
8-B
; U 2 SELO
MUX2x32
C:-F
4-7
\ s S SELt
-3
4~7 C-F
4-7 0 -3
8-B
I I I I I I 4.
Finally, the resultant image is submitted to a new loop to smooth and redraw the shapes of both ridges and valleys. This refinement is performed by convolving the image with a new 4. I. 4.7x7 filter designed to remove some possible spurious or noisy
REGISTERS
ERS 16-BYTE
SHIFT
pixels [6]. As in the previous binarization loop, and before processing the convolution, the smoothing filter used as
Gxj,i Gyj,i Gxj+l,i
each of 8x8 pixels by rotating template is adapted toby the blockorientation of that region. it the degrees specified field
Gyj+1,i Gxj+2,i
Gyj+2,i Gxj+3,i
4;;
Gyj+3,i
0003
22~~~~~~~~~~~~YGxy,x-
yy,
61
AOC
// // \\ \\
z PRE-ADDERS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ACC<< t
v
2Pyx. h
I
l
pyx, hyx 11-BIT pyx, hyx
I
28 TAPS
pyx, hyx
Gxj,i+Gy,i,
dX
|ATAN2(d/d4 Q
YYGX2yx-Gy2yx
64
3
| 1
~~~~~LOAD |
<
The field orientation lets improve the definition of ridges and valleys by convolving the image with directional filters. To obtain the angle of both ridges and valleys, our algorithm makes use of the least mean square orientation estimation [9]. Once the gradients 9G, and Gx, are calculated for each pixel, the local orientation of each block of 8x8 pixels, 08x8, iS
Fig.
IV.
PERFORMANCE EVALUATION
In a first approach, the fingerprint image enhancement is carried out only through SW tasks by the MCU of our SoC TABLE I. platform shown in Fig. 6. The results concerning execution time are really poor, so much so that this approach is not PROCESSING TIME feasible for an embedded AFAS with real-time constraints. HW Fingerprint Image Enhancement The second approach combines 1-W and SW developed in ., Processing Task SW-only HW-SW algorithmContext Segmentation + Normalization image VHDL and C languages: the ,ma enhancement algorithm iS 1720 ms 25 ms now synthesized through a set of time-multiplexed HW Normalization + Noise Suppression 16600 ms 2 35ms 3 1820 ms Field Orientation 25 ms coprocessors, sequentially downloaded into the FPGA by the 4 Binarization 24600 ms 40 ms MCU while the biometric phases are in progress. In this new 5 7040 ms 40ms scenario, the FPGA takes charge of the computing whereas ISmoothing the MCU handles the program flow and the reconfiguration V. CONCLUSIONS stages. Moreover, before reconfiguring a 1-W context, all the relevant data are preserved in DP-SRAMISDRAM since when the FPGA is reconfigured by the MCU it is held in As far as authors know, this is the first research work that reset. Later, when the new FPGA context is ahready operative merges reconfigurable computing and biometrics to build an these dataare recovered again byboth processors. efficient and cost-effective embedded fingerprint computer able to perform the fingerprint recognition algorithm -batch process composed by several image/signal processing stages.....................................at real-time. The achieved results prove that a set of custom and reconfigurable 1Wcoprocessors mapped on a FG lets speed up the algorithm, partitioned in sequential tasks and processe on te same silicon-area, in about two orders of l..................... _magnitude compared with a purely SW implementation under a general-purpose MCU. Although other existing 11W/SW solutions afready reached this acceleration rate [11I], the novelty and key factor of our solution is found in the siliconsaving aspect coming from the 11W time-multiplexing, since those other 11W-SW approaches based on a static 11W design of the entire algorithm -with all the 1HW contexts placed on silicon all the time- result too expensive for most of the embedded applications of our cost-competitive real-world.
VHDL
available in the EPXA1O device. Therefore, both cost and reconfiguration time can be notoriously reduced by fitting the same design on a smaller Excalibur 400kgates EPXA4 SoC.
and
languages: the
enhancement
REFERENCES
Fig. 6. Development platform of our AFAS prototype
The performances comparison collected in Table I shows the time spent in processing a same fingerprint image of 512x256 pixels pixels alongall the phases of the enhancement algorithm along all the phases of the enhancement algorithm when only the MCU (purely SW solution) or both MCU and FPGA (11W-SW co-design) are working at 50 MHz. In the 11W-SW approach, the performance of our AMBA AHBbased iS limitedl by ItS dlata bandlwidlth of 32-bit relatedl basedl system is limited by its data bandwidth of 32,bit relatd to the data transfers from ARM/FPGA to SDRAM. Just for this reason, our 11W design places four 2D-convolvers on the FPGA at the same time. Thus, four 8-bit pixels are processed
system
[1] M. Fons, F. Fons, N. Canyellas, E. Cant6 and M. L6pez, "HardwareSoftware Co-design of an Automatic Fingerprint Acquisition System,"
to reconfigure the 1 Mgates FPGA is 720 ins. Concerning resources used, each one of the 11W contexts placed on the
1 1 .r O r 1 11 kbits Of memory, * i.e. 300O Of the programmable resources
[3] D. Maltoni, D. Maio, A.K. Jain and S. Prabhakar, Handbook of Fingerprint Recognition, Springer Verlag, 2003. http://www.arm.co [4] ARM Corp., AMBA Specification (Rev 2.0), Minutiae Detection in ,[5] D. Maio and D. Maltoni, "Direct Gray-Scale Fingerprints, " IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, pp. 27-40, Jan. 1997. [6] L. Hong, Y. Wan and A. Jain, "Fingerprint Image Enhancement: Algorithm and Performance Evaluation," IEEE Transactions on Pattern 1998. Intelligence, vol.20, no. pp. in parallel corresponding tothefour8bitdatacyclicallAnalysis and Machine"Fingerprint Enhancement8,with 777-789,Scalein parallel corresponding to the four 8-bit data cyclically [7] J. Cheng and J. Tian, Dyadic transferred via the AHB bus. In addition to the 11W Space," Pattern Recognition Letters 25, Elsevier, pp. 1273-1284, 2004. [8] Atmel Corp., 3x3 Convolver with Run-Time Reconfigurable Vector processing time of each task, the reconfiguration process Multiplier in Atmel AT6000 FPGAs, 1999, http://www.atme1com/ between two consecutive 11W contexts is carried out at 16 [9] L. Hong, A. Jam, S. Pankanti and R. Bolle, "Fingerprint Enhancement," MHz through the configuration controller, and the time spent in Proc. IEEE WACV, pp. 202-207, 1996.
Control, http:!!wwwalteracom!
Chip," pp. 122-127, March 2006. P. Schaumont and Verbauwhede, ~~~~~~~~~~[11] LNCS 3985,I. Springer Verlag,"Thumbpod puts Security under your Thumb," XilinxXcell Journal, October 2003 (Winter 2004).
[10] F. Fons, M. Fons, E. Cant6 and M. L6pez, "Trigonometric Computing Embedded in a Dynamically Reconfigurable CORDIC System-on-