O

212
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 2, FEBRUARY 2008
In SSTA, one also encounters variables of the form X = max(max(X1 ; X2 ); X3 ); X = max(max(max(X1 ; X2 ); X3 ); X4 ), and so on. The exact distributions of these variables can also be calculated. For example, if max(X1 ; X2 ) and X3 are assumed independent then the pdf and the cdf of X = max(max(X1 ; X2 ); X3 ) will be
FPGA Implementation(s) of a Scalable Encryption Algorithm

F. Mac, F.-X. Standaert, and J.-J. Quisquater
fX (x) F X (x )
= =
Fmax(X ;X ) (x)fX Fmax(X ;X ) (x)FX
(x) + fmax(X (x )
;X
) (x )F X
(x )
respectively. If max(X1 ; X2 ) and X3 are not independent then expressions similar to those in Sections IIIV could be obtained by assuming that (X1 ; X2 ; X3 ) follows the trivariate normal distribution.
REFERENCES
[1] H. Eriksson, P. Larsson-Edefors, and D. Eckerbert, Toward architecture-based test-vector generation for timing verication of fast parallel multipliers, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 4, pp. 370379, Apr. 2006. [2] Y. Abulaa and A. Kornfeld, Estimation of FMAX and ISB in microprocessors, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 10, pp. 12051209, Oct. 2005. [3] Y. Cao, X. D. Yang, and X. J. Huang et al., Switch-factor based loop RLC modeling for efcient timing analysis, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 9, pp. 10721078, Sep. 2005. [4] A. Valentian, O. Thomas, and A. Vladimirescu et al., Modeling subthreshold SOI logic for static timing analysis, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 6, pp. 662668, Jun. 2004. [5] B. Taskin and I. S. Kourtev, Linearization of the timing analysis and optimization of level-sensitive digital synchronous circuits, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 1, pp. 1227, Jan. 2004. [6] Y. Cao, X. J. Huang, and N. H. Chang et al., Effective on-chip inductance modeling for multiple signal lines and application to repeater insertion, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 10, no. 12, pp. 799805, Dec. 2002. [7] C. H. Oh and M. R. Mercer, Efcient logic-level timing analysis using constraint-guided critical path search, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 4, no. 9, pp. 346355, Sep. 1996. [8] A. P. Basu and J. K. Ghosh, Identiability of the multinormal and other distributions under competing risks model, J. Multivariate Anal., vol. 8, pp. 413429, 1978. [9] H. N. Nagaraja and N. R. Mohan, On the independence of system life distribution and cause of failure, Scandinavian Actuarial J., pp. 188198, 1982. [10] H. A. David, Order Statistics, 2nd ed. New York: Wiley, 1981. [11] Y. L. Tong, The Multivariate Normal Distribution. New York: Springer-Verlag, 1990. [12] R Development Core Team, Vienna, Austria, R: A language and environment for statistical computing, R foundation for statistical computing, ISBN 3-900051-07-0, 2005 [Online]. Available: http://www.R-project.org [13] R. Ihaka and R. Gentleman, R: A language for data analysis and graphics, J. Computational Graph. Stat., vol. 5, pp. 299314, 1996.
AbstractSEA is a scalable encryption algorithm targeted for small embedded applications. It was initially designed for software implementations in controllers, smart cards, or processors. In this letter, we investigate its performances in recent eld-programmable gate array (FPGA) devices. For this purpose, a loop architecture of the block cipher is presented. Beyond its low cost performances, a signicant advantage of the proposed architecture is its full exibility for any parameter of the scalable encryption algorithm, taking advantage of generic VHDL coding. The letter also carefully describes the implementation details allowing us to keep small area requirements. Finally, a comparative performance discussion of SEA with the Advanced Encryption Standard Rijndael and ICEBERG (a cipher purposed for efcient FPGA implementations) is proposed. It illustrates the interest of platform/context-oriented block cipher design and, as far as SEA is concerned, its low area requirements and reasonable efciency. Index TermsBlock ciphers, constrained applications, eld-programmable gate array (FPGA) implementations, modular design.
I. INTRODUCTION Scalable encryption algorithm (SEA) is a parametric block cipher for resource constrained systems (e.g., sensor networks, RFIDs) that has been introduced in [1]. It was initially designed as a low-cost encryption/authentication routine (i.e., with small code size and memory) targeted for processors with a limited instruction set (i.e., AND, OR, XOR gates, word rotation, and modular addition). Additionally and contrary to most recent block ciphers (e.g., the DES [2] and AES Rijndael [3], [4]), the algorithm takes the plaintext, key, and the bus sizes as parameters and, therefore, can be straightforwardly adapted to various implementation contexts and/or security requirements. Compared to older solutions for low-cost encryption like tiny encryption algorithm (TEA) [5] or Yuvals proposal [6], SEA also benets from a stronger security analysis, derived from recent advances in block cipher design/cryptanalysis. In practice, SEA has been proven to be an efcient solution for embedded software applications using microcontrollers, but its hardware performances have not yet been investigated. Consequently, and as a rst step towards hardware performance analysis, this letter explores the features of a low-cost eld-programmable gate array (FPGA) encryption/decryption core for SEA. In addition to the performance evaluation, we show that the algorithms scalability can be turned into a fully generic VHDL design, so that any text, key, and bus size can be straightforwardly reimplemented without any modication of the hardware description language, with standard synthesis and implementation tools. In the rest of this paper, we rst provide a brief description of the algorithm specications. Then, we describe the details of our generic loop architecture and its implementation results. Finally, we discuss some illustrative comparisons of the hardware performances of SEA, the AES Rijndael, and ICEBERG (a cipher purposed for efcient FPGA implementations) with respect to their design approach (e.g., exible versus platform/context-oriented).
Manuscript received November 21, 2006. The work of F. Mac was supported by the FRIA Grant, Belgium. The work of F.-X. Standaert was supported by the Belgian Fund for Scientic Research. The authors are with the UCL Crypto Group, Laboratoire de Microlectronique, Universit Catholique de Louvain, B-1348 Louvain-La-Neuve, Belgium (e-mail: francois.mace@uclouvain.be; fstandae@uclouvain.be; jean-jacques.quisquater@uclouvain.be). Digital Object Identier 10.1109/TVLSI.2007.904139
1063-8210/$25.00 2007 IEEE
213
Fig. 1. Encrypt/decrypt round and key round.
II. ALGORITHM DESCRIPTION A. Parameters and Denitions
C. Round and Key Round Based on the previous denitions, the encrypt round FE , decrypt round FD , and key round FK are pictured in Fig. 1 and dened as
SEAn;b operates on various text, key, and word sizes. It is based on a Feistel structure with a variable number of rounds, and is dened with respect to the following parameters: n plaintext size, key size; b processor (or word) size; nb = n=2b number of words per Feistel branch; nr number of block cipher rounds. As an only constraint, it is required that n is a multiple of 6b (see [1] for details). For example, using an 8-bit processor, we can derive a 96-bit block ciphers, denoted as SEA96;8 . Let x be a n=2-bit vector. We consider the following two representations. Bit representation: xb = x((n=2) 0 1) 1 1 1 x(2) x(1) x(0). Word representation: xW = xn 01 xn 02 1 1 1 x2 x1 x0 .
B. Basic Operations Due to its simplicity constraints, SEAn;b is based on a limited number of elementary operations (selected for their availability in any processing device) denoted as follows: 1) bitwise XOR 8; 2) addition mod2b + ; 3) a 3-bit substitution box S := f0; 5; 6; 7; 4; 3; 1; 2g that can be applied bitwise to any set of 3-bit words for efciency purposes. In addition, we use the following rotation operations: word rotation R, dened on nb -word vectors
R
+1 ; Ri+1 ] = FE (Li ; Ri ; Ki ) , Ri+1 = R(Li ) 8 r (S (Ri + Ki )) Li+1 = Ri [Li+1 ; Ri+1 ] = FD (Li ; Ri ; Ki ) , 01 (Li 8 r(S (Ri + Ki ))) Ri+1 = R Li+1 = Ri [K Li+1 ; K Ri+1 ]
Li K Ri K Li
= +1 = +1 =
FK K Li ; K Ri ; Ci K Li
8 (( (
),
R r S K Ri
+ Ci )))
K Ri :
D. Complete Cipher The cipher iterates an odd number nr of rounds. The following pseudo-C code encrypts a plaintext P under a key K and produces a ciphertext C . P , C , and K have a parametric bit size n. The operations within the cipher are performed considering parametric b-bit words.
= SEA (
L KL
n;b P; K
% initialization:
: ! = ( ), 0 02 +1 = 0 = 01
x y R x yi xi ; xn i nb y
% key scheduling: for i in 1 to bnr =2c
0 & R0 = P ; 0 &K R 0 = K ;
switch K Lbn =2c , K Rbn =2c ; for i in dnr =2e to nr 0 1 % encryption: for i in 1 to dnr =2e
K Li ; K Ri
]= ]=
FK K Li
01 ; K Ri01 ; C (i));
bit rotation r , dened on nb -word vectors

r
K Li ; K Ri
FK K Li
01 ; K Ri01 ; C (r 0 i));
; K Ri
: ! = ( ), 3 = 3 +1 = 3 +1 1 3 +2 = 3 +2
x y r x x i x i y i y i y i
x i
for i in dnr =2e + 1 to nr % nal:

C
Li ; Ri
]= ]=
FE L i
Li ; R i
FE L i
( 01 01 ( 01 01
; Ri ; Ri
01 );
; K Li
01 );
where 0 i (nb =3) 0 1 and and , respectively, represent the cyclic right and left shifts inside a word.
switch K Ln
Rn
&
Ln
01 , K Rn 01 ; g
214
Fig. 2. Loop implementation of SEA.
where & is the concatenation operator, KRbn =2c is taken before the switch and C (i) is a nb -word vector of which all the words have value 0 excepted the LSW that equals i. Decryption is exactly the same, using the decrypt round FD . III. IMPLEMENTATION OF A LOOP ARCHITECTURE A. Description The structure of our loop architecture for SEA is depicted in Fig. 2, with the round function on the left part and the key schedule on the right part. Resource-consuming blocks are the Sboxes and the mod2b adder; the Word Rotate and Bit Rotate blocks are implemented by swapping wires. According to the specications, the key schedule contains two multiplexors allowing to switch the right and left part of the round key at half the execution of the algorithm using the appropriate command signal Switch. The multiplexor controlled by HalfExec provides the round function with the right part of the round key for the rst half of the execution and transmits its left part instead after the switch. To support both encryption and decryption, we nally added two multiplexors controlled by the Encrypt signal. Supplementary area consumption will be caused by the two routing paths. The algorithm can easily benet of a modular implementation, taking as only mandatory parameters the size of the plaintexts and keys n and the word length b. The number of rounds nr is an optional input that can be automatically derived from n and b according to the guidelines given in [1]. From the datapath description of Fig. 2, a scalable design can then be straightforwardly obtained by using generic VHDL coding. A particular care only has to be devoted to an efcient use of the mod 2b adders in the key scheduling part.
In the round function, the mod 2b adders are realized by using nb b-bits adders working in parallel without carry propagation between them. However, in the key schedule, the signal Consti (provided by the control part) can only take a value between 0 and nr =2. Therefore, it may not be necessary to use nb adders. If log2 (nr =2) b, then a single adder is sufcient. If log2 (nr =2) > b, then dlog2 (nr =2)=be adders will be required. In Section III-B, we detail the implementation results of this architecture for different parameters. B. Implementation Results Implementation results were extracted after place and route with the ISE 7.1i tool from Xilinx on a xc4vlx25 VIRTEX-4 platform with speed grade 012. In order to illustrate the modularity of our architecture, we ran the design tool for different sets of parameters, with plaintext/key sizes n ranging from 48 to 144 bits and word lengths of 4, 6, 7, 8, and 12 bits. For the control part, we used the recommended number of rounds nr = [3(n=4)+2((n=2b)+(b=2))].1 The computed implementation costs stand for both the operative and control parts. A summary of these results is presented in Table I, where the area requirements (in slices), the work frequency, and the throughput are provided. We observe that the obtained values for the work frequency are very close for all the implementations. Indeed, the critical path (passing through the key scheduling multiplexors, a mod 2b adder, the Round Function Sbox, a XOR operator and the multiplexor selecting between encryption or decryption paths) is very similar for any of our selected values for n and b. For a given n value, it is noticeable that increasing b decreases the number of rounds nr and, therefore, improves the throughput (since work frequencies are close in all our examples).
1
+1 if this term is even.
215
TABLE I IMPLEMENTATION RESULTS FOR SEA WITH DIFFERENT
AND
PARAMETERS
TABLE II IMPLEMENTATION RESULTS OF OTHER BLOCK CIPHERS
Similarly, for our set of parameters, increasing b for a given n generally decreases the area requirements in slices. These observations lead to the empirical conclusion that as long as the b parameter is not a limiting factor for the work frequency, increasing the word size leads to the most efcient implementations for both area and throughput reasons. C. Comparisons With Other Block Ciphers For our comparative discussions, we reported a few implementation results of the AES Rijndael in Table II. We selected the implementations in [7][9] because their design choices t relatively well with those of the presented SEA architectures. Mainly, these cores do not take advantage of RAM blocks nor loop unrolling. The four rst cores all correspond to loop architectures with a 128-bit datapath. They respectively have no pipeline (Pipe0) or a three-stage pipeline (Pipe3) and use lookup table (LUT)-based or distributed RAM-based Sboxes. The fth referenced implementation [7] uses a 32-bit datapath and consequently reduces the area requirements at the cost of a smaller throughput. Finally, [8] uses a 128-bit datapath with a pipelined composite eld description of the Sbox. As a matter of fact, a lot of other FPGA implementations of the AES can be found in the open literature, e.g., taking advantage of different datapath sizes, FPGA RAM blocks, pipelining, unrolling techniques, . . ., e.g., [10][13]. Additionally, we compared these results with those obtained for ICEBERG, a block cipher optimized for recongurable hardware devices. Details on the ICEBERG architecture and different possible implementation tradeoffs are discussed in [14]. The reported result corresponds to a single-round loop architecture without pipeline. Compared to the AES Rijndael, ICEBERG is built upon a combination of 4-bit operations that perfectly t into the FPGAs LUTs which intently results in a very good ratio between throughput and area.
The implementation results in Table II lead to the following observations. First, in terms of area requirements (for a datapath size equal to the block size), SEA generally exhibits the smallest cost. Measuring the area efciency with the bit per slice metric leads to a similar conclusion. Of course, the area requirements of, e.g., the AES Rijndael could still be decreased by using smaller datapaths [15] and such a comparative table only serves as an indicator rather than a strict comparison. However, in the present case, these results clearly suggest the low-cost purpose of our presented implementations. By contrast, looking at the throughput per area metric indicates that these low area requirements come with weak throughputs. This is of course mainly due to the high number of rounds in SEA. With this respect, it is interesting to compare SEA and ICEBERG since their implementation results clearly illustrate their respective context/platformoriented design approach. Namely, SEA is purposed for low cost applications, while ICEBERG optimizes the throughput per slice. These numbers also conrm the differences between specialized algorithms and standard solutions. It must be underlined with this respect that the AES Rijndael still ranges relatively well in terms of hardware cost and throughput efciency, compared to the investigated specialized solutions. Note also that SEA was initially purposed for low cost software implementations. While these design criteria turned out to allow low cost hardware implementations as well, it is likely that targeting a cipher specically for low cost hardware would lead to even better solutions, e.g., [16]. Finally, it is also important to emphasize a number of advantages in SEA that cannot be found in other recent block ciphers, namely its simplicity, scalability (reimplementing SEA for a new block size does not require to rewrite code), good combination of encryption and decryption and ability to derive keys on the y both in encryption and decryption.
216
IV. CONCLUSION This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters. The presented parametric architecture allows keeping the exibility of the algorithm by taking advantage of generic VHDL coding. It executes one round per clock cycle, computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost. Compared to other recent block ciphers, SEA exhibits a very small area utilization that comes at the cost of a reduced throughput. Consequently, it can be considered as an interesting alternative for constrained environments. Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations.
REFERENCES
[1] F.-X. Standaert, G. Piret, N. Gershenfeld, and J.-J. Quisquater, Sea: A scalable encryption algorithm for small embedded applications, in Proc. CARDIS, 2006, pp. 222236. [2] Data Encryption Standard, FIPS PUB 46-3, Oct. 1999. [3] J. Daemen and V. Rijmen, The Design of Rijndael. New York: Springer-Verlag, 2001. [4] Advanced Encryption Standard, FIPS PUB 197, Nov. 2001. [5] D. Wheeler and R. Needham, Tea, a tiny encryption algorithm, in Proc. Fast Softw. Encryption (FSE), 1994, pp. 363366. [6] G. Yuval, Reinventing the travois: Encryption/MAC in 30 ROM bytes, in Proc. Fast Softw. Encryption (FSE), 1997, pp. 205209.
[7] N. Pramstaller and J. Wolkerstorfer, A universal and efcient AES co-processor for eld programmable logic arrays, in Proc. FPL, 2004, pp. 565574. [8] F.-X. Standaert, G. Rouvroy, J.-J. Quisquater, and J.-D. Legat, Efcient implementation of Rijndael encryption in recongurable hardware: Improvements and design tradeoffs, in Proc. Cryptograph. Hardw. Embed. Devices (CHES), 2003, pp. 334350. [9] J. Zambreno, D. Nguyen, and A. Choudhary, Exploring area/delay tradeoffs in an AES FPGA implementation, in Proc. FPL, 2004, pp. 575585. [10] K. Gaj and P. Chodowiec, Fast implementation and fair comparison of the nal candidates for advanced encryption standard using eld programmable gate arrays, in Proc. Topics Cryptol. (CT-RSA), 2001, pp. 8499. [11] G. P. Saggese, A. Mazzeo, N. Mazzocca, and A. G. M. Strollo, An FPGA-based performance analysis of the unrolling, tiling, and pipelining of the AES algorithm, in Proc. FPL, 2003, pp. 292302. [12] A. J. Elbirt, W. Yip, B. Chetwynd, and C. Paar, An FPGA implementation and performance evaluation of the AES block cipher candidate algorithm nalists, in Proc. AES Candidate Conf., 2000, pp. 1327. [13] K. Jarvinen, M. Tommiska, and J. Skytta, Comparative survey of high-performance cryptographic algorithm implementations on FPGAs, IEE Proc. Inf. Security, vol. 152, pp. 312, Oct. 2005. [14] F.-X. Standaert, G. Piret, G. Rouvroy, and J.-J. Quisquater, FPGA implementations of the ICEBERG block cipher, in Proc. ITCC, 2005, pp. 556561. [15] M. Feldhofer, J. Wolkerstorfer, and V. Rijmen, AES implementation on a grain of sand, IEE Proc. Inf. Security, vol. 152. IEE, pp. 1320, Oct. 2005. [16] D. Hong, J. Sung, S. Hong, J. Lim, S. Lee, B.-S. Koo, C. Lee, D. Chang, J. Lee, K. Jeong, J. Kim, and S. Chee, Hight: A New block cipher suitable for low-resource devices, in Proc. Cryptograph. Hardw. Embed. Devices (CHES), 2006, pp. 1320.

O

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

O

Загружено:

Авторское право:

Доступные форматы

212

FPGA Implementation(s) of a Scalable Encryption Algorithm

Fmax(X ;X ) (x)fX Fmax(X ;X ) (x)FX

1063-8210/$25.00 2007 IEEE

Fig. 1. Encrypt/decrypt round and key round.

II. ALGORITHM DESCRIPTION A. Parameters and Denitions

% key scheduling: for i in 1 to bnr =2c

bit rotation r , dened on nb -word vectors

for i in dnr =2e + 1 to nr % nal:

Fig. 2. Loop implementation of SEA.

+1 if this term is even.

TABLE I IMPLEMENTATION RESULTS FOR SEA WITH DIFFERENT

TABLE II IMPLEMENTATION RESULTS OF OTHER BLOCK CIPHERS

Вам также может понравиться