Вы находитесь на странице: 1из 4

A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION STANDARD (AES) ALGORITHM USING SYSTEMVERILOG Bahram Hakhamaneshi Computer Engineering Program

California State University, Sacramento Sacramento, CA 95819 USA hakhamab@ecs.csus.edu Behnam S. Arad Computer Science Department California State University, Sacramento Sacramento, CA 95819 USA arad@ecs.csus.edu

Abstract In this paper, a hardware implementation of the AES128 encryption algorithm is proposed. A unique feature of the proposed pipelined design is that the round keys, which are consumed during different iterations of encryption, are generated in parallel with the encryption process. This lowers the delay associated with each round of encryption and reduces the overall encryption delay of a plaintext block. This leads to an increase in the message encryption throughput. The proposed pipelined design was modeled and validated in SystemVerilog hardware description language. The pipelined design was synthesized using the Synopsys Design Compiler tool and LSI_10K technology library. The synthesized gate-level netlist can operate at 40MHz frequency. To get an estimate of the speed gain by the hardware implementation, a virtual system was created using the Virtutech Simics [7] software to emulate the execution of a C program that implements the AES128 encryption in software. The utilized Simics virtual system is based on Intels x86 architecture with the 440BX chipset and has a 2GHz Pentium4 processor. The results indicate that the hardware implementation proposed in this paper is at least 60 times faster than the software implementation. Keywords: AES, SystemVerilog, Encryption, Hardware Implementation, Simics, Synthesis 1 INTRODUCTION

information. The Advanced Encryption Standard (AES) is a computer security standard issued by the National Institute of Standards and Technology (NIST) intended for protecting electronic data. The NIST has published the specifications of the AES encryption standard in the Federal Information Processing Standards (FIPS) Publication 197. [1] The Advanced Encryption Standard can be implemented in either software or hardware. There are several drawbacks in software implementation of any encryption algorithm, including lack of CPU instructions operating on very large operands, word size mismatch on different operating systems and less parallelism in software. In addition, software implementation does not fulfill the required speed for time critical encryption applications. Hardware acceleration is the use of hardware to replace the software implementation of a task. Thus, hardware implementation of encryption algorithms is an important alternative as it achieves much faster speed through higher levels of parallelism. The AES is a subset of a much larger encryption algorithm known as Rijndael, which was one of many proposals to the NIST competing for becoming an encryption standard. On October of 2000, the NIST announced the Rijndael algorithm as the winner due to the best overall score in security, performance, efficiency, implementation capability and simplicity. [2] The AES algorithm is a symmetric cipher which uses a single secret key for both encryption and decryption. In addition, the AES is a block cipher as it operates on fixedlength groups of bits (blocks), and the set of transformations applied to each block is the same. The AES cipher operates on blocks of 128 bits and uses cipher keys with 128, 192 or 256 bits in length. Although the original Rijndael encryption algorithm was capable of processing different blocks sizes as well as using several other cipher key lengths, but the NIST did not adopt these additional features in the AES. [1]

In todays digital world, encryption is emerging as a disintegrable part of all communication networks and information processing systems, for protecting both stored and in transit data. Encryption is the transformation of plaintext into unintelligible data known as ciphertext through an algorithm referred to as cipher. There are numerous encryption algorithms that are now commonly used in computation, but the U.S. government has adopted the Advanced Encryption Standard (AES) to be used by Federal departments and agencies for protecting sensitive

All the AES cipher transformations are performed on a two dimensional 4x4 array of bytes (called the State), which is populated with the plaintext at the beginning of the encryption process. Then the cipher performs a set of four transformations (SubBytes, ShiftRows, MixColumns, and AddRoundKey) on the State for certain number of rounds depending on the cipher key length. After the cipher operations are conducted on the State, the final value of the state represents the ciphertext. The AES cipher pseudo code is shown in Figure 1.
Cipher(PlainText, CipherKey, CipherText) begin byte State[4,4] State = PlainText AddRoundKey(State, RoundKey[0]) for round = 1 step 1 to Nr1 SubBytes(state) ShiftRows(state) MixColumns(state) AddRoundKey(state, RoundKey[round]) end for SubBytes(state) ShiftRows(state) AddRoundKey(state, Roundkey[Nr]) CipherText = state end

A unique feature of the proposed design is that the encryption rounds are pipelined with the round key generation. While the root module is performing an encryption iteration on the State using the round keys generated in the previous cycle, the AES128_Key_Expand module produces the next rounds set of keys to be used by the root module in the next encryption iteration. In other words, each AES encryption round n is pipelined with the key generation for round n+1. The round keys consist of four 32-bit words. The round key expansion requires a round constant value which is provided by the AES128_Rcon leaf module. The most important advantage of the proposed pipelined design is the lower delay for each encryption iteration, since the round keys for each encryption iteration is present at the beginning of the iteration cycle. The lower delay in each encryption iteration means faster completion of each round of encryption. This reduces the overall encryption delay and allows the design to operate at higher clock frequencies. The higher clock frequency will increase the message encryption rate (throughput) making this design suitable for time critical encryption applications.

Figure 1 AES Cipher The AES algorithm requires four 32-bit words of round keys for each encryption round. That is total of 4*(Nr + 1) round keys considering the initial set of keys required for the first AddRoundKey transformation. All the round keys are derived from the cipher key itself. There is no restriction on the cipher key selection, as no week cipher key has been identified for the AES algorithm. [1] In the rest of this paper our proposed pipelined design of the AES128 encryption algorithm is introduced. For more details on the implementation and validation of the design refer to [9]. 2 AES128 PIPELINED DESIGN

AES128_Cipher_Top plaintext 128 b cipherkey 128 b ld rst clk

ciphertext 128 b

done AES128_Rcon AES128_Key_Expand

Figure 2 Design Hierarchy The state diagram of the root module implementing the AES128 cipher is shown in Figure 3. After leaving the Reset state, the root module waits for assertion of the Ld signal, which indicates that a valid set of plaintext and cipher key is available on the input ports. There are ten rounds of transformations represented by r1 to r10 states. The r0 state corresponds to the initial AddRoundKey transformation shown in Figure 1. The four cipher transformations are applied to the State in each encryption round in a single clock cycle.

The proposed AES128 hardware model is a 3-level hierarchical design as shown in Figure 2. The root module in the hierarchy implements the AES pseudo code displayed in Figure 1. It has two 128-bit inputs for receiving the cipher key and the plaintext. There is also a single bit input signal, Ld, which is used to indicate the availability of a new set of plaintext or cipher key block on the input ports. The completion of the encryption process is indicated by asserting the done single bit output.

clk r2 clk r0 !Ld


States --------------R0 R9

clk clk

r1

r3

!rst rst

Ld

r4
Outputs -------------done=0 Ciphertext= z done=1 Ciphertext= <valid>

clk r5 clk

Reset

Load

clk r10 clk

R10

r6 r9 clk r8 r7 clk clk

Figure 3 AES128 Cipher State Diagram The round keys used by the root module are generated based on the state diagram shown in Figure 4. The AES128_Key_Expand and the AES128_Rcon modules are responsible for generating the round keys. The AES128_Key_Expand module generates four 32-bit round keys for each round of the encryption process by using the cipher key and the round constant value provided by the leaf module.
clk Ld r2 clk clk

language. [3] The design was thoroughly validated by means of a test infrastructure, which utilized several unique SystemVerilog features including Interface and Program. [4] The test infrastructure utilized Interface to enforce synchronization and communication protocol between the design and the testbench. The SystemVerilog Program was used as part of the testbench to construct and provide test objects to the design, while eliminating any potential race condition between them. The testbench included Functional Coverage to measure the verification progress of the design features to make sure the design is fully validated. Functional Coverage is done by means of a single Cover Group to keep track of the 128-bit plain_text and cipher_key stimuli. The design verification continues until Functional Coverage reaches 100%. [9] The Synopsys VCS and DVE tools were used for validating the design. The test infrastructure components and the interconnects are shown in Figure 5.

AES128_Top Clock Generator AES128_Cipher_Top

r1 !Ld

r3

AES128_Program

AES128_Rcon AES128_Interface AES128_Key_Expand

rst

r0

r4

clk

!rst

Reset

clk r6 r9 clk r8 r7 clk clk

r5 clk

r10 clk

Figure 5 AES128 Test Infrastructure 3 IMPLEMENTATION RESULTS

Figure 4 AES128 Round Key Generation The two state diagrams show that each encryption round is pipelined with the round key generation for the next cipher round. This lowers the delay for each encryption iteration, since the round keys for each cipher round is present at the beginning of the round. The lower delay in each encryption iteration means faster completion of each cipher round. This reduces the overall delay for encrypting a plaintext block, and allows the design to operate at higher clock frequencies. The higher clock frequency will increase the message encryption throughput. The proposed pipelined design was modeled and validated in SystemVerilog hardware description

In order to get an estimate of the speed gain by the hardware implementation of the AES128 algorithm, the proposed design was synthesized to an optimized gatelevel netlist and the operation frequency of the netlist is measured. Then, the AES128 was implemented in C language, and ported on a virtual system to measure the speed up gained by the hardware model compared to software implementation. 3.1 Design Synthesis Results

The Synopsys Design Compiler tool [6] was used to synthesize the pipelined design to an optimized gate-level netlist using the LSI_10K technology library. The worst combinational path delay in the gate-level netlist is 24.09ns. Thus, the gate level netlist is capable of operating at a maximum of 40MHz frequency. In other

words, the proposed model can encrypt a block of plaintext in 250ns after ten clock cycles. 3.2 AES128 Software Implementation Results

The C program implementing the AES128 encryption algorithm in software was run on a virtual system, and the statistics of the virtual system were gathered before and after encrypting a block of plaintext. The number of CPU cycles that were required on the virtual system to encrypt a block of plaintext was used to compare the efficiency of software and hardware implementations. The virtual system was created using the Virtutech Simics software. The Simics virtual system utilized in this project is based on Intels x86 architecture with the 440BX chipset and has a 2GHz Pentium4 processor. [8] The virtual systems statistics were gathered during the execution of the C program, before and after encrypting a block of plaintext. The section of the C program for which the virtual system statistics is gathered, involves copying the plaintext block to the state, generating the round keys from the cipher key and performing ten rounds of encryption on the state. [9] The statistics gathered from the virtual system showed that it would take more than 30,000 CPU cycles to encrypt a block of plaintext, assuming one clock per instruction. This is shown in Figure 6. The results indicate that the hardware implementation proposed in this project is at least 60 times faster than the software implementation. # of CPU Instructions Callback 1 Callback 2 Difference 11591439980005 11591440011742 31737 CPU Cycles 11591439980005 11591440011742 31737

synthesize the proposed hardware model using the LSI_10K technology library. The optimized gate-level netlist could operate at 40MHz. In order to get and estimate of speed gain by hardware implementation of the AES128 algorithm, a virtual target system was created using the Simics software to emulate the execution of a C program implementing the AES128 cipher in software. The results indicate that the hardware implementation proposed in this project is at least 60 times faster than the software implementation. Acknowledgements This project utilized EDA tools set donated to CSUS by Synopsys, Inc. through its University Program. The final design was simulated on servers donated to CSUS by Intel Corporation based on grant number 56413. 5 REFERENCES

[1] National Institute of Standards and Technology, Federal Information Processing Standards Publication 197, 2001 [2] Joan Daemen and Vincent Rijmen, "The Design of Rijndael: AES - The Advanced Encryption Standard." Springer, 2002. ISBN 3-540-42580-2 [3] Samir Palnitkar, Verilog HDL, A Guide to Digital Design and Synthesis, Prentice Hall, 2003 [4] Chris Spear, SystemVerilog for Verification, Springer, 2008 [5] Stuart Sutherland, SystemVerilog for Design, Springer, 2006 [6] Synopsys Inc, Synthesis Quick Reference, 2002 [7] http://www.virtutech.com/whatissimics.html [8] Virtuetech Inc, Getting Started with Simics, 2009 [9] Bahram Hakhamaneshi, A Hardware Implementation of the Advanced Encryption Standard (AES) algorithm using SystemVerilog, M.S. Project Report, Fall 2009

Figure 6 Software Implementation Statistics 4 CONCLUSION

In this paper, a pipelined design implementing the AES128 encryption standard was proposed. The proposed pipelined design reduces the delay associated with each round of encryption compared to a non-pipelined design. This increases the message encryption throughput. The proposed design was modeled and completely verified using the SystemVerilog hardware description language. The Synopsys Design Compiler tool was used to

Вам также может понравиться