Вы находитесь на странице: 1из 6

Implementation of ECC on Reconfigurable FPGA

Using Hard Processor System


Hasbi Asshidiq Arif Sasongko Yusuf Kurniawan
School of Electrical Engineering and School of Electrical Engineering and School of Electrical Engineering and
Informatics Informatics Informatics
Institut Teknologi Bandung Institut Teknologi Bandung Institut Teknologi Bandung
Bandung, Indonesia Bandung, Indonesia Bandung, Indonesia
hasbi_asshidiq@students.itb.ac.id asasongko@stei.itb.ac.id yusufk@stei.itb.ac.id

Abstract-This study is proposed to design and inverse and multiplication [4]. Both operations are high time
implement Elliptic Curve Cryptography (ECC) on consuming operation so that by using projective model, it
reconfigurable Field Programmable Gate Array (FPGA). can reduce time cost.
The implemented ECC based on Galois Field Polynomial
(GF(2m)). This type of ECC has the advantage of Some aspects we need to consider in this paper are the
compatibility to be implemented in hardware. The speed of computing, ease of development, and flexibility.
choosen size of ECCs are on the size of 163, 233, 283, and FPGA board as hardware has a function to accelerate
409. The ECC program will be implemented on FPGA computation performed from the PC. In addition, the use of
with reconfigurable interface on Hard Processor System
projective coordinate is also intended to accelerate
(HPS). Both input and output are controlled via PC.
ECC development work is done by using High-Level computation. Aspects of ease of development are concerned
Synthesis (HLS) tool. HLS tool is capable of generating with using Autonomous and User Guided High-level
Hardware Description Language (HDL) files from C synthesis (AUGH) as one of HLS tool developed by TIMA
language. The program for configuring FPGA is needed Laboratory, Universite Grenoble Alpes [5]. AUGH is a tool
and implemented on HPS. This program is able to capable of generating VHDL files from C files with some
reconfigure the logic gate of FPGA based on the given
constraint input such as resource and frequency. Moncef
instructions by uploading the raw binary file (rbf) to the
FPGA chip. The program is accessed directly from PC Amara and Amar Siad carried out ECC implementation on
via an ethernet cable in a local netwok. Implementation FPGA using VHDL programming by hand in GF(2163) and
is carried out in FPGA DE10 Standard and using Linux GF(2233). This implementation of programming VHDL by
Console BSP as operating system for ARM in HPS. hand surely take longer time tha programming using HLS
tool. HLS tool can generate VHDL code by given C file as
Keywords – ECC, FPGA, HPS, Reconfigurable.
input. Development process using C language is faster than
I. INTRODUCTION using Hardware Description Language (HDL) language such
as VHDL. In addition, the use of high-level models can
Cryptography is a mechanism used to protect data
make it easier for designers with background systems rather
confidentially. Cryptography can be implemented in
than circuits background to be productive following the
software or hardware. Cryptography on hardware has the
trends that provide an increasingly large number of
advantage of being faster in processing. However this has
integrated systems.
trade off that is more expensive price and the difficulty of its
development. One of the implementation of hardware The reason for using HPS to reconfigure FPGA is the
cryptography is on microcontroller based board and aspect of flexibility in resource usage. FPGA board
programmable logic device like FPGA. FPGA has flexibility connected to local network via ethernet cable. Then this
that tends to be high compared to other hardware in program board can be accessed from any local computer and act as
implementation so it is widely used in development process. shared accelerator resource. Malik Imran et al. carried out an
implementation of ECC in FPGA [11]. In their work, the
Elliptic curve cryptography (ECC) is a type of public key
design is for configuring one type of ECC through an usb
cryptography [1] discovered by Neal Koblitz [2] and Victor
cable. The use of a USB cable cannot support FPGA
Miller [3]. Elliptic curves are computed over three fields [1].
implementation as shared peripheral.
Over real numbers, which are less accurate and infinite, over
prime Galois Field (GF(p)) which is best suitable for There are several types (key size) ECC algorithm
software implementations, over binary field GF(2m) which implemented in single board. Reconfigurable program is
is best suitable for hardware implementations [1]. In this needed to overcome this problem because different FPGA
study we calculate the ECC scalar multiplication using logic is needed for execute ECC with different size.
Lopez - Dahab projective coordinate to minimize the use of Reconfiguring the FPGA chip could be carried out by

978-1-5386-6670-8/18/$31.00 ©2018 IEEE


running a program in HPS that included in board. Inside the ( )= + +1
HPS there are an ARM embedded along with booted Linux
( )= + + + +1
that could be the environment for developing program.
Reconfiguring the FPGA chip is carried out in Linux ( )= + +1
Console BSP by executing the device tree overlays with dtbo ECC consists of four operation layers [11] as follows :
file to uploading the Raw Binary File (rbf) to FPGA chip.
• Layer 4 : Protocols
II. LITERATURE REVIEWS • Layer 3 : Scalar multiplication.
A. Elliptic Curve Cryptography over GF(2m) • Layer 2 : Point addition and point doubling.
• Layer 1 : Finite field addition, multiplication,
Elliptic Curve Cryptography (ECC) is a public key
squaring and inversion.
cryptography method based on the elliptic curve algebra
structure in the finite field [1]. The security of elliptic curve B. AUGH
cryptography depends on the ability to calculate point AUGH is software that generates generic VHDL
multiplication and the inability to calculate multiplicand with descriptions of input programs written in C language. This
known original point (starting point / initiation) calculation software is free and open-source with AGPLv3 License.
[1]. The ECC public key is calculated by knowing the Figure 2.1 shows the AUGH generation flow [9].
private key and origin point as in the equation below [1].

k = Q.P (1)

k = Public Key, Q = Private Key, P = Original point


Elliptical curve size determines the difficulty of the
problem. The key benefits promised by ECC are smaller key
sizes, reducing storage and transmission requirements. The
elliptic curve can provide the same level of security provided
by RSA-based systems with large modulus. For example,
public key 256 bit elliptic curves can provide security that is
comparable to a 3072-bit RSA public key [13].

One of the standard parameters for implementing ECC is


Standard for Efficient Cryptography (SEC) [12]. Based on
this standard, an elliptic curve in the finite field GF(2m) has
the following parameters :

T = (m, f(x), a, b, G, n, h) (2) Figure 2.1 AUGH Generation Flow

These seven terms is consisting of an integer m that Based on the picture above it can be seen that AUGH
determines the finite field, an irreducible polynomial f (x) receives input in the form of C files that will be translated
with the degree of m that determines the representation base and a constraint such as specific target resources area and
of the finite field, two elements a, b ∈ GF(2m) determines the frequency. This constraint is set on the board plugin in augh.
elliptic curve E(F2m) in the equation as shown below [12].
The first generation step is C code parsing. The
E : y2 + x.y = x3 + a.x2 + b (3) allocation step consists of making the main hardware
component circuit (adder, register, memory bank, etc.) and
G is the base point, a prime number n which is an order
determine minimum components needed. The scheduling
of G, and an integer h which is a cofactor. ECC use
step consists of packing as many instructions as possible in
parameter m in GF(2m) with size in prime number and
design control stage (state FSM). This step will optimize the
irreducible polynomial as shown in the table 2.1 [12].
circuit in terms of time execution.
m € (163, 233, 283, 409) (4)
The mapping step consists of assigning hardware
Table 2.1 Recommended Irreducible Polynomials Over GF(2m) components to each instruction / operation of the circuit that
must be carried out. This step also creates all needed
Field Reduction Polynomial(s) multiplexers and the FSM. After that the size of the design
( )= + + + +1 can evaluated and presented to users. Frequency constraints
are handled at this stage. If the control step has a delay
longer than the period clock, the corresponding FSM state is calculate the corresponding public key. If the execution on
set to last more than one clock cycle. The resource the FPGA chip has been completed then the results will be
constraints are handled by the Design Space Exploration placed into memory which is one package with ARM on
process (DSE). Pipelined components are not yet handled. HPS. The main program will read the contents of this
memory after a certain time and then the program forwards
III. DESIGN AND IMPLEMENTATION the results via Ethernet to the PC to display the obtained
A. Overall Design public key.

FPGA DE10 Standard is choosen because this type are


include HPS packed inside the board. Inside the HPS there
are an ARM embedded along with booted Linux that could
be the environment for developing main reconfiguration
program. The overall functionality of the design is as shown
in the Figure 3.1. The component interaction diagram
corresponding to the function described in Figure 3.1 is
shown in the Figure 3.2.

Based on Figure 3.2 ethernet is used to communicate


between PC with FPGA in a local network. Please note that
the FPGA board must obtain an IP to be able to
communicate with the local network. The execution of
program to configure FPGA is done on the Linux Console
BSP within the ARM Kernel.

Figure 3.2 Integrated Component Diagram

B. ECC Implementation
The ECC program was designed using C language which
was translated into VHDL using AUGH as the HLS tool.
Each data representation is divided by using 32-bit word
length. As explained earlier that ECC implementation is
based on GF (2m). The ECC in this finite field has seven
parameters: T = (m, f(x), a, b, G, n, h). The parameter used
provisions under the Standard for Efficient Cryptography
(SEC) [12]. Four type parameter that implemented in this
work is sect163k1, sect233k1, sect283k1, and sect409k1.

Explanation of the above terms can be known by taking


an example on sect163k1. Subterm “sec” indicate that this
parameter use the SEC standard, “t” indicate polynomial
ECC GF(2m), “163” for the ECC size m, “k” indicate the use
of Koblitz Curve, and “1” is the number of type issued by
Figure 3.1 Overall Functionality of Design SEC. The design use Koblitz Curve in consideration of
efficient implementation. On the Koblitz curve, a and b
When the main program is begining to run, the program parameter is always constant a, b ∈ {0,1}. The irreducible
uploads the raw binary file (rbf) file to configure the polynomial parameters f(x) are in accordance with those
contents of the FPGA chip. The binary file is containing the described in section II.B. Another parameters such as G, n, h
configuration of FPGA for calculating ECC scalar use the related provisions under the Standard for Efficient
multiplication. Along with uploading binary file, we need to Cryptography (SEC) [12]. The algorihm for constructing an
access device tree overlays from ARM because FPGA chip ECC will be explained in the section III.B.1 to III.B.3. We
is recognized as peripheral by ARM. After binary file is implement ECC up to the third layer. The third layer is
uploaded, then the program read the input file to be known as scalar multiplication. The main purpose of scalar
allocated in memory. Input in the form of binary file multiplication is public key calculation from the private key
represent as private key. Furthermore, the contents of this based on the equation k = Q.P.
file are forwarded by the main program to the FPGA chip to
B.I. Layer 3 : Scalar Multiplication • Invers

Scalar multiplication that implemented in this work uses The invers operation uses the Extended Euclidean
the projective coordinates found by Lopez and Dahab [4]. Algorithm [6] as in the following description :
The purpose of using the algorithm is to minimize the cost
required for inverse and multiplication operations. The scalar Input :a€ ,a 0
multiplication algorithm is as follows : Output : a-1 mod f(x)
1. b ⃪ 1, c ⃪ 0, u ⃪ a, v ⃪ f.
Input : An integer k ≥ 0 and a point P = (x, y) € E. 2. While deg(u) 0 do
Output : Q = kP. J ⃪ deg(u) – deg(v).
1. if k = 0 or x = 0 then output(0, 0) and stop. If j 0 then: u ⃪ v, b ⃪ c, j ⃪ -j.
U ⃪ u + xjv, b ⃪ b + xjc.
2. Set k ⃪ (kl-1 . . . k1k0)2.
3. Return b
3. Set X1 ⃪ x, Z1 ⃪ 1, X2 ⃪ x4 + b, Z2 ⃪ x2.
4. for i from l - 2 downto 0 do
• Multiplication
if ki = 1 then
Madd(X1, Z1, X2, Z2), Mdouble(X2, Z2). The multiplication operation uses the left to right comb
else method as we have seen in the previous explanation that the
Madd(X2, Z2, X1, Z1), Mdouble(X1, Z1). length of w (word) used is 32 bits [8].
5. return(Q = Mxy(X1, Z1, X2, Z2)).
Input : a = (As-1 ... A0), b = (Bs-1 ... B0), and f = (Fs-1 ... F0).
Madd and Mdouble is point addition and point doubling
Output : c = (Cs-1 ... C0) = ab mod f
that lied on layer 2. While Mxy is a transformation from
projective coordinate form to affine coordinate form. Set Ti ⃪ 0, i = 0, ... , 2s-1
for j from w-1 downto 0 do
B.II. Layer 2
for i from 0 to s-1 do
The operation on this layer is not written in detail
if the jth bit of Ai is 1 then
because it is too long. To facilitate information, the data
presented here is only on the number of finite field for k from 0 to s-1 do
operations needed for each Point Doubling (Mdouble), Point Set Tk+1 ⃪ Tk+1 ⊕ Bk
Addition (Madd), and Retransform (Mxy) operation [4].
If j 0 then T ⃪ xT // shift T //
Table 3.1 Number of finite field operations for each sub scalar Set c ⃪ T mod f
multiplication operations.
return (c)
Operation Number of finite field operation
Add Sqr Mult Inv • Reduction
Madd 1 3 2 0
Mdouble 2 1 4 0 Reduction is required whenever field multiplication or
Mxy 6 1 10 1 squaring operations are performed. Different sized ECCs
have different reduction operations. This work uses the NIST
B.III. Layer 1
fast reduction in accodance of implemented f(x) as shwon
This layer include finite field operations consisting of table 2.1 [1].
addition, inversion, multiplication, and squaring.
IV. PERFORMANCE
• Addition
The implementation carried out by using FPGA Cyclone
The addition operation is performed by using XOR for V with frequency of 50 MHz. The FPGA reconfiguration
each corresponding bit per bit [1]. time is shown in the Table 4.1.

• Squaring Table 4.1 FPGA chip reconfiguration time

Squaring can be carried out using Look Up Table [7]. Binary file size Configuration
The squaring operation is performed by inserting 0 for each (kb) Time (µs)
ECC 163 2303 179
data input as follows :
ECC 233 2384 179
a(z) = am-1zm-1 + ... + a2z2 + a1z1 + a0 ECC 283 2440 189
ECC 409 2748 199
a(z) = am-1z2m-2 + ... + a2z4 + a1z2 + a0
The algorithm is implemented based on FIFO sequences Based on table 4.4 the execution time on hardware
so that scalar multiplication operations can be carried out implementation by hand is far more faster than this proposed
continuously up to the number of FIFO context. For each work, but this design give less amount of logic gate.
execution, the program will reconfigure the FPGA chip first. Table 4.5 Comparison to software implementation
Each program execution can accommodate several pairs of
public keys or one pair of public keys. Therefore the effect ECC ECC ECC ECC
of reconfiguration time will be smaller if the input provided 163 233 283 409
Intel 1 GHz
is more because giving multiple inputs or giving only one 3.6 6.4 9.7 19.8
[16]
input will carried out a single reconfiguration first. Cyclone V
23,59 50,92 75,44 185,74
50 MHz
The following table show the execution time data for
each finite field operation used in the ECC. Data retrieval is Comparison with the implementation of the software is
obtained from simulation on Modelsim with clock 50 MHz. still less fast, although the number of clocks needed is less.
In this proposed work has the advantage in the use of the
Table 4.2 The execution time for each finite field operation power. Intel 1 GHz takes power of 30 W, while in this
in ECC design it takes power of 1.92 W (using the PowerPlay Power
ECC size Execution Time (µs) Analyzer Tool). Based on these data, it can be seen that to
Add Sqr Mult Inv run a scalar multiplication algorithm, less power is needed
163 0,44 3,63 54,14 436,46 compared to the implementation of the software.
233 0,54 4,88 82,3 682,4
283 0,6 5,38 96,78 578,6 Another advantage of using HLS rather than designing
409 0,94 7,76 164,68 469,88 ECC with RTL by hand is in terms of ease of development.
Based on the table 4.2 it can be infered that the most This can be seen from the number of line of code programs.
time-consuming finite field operation is inverse operation Table 4.6 The comparison of the line of codes in C program
followed by multiplication operation. The use of projective using AUGH and RTL program that created by hand
coordinate is intended to reduce the use of this function.
Number of line codes Number of line codes
Table 4.3 Scalar multiplication execution time and logic gate (C Program using AUGH) ( RTL by hand) [17]
number in ECC ±650 ±2340
ECC Execution Time (ms) Number of logic
size gates
V. CONCLUSION
163 23,59 14226 Multi-algorithm implementation requires a
233 50,92 21255 reconfiguration program to support flexibility in altering the
283 75,44 24245 content of FPGA chip. This program is executed in HPS that
409 185,74 34714 included in DE10 Standard board. The board is connected to
Based on the table 4.3 it is infered that the larger the PC via local network by using ethernet connection. In this
ECC size used, the execution time will also increase design we implemented multi algorithm Elliptic Curve
accompanied by the number of logic gates. The number of Cryptography in four different sizes. The four algorithms use
logic gates here is obtained based on the optimization carried projective coordinate to reduce the time cost of field
out by AUGH with constraints in the form of the number of multiplication and inversion operations. In addition, the
resources on the board. development process used AUGH as HLS tool to create
programs in the C language so that it can increase the ease of
The comparison of hardware implementation by hand
development but the trade off is longer execution time.
and software implementation are shown on the table 4.4 and
table 4.5.
REFERENCES
Table 4.4 Comparison to hardware implementation using
RTL programming by hand (without HLS tool) [1] Alfredd Menezes, Scott Vanstone Darrel Hankerson, Guide to
Elliptic Curve Cryptography. New York: Springer, 2004.
Chang Shu, et al Proposed Work
[2] Neal Koblitz, "Elliptic Curve Cryptosystems," Mathematics of
[15] Computation, vol. 48, no. 177, pp. 203-209, June 1987.
Field Size 163 233 163 233
ALUT 25763 35800 14186 23143 [3] Victor S Miller, "Use of Elliptic Curves in Cryptography," Advanes
in Cryptology, vol. 85, pp. 417-426, 1986.
F(MHz) 68,9 67,9 50 50
Execution 48 89 23590 50920 [4] Lopez, J., Dahab, R. : Fast Multiplication on Elliptic Curves over GF
time(µs) (2m) without Precomputation.
[5] Boucle, A.P., Muller, O., Rousseau, F. : Fast and standalone Design Implementation of Scalar Multiplication in Elliptic Curve
Space Exploration for High-Level Synthesis under resource Cryptography (ECC) Over GF(2^163) on FPGA.
constraints. Journal of Systems Architecture 60 (2014) 79–93. TIMA [12] Brown, Daniel R.L., (2010) : Standard for Efficient Cryptography,
Laboratory – CNRS/Grenoble-INP/UJF. SEC 2: Recommended Elliptic Curve Domain Parameters.
[6] Hankerson, D., Hernandez, J.,L., Menezes, A. : Software [13] Maletsky, K. (2015) : RSA vs ECC Comparison for Embedded
Implementations of Elliptic Curve Cryptography over Binary Fields. Systems.
[14] Amara, M., Siad, A. (2012) : Hardware Implementation of Elliptic
[7] Aranha, D., F., Dahab, R., Lopez, J., Oliveira, L., B. (2010) :
Curve Point Multiplication over GF(2m) for ECC protocols.
Efficient Implementation of Elliptic Curve Cryptography in Wireless International Journal for Information Security Research (IJISR).
Sensors, Advances in Mathematics of Communications. University of Paris 8 LAGA laboratory.
[8] Lopez, J., Dahab, R. : High-Speed Software Multiplication in F2m. [15] Shu, C., Gaj, K., Ghawazi, T.E. (2005) : Low Latency Elliptic Curve
Cryptography Accelerators for NIST Curves over Binary Fields
[9] Boucle, A.P. (2017) : AUGH User Guide.
[16] Weimerskirch, A., Stebila, D., Shantz, S.,C. (2003) : Generic
[10] Stallings W. (2014) : Cryptography and Network Security, Principles GF(2m) Arithmetic in Software and Its Application to ECC.
and Practice, Fourth Edition.
[17] Kadir, S.A. : Modification of Elliptic Curve Crytography on FPGA
[11] Imran, M., Rashid, M., Kashif, M.K. : Hardware Design and Implementation for Side Channel Attack Countermeasure.

Вам также может понравиться