Академический Документы
Профессиональный Документы
Культура Документы
12. x0 ← X Z1 ,
1
T3 ← (X1 Z1 )2 , Z2 ← Z3
y0 ← x · (x + X
1
Z1 )·
1
2. X2 ← T1 T2 + xZ3 , X1 ← bZ14 + X14 , Z1 ← T3
{(x + X X2 2
Z1 )(x + Z2 ) + x + y} + y.
1
if (l 6= 0 and kl 6= kl−1 ) or (l = 0 and ki = 1) then
13. Return kP = (x0 , y0 ). Swap(X1 , X2 ), Swap(Z1 , Z2 )
end if
end for
s s
results and directly used as input values for the next itera- 2s − 1 = (2 2 − 1)(2 2 + 1), if s = even,
tion, where i ∈ 1, 2. Therefore we can reduce clock cycles s−1 s−1
for data fetch from register file. The two buffer registers are = 2(2 2 − 1)(2 2 + 1) + 1, if s = odd.
used to adjust input timing. In other words, the temporary (4)
results Z1 and Z2 are computed at the first step, however,
these results should be used at the same time as inputs of
The Itoh-Tsujii’s inversion scheme requires blog2 (m−1)c+
the next iteration with the temporary results X1 and X2 .
H(m − 1) − 1 multiplications in GF (2m ), where H de-
From Fig. 2, we can notice that (l − 1)·2(clock cycles for
notes the hamming weight of the binary expansion of given
multiplication) clock cycles are required for main loop of
integer. In case of m = 163, we get m − 1 = 162 =
the López-Dahab point multiplication.
(10100010)2 . Thus, the number of required multiplications
are 7 + 3 − 1 = 9. Therefore, we can compute the inverse
2.3 Arithmetic unit for coordinate conver-
of A in GF (2163 ) in the following order of the exponents.
sion
3 Elliptic curve cryptographic processor over multiplication over GF (2163 ), where we assumed word size
GF (2163 ) ω = 55, i.e., L = 3.
3.1 Datapath for elliptic curve point mul- 3.2 FPGA implementation and perfor-
tiplication over GF (2163 ) mance analysis
A new ECC processor for GF (2163 ), proposed in this
paper, is shown in Fig. 4. The ECC processor, shown in The ECC processor over GF (2163 ) in Fig. 4 was coded
Fig. 4, consists of eight main components. Eight compo- using VHDL and synthesized by Synopsys FPGA Com-
nents are host interface (HI), data memory, register file, in- piler II, in which Xilinx XC4VLX80 was used as the tar-
struction memory, control-1, control-2, AU-1, and AU-2. get device. The placement, route process, and timing anal-
The HI communicates with host microprocessor, i.e., host ysis of the synthesized designs were accomplished using
microprocessor transmits all parameters for kP to HI with Xilinx’s foundation software. We implemented the design
Start signal, and receives kP results and End signal from using Libtron’s SYS-Lab 5000 system-on-chip test board
HI. The data memory consists of 8×163-bit dual port block which includes Intel PXA272 microprocessor and Xilinx
RAM and the instruction memory contains 13 microcode XC4VLX80 FPGA device.
sequences of 11-bits word. For high performance imple- Performance comparisons with recently proposed archi-
mentation of point doubling & addition, we add 7 × 163-bit tectures are given in Table 2. From Table 2, it is noted
register file, which receives data form HI and transmits tem- that the proposed architecture is the fastest design includ-
porary computation results (X1 , X2 , Z1 , Z1 ) to data mem- ing ASIC implementation. As a detailed comparison, our
ory. The AU-1 is used for point doubling & addition and ECC processor is 4.8 times faster than the architecture in [1]
controlled by Control-1. The AU-2 performs the coordinate which is the best FPGA implementation to up to date to the
conversion in algorithm 1. The Control-2 receives opera- author’s knowledge. The proposed design, however, uses
tion code from instruction memory and generates control roughly 2 times more hardware resources than the Shu’s ar-
signals for AU-2, data memory, and HI. Table 1 gives num- chitecture, since one slice of Xilinx’s XC4VLX80 device
ber of required clock cycles to perform elliptic curve point has 2 LUTs.
Figure 4. Architecture of ECC processor over GF (2163 )
References