Вы находитесь на странице: 1из 6

43rd IEEE Conference on Decision and Control FrB09.

3
December 14-17, 2004
Atlantis, Paradise Island, Bahamas

Efficient Implementation of PID Control


Algorithm using FPGA Technology
Y.F. Chan M. Moallem W. Wang
ychan46@uwo.ca mmoallem@engga.uwo.ca wwang@eng.uwo.ca
Department of Electrical & Computer Engineering
University of Western Ontario
London, Ontario, N6A 5B9, Canada

Abstract— In this paper, an efficient design scheme for consumption. These are attractive features from the embed-
implementation of the Proportional-Integral-Derivative (PID) ded systems design point of view [4].
controller using Field Programmable Gate Array (FPGA) Conventional implementation of FPGA based controllers
technology is presented. The algorithm is implemented using
a Distributed Arithmetic (DA)-based scheme where a Look- have not focused on optimal use of hardware resources.
Up-Table (LUT) mechanism inside the FPGA is utilized. Two These designs usually require a large number of multipliers
novel DA-based PID controllers have been proposed for FPGA and adders and do not efficiently utilize the memory-rich
implementation. The implementation results show that, the two characteristics of FPGAs [5]. An FPGA chip consists of
DA methods require 13% and 4% of logic devices, respectively, a lot of memory blocks, referred to as Look-Up Tables
compared to the design using multipliers. Furthermore, the
power consumption is reduced by about 40%. A design which (LUT), which can be utilized to implement efficient designs.
is efficient in terms of power consumption and chip area In this work, we utilize the Distributed Arithmetic (DA)
while having adequate speed means that the FPGA chip can scheme [6], which is an efficient LUT design method, and
be used to accommodate more controllers with low power is very promising in the FPGA implementation of PID
consumption, resulting in a cost reduction of the controller controller. The organization of this paper is as follows. In
hardware.
Keyowrds: FPGA design, embedded controllers, distributed section 2, an improved PID controller is considered and
arithmetic, power optimization, PID controller. its implementation using the DA scheme is discussed. In
Section 3, the implementation results on a Xilinx FPGA
I. I NTRODUCTION chip are discussed. Comparisons are made between the
proposed scheme and the design based on conventional
Proportional Integral Derivative (PID) controller is the methods.
most common type of controller used in dynamic systems
[1]. An important feature of this controller is that it does II. PID C ONTROLLER I MPLEMENTATION
not need a precise analytical model of the system that is
being controlled. For this reason, PID controllers have been The application of a PID controller in a feedback control
widely used in process control, manufacturing, robotics, system is shown in Fig. 1., where uc is the command signal,
automation, transportation, and interestingly in real-time y is the feedback signal, e is the error signal, and u is
scheduling of concurrent tasks in multi-tasking applications the control input. The simplest form of the PID control
[1]. algorithm is given by
PID controllers are often combined with logic, sequential  t
functions, and other blocks to implement complicated sys- u(t) = Kp e + KI e(τ )dτ + KD ė (1)
tems. Implementation of PID controllers has gone through 0
several stages of evolution, from the early mechanical and From a practical point of view, implementation of the above
pneumatic designs to the microprocessor-based systems. algorithm has certain limitations [4]. Firstly, actuator satu-
Recently, Field Programmable Gate Arrays (FPGA) have ration can cause integrator wind-up, leading to a sluggish
become an alternative solution for the realization of digi- transient response. Secondly, the pure differentiation term
tal control systems, previously dominated by the general- amplifies noise, leading to a deterioration of the control
purpose microprocessor and application specific integrated command. Finally, the differentiation term acts on the error
circuits (ASIC) [1]-[3]. The FPGA-based controllers offer signal, taking the derivative of the command signal as
advantages such as high-speed computation, complex func- well. This procedure can lead to spikes in the command
tionality, real-time processing capabilities, and low power signal when, for example, a user changes the set-point
abruptly. Following [4], a modified PID control algorithm
This research was supported in part by grants RGPIN227612 and RG-
PIN261527 from the Natural Sciences and Engineering Research Council that overcomes the above problems is given in the Laplace
of Canada (NSERC). domain by
0-7803-8682-5/04/$20.00 ©2004 IEEE 4885
TABLE I
C ONTENTS OF THE LU TP .
1
U (s) = K(bUc (s) − Y (s) + (Uc (s) − Y (s))
sTi u(kT )[j] y(kT )[j] LU TP
sTd 0 0 0
− Y (s)) (2) 0 1 −K
1 + sTd /N 1 0 Kb
1 1 Kb − K
where K, b, Ti , Td , and N are controller parameters, and
U (s), Uc (s), and Y (s) denote the Laplace transforms of u, TABLE II
uc , and y, respectively. In order to implement the control C ONTENTS OF THE LU TI .
algorithm using digital technology, equation (2) has to be
discretized. Denoting the sampling period by T , and using I((k − 1)T )[j] u((k − 1)T )[j] y((k − 1)T )[j] LU TI
backward differences to discretize the derivative term and 0 0 0 0
0 0 1 − KT
T
forward differences for the integral term, one has KT
i
0 1 0 Ti
0 1 1 0
u(kT ) = P (kT ) + I(kT ) + D(kT ) (3) 1 0 0 1
1 0 1 1 − KT
T i
where k denotes the k-th sampling instant and 1 1 0 1 + KT
Ti
1 1 1 1
P (kT ) = K(bu(kT ) − y(kT ))
KT
I(kT ) = I((k − 1)T ) + (u((k − 1)T )
Ti
−y((k − 1)T ))

m−1
Td KTd N P (kT ) = (Kb × u(kT )[j] −
D(kT ) = D((k − 1)T ) −
Td + N T Td + N T j=0
(y(kT ) − y((k − 1)T )) . (4) K × y(kT )[j]) × 2j (5)
where y(kT ) is the output of the system at the current 
m−1
KT
instant, y((k − 1)T ) is the output of the system at the I(kT ) = (I((k − 1)T )[j] + (u((k −
Ti
previous instant, uc (kT ) is the desired output of the system, j=0

I((k − 1)T ) is the integral term at the previous instant, 1)T )[j] − y((k − 1)T )[j])) × 2j (6)
D((k − 1)T ) is the derivative term at the previous instant, 
m−1
Td
K, b, Ti , Td , N are controller parameters, and T is the D(kT ) = ( D((k − 1)T )[j] −
T d + NT
sampling period. j=0
Having obtained the discretized control algorithm, the KTd N
((y(kT )[j] − y((k −
focus is now on its efficient implementation. The direct Td + N T
implementation of the above algorithms using FPGA re- 1)T )[j]) × 2j )) (7)
quires a total of 5 multipliers, 5 adders/subtractors, and 4
delay blocks. The P (kT ) term requires 2 multipliers and
1 adder/subtractor, the I(kT ) term requires 1 multiplier The results of (Kb × u(kT )[j] − K × y(kT )[j]),
and 2 adders/subtractors, and the D(kT ) term requires (I((k − 1)T )[j] + KT Ti (u((k − 1)T )[j] − y((k − 1)T )[j]))
2 multipliers and 2 adders/subtractors. The u((k − 1)T ), and ( Td +N
Td
T D((k − 1)T )[j] − TKT dN
d +N T
(y(kT )[j] − y((k −
y((k − 1)T ), I((k − 1)T ) and D((k − 1)T ) terms each 1)T )[j])) can be precomputed and stored in three LUTs,
requires a delay block. Thus, a total of 4 delay blocks are namely, LU TP , LU TI and LU TD . The contents of the
required. three LUTs are shown in Tables I, II and III, respectively.
The multiplier-based design uses many multipliers and Using the three LUTs and the corresponding shift-add
adders. Since FPGA has a limited number of configurable accumulators (ACCs), the P (kT ), I(kT ), and D(kT ) terms
logic blocks for the above calculations, the multiplier-based can be obtained in m clock cycles. The main advantage
design is not efficient for FPGA. In order to reduce the of the DA expression given by (5), (6) and (7) lies in its
required multipliers and adders, we apply the DA method capability to compute the PID function utilizing the LUT-
[5] in order to replace the multiplication operation by simple rich FPGA.
shifting and addition operation. This is discussed as follows. Based on the above equations, the direct DA implementa-
tion of the PID controller, namely, DA-I, is shown in Figure
A. Direct DA Implementation (DA-I) 2. It consists of four delay blocks, three LUTs, three ACCs,
Let us consider the controller terms given in (4). Assum- and two adders. The delay blocks 1 and 2 are used to obtain
ing that u(kT ), u((k − 1)T ), y(kT ), and y((k − 1)T ) are u((k−1)T ) and y((k−1)T ), respectively. The delay blocks
m-bit numbers and [j] represents the jth bit of the numbers, 3 and 4 are used to generate the terms I((k − 1)T ) and
we have D((k − 1)T ), respectively. Three LUTs and three ACCs
4886
uC + e u
PID Controller Dynamic System
Set-Point
-
y

Sensor

Fig. 1. A PID-based feedback control system.

TABLE III TABLE IV


C ONTENTS OF THE LU TD C ONTENTS OF THE LU TI

D((k − y(k)T )[j] y((k−1)T )[j] LU TD u((k − 1)T )[j] y((k − 1)T )[j] LU TI
1)T )[j] 0 0 0
0 0 0 0 0 1 − KT
KTd N T i
0 0 1 Td +N T 1 0 KT
KTd N Ti
0 1 0 − T +N T 1 1 0
d
0 1 1 0
Td
1 0 0 Td +N T TABLE V
(KN +1)Td C ONTENTS OF THE LU TD .
1 0 1 Td +N T
(1−KN )Td
1 1 0 Td +N T
Td
y(k)T )[j] y((k − 1)T )[j] LU TD
1 1 1 Td +N T 0 0 0
KTd N
0 1 Td +N T
KTd N
1 0 − T +N T
d

are used to provide the terms P (kT ), I(kT ), and D(kT ).


The ACC consists of a shift register and an adder/subtractor.
Finally, two adders produce the sum of P (kT ), I(kT ), and For the D(kT ) term, we revise equation (7) as
D(kT ). The throughput (speed) of this PID implementation Td
is (m + 1) clock cycles, i.e., m clock cycles to generate the D(kT ) = D (kT ) (9)
Td + N T
result, and one more clock cycle to update the I((k − 1)T )

m−1
and D((k − 1)T ) terms. The latency is also (m + 1) clock where D  (kT ) = D((k − 1)T ) + KN ×
cycles. j=0

(−y(kT )[j] + y((k − 1)T )[j]) × 2j (10)


B. Improved DA Implementation (DA-II) Following a similar approach for I(kT ), D  (kT ) is
implemented by using a LUT and an ACC. The addition of
In order to improve the efficiency of the design, we apply D((k −1)T ) can be incorporated into the ACC as discussed
a pipeline scheme to utilize the direct DA implementation before. The contents of the LUT, LU TD , are shown in
as follows. Table V.
For the I(kT ) term, equation (6) can be revised as After calculating I(kT ) and D  (kT ) using m clock
cycles in the first pipeline stage, we then have in the second
stage as follows

m−1
KT
I(kT ) = I((k − 1)T ) + (u((k − 1)T )[j] − u(kT ) = P (kT ) + I(kT ) + D(kT )
Ti
j=0 Td
= P (kT ) + I(kT ) + D (kT ) (11)
y((k − 1)T )[j]) × 2j (8) Td + N T

m−1
= (Kb × u(kT )[j] − K × y(kT )[j]
The term I((k−1)T ) can be incorporated into the ACC as
j=0
follows. The ACC’s shift register is cleared to 0 in the above
Td
direct implementation after m clock cycles. By removing + I(kT )[j] + D (kT )[j]) × 2j(12)
Td + N T
the clear function in the shift register, the ACC can keep
the previous value I((k − 1)T ) to perform the addition. The results of (Kb×u(kT )[j]−K×y(kT )[j]+I(kT )[j]+

Td +N T D (kT )[j]) are computed a priori, and stored in
Td
The contents of the new LUT table, LU TI , are as shown
in Table IV. another LUT, i.e., LU TP ID as shown in Table VI. This
4887
U(kT)
P LUTP ACC
y(kT)

U(kT) D1

I
y(kT) D2 LUPI ACC P+I+D

D3

D y(kT) LUPD ACC

D4

Fig. 2. Architecture of the proposed DA-I PID controller

LUT and the corresponding ACC will generate the PID amongst the three designs.
output in m clock cycles, in the second pipeline stage. This
two-stage implementation of the DA-based PID controller, III. FPGA I MPLEMENTATION R ESULTS
namely, DA-II, is shown in Figure 3. It requires three LUTs,
three ACCs and two delay blocks , while DA-I requires two The proposed DA-based PID controller DA-II is im-
more adders and two more delay blocks. Thus, the hardware plemented using the Xilinx Inc. FPGA technology and
resources required by DA-II are less than those of DA-I. can be used as a general purpose controller for different
DA-II needs two stages to accomplish one PID calcula- applicaitons. The FPGA design flow is as follows. First,
tion in a pipeline. The first stage consists of two LUTs and the controller was implemented by using the Xilinx ISE
two ACCs to calculate I(kT ) and D  (kT ), respectively. foundation tools [9] and simulated at the Register Transfer
The second stage consists of one LUT and one ACC to Level (RTL) to verify the correctness of the design. By us-
calculate the summation of the PID function using the ing the Xilinx ISE Foundation tools, the logic synthesis was
results of I(kT ) and D  (kT ) available in the first stage. carried out to optimize the design, and the placement and
These stages are pipelined so that when the second stage is routing were carried out automatically to generate the FPGA
performing the first calculation, the first stage is performing implementation file. Finally, the generated implementation
the next calculation. Thus, the throughput (speed) is only file was downloaded to the FPGA development board for
m clock cycles. The two stages, each requiring m clock testing.
cycles, introduce a latency of 2m clock cycles. For the PID case, all the parameters were taken from [8]
The performance, in terms of complexity and speed, of and the input is a 13-bit number. All the numbers in the
the proposed designs, and the multiplier-based design are PID expressions can be either negative or positive and are
listed in Table 7. Compared to the multiplier-based PID con- represented by fixed-point 2’s complement numbers.
troller, the two DA-based designs, DA-I and DA-II, utilize This PID controller is targeted to a Xilinx Spartan-II-E
the memory rich characteristics of the FPGA. The proposed FPGA xc2s200e-FT256 -6. The simulation and testing are
design (DA-II) requires less adders/subtractors and less conducted on the FPGA board to validate the functions of
delay blocks as compared to the direct DA implementation, the PID controller as shown in Fig. 4.
i.e., DA-I. The speed (throughput) of DA-II design is a little The implementation results of the DA-II PID controller
bit higher than that of DA-I, but the latency is more. Since are shown in Table 8. Using the proposed method, the DA-II
the latency only occurs once during the power up, it is not PID controller uses 13 slices, 13 slice flip-flops and 3 block
of much concern in our control system consideration. Thus, RAMs. For the purpose of comparison, the corresponding
the DA-II design has improved characteristics compared to DA-I and multiplier-based designs are also implemented for
DA-I and is the most preferred design in the control system the 13-bit input case. The implementation results of the two
4888
u(kT) D
I LUTI ACC
y(kT) D

u(kT)
P LUTPID ACC P+I+D
y(kT)

D y(kT)
LUTD' ACC

Fig. 3. Architecture of the proposed DA-II PID controller

TABLE VI
C ONTENTS OF THE LU TP ID .

u(kT )[j] y(kT )[j] I(kT )[j] D  (kT )[j] LU TP ID


0 0 0 0 0
Td
0 0 0 1 Td +N T
0 0 1 0 1
Td
0 0 1 1 1+ Td +N T
0 1 0 0 −K
Td
0 1 0 1 Td +N T
−K
0 1 1 0 1−K
Td
0 1 1 1 1 − K + T +N T
d
1 0 0 0 Kb
Td
1 0 0 1 Kb + T +N T
d
1 0 1 0 Kb + 1
Td
1 0 1 1 Kb + 1 + T +N T
d
1 1 0 0 Kb − K
Td
1 1 0 1 Kb − K + T +N T
d
1 1 1 0 Kb − K + 1
Td
1 1 1 1 Kb − K + 1 + T +N Td

TABLE VII
C OMPLEXITY AND S PEED OF PID C ONTROLLERS

PID controller Complexity Throughput latency


3 LUTs,3 ACCs,
DA II 2 delay blocks m clock cycles 2m clock cycles

3 LUTs, 3 ACCs,
DA I 4 delay blocks,2 adders m + 1 clock cycles m + 1 clock cycles

5 multipliers
Multiplier-based I 4 delay blocks,5 adders 1 clock cycle 1 clock cycle

designs are also shown in Table 7. The clock frequency of clock are fast enough, the tradeoff of speed to hardware
all three designs is 50 M Hz. resource and power saving is useful for the PID controller
It is seen from Table 7 that the DA-I design uses about design.
13% of the logic resources required by the design using The DA-II offers more improvement over DA-I. In par-
multipliers. On the other hand, the power consumption of ticular, the DA-II design uses only about 4% of the logic
DA-I is reduced by 40%. Due to the serial nature of the resources required by the design using multipliers. The
DA method, the DA-I PID controller needs 14 clock cycles power consumption of DA-II is reduced by 38%. The DA-II
while the design using multipliers needs 1 clock cycle. Since PID controller needs 13 clock cycles to generate one result.
in the control system considered, the 14 cycles of 50 MHz The latency of DA-II is more than DA-I since DA-II needs
4889
TABLE VIII
FPGA I MPLEMENTATION R ESULTS OF PID C ONTROLLERS (C LOCK FREQUENCY = 50 M Hz)

PID controller Complexity Throughput latency Power


13 Slices,
DA II 13 Slice Flip-Flops, 13 cycles 26 cycles 8.97 mW
3 Block RAMs

42 Slices,
DA I 52 Slice Flip-Flops, 14 cycles 14 cycles 9.29 mW
3 Block RAMs

321 Slices,
multiplier-based 104 Slice Flip-Flops 1 cycle 1 cycle 15 mW

two stages of pipeline. The latency only occurs once during [5] W. Wolf, Computers as Components: Principles of Embedded Com-
the power up. Afterwards, the outputs of the PID algorithm puting System Design, San Francisco, Morgan Kaufman, 2001.
[6] A. White, “Application of Distributed Arithmetic to Digital Signal
are generated every 13 clock cycles in the pipeline. Thus, Processing: A Tutorial Review,” IEEE Accoustic, Speech, and Signal
the throughput of DA-II is 13 clock cycles which is faster Processing Magzine, vol. 6, pp. 4-19, 1989.
than that of the DA-I scheme. [7] C. Lu, J.A. Stankovic, G. Tao, and S.H. Son, “Design and Evaluation
of a Feedback Control EDF Scheduling Algorithm”, 20th IEEE Real-
A design which is efficient in terms of power consump- Time Systems Symposium (RTSS 1999), Phoenix, AZ, December 1999
tion and chip area means that the FPGA chip can be used [8] M. Moallem, ”A Laboratory Testbed for Embedded Computer Con-
to accommodate more controllers with adequate speed and trol,” in press, IEEE Trans. on Education.
[9] Xilinx ISE 6 Software Manuals, Xilinx Inc. , California,
low power consumption, resulting in a cost reduction of USA,(www.xilinx.com), 2003.
the controller hardware. For the implementation example
considered here, one Xilinx FPGA chip, namely, Spartan-
II-E FPGA xc2s200e-FT256 -6, can accommodate only
7 multiplier-based designs. However, the same chip can
implement about 56 DA-I designs or 180 DA-II designs.
IV. C ONCLUSION
In this paper, two novel DA-based PID controllers have
been proposed for FPGA implementation. By using the DA-
based LUT scheme, the memory inside FPGA has been
utilized to provide efficient design for PID controllers.
The FPGA implementation results show that, the two DA
designs requires only 13% and 4% of logic devices, re-
spectively, compared to the design using multipliers. Fur-
thermore, the power consumption is reduced by about 40%.
Future work will involve the implementation and integration
of a DA-based PID controller into a complete control
system consisting of analog and digital I/O.
V. ACKNOWLEDGMENT
The authors would like to thank Rumi Zhang for his great
help in editing the paper and preparing the figures.
R EFERENCES
[1] K.J. Astrom and B. Wittenmark, Computer Controlled Systems, Pren-
tice Hall, New Jersery, USA, 1997
[2] R. Chen, L. Chen and L. Chen, “System Design Consideration for
Digital Wheelchair Controller,” IEEE Trans. On Industrial Electronics,
vol. 47, pp. 898-907, Aug. 2000.
[3] L. Samet, N. Masmoudi, M. W. Kharrat, and L. Kamoun, “A digital
PID Controller for Real-time and Multi-loop Control: A Comparative
Study,” IEEE Int. Conf. on Electronics, Circuits and Systems, vol. 1,
pp. 291-296, Sept. 1998.
[4] B. Wittenmark, K. J. Astrom, and K-E., Arzen, Computer Control: An
Overview, Technical Report, Department of Automatic Control, Lund
Institute of Technology, Sweden (www.control.lth.se/ kursdr/ifac.pdf),
April 2003. Fig. 4. Setup of the FPGA-based PID controller

4890