Вы находитесь на странице: 1из 8

1

Short-Term Prediction of Wind Farm Output


Using the Recurrent Quadratic Volterra Model
Duehee Lee, Student Member, IEEE,

Abstract—This paper presents a way to use the recurrent curve of the manufacturer and the wind turbine control change,
quadratic Volterra system to forecast the wind power output. The so the forecasting models must also change. Furthermore,
recurrent quadratic Volterra system is a second-order polynomial since the RNN uses a non-polynomial activation function, it is
equation that uses the output data as feedback recursively. The
Volterra system is extracted from the weights of the Recurrent hard to measure the nonlinearity of wind power output. Third,
Neural Network. During this process, three innovative techniques it is hard to check the stability of the RNN. Since the RNN
are used. In order to make Volterra kernels from the combination uses the output as feedback, the asymptotic stability should be
of weights, the activation function is approximated to the high- checked, because an unstable RNN cannot forecast the multi-
order polynomial function by using the Lagrangian interpolation. step ahead forecasts. Finally, the RNN can be easily over-fitted
Furthermore, the memory of the Volterra system is also identified
using the Partial Autocorrelation Function. After building the to the data.
Volterra system, the 15 and 30-minutes ahead of wind power out- These limits could be mitigated by using the Volterra
put is forecasted with confidence intervals at the 95% confidence system that has previously been used to describe the nonlinear
level. The confidence interval is calculated using the multi-linear systems [8]. Volterra kernels can express the nonlinearity and
regression techniques. The stability of the recurrent Volterra model changes as the wind speed varies. We can also find the
system is also considered by the heuristic method.
dominant Volterra kernels using the genomic algorithm in [9],
Index Terms—Volterra system, Recurrent Neural Network, so we can further reduce the model parameters. Furthermore,
Wind Power System, Short-Term Forecasting it is easier to check the stability of the Volterra system than
the RNN since the Volterra kernels can be represented as the
I. I NTRODUCTION linear state space model. The second method of Lyapunov can
be used to check the stability of the Volterra system [10]. With

W IND power is so intermittent that utilities have to


forecast the wind power output and adjust long-term or
short-term generating plans to match the electricity load. For
regard to the last limit, the Volterra system is less over-fitted
to the data than the RNN since it has fewer parameters than
the RNN does.
short-term plans, the spinning reserve capacity has to be ready The goal of this paper is to forecast wind power output
to compensate for an insufficient amount of electricity by using the recurrent quadratic Volterra system. In order to
controlling the fast-start generators. For the long-term plans, accomplish this goal, we implement many innovative methods.
the unit commitment determines the on-off of slow-start gen- First, we extract Volterra kernels from the weights of the
erators. Therefore, the efficiency of these plans depends on the RNN and build the recurrent Volterra system [11], [12]. When
accuracy of the wind power forecasts [1]. Accurate forecasts extracting Volterra kernels, the Taylor series has been usually
could reduce the amount of generation reserve by turning off used to approximate the activation function of the RNN to
unnecessary conventional generators [2]. In this paper, we deal a polynomial equation [11], [12]. However, the Taylor series
with the short-term wind power output forecasting. might fail to generate a proper polynomial function unless
The Recurrent Neural Network (RNN) is the representative the sum of weights is close to zero. In this paper, therefore,
short-term prediction technique for the nonlinear wind power we use the Lagrangian Interpolation (LI) to approximate
output [3]. The RNN has been widely used to forecast the wind the activation function. The memory of the Volterra system
speed, wind power output, and electric loads. Kariniotakis [4] is identified as well. Second, the confidence intervals of
developed the recurrent high order neural networks for short- the Volterra system is calculated by using the multi-linear
term wind power prediction. Shuhui [5] used the RNN in a regression technique. Finally, the stability of the recurrent
different way by using the expanded Kalman filter for stable Volterra model is considered. Since there is no general way
training of the RNN. Moreover, Barbounis [2], [6] forecasted to ascertain the stability of the nonlinear system, the heuristic
the wind speed based on the spatial information measured from method is used to check the stability.
three stations using the RNN. In addition, the Wiktor [7] used This paper is organized as follows. Section II classifies the
RNN to forecast the very short-term electricity load. wind power forecasting techniques and provides an overview
The RNN is a good tool for forecasting wind power output, of the Volterra system. In Section III, we pre-precess the
but it has three limits. First, the RNN does not have specific sampled data. The structure of the Volterra system is estab-
parameters to measure the model changes with respect to lished, and the memory is identified in Section IV. Section V
changes in wind speeds. As the wind speed changes, the power shows methods to extract Volterra kernels. In Section VI, the
wind power outputs are forecasted with the CIs at the 95%
D. Lee is with the Department of Electrical and Computer Engineering,
the University of Texas at Austin, 2501 Speedway, Austin, TX, 78712, USA confidence level, and the stability of the Volterra system is
(e-mail: Keanu@mail.utexas.edu) tested. Finally, the conclusion is discussed in Section VII.
978-1-4577-1002-5/11/$26.00 ©2011 IEEE
2

II. OVERVIEW of M is defined as:


In this section, the wind power forecasting models are 
M

introduced and classified with respect to various standards. x(n) = h0 + h1 (k1 )x(n − k1 )
k1 =1
In addition, the overview of the recurrent quadratic Volterra
system is described including the number of Volterra kernels. 
M 
M
+ h2 (k1 , k2 )x(n − k1 )x(n − k2 )
k1 =0 k2 =1
A. Wind Power Forecasting Classifications 
M 
M
+ ··· hp (k1 , . . . , kp )x(n − k1 ) · · · x(n − kp )
Wind forecasting models can be classified into the re- k1 =1 kp =1
gression and recurrent models according to the structure of
forecasting models. The recurrent model receives output as + e(t) n = M + 1, M + 2, · · ·
feedback. The RNN and time-series models belong to this (1)
approach. Forecasting models in this category are usually used where h0 is the constant, which is zero in this case, and
for the short-term forecasting. where hi (k1 , . . . , ki ) is the set of Pth order Volterra kernel
The regression model uses a function between the ex- coefficients. In order to reduce the computational complexity,
ogenous input and the output data. This model receives the the kernels should be assumed as
geographical and meteorological information, such as wind
speed and temperature, as input. Even though the regression hp (k1 , . . . , kP ) = 0 if k1 > · · · > kP (2)
model can describes the relationship between the exogenous
Since the kernels, hi , can be assumed to be a symmetric func-
input and the wind power output, it cannot fully describe the
tion with respect to all permutations of the indices ki , . . . , kP ,
dynamic characteristic of a wind power system. Furthermore,
one kernel per each permutation is enough to describe the
for the regression model, in order to forecast the wind power
Volterra system. The other kernels becomes zero. The input
output, the input data should be forecasted before anything
signal is Xn−1 = {x(n − 1), x(n − 2), . . . , x(n − M )}, and
else. The input data is usually predicted by a Numerical
the output signal is the x(n). All signals and kernels are real
Weather Prediction (NWP) model [13]. The time series model,
numbers. Since it is difficult to classify errors in the truncated
NN, and Kalman filter fall into this category, but the most
Volterra model into natural system errors and higher order term
powerful regression model is the power curve of wind turbine
errors, we assumed that all errors originated from mismatches
manufacturers [5].
with the natural system. In other words, higher order terms
In addition, wind forecasting models can be classified into
are absorbed into the Volterra system errors.
the stochastic model and the deterministic model with respect
Although the Volterra system has been considered a power-
to the assumptions about data [14]. In the stochastic model,
ful model for analyzing nonlinear systems, the large number
the data is assumed to follow the stochastic process based
of kernels and their identification have been problematic. The
on the probability distribution. The predicted values usually
number of Volterra kernels of each order can be calculated by
stay around the mean value. Since wind power output is
using the Combination with a Repetition. The equation is as
not stationary in the mean, these stochastic models are not
follows
good at long-term prediction of wind power output. Therefore,
these stochastic models are also used for short-term prediction. (M + P )!
(3)
Time-series models such as the autoregressive (AR) and the P !(M )!
moving average (MA) models belong to this category. The number of Volterra kernels increases exponentially as the
In the non-stochastic model, the data is decided by deter- order and memory increases. For example, the number of third
ministic protocols. The Least Mean Square (LMS), Recursive order Volterra kernels with a memory of 20 is 1,771. This is
Least Square (RLS), Volterra system and NN are in this cat- a huge and impractical amount of data, so in our work, the
egory. Many literature reviews of various forecasting models order is limited to the second order, and the memory is limited
can be found in [15]. to 20.

B. Overview of the Recurrent Quadratic Volterra System III. P RE -P ROCESSING


In this section, the discrete-time recurrent Volterra system In this section, we introduce data acquisition, the process
is introduced. The Volterra system can analyze information of identification, and pre-processing. We use the wind power
regarding system nonlinearities, such as order and memory, output that were sampled on May 21, 2007 15:00 for 1,000
and it captures the output power patterns that can be used minutes at the Brazos wind farm in west Texas for 1,000
for short-term forecasting. The recurrent Volterra system is minutes. The sampling period is one minute. The wind power
the Volterra system with feedback; the output of the system output is shown in Fig. 1. The data from the first 800 minutes
is used as an input again after passing a time-delay filter. is used to train the NN and validate the result. Validating data
For the simplicity, it is assumed that the system is causal is used to stop the training early before the NN is over-fitted to
and homogeneous. The truncated recurrent Volterra system noise and gusts of wind. The data from the first 800 minutes
V (P, M ) with the finite order of P and the finite memory is randomly divided into the training data and the validating
3

Wind power output in Brazos wind farm Sample Partial Autocorrelation Function (PACF)
60
58
0.8
56 Partial correlation coefficients

54 0.6
52
MW

0.4
Confidence bound (95%)
50 Test Data
48 0.2

46
Training Data (80%) & 0
44 Validation Data (20%)
42 −0.2
0 100 200 300 400 500 600 700 800 900 1000 0 5 10 15 20 25 30 35 40
Minute Memory

Fig. 1. Wind power output was sampled from the Brazos wind farm in Texas. Fig. 3. Sample partial autocorrelation function of wind power output is
Data was sampled on 5/21/07 at 15:00 for 1,000 minutes. described with the confidence intervals at the 95% confidence level. The
estimated memory is five, but it is discovered that the memory of six performs
better.
data. Random division can prevent the NN from being over-
fitted to the early data. Eighty percent of the data from the
The RNN is one of the dynamic neural networks which
first 800 minutes is used to train the NN, and 20% is used to
use an output as an input after passing the time-delay. The
validate the NN. The data from the last 200 minutes is used
RNN in this paper has a three layer with one input layer, one
to test the model and forecast future output power. Then, the
hidden layer, and one output layer is used. The input layer
identification process is as follows.
receives as many input neurons as the number of memory, and
Step 1. Preprocess the wind power output data input neurons are in the tap-delayed form as shown in [11].
One hidden layer has an arbitrary number of neurons which
Step 2. Establish the NN and identify the memory.
is heuristically decided by the complexity of the data or the
Step 3. Adjust and approximate the activation function number of Volterra kernels. Since the RNN cannot always find
the optimum solution, we run the program many times per each
Step 4. Train the NN and extract the Volterra system
data with different initial values. Hidden layers receive ”net
Step 5. Forecast the wind power output input” which is the sum of the input values that are multiplied
by their corresponding weights [16]. Hidden neurons use the
Step 6. Find the confidence intervals and check the stability
tangent hyperbolic function tanh as the activation function.
Data is preprocessed in order to increase the convergence The output layer has only one neuron and uses a linear
speed and to extract the Volterra kernels accurately. Data activation function. Fig. 2 shows the overview of the RNN
preprocess consists of mean subtraction and downscaling, structure. In 2, the Z − 1 means the one-time delay, and the
and these two processes are applied to input and target data number of hidden neurons is abbreviated.
individually. The mean subtraction generates unbiased data
and helps to extract unbiased Volterra kernels. All processes
in this paper handle the mean-subtracted data. Then, the input A. Memory Identification
and target data are downscaled individually so that they have The memory is identified by using the non-parametric model
the same magnitude. In order to make input and target data identification, which is used for the AR model estimation
have a similar magnitude, we do not normalize, but downscale based on two facts. First, the number of nonlinear terms in
the data. Downscaling data divides the data by a maximum the Volterra model depends on the memory of linear terms
absolute value and does not subtract the mean. In contrast, since the nonlinear terms are all possible combinations of the
normalizing data puts data into [−1, 1] and subtracts the linear terms. That is, in order to have nonlinear terms of higher
minimum. Normalization requires to subtract the minimum memory, the model should have linear terms of higher memory
from input data, so the NN will be trained by minimum- as well. Second, the nonlinear terms are the extra terms of
subtracted data. As a result, the biased Volterra kernels will the linear model, and most models can be approximated by
be extracted. Therefore, the wind power output is downscaled. linear models, so their involvement affects the overall model
structure less. Furthermore, since it is very hard to determine
IV. M ODEL I DENTIFICATION the sequence of the nonlinear terms, unlike orderly arranged
In this section, we identify the structure of the RNN and the linear terms, the memory is decided on the number of linear
memory. Identifying the RNN includes to build the activation terms. Therefore, we focus on analyzing the AR model of the
function and approximate to the polynomial function using the given data in order to narrow down the pool of possible model
LI. structures to the few that are most likely to be a good fit.
4

Z −1 Z −1
x(n −1) x(n −1)

Z −1 x(n − 2)
h1 (0)
x(n − 2)

Z −1 x(n − M ) h1 (M )

x 2 (n )
x ( n)
x(n)x(n 1) h2 (0,1) x ( n)
x( n − M +1)

Z −1 h2 (M , M )
x(n − M ) x( n − M +1)
x(n − M )
x 2 (n −M )
Recurrent Volterra Kernels
Extraction Recurrent
Neural Network Volterra System
Process

Fig. 2. The recurrent neural network can be converted to the Volterra system through the Volterra kernels extraction process

If the data sampled at past generate the data sampled at time wave within the confidence bounds after five terms. Around
t, the past data and present data should be correlated. This this initial guess, the pool of candidate Volterra models is built.
correlation is generally detected through the autocorrelation Candidate models are V (2, 3), V (2, 4), V (2, 5), V (2, 6), and
function (ACF) and PACF [17]. In this paper, the PACF is V (2, 7). After comparing them in a further section, the best
used to decide the memory of the Volterra system as an initial suitable model is selected.
guess. The PACF of the AR model is close to zero after the
memory term [18]. The PACF is defined in [18] as B. Activation Function
 
 1 ρ1 ρ2 · · · ρk−2 ρ1 
 The activation function in the hidden layer is the tangent
 ρ1 1 ρ1 · · · ρk−3 ρ2 
 hyperbolic function tanh. As shown in Fig. 4, the tanh
 .. .. .. .. .. 
 . . . . .  consists of the activation range and saturation range. The tanh

 ρk−1 ρk−2 ρk−3 · · · ρ1 ρk  in the activation range is close to a linear function, and the
ϕkk =   (4) tanh in the saturation range is close to a severely nonlinear
 1 ρ1 ρ2 · · · ρk−2 ρk−1 
 function.
 ρ1 1 ρ1 · · · ρk−3 ρk2 
 Nonlinear signals generate the large net inputs. If net inputs
 .. .. .. .. .. 
 . . . . .  are larger than the activation range, the activation function

 ρk−1 ρk−2 ρk−3 · · · ρ1 1  drives the input to the saturated number, and the neurons in
the hidden layer are saturated. A saturated RNN cannot learn
where ρk is the ACF. The ρk between Xt and Xt−k is defined the input signals well. Besides, it is hard to approximate the
in [18] as activation function in the saturation range, so it is also hard to
Cov(Xt , Xt−k ) extract Volterra kernels from a saturated RNN. Therefore, the
ρk =   (5) net inputs should be located in the activation range. In order
Var(Xt ) Var(Xt−k )
to avoid the saturation ranges of the activation function, the
where the Cov(Xt , Xt−k ) is the cross-covariance between Xt linear range should be wide enough to receive large net inputs.
and Xt−k . It is defined as The linear range can be expanded by transforming the
activation function by constraints a and b as
γk = Cov(Xt , Xt−k ) = E[(Xt − μ)(Xt−k − μ)] (6)
ϕ(x) = a tanh(bx) (7)
where μ is the mean of Xt .
The PACF of wind power output is shown in Fig. 3. The constraints a and b rely on the linear range which depends
According to Fig. 3, the initial guess of memory is selected on the maximum net inputs. Since the range of net inputs is
as five since correlation coefficients dampen as the sinusoidal not known before training the NN, transforming the activation
5

Tangent sigmoid activation function extracted Volterra system is shown in Fig. 2. It is assumed that
5
Activation region all activation functions have the same Lagrangian polynomial
4
and that the number of hidden neurons is H. Furthermore,
3 the memory is assumed as M . The structure of the RNN is
2 compared to the structure of the extracted Volterra system
1 in Fig. 2. The coefficients of the 17th degree Lagrangian
Y axe

0
polynomial, a is defined as
−1 a = [a17 , a16 , . . . , a0 ]. (11)
−2 Then, the input of hth hidden neuron Sh is defined as
−3
a*tanh(b*x) Sh =wh,1 x(t − 1) + wh,2 x(t − 2) + wh,3 x(t − 3) + . . .
−4
Polynomial equation (17th orders) + wh,M−1 x(t − M + 1) + wh,M x(t − M ) + b1
−5 (12)
−15 −10 −5 0 5 10 15
X axe where wh,m is the weight of mth delayed input data in the
Fig. 4. The tangent sigmoid function is shown. It is approximated to poly-
hth neuron. Given a and Sh , the output of hth polynomial Ih
nomial equation by the Lagrangian Interpolation. Approximated polynomial is defined as
equation is also shown as a dotted line.
Ih = aT × S (13)
17 16 0
where S = [Sh , Sh , . . . , Sh ]. Ih consists of the output
function and training the NN should be performed recursively.
vector of the hidden neurons I, which is defined as I =
In this paper, ϕ(x) has ϕ(−2) = −1 and ϕ(2) = 1. The
[IH , IH−1 , . . . , I1 ]. Then, after passing the output layer, the
constraint a is determined as 1. Then, the constraint b is found
output of RNN x(n) at time stamp n is calculated as
by
  x(n) = WT × I(n) (14)
1 1 + max(Y)/a
b= log (8)
2 max(X) 1 − max(Y)/a where W is the weights of the output layer and is defined as
Fig. 4 shows the activation function a tanh(bx) and its ap- W = [WH , WH−1 , . . . , W1 ].
proximated polynomial function. In this case, the activation Since we want to build the quadratic Volterra system, we
function between 6.15 and −6.15 is approximated. The acti- should extract linear and quadratic terms from (13). For the
vation function and its approximated polynomial function vary dth degree and for d ≥ 2, the Sh d can be described as
with respect to the complexity of the data. Shd = [wh,1 x(t − 1) + wh,2 x(t − 2) + wh,3 x(t − 3) + . . .
We approximate the tanh to a polynomial equation by using
+ wh,M−1 x(t − M + 1) + wh,M x(t − M ) + b1 ]d
the LI. The LI finds a polynomial equation of the least degree
with sampled data. Since the least-degree is less than the ≈ wh,1 x(t − 1)bd−1
1 + wh,2 x(t − 2) bd−1
1 + ...
number of sampled data, we confine the number of samples + wh,M−1 x(t − M + 1)bd−11 + wh,M x(t − M ) bd−1
1
into practical degree to confine the degree of the polynomial.  
d
The data were equally sampled within the linear range of + [wh,1 x(t − 1)]2 bd−2
1
2
the activation function. In this paper, the number of samples  
d
is 18, and the degree of the polynomial become 17th. The + [wh,2 x(t − 2)]2 bd−2
1 + ...
approximated polynomial function L(x) is defined as 2
 
d

O + [wh,M x(t − M )]2 bd−2
1
L(x) = f (xi ) li (x) (9) 2
 
i=1 d
+2 [wh,1 x(t − 1)wh,2 x(t − 2)] bd−2
1
where the f (x) is the target function, which is the tanh in 2
 
this paper, and the O is the number of samples. The li (x) is d
+2 [wh,1 x(t − 1)wh,3 x(t − 3)] bd−2
1 + ...
the Lagrange basis polynomials, and it is defined as 2
 
 x − xj d
li (x) = +2 [wh,M−1 x(t − M + 1)wh,M x(t − M )] bd−2 1
xi − xj (10) 2
0jk
j=i + H.O.T.
It should be noted that the j should be different from the i (15)
and that the same data should not be sampled. The 17th-degree If we substitute S in (13) to S which consists of the extracted
polynomial equation is shown in Fig. 4 as a dotted line. linear and second degree terms, the coefficients of inputs will
propagate from (13) to (14 and become the Volterra kernels.
V. E XTRACTION The data and Volterra kernels should be post-processed
In this section, we extract the Volterra kernels from the La- because the Volterra kernels are kernels of the downscaled
grangian polynomial and post-process them. The structure of data. Post-processing is done to convert the downscaled data
6

Linear Volterra Kernels with Memory of 6 Quadratic Volterra Kernels with Memory of 6
1.2
0.02
1
0.03 0.01
0.8 0.02
0.01
0.6 0
0
−0.01
0.4 −0.02 −0.01
−0.03
0.2 −0.04 −0.02
−0.05
0 6
5 6 −0.03
4 6
5
−0.2 3 4
1 2 3 4 5 6 7 2 3 −0.04
1 2
Memory of the Second term 1 Memory of the First term
Memory 0
(a) (b)

Fig. 5. The Volterra kernels are extracted from the RNN: (a) the linear Volterra kernels (b) the quadratic Volterra kernels

to the mean-subtracted data and the kernels of downscaled data −1 Neural Network Training Performance
10
to kernels of the mean-subtracted data. The Volterra system of
downscaled data is shown below:
Mean Square Error (MSE)


M Performance of validation data
x (n) = h0  + h1  (k1 )x(n − k1 ) Performance of training data
k1 =1
−2

M 
M

(16) 10
 
+ h2 (k1 , k2 )x(n − k1 ) x(n − k2 )
k1 =1 k2 =1
+ e(t) n = M + 1, M + 2, · · ·
Number of training
where x (n) is the downscaled data and defined as Goal

x(t) −3
10
x(t) = (17) 0 10 20 30 40 50 60 70 80
Fx
Number of training
where the Fx is the downscaling factor and x (n) is the mean-
subtracted data. If x(t) in (17) substitutes for x(t) in (16), Fig. 6. The performance graph of the RNN is shown. The training algorithm
(16) becomes is the Conjugate Gradient Algorithm. The goal is 0.005, and the number of
training is 67.
x(n) 
M
h1 (k1 )
= h0  + x(n − k1 )
Fx Fx
k1 =1
are symmetric each other. The variance of the kernels with

M 
M
h2 (k1 , k2 )  (18) respect to the delays is not shown. Quadratic kernels might be
+ 2 x(n − k1 ) x(n − k2 ) distributed equally on the first and second delay field.
k1 =1 k2 =1
F x

+ e(t) n = M, M + 1, · · · VI. S IMULATION


Therefore, the Volterra kernels of the mean-subtracted data are In this section, among the Volterra candidate models V(2,3),
h0  = h0  Fx V(2,4), V(2,5), V(2,6), and V(2,7), the V(2,6) is selected
h1  = h1  as the best suitable model for the given wind power data.
(19) Furthermore, with the 6 input neurons, 30 hidden neurons is
 h2  enough to describe the dynamic of the wind power output.
h2 =
Fx Therefore, the Volterra system V(2,6) is extracted from the
The extracted linear and quadratic Volterra kernels in (19) are RNN that uses 6 input neurons and 30 hidden neurons.
shown in Fig. 5(a) and Fig. 5(b) respectively. The range of the We train the RNN using the Conjugate Gradient Algorithm
linear kernels is between 1 and -0.2. In contrast, the range of (CGA) [19] with the Fletcher Reeves formula [20] and the
the quadratic kernels is between 0.02 and -0.05. It is clear that Levenberg-Marquardt Algorithm (LMA) [21] alternately with
the linear kernels are dominant. In Fig. 5(a), the linear kernels respect to the complexity of patterns or the amount of the wind
tail off as delay increases. It means that the effects or shocks power data. The training performance of the CGA is shown in
of past inputs are decreasing. Fig. 5(b) shows that the kernels Fig. 6. The goal is 0.005, and the number of training is 67. It
7

15 Minutes Ahead Forecasted Wind Power Output 30 Minutes Ahead Forecasted Wind Power Output
62 62
Confidence Interval Test Data Test Data
60 60 Confidence Interval
58 58
Wind Farm Output (MW)

Wind Farm Output (MW)


56 56
54 54
52 52
Forecasting Data 50
50
Forecasting Data
48 48
46 46
44 Training Data 44 Training Data
42 42
0 200 400 600 800 1000 0 100 200 300 400 500 600 700 800 900 1000
Minutes Minutes
(a) (b)

Fig. 7. The wind power output is forecasted with confidence intervals at the 95% confidence level up to (a) 15 minutes ahead (b) 30 minutes ahead

is shown that the performance of the validation data become B. Forecasting


worse than the performance of the training data after passing
the optimal training number. We predict the 15 and 30-minutes ahead forecasts using
the V(2,6) Volterra system with CIs at the 95% confidence
A. Confidence Interval levels. The 15 and 30-minutes ahead forecasts are shown in
Fig. 7(a) and Fig. 7(b) respectively. In Fig. 7(a) and Fig. 7(b),
We calculate the CIs to quantify the forecasting errors since
the forecasts from the Volterra system are represented as the
point forecasting is meaningless under noisy circumstances.
solid line, the test data and training data are shown as the
The CIs are built from the RNN based on the multi-linear
long and short dashed lines respectively, and the corresponding
regression technique [22]. This technique uses the forecast
confidence intervals are described as narrow dashed lines.
errors and outputs of the hidden neurons. The RNN can be
represented as the linear function of outputs of hidden neurons The forecasting process is as follows. The Volterra system
I as shown in (14). Then, the CIs at instant step n are defined generates only one-step-ahead forecasts at one time, so the
in [22] as Volterra system must calculate the multi-step-ahead forecast
 by recursively updating the forecasts up to the target time
x̂(n)−tN −H (α)s 1 + I  A−1 I ≤ x(n) ≤ step, based on only the given observations Xn−1 = {x(n −
 (20)
x̂(n) + tN −H (α)s 1 + I  A−1 I 1), x(n − 2), . . . , x(n − M )}. The accuracy of the forecasts is
calculated based on the Mean Square Error (MSE). The MSE
where the A is the decision matrix that is decided by the
(%) is defined as
training data and is defined in [22] as
 N  2
1   x(n) − x̂(n) 
N

A= I(n) × I(n) . (21) M SE =  × 100 (23)
n=1
N n=1  x(n)
To get the CIs at the 95% confidence level, the tN −H (α) in
The MSE for the 15 and 30-minutes ahead forecasts is 0.0371
(20) must follow the Student’s t-distribution, and α becomes
% and 0.0785% respectively. Fig. 7 show that the forecasting
0.05. The unbiased estimator of the prediction error variance
errors increase as the forecasting steps increase.
s in (20) is defined in [23] as
The Volterra system forecasts well for both the training
N
2
(x̂(n) − x(n)) and test data. However, the Volterra system shows superior
2 n=1 (22) performance on the test data because it is smoother than the
s =
N −H training data. In Fig. 7(a), for the validation data, most of the
where the x̂(n) is the forecasts at the time-stamp n, x(n) is actual wind power output lie inside the CIs, and the forecasts
the observed value at the same time-stamp, N is the number follow the sudden decrease of actual data from 200 minutes
of training data sets, and H is the number of hidden neurons. to 350 minutes. In contrast, in Fig. 7(b), even though the CIs
In (20), the CI of the RNN is based on the observed data at have been enlarged, some of the actual wind power output lie
instant step n, so the CI is updated at each step. Therefore, the near the edge of the CIs, but still inside them. In addition, the
CIs vary with respect to the observed real data. Furthermore, forecasts in Fig. 7(b) do not follow the sudden decrease of
since the CIs also depend on the prediction error variance s, the actual data during the same duration. Therefore, despite
CIs become wide as the forecasting errors increase. It is shown a slight decrease in accuracy with a longer time-step, wind
that the CIs in Fig. 7(a) are narrower than the CIs in Fig. 7(b). power output is forecasted well using the Volterra system.
8

C. Stability [4] G. Kariniotakis, G. Stavrakakis, and E. Nogaret, “Wind power forecast-


ing using advanced neural networks models,” Energy Conversion, IEEE
There is no general method to prove the stability of the Transactions on, vol. 11, no. 4, pp. 762 –767, Dec. 1996.
quadratic function [10]. Therefore, in this paper, the stability [5] S. Li, “Wind power prediction using recurrent multilayer perceptron
of the Volterra system is secured by calculating up to 90-step neural networks,” in Power Engineering Society General Meeting, 2003,
IEEE, vol. 4, 2003, p. 2330 Vol. 4.
ahead forecasts. If the system is still stable up to 90 steps, [6] T. Barbounis and J. Theocharis, “Locally recurrent neural networks for
we consider the Volterra system as stable. Furthermore, as long-term wind speed and power prediction,” Neurocomputing, vol. 69,
the forecasting step l increases, the forecasts asymptotically no. 4-6, pp. 466 – 496, 2006.
[7] W. Charytoniuk and M.-S. Chen, “Very short-term load forecasting
approach the mean value of the wind power output. This can be using artificial neural networks,” Power Systems, IEEE Transactions on,
proof of the stability since the ARMA model also approaches vol. 15, no. 1, pp. 263 –268, Feb. 2000.
the mean value as the forecasting step increases. However, this [8] W. J. Rugh, Nonlinear System Theory: The Volterra/wiener Approach.
Johns Hopkins University Press, Baltimore and London, 1981.
method is not sufficient to prove the stability of the general [9] H. M. Abbas and M. M. Bayoumi, “Volterra system identification using
Volterra model, but it is useful to forecast the wind power adaptive genetic algorithms,” Applied Soft Computing, vol. 5, no. 1, pp.
output up to 30 minutes ahead. 75 – 86, 2004.
[10] H. K. Khalil, Nonlinear Systems, 3rd ed. Upper Saddle River, NJ:
The stability of the quadratic Volterra system might be Prentice-Hall, 2000.
proven using the modified Lyapunov second theorem. In short, [11] V. Marmarelis and X. Zhao, “Volterra models and three-layer percep-
the Volterra system would be represented as a linear model if trons,” Neural Networks, IEEE Transactions on, vol. 8, no. 6, pp. 1421
–1433, nov 1997.
we consider the quadratic term as a single term. Then, the [12] G. G. G. Jonathan W, “Calculation of the volterra kernels of non-
quadratic Volterra system is converted to the linear state space linear dynamic systems using an artificial neural networks,” Biological
model. The stability of the linear state space model can be Cybernetics, vol. 71, no. 3, pp. 1432–0770, July 1994.
[13] L. Landberg, G. Giebel, H. A. Nielsen, T. Nielsen, and H. Madsen,
proven using the Lyapunov second theorem, but the theorem “Short-term prediction - an overview,” Wind Energy, vol. 6, pp. 273–
should be adjusted because the feedback is absorbed in the 280, 2003.
quadratic terms. On the other hand, the stability of the Volterra [14] M. Negnevitsky, P. Mandal, and A. Srivastava, “An overview of fore-
casting problems and techniques in power systems,” in Power Energy
system might be secured only by the linear terms. Society General Meeting, 2009. PES ’09. IEEE, 2009, pp. 1 –4.
[15] G. Giebel, R. Brownsword, and G. Kariniotakis, “The state of the art
VII. C ONCLUSION in short-term prediction of wind power: A literature overview,” in EU
Project ANEMOS, Tech. Rep. ENK5-CT-2002-00665, 2003. [Online].
In this paper, the 15 and 30-minutes-ahead wind power Available: http://anemos.cma.fr.
output is forecasted using the recurrent quadratic Volterra [16] J. Freeman and D. M. Skapura, Neural Networks: Algorithms, Appli-
system. The Volterra system is established on the memory cations, and Programming Techniques, 1st ed. New York: Addison-
Wesley, 1991.
and the Volterra kernels. The memory of the Volterra system [17] D. C. Montgomery, C. L. K., and K. Murat, Introduction to Time Series
is identified using the PACF, and the Volterra kernels are Analysis and Forecasting, D. J. B., N. A. C., and G. M. F., Eds. Wiley
extracted from the RNN after approximating the tanh to the series in probability and statistics, 2008.
[18] W. W.S.Wei, Time Series Analysis: Univariate and Multivariate Meth-
polynomial function using the LI. The forecasts lie within ods, D. L. and S. O., Eds. Pearson Education, Inc, 2006.
the CIs, which are computed by the multi-linear regression [19] D. F. Shanno, Neural Networks for Control. Cambridge, Mass: MIT
technique. Furthermore, the stability of the Volterra system is Press, 1995.
[20] R. Fletcher, Practical Methods of Optimization, 2nd ed. New York:
heuristically checked by predicting long step-ahead forecasts. Wiley, 1987.
The single Volterra system is not sufficient to identify [21] M. Hagan and M. Menhaj, “Training feedforward networks with the
the wind power system and forecast the wind power output marquardt algorithm,” Neural Networks, IEEE Transactions on, vol. 5,
no. 6, pp. 989 –993, November 1994.
because the wind turbine is controlled by many factors, such [22] A. da Silva and L. Moulin, “Confidence intervals for neural network
as the pitch angle, the stall, and rated wind speed. Therefore, in based short-term load forecasting,” Power Systems, IEEE Transactions
order to build a more accurate Volterra system while reflecting on, vol. 15, no. 4, pp. 1191 –1196, Nov. 2000.
[23] G. Yu, H. Qiu, D. Djurdjanovic, and J. Lee, “Feature signature prediction
the control factors mentioned above, piecewise Volterra sys- of a boring process using neural network modeling with confidence
tems should be identified in future studies. While coping with bounds,” The International Journal of Advanced Manufacturing Tech-
the limiting factors mentioned above, we should also improve nology, vol. 30, pp. 614–621, 2006.
the techniques to find the CIs of the Volterra model. We might
can directly find the CIs from the Volterra kernels instead of
the output of the hidden neurons. Furthermore, a full analysis
of the stability of the recurrent quadratic Volterra system will Duehee Lee (SM’10) was born in Daegu, Republic
of Korea in 1981. He received B.S. in Electronic and
be the subject of future research. Electrical Engineering in 2000 from Pohang Univer-
sity of Science and Technology, Pohang, Republic
R EFERENCES of Korea. He received his M.S. in Electrical and
Computer Engineering Department in the University
[1] T. Barbounis, J. Theocharis, M. Alexiadis, and P. Dokopoulos, “Long- of Texas at Austin, Austin, TX, in 2009. He is cur-
term wind speed and power forecasting using local recurrent neural rently pursuing his PhD in the same University. His
network models,” Energy Conversion, IEEE Transactions on, vol. 21, research interests are applying system identification
no. 1, pp. 273 – 284, 2006. techniques to the wind power system and the data
[2] T. Barbounis and J. Theocharis, “Locally recurrent neural networks for mining techniques to the smart grid.
wind speed prediction using spatial correlation,” Information Sciences,
vol. 177, no. 24, pp. 5775 – 5797, 2007.
[3] G. Zhang, B. E. Patuwo, and M. Y. Hu, “Forecasting with artificial neural
networks:: The state of the art,” International Journal of Forecasting,
vol. 14, no. 1, pp. 35 – 62, 1998.

Вам также может понравиться