Вы находитесь на странице: 1из 41

Sarath Chandra Tumuluri

1|Page
Sarath Chandra Tumuluri

Contents
INTRODUCTION ......................................................................................................................... 5
2. Subset Data...3
3.Data Cleansing Operations7
3.1Data smoothing 11
3.2 Data Transformations 16
4. Exponential Smoothing Method Decision Tree17
4.1 Exponential Smoothing Decision FlowChart..19
5. Fitting Exponential Smooth Model...20
5.1 Plots of Original and Exponential Smoothed Data.22
6. Forecast with Exponential Smoothing Model...23
6.1 Traning Set Preparation...23
6.2 Analysing Errors in the Forecasted Data.32
6.3 Analysing ACF Plot and PACF Plot for my de trended and de seasonalized data.33

2|Page
Sarath Chandra Tumuluri

List of Figures

1) Fig 1.0-Time Series Plot of Entire Data-Set


2) Fig 1.1 Yearly Data (2015)
3) Fig 2.1 Comparision of January 2015 with January 2016
4) Fig 2.2 January 2015 Data
5) Fig 2.3 Weekly data of January 2015
6) Fig 2.4 Comparision of January 2015 with January 2016
7) Fig 3.1 EvenlySpaced Data
8) Fig 3.2 Kalman Data Imputation on Weekly January 2015
9) Fig 3.3 Interpolation Data Imputation on Weekly January 2015
10) Fig 3.4 Mean Data Imputation on Weekly January 2015
11) Fig 3.5 Moving Average Smoothing on Weekly January 2015
12) Fig 3.6 ACF Plot for the entire data
13) Fig 3.7 Variogram plot
14) Fig 3.8 Plot without Differencing
15) Fig 3.9 Plot without Differencing
16) Fig 3.2.1 after performing Log Transformation
17) Fig 4.1.0 Heat Consumption data for Jan 2015 (no trend and non stationary)
18) Fig 4.1.1 Data with trend
19) Figure 4.1.2 Flowchart on exponential smoothing
20) Fig 5.1.0 Model fitted with optimal values choosen by Holt-Winters-Method
21) Fig 5.1.1 Graph between alpha =0.18 and alpha 0.3 keeping beta constant
22) Fig 5.1.2 Analysing with different alpha values (A=Alpha)
23) Fig 6.1.0 HeatConsumption of August and September 2015
24) Fig 6.1.1 Heat Consumption using Double Exponential Smoothing (HoltWinters)
25) Fig 6.2.1 Single Exponential Method , lamba = 0.1305
26) Fig 6.2.2 Single Exponential Method , lamba=0.3
27) Fig 6.2.3 Double Exponential , Lamba = 0.1623294 (optimal)
28) Fig 6.2.5- Double Exponential , lamba = 0.0258 (below optimal)
29) Fig 6.2.6 Lamba(Single Exponential) vs Sum of Squared Errors (SSE)
30) Fig 6.5.1 -- ACF of the complete heat consumption data
31) Fig 6.5.3 PACF Plot for the complete Data

3|Page
Sarath Chandra Tumuluri

List of Tables

1) Table 6.4.1 Table with Errors and forecast Measures (Single Exponential Method)
2) Table 6.4.2 Table with Erros and Forecast Measures (Double Exponential Method)

4|Page
Sarath Chandra Tumuluri

1.Introduction

1) This file contains data of Wallace Library Heat consumption per hour in
BTU (traditional unit of work equal to about 1055 joules).

Microsoft Excel
Macro-Enabled Worksheet
2) Source of data : Wallace Library heat consumption provided by Rochester Institute of
Technology.
3) Exogenous variables: Outside and inside temperatures , winter ventilation provided by
mechanical and other systems, Infiltration resulting from building construction and usage
and heat required to raise the temperature of materials that is frequently brought into
heated space from outdoors.

Fig 1.0-Time Series Plot of Entire Data-Set

5|Page
Sarath Chandra Tumuluri

Cyclic or Seasonal Data and non-stationary data (no natural mean over time, trend exhibited)
with no atypical events.

2.Subset data

Fig 1.1 Yearly Data (2015)

It can be seen from the graph that the heat consumption is more starting from January to Apirl,
which is understandable as it is winter season and outside temperatures drop to the lowest.

6|Page
Sarath Chandra Tumuluri

Fig 2.4 Comparision of January 2015 with January 2016

Comparing the data of January 2015 with January 2016 to see if there is any correlation with the
months, it can be seen that the amount of heat consumed in the year of 2016 is more when
compared with 2015. This can be related to the exogenous variable of Outside Temperature
which might be higher in the year 2016, which resulted in more heat consumption per hour, or
BTU/hr.

7|Page
Sarath Chandra Tumuluri

Fig 2.2 January 2015 Data

Data subset of only taking the month of January and seeing if there is any trend or seasonality
followed. It can be depicted from the graph that the heat consumption is high in the month end
of January 2015.

8|Page
Sarath Chandra Tumuluri

Fig 2.3 Weekly data of January 2015

Taking a single week of January 2015 and exploring for if any trend or seasonality shown.
We can see sudden jump of heat consumption starting 3rd day of the week and literally in the
first few days of the week of January 2015.

9|Page
Sarath Chandra Tumuluri

3.Data Cleansing Operations

Fig 3.1 EvenlySpaced Data

Data that is provided is verified and can be seen as evenly spaced data without any data missing
in between. For the date to come on X-Axis , special function of as.posxict is used and sequence
is spilt by 3 month duration, which can be seen in the graph above.

Now to perform data imputation techniques intentionally data has been taken out and tried
several data imputation methods on it to see the perform of each method of imputation on this
particular data set of heat consumption. Data imputation methods that are used are , Kalman ,
Interpolation and moving average.

10 | P a g e
Sarath Chandra Tumuluri

Kalman Technique :

Fig 3.2 Kalman Data Imputation on Weekly January 2015

11 | P a g e
Sarath Chandra Tumuluri

Interpolation Technique :

Fig 3.3 Interpolation Data Imputation on Weekly January 2015

12 | P a g e
Sarath Chandra Tumuluri

Moving Average :

Fig 3.4 Mean Data Imputation on Weekly January 2015

13 | P a g e
Sarath Chandra Tumuluri

3.1Data smoothing
Plotted data smoothing using rolling median and rolling mean for the 2nd week of January 2015
and compared the data smoothing done by rolling median and rolling mean, which can be seen
in the graph.

Fig 3.5 Moving Average Smoothing on Weekly January 2015

14 | P a g e
Sarath Chandra Tumuluri

Moving Average for 2 months worth of data

Fig 3.5 Moving Average Smoothing on Weekly January 2015

15 | P a g e
Sarath Chandra Tumuluri

Moving Median for the same data

Fig 3.6 Moving Median Smoothing on Weekly January 2015

Discussion on the observation


A simple moving average for span 5, which assigns 1/5 weigh to the most recent
observations. Exhibits less variability and easier to interpret and analyse if there is any
trend. However, failed to remove the potential outliers.

Sample Mean and Variance for the data;

Sample mean : 46467.42


Sample variance : 14952092370

16 | P a g e
Sarath Chandra Tumuluri

ACF plot for lag k

Fig 3.6 ACF Plot for the entire data

Graph shows that the data is very highly auto correlated and non-statio
nary random series.

17 | P a g e
Sarath Chandra Tumuluri

Variogram

Fig 3.7 Plot of variogram for lags of k =1 to k = 12

Varigorams are generally used to determine the stationarity of


the data.
From the above plot, it can be inferred that data is stationary.

18 | P a g e
Sarath Chandra Tumuluri

Differencing

Fig 3.7 Plot without Differencing

19 | P a g e
Sarath Chandra Tumuluri

After Differencing
Fig 3.8 Plot after Differencing

After differencing, it appears to have a constant mean and constant variance over time. Which
can be clearly seen in the graph.

20 | P a g e
Sarath Chandra Tumuluri

3.2 Data Transformations

As for the data, that I have my variances changes a lot at the starting of the year and in the
middle of the year. Which is the heteroscedasticity.

To remove that I applied the natural log transformation.

The other options that were available for transformation are inverse transformation, square
root transformation and reciprocal square root transformation.

Fig 3.2.1 after performing Log Transformation

We can infer from the graph that the variance is consistent when compared with before.

21 | P a g e
Sarath Chandra Tumuluri

4. Exponential Smoothing Method Decision Tree

4.1 Exponential Smoothing Decision Flowchart

Firstly, exponential smoothing is used to remove the drawback of weighing all the past
observations. It gives more weightage to the recent observations than the past.

Fig 4.1.0 Heat Consumption data for Jan 2015 (no trend and non-stationary)

For a Data which is non stationary with no trend , then single exponential Smoothing is used.

The main drawback of the single exponential smoothing is that it lags behind the trend by a
certain amount and doesnt properly estimate the trend.

22 | P a g e
Sarath Chandra Tumuluri

Double Exponential Smoothing.

Is an addition to the single exponential Smoothing, which is used for smoothing data, which has
trend.

Fig 4.1.1 Data with trend.

23 | P a g e
Sarath Chandra Tumuluri

Figure 4.1.2 Flowchart on exponential smoothing

Exponential Smoothing

No Yes

Trend

Single Exponential Double Exponential

As my Data has got both the charactersitics of non stationary as well as trend , I chose Double
Exponential Model for my data.

But I also tried to forecast and evaluate the data with the single exponential smoothing to see
the performance and compare.

24 | P a g e
Sarath Chandra Tumuluri

5. Fitting Exponential Smooth Model

5.1 Plots of Original and Exponential Smoothed Data

Fig 5.1.0 Model fitted with optimal values choosen by Holt-Winters-Method

This is double exponential model fit with optimal value of alpha and beta chose
n by Holt winters method, which are alpha= 0.1895906 and beta= 0.001607902
. I have chosen a value above the alpha value specified by the Holt Winters m
ethod and one value below keeping the beta value constant, to see if the mode
l fits my data better than the one given by Holt Winter.

25 | P a g e
Sarath Chandra Tumuluri

Fig 5.1.1 Graph between alpha =0.18 and alpha 0.3 keeping beta constant

When we plot the graphs by taking two different alpha values then it shows
that the as the value of alpha increases the model goes more away from the
original data and smoothing is affected.

We subset the data to see how the parameters of the model are affecting the
data. To get a better interpretation of the model parameters.

26 | P a g e
Sarath Chandra Tumuluri

Fig 5.1.2 Analysing with different alpha values (A=Alpha)

From this it can be interpolated that by changing the values of alpha with
values which are greater than and less than alpha and by keeping the beta
constant, the model that is fitted is lagging by a greater amount each time we
go away from the alpha.
So we can conclude that the optimal value which best fits the orginal data with
minimum lag is alpha = 0.18 and beta=0.0016.

27 | P a g e
Sarath Chandra Tumuluri

6. Forecast with Exponential Smoothing Model


6.1 Traning Set Preparation

Intial thought was to forecast the last year October 2015 data by giving the
input of the September and August 2015.
Fig 6.1.0 HeatConsumption of August and September 2015

But at this time of year we have very unpredictable weather changes and what
was very less consumption in august 2015 will see a sudden large shift in the
heat consumption due to change in climate from normal weather to a very cold
temperatures. So we input this as our training set then I saw a very high
residuals given the ground truth.
As this data is very fluctuating and random , I have choosen to go with selecting
the data based on the the season that is currently going on. So if its a fall then
it would be reasonable to give the data of June and July as trained set or
historic data set based on which accurate predictions or forecast can be made.
Even though they might be potential outliers it would not effect much as it

28 | P a g e
Sarath Chandra Tumuluri

would do if chosen data in between the seasons of the year. Like trying to
predict the Heat Consumption of spring given only the historic fall data.
Keeping this in mind, I have chosen to predict the BTU/Hr of March 2015, given
the data of January and February 2015.

Fig 6.1.1 Heat Consumption using Double Exponential Smoothing (HoltWinters)

As it can be seen that the confidence intervals does not even lie in the frame
of BTU/Hr consumption of ground truth.

29 | P a g e
Sarath Chandra Tumuluri

6.2 Forecasted data of March 2015 by training the model on the previous two
months

Before I did the forecast, I have choosen my forecasting performance


evaluators, which will to be suitable to my data, and gives the correct estimate
of the performance of my model.
There are importantly 4 performance measures, which are listed below.
1) Mean Absolute Percent Error (MAPE)
2) Root Mean Squared Error (RMSE)
3) R squared
4) Mean Squared Error (MSE)

All the measures are trying to explain how much the forecasted value deviated
from the original value.
R squared measures of how well the model explains the given data. But does
not tell how well the model predicts the future.
Mean Absolute Percent Error has a disadvantage if the actual value goes to zero
then it becomes infinity, which does not tell anything about the forecasting
performance of the model.
Whereas Root Mean squared error can be used if the values of the forcasted
and actual are on the same scale and the RMSE is scale dependent.
As, we are not having the problems mentioned above for the current data set ,
except for R squared which does not tell about how well the model predicts .
We calculate RMSE and MAPE for the current model.

30 | P a g e
Sarath Chandra Tumuluri

Now after selecting the model performance parameters. We forecast the data
by using both the single and double exponential smoothing.
First by getting the optimal parameters automatically from the Holt Winters
Method
Second , by using the mean squared error as an evaluator to calculate the best
possible parameters.
Also , by changing the values of parameters which are greater than and less
than the optimal values given by Holt Winters Method.
The performance evaluators chosen evaluate the Best of the above three test
cases and the best forecasting method with optimal values of the parameters is
achieved.
Fig 6.2.1 Single Exponential Method , lamba = 0.1305

MAPE = 244029
MSE = 73639472095

31 | P a g e
Sarath Chandra Tumuluri

RMSE = 271365.9

Fig 6.2.2 Single Exponential Method , lamba=0.3

MAPE = 217.164
MSE=75416239290
RMSE=274620.2

Fig 6.2.3 Single Exponential Method , Lamba = 0.09

32 | P a g e
Sarath Chandra Tumuluri

MAPE=25401
RMSE = 274620.2

33 | P a g e
Sarath Chandra Tumuluri

Fig 6.2.3 Double Exponential , Lamba = 0.1623294 (optimal)

MAPE = 168.4257
MSE=74445833884
RMSE = 272847.6

34 | P a g e
Sarath Chandra Tumuluri

Fig 6.2.5- Double Exponential , lamba = 0.0258 (below optimal)

MAPE = 66.90126
MSE = 91220442069
RMSE = 302027.2

The model parameters are changed in two ways


1) One way being keeping the optimal values of lamba given automatically
by Holt Winters Method as origin and analysis the changes in the
evaluators with the change in the lamba values
2) By plotting the lamba values from 0 to 1 against the sum of squared
errors and getting the lamba with least sum of squared error.

35 | P a g e
Sarath Chandra Tumuluri

Fig 6.2.6 Lamba(Single Exponential) vs Sum of Squared Errors (SSE)

From this we can see that 0.3 lamba is the most optimal for the model.

After looking at the results of the error values calculated by different


forecasting measures used , it can be inferred that double exponential
smoothing with alpha = 0.025 and beta = 0.004976 gave the least errors and
best model fit.

36 | P a g e
Sarath Chandra Tumuluri

6.2 Analysing Errors in the Forecasted Data

Table 6.4.1 Table with Errors and forecast Measures (Single Exponential
Method)

lamba MAPE MSE RMSE


0.09 25401 75416254248 274620.2
0.1305 244.29 73639451683 271365.9
0.3 217.164 75416254248 274620.2
0.74 7625 90590766289 300983

Table 6.4.2 Table with Erros and Forecast Measures (Double Exponential
Method)

lamba MAPE MSE RMSE


0.025 66.90126 91220442069 302027
0.1 108.1917 75138649464 274114.3
0.3 217.164 75837008379 275385.2

37 | P a g e
Sarath Chandra Tumuluri

Pattern of MAPE with Original Data

38 | P a g e
Sarath Chandra Tumuluri

Histogram plot showing the frequency of MAPE errors

39 | P a g e
Sarath Chandra Tumuluri

6.3 Analysing ACF Plot and PACF Plot for my de trended and de seasonalized
data

Fig . 6.5.1 -- ACF of the complete heat consumption data

40 | P a g e
Sarath Chandra Tumuluri

Fig 6.5.3 PACF Plot for the complete Data

As I can see a mixture of exponential decay and damped sinusoid in ACF and
PACF drops to 0 after lag 11 . It can be interpreted as P order autoregressive
model. Where P might be 11 or 12.

41 | P a g e

Вам также может понравиться