Project Final

Sarath Chandra Tumuluri
1|Page
Contents
INTRODUCTION ......................................................................................................................... 5
2. Subset Data...3
3.Data Cleansing Operations7
3.1Data smoothing 11
3.2 Data Transformations 16
4. Exponential Smoothing Method Decision Tree17
4.1 Exponential Smoothing Decision FlowChart..19
5. Fitting Exponential Smooth Model...20
5.1 Plots of Original and Exponential Smoothed Data.22
6. Forecast with Exponential Smoothing Model...23
6.1 Traning Set Preparation...23
6.2 Analysing Errors in the Forecasted Data.32
6.3 Analysing ACF Plot and PACF Plot for my de trended and de seasonalized data.33
2|Page
List of Figures
1) Fig 1.0-Time Series Plot of Entire Data-Set

2) Fig 1.1 Yearly Data (2015)
3) Fig 2.1 Comparision of January 2015 with January 2016
4) Fig 2.2 January 2015 Data
5) Fig 2.3 Weekly data of January 2015
6) Fig 2.4 Comparision of January 2015 with January 2016
7) Fig 3.1 EvenlySpaced Data
8) Fig 3.2 Kalman Data Imputation on Weekly January 2015
9) Fig 3.3 Interpolation Data Imputation on Weekly January 2015
10) Fig 3.4 Mean Data Imputation on Weekly January 2015
11) Fig 3.5 Moving Average Smoothing on Weekly January 2015
12) Fig 3.6 ACF Plot for the entire data
13) Fig 3.7 Variogram plot
14) Fig 3.8 Plot without Differencing
15) Fig 3.9 Plot without Differencing
16) Fig 3.2.1 after performing Log Transformation
17) Fig 4.1.0 Heat Consumption data for Jan 2015 (no trend and non stationary)
18) Fig 4.1.1 Data with trend
19) Figure 4.1.2 Flowchart on exponential smoothing
20) Fig 5.1.0 Model fitted with optimal values choosen by Holt-Winters-Method
21) Fig 5.1.1 Graph between alpha =0.18 and alpha 0.3 keeping beta constant
22) Fig 5.1.2 Analysing with different alpha values (A=Alpha)
23) Fig 6.1.0 HeatConsumption of August and September 2015
24) Fig 6.1.1 Heat Consumption using Double Exponential Smoothing (HoltWinters)
25) Fig 6.2.1 Single Exponential Method , lamba = 0.1305
26) Fig 6.2.2 Single Exponential Method , lamba=0.3
27) Fig 6.2.3 Double Exponential , Lamba = 0.1623294 (optimal)
28) Fig 6.2.5- Double Exponential , lamba = 0.0258 (below optimal)
29) Fig 6.2.6 Lamba(Single Exponential) vs Sum of Squared Errors (SSE)
30) Fig 6.5.1 -- ACF of the complete heat consumption data
31) Fig 6.5.3 PACF Plot for the complete Data
3|Page
List of Tables
1) Table 6.4.1 Table with Errors and forecast Measures (Single Exponential Method)
2) Table 6.4.2 Table with Erros and Forecast Measures (Double Exponential Method)
4|Page
1.Introduction
1) This file contains data of Wallace Library Heat consumption per hour in
BTU (traditional unit of work equal to about 1055 joules).
Microsoft Excel
Macro-Enabled Worksheet
2) Source of data : Wallace Library heat consumption provided by Rochester Institute of
Technology.
3) Exogenous variables: Outside and inside temperatures , winter ventilation provided by
mechanical and other systems, Infiltration resulting from building construction and usage
and heat required to raise the temperature of materials that is frequently brought into
heated space from outdoors.
Fig 1.0-Time Series Plot of Entire Data-Set
5|Page
Cyclic or Seasonal Data and non-stationary data (no natural mean over time, trend exhibited)
with no atypical events.
2.Subset data
Fig 1.1 Yearly Data (2015)
It can be seen from the graph that the heat consumption is more starting from January to Apirl,
which is understandable as it is winter season and outside temperatures drop to the lowest.
6|Page
Fig 2.4 Comparision of January 2015 with January 2016
Comparing the data of January 2015 with January 2016 to see if there is any correlation with the
months, it can be seen that the amount of heat consumed in the year of 2016 is more when
compared with 2015. This can be related to the exogenous variable of Outside Temperature
which might be higher in the year 2016, which resulted in more heat consumption per hour, or
BTU/hr.
7|Page
Fig 2.2 January 2015 Data
Data subset of only taking the month of January and seeing if there is any trend or seasonality
followed. It can be depicted from the graph that the heat consumption is high in the month end
of January 2015.
8|Page
Fig 2.3 Weekly data of January 2015
Taking a single week of January 2015 and exploring for if any trend or seasonality shown.
We can see sudden jump of heat consumption starting 3rd day of the week and literally in the
first few days of the week of January 2015.
9|Page
3.Data Cleansing Operations
Fig 3.1 EvenlySpaced Data
Data that is provided is verified and can be seen as evenly spaced data without any data missing
in between. For the date to come on X-Axis , special function of as.posxict is used and sequence
is spilt by 3 month duration, which can be seen in the graph above.
Now to perform data imputation techniques intentionally data has been taken out and tried
several data imputation methods on it to see the perform of each method of imputation on this
particular data set of heat consumption. Data imputation methods that are used are , Kalman ,
Interpolation and moving average.
10 | P a g e
Kalman Technique :
Fig 3.2 Kalman Data Imputation on Weekly January 2015
11 | P a g e
Interpolation Technique :
Fig 3.3 Interpolation Data Imputation on Weekly January 2015
12 | P a g e
Moving Average :
Fig 3.4 Mean Data Imputation on Weekly January 2015
13 | P a g e
3.1Data smoothing
Plotted data smoothing using rolling median and rolling mean for the 2nd week of January 2015
and compared the data smoothing done by rolling median and rolling mean, which can be seen
in the graph.
Fig 3.5 Moving Average Smoothing on Weekly January 2015
14 | P a g e
Moving Average for 2 months worth of data
Fig 3.5 Moving Average Smoothing on Weekly January 2015
15 | P a g e
Moving Median for the same data
Fig 3.6 Moving Median Smoothing on Weekly January 2015
Discussion on the observation

A simple moving average for span 5, which assigns 1/5 weigh to the most recent
observations. Exhibits less variability and easier to interpret and analyse if there is any
trend. However, failed to remove the potential outliers.
Sample Mean and Variance for the data;
Sample mean : 46467.42

Sample variance : 14952092370
16 | P a g e
ACF plot for lag k
Fig 3.6 ACF Plot for the entire data
Graph shows that the data is very highly auto correlated and non-statio
nary random series.
17 | P a g e
Variogram
Fig 3.7 Plot of variogram for lags of k =1 to k = 12
Varigorams are generally used to determine the stationarity of

the data.
From the above plot, it can be inferred that data is stationary.
18 | P a g e
Differencing
Fig 3.7 Plot without Differencing
19 | P a g e
After Differencing
Fig 3.8 Plot after Differencing
After differencing, it appears to have a constant mean and constant variance over time. Which
can be clearly seen in the graph.
20 | P a g e
3.2 Data Transformations
As for the data, that I have my variances changes a lot at the starting of the year and in the
middle of the year. Which is the heteroscedasticity.
To remove that I applied the natural log transformation.
The other options that were available for transformation are inverse transformation, square
root transformation and reciprocal square root transformation.
Fig 3.2.1 after performing Log Transformation
We can infer from the graph that the variance is consistent when compared with before.
21 | P a g e
4. Exponential Smoothing Method Decision Tree
4.1 Exponential Smoothing Decision Flowchart
Firstly, exponential smoothing is used to remove the drawback of weighing all the past
observations. It gives more weightage to the recent observations than the past.
Fig 4.1.0 Heat Consumption data for Jan 2015 (no trend and non-stationary)
For a Data which is non stationary with no trend , then single exponential Smoothing is used.
The main drawback of the single exponential smoothing is that it lags behind the trend by a
certain amount and doesnt properly estimate the trend.
22 | P a g e
Double Exponential Smoothing.
Is an addition to the single exponential Smoothing, which is used for smoothing data, which has
trend.
Fig 4.1.1 Data with trend.
23 | P a g e
Figure 4.1.2 Flowchart on exponential smoothing
Exponential Smoothing
No Yes
Trend
Single Exponential Double Exponential
As my Data has got both the charactersitics of non stationary as well as trend , I chose Double
Exponential Model for my data.
But I also tried to forecast and evaluate the data with the single exponential smoothing to see
the performance and compare.
24 | P a g e
5. Fitting Exponential Smooth Model
5.1 Plots of Original and Exponential Smoothed Data
Fig 5.1.0 Model fitted with optimal values choosen by Holt-Winters-Method
This is double exponential model fit with optimal value of alpha and beta chose
n by Holt winters method, which are alpha= 0.1895906 and beta= 0.001607902
. I have chosen a value above the alpha value specified by the Holt Winters m
ethod and one value below keeping the beta value constant, to see if the mode
l fits my data better than the one given by Holt Winter.
25 | P a g e
Fig 5.1.1 Graph between alpha =0.18 and alpha 0.3 keeping beta constant
When we plot the graphs by taking two different alpha values then it shows
that the as the value of alpha increases the model goes more away from the
original data and smoothing is affected.
We subset the data to see how the parameters of the model are affecting the
data. To get a better interpretation of the model parameters.
26 | P a g e
Fig 5.1.2 Analysing with different alpha values (A=Alpha)
From this it can be interpolated that by changing the values of alpha with
values which are greater than and less than alpha and by keeping the beta
constant, the model that is fitted is lagging by a greater amount each time we
go away from the alpha.
So we can conclude that the optimal value which best fits the orginal data with
minimum lag is alpha = 0.18 and beta=0.0016.
27 | P a g e
6. Forecast with Exponential Smoothing Model

6.1 Traning Set Preparation
Intial thought was to forecast the last year October 2015 data by giving the
input of the September and August 2015.
Fig 6.1.0 HeatConsumption of August and September 2015
But at this time of year we have very unpredictable weather changes and what
was very less consumption in august 2015 will see a sudden large shift in the
heat consumption due to change in climate from normal weather to a very cold
temperatures. So we input this as our training set then I saw a very high
residuals given the ground truth.
As this data is very fluctuating and random , I have choosen to go with selecting
the data based on the the season that is currently going on. So if its a fall then
it would be reasonable to give the data of June and July as trained set or
historic data set based on which accurate predictions or forecast can be made.
Even though they might be potential outliers it would not effect much as it
28 | P a g e
would do if chosen data in between the seasons of the year. Like trying to
predict the Heat Consumption of spring given only the historic fall data.
Keeping this in mind, I have chosen to predict the BTU/Hr of March 2015, given
the data of January and February 2015.
Fig 6.1.1 Heat Consumption using Double Exponential Smoothing (HoltWinters)
As it can be seen that the confidence intervals does not even lie in the frame
of BTU/Hr consumption of ground truth.
29 | P a g e
6.2 Forecasted data of March 2015 by training the model on the previous two
months
Before I did the forecast, I have choosen my forecasting performance

evaluators, which will to be suitable to my data, and gives the correct estimate
of the performance of my model.
There are importantly 4 performance measures, which are listed below.
1) Mean Absolute Percent Error (MAPE)
2) Root Mean Squared Error (RMSE)
3) R squared
4) Mean Squared Error (MSE)
All the measures are trying to explain how much the forecasted value deviated
from the original value.
R squared measures of how well the model explains the given data. But does
not tell how well the model predicts the future.
Mean Absolute Percent Error has a disadvantage if the actual value goes to zero
then it becomes infinity, which does not tell anything about the forecasting
performance of the model.
Whereas Root Mean squared error can be used if the values of the forcasted
and actual are on the same scale and the RMSE is scale dependent.
As, we are not having the problems mentioned above for the current data set ,
except for R squared which does not tell about how well the model predicts .
We calculate RMSE and MAPE for the current model.
30 | P a g e
Now after selecting the model performance parameters. We forecast the data
by using both the single and double exponential smoothing.
First by getting the optimal parameters automatically from the Holt Winters
Method
Second , by using the mean squared error as an evaluator to calculate the best
possible parameters.
Also , by changing the values of parameters which are greater than and less
than the optimal values given by Holt Winters Method.
The performance evaluators chosen evaluate the Best of the above three test
cases and the best forecasting method with optimal values of the parameters is
achieved.
Fig 6.2.1 Single Exponential Method , lamba = 0.1305
MAPE = 244029
MSE = 73639472095
31 | P a g e
RMSE = 271365.9
Fig 6.2.2 Single Exponential Method , lamba=0.3
MAPE = 217.164
MSE=75416239290
RMSE=274620.2
Fig 6.2.3 Single Exponential Method , Lamba = 0.09
32 | P a g e
MAPE=25401
RMSE = 274620.2
33 | P a g e
Fig 6.2.3 Double Exponential , Lamba = 0.1623294 (optimal)
MAPE = 168.4257
MSE=74445833884
RMSE = 272847.6
34 | P a g e
Fig 6.2.5- Double Exponential , lamba = 0.0258 (below optimal)
MAPE = 66.90126
MSE = 91220442069
RMSE = 302027.2
The model parameters are changed in two ways

1) One way being keeping the optimal values of lamba given automatically
by Holt Winters Method as origin and analysis the changes in the
evaluators with the change in the lamba values
2) By plotting the lamba values from 0 to 1 against the sum of squared
errors and getting the lamba with least sum of squared error.
35 | P a g e
Fig 6.2.6 Lamba(Single Exponential) vs Sum of Squared Errors (SSE)
From this we can see that 0.3 lamba is the most optimal for the model.
After looking at the results of the error values calculated by different

forecasting measures used , it can be inferred that double exponential
smoothing with alpha = 0.025 and beta = 0.004976 gave the least errors and
best model fit.
36 | P a g e
6.2 Analysing Errors in the Forecasted Data
Table 6.4.1 Table with Errors and forecast Measures (Single Exponential
Method)
lamba MAPE MSE RMSE

0.09 25401 75416254248 274620.2
0.1305 244.29 73639451683 271365.9
0.3 217.164 75416254248 274620.2
0.74 7625 90590766289 300983
Table 6.4.2 Table with Erros and Forecast Measures (Double Exponential
Method)
lamba MAPE MSE RMSE

0.025 66.90126 91220442069 302027
0.1 108.1917 75138649464 274114.3
0.3 217.164 75837008379 275385.2
37 | P a g e
Pattern of MAPE with Original Data
38 | P a g e
Histogram plot showing the frequency of MAPE errors
39 | P a g e
6.3 Analysing ACF Plot and PACF Plot for my de trended and de seasonalized
data
Fig . 6.5.1 -- ACF of the complete heat consumption data
40 | P a g e
Fig 6.5.3 PACF Plot for the complete Data
As I can see a mixture of exponential decay and damped sinusoid in ACF and
PACF drops to 0 after lag 11 . It can be interpreted as P order autoregressive
model. Where P might be 11 or 12.
41 | P a g e

Project Final

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Project Final

Загружено:

Авторское право:

Доступные форматы

Sarath Chandra Tumuluri

1) Fig 1.0-Time Series Plot of Entire Data-Set

Fig 1.0-Time Series Plot of Entire Data-Set

Fig 1.1 Yearly Data (2015)

Fig 2.4 Comparision of January 2015 with January 2016

Fig 2.2 January 2015 Data

Fig 2.3 Weekly data of January 2015

3.Data Cleansing Operations

Fig 3.1 EvenlySpaced Data

Fig 3.2 Kalman Data Imputation on Weekly January 2015

Fig 3.3 Interpolation Data Imputation on Weekly January 2015

Fig 3.4 Mean Data Imputation on Weekly January 2015

Fig 3.5 Moving Average Smoothing on Weekly January 2015

Moving Average for 2 months worth of data

Fig 3.5 Moving Average Smoothing on Weekly January 2015

Moving Median for the same data

Fig 3.6 Moving Median Smoothing on Weekly January 2015

Discussion on the observation

Sample Mean and Variance for the data;

Sample mean : 46467.42

ACF plot for lag k

Fig 3.6 ACF Plot for the entire data

Fig 3.7 Plot of variogram for lags of k =1 to k = 12

Varigorams are generally used to determine the stationarity of

Fig 3.7 Plot without Differencing

3.2 Data Transformations

To remove that I applied the natural log transformation.

Fig 3.2.1 after performing Log Transformation

4. Exponential Smoothing Method Decision Tree

4.1 Exponential Smoothing Decision Flowchart

Double Exponential Smoothing.

Fig 4.1.1 Data with trend.

Figure 4.1.2 Flowchart on exponential smoothing

Single Exponential Double Exponential

5. Fitting Exponential Smooth Model

5.1 Plots of Original and Exponential Smoothed Data

Fig 5.1.0 Model fitted with optimal values choosen by Holt-Winters-Method

Fig 5.1.2 Analysing with different alpha values (A=Alpha)

6. Forecast with Exponential Smoothing Model

Fig 6.1.1 Heat Consumption using Double Exponential Smoothing (HoltWinters)

Before I did the forecast, I have choosen my forecasting performance

Fig 6.2.2 Single Exponential Method , lamba=0.3

Fig 6.2.3 Single Exponential Method , Lamba = 0.09

Fig 6.2.3 Double Exponential , Lamba = 0.1623294 (optimal)

Fig 6.2.5- Double Exponential , lamba = 0.0258 (below optimal)

The model parameters are changed in two ways

Fig 6.2.6 Lamba(Single Exponential) vs Sum of Squared Errors (SSE)

After looking at the results of the error values calculated by different

6.2 Analysing Errors in the Forecasted Data

lamba MAPE MSE RMSE

lamba MAPE MSE RMSE

Pattern of MAPE with Original Data

Histogram plot showing the frequency of MAPE errors

Fig . 6.5.1 -- ACF of the complete heat consumption data

Fig 6.5.3 PACF Plot for the complete Data

Вам также может понравиться