Академический Документы
Профессиональный Документы
Культура Документы
1|Page
Sarath Chandra Tumuluri
Contents
INTRODUCTION ......................................................................................................................... 5
2. Subset Data...3
3.Data Cleansing Operations7
3.1Data smoothing 11
3.2 Data Transformations 16
4. Exponential Smoothing Method Decision Tree17
4.1 Exponential Smoothing Decision FlowChart..19
5. Fitting Exponential Smooth Model...20
5.1 Plots of Original and Exponential Smoothed Data.22
6. Forecast with Exponential Smoothing Model...23
6.1 Traning Set Preparation...23
6.2 Analysing Errors in the Forecasted Data.32
6.3 Analysing ACF Plot and PACF Plot for my de trended and de seasonalized data.33
2|Page
Sarath Chandra Tumuluri
List of Figures
3|Page
Sarath Chandra Tumuluri
List of Tables
1) Table 6.4.1 Table with Errors and forecast Measures (Single Exponential Method)
2) Table 6.4.2 Table with Erros and Forecast Measures (Double Exponential Method)
4|Page
Sarath Chandra Tumuluri
1.Introduction
1) This file contains data of Wallace Library Heat consumption per hour in
BTU (traditional unit of work equal to about 1055 joules).
Microsoft Excel
Macro-Enabled Worksheet
2) Source of data : Wallace Library heat consumption provided by Rochester Institute of
Technology.
3) Exogenous variables: Outside and inside temperatures , winter ventilation provided by
mechanical and other systems, Infiltration resulting from building construction and usage
and heat required to raise the temperature of materials that is frequently brought into
heated space from outdoors.
5|Page
Sarath Chandra Tumuluri
Cyclic or Seasonal Data and non-stationary data (no natural mean over time, trend exhibited)
with no atypical events.
2.Subset data
It can be seen from the graph that the heat consumption is more starting from January to Apirl,
which is understandable as it is winter season and outside temperatures drop to the lowest.
6|Page
Sarath Chandra Tumuluri
Comparing the data of January 2015 with January 2016 to see if there is any correlation with the
months, it can be seen that the amount of heat consumed in the year of 2016 is more when
compared with 2015. This can be related to the exogenous variable of Outside Temperature
which might be higher in the year 2016, which resulted in more heat consumption per hour, or
BTU/hr.
7|Page
Sarath Chandra Tumuluri
Data subset of only taking the month of January and seeing if there is any trend or seasonality
followed. It can be depicted from the graph that the heat consumption is high in the month end
of January 2015.
8|Page
Sarath Chandra Tumuluri
Taking a single week of January 2015 and exploring for if any trend or seasonality shown.
We can see sudden jump of heat consumption starting 3rd day of the week and literally in the
first few days of the week of January 2015.
9|Page
Sarath Chandra Tumuluri
Data that is provided is verified and can be seen as evenly spaced data without any data missing
in between. For the date to come on X-Axis , special function of as.posxict is used and sequence
is spilt by 3 month duration, which can be seen in the graph above.
Now to perform data imputation techniques intentionally data has been taken out and tried
several data imputation methods on it to see the perform of each method of imputation on this
particular data set of heat consumption. Data imputation methods that are used are , Kalman ,
Interpolation and moving average.
10 | P a g e
Sarath Chandra Tumuluri
Kalman Technique :
11 | P a g e
Sarath Chandra Tumuluri
Interpolation Technique :
12 | P a g e
Sarath Chandra Tumuluri
Moving Average :
13 | P a g e
Sarath Chandra Tumuluri
3.1Data smoothing
Plotted data smoothing using rolling median and rolling mean for the 2nd week of January 2015
and compared the data smoothing done by rolling median and rolling mean, which can be seen
in the graph.
14 | P a g e
Sarath Chandra Tumuluri
15 | P a g e
Sarath Chandra Tumuluri
16 | P a g e
Sarath Chandra Tumuluri
Graph shows that the data is very highly auto correlated and non-statio
nary random series.
17 | P a g e
Sarath Chandra Tumuluri
Variogram
18 | P a g e
Sarath Chandra Tumuluri
Differencing
19 | P a g e
Sarath Chandra Tumuluri
After Differencing
Fig 3.8 Plot after Differencing
After differencing, it appears to have a constant mean and constant variance over time. Which
can be clearly seen in the graph.
20 | P a g e
Sarath Chandra Tumuluri
As for the data, that I have my variances changes a lot at the starting of the year and in the
middle of the year. Which is the heteroscedasticity.
The other options that were available for transformation are inverse transformation, square
root transformation and reciprocal square root transformation.
We can infer from the graph that the variance is consistent when compared with before.
21 | P a g e
Sarath Chandra Tumuluri
Firstly, exponential smoothing is used to remove the drawback of weighing all the past
observations. It gives more weightage to the recent observations than the past.
Fig 4.1.0 Heat Consumption data for Jan 2015 (no trend and non-stationary)
For a Data which is non stationary with no trend , then single exponential Smoothing is used.
The main drawback of the single exponential smoothing is that it lags behind the trend by a
certain amount and doesnt properly estimate the trend.
22 | P a g e
Sarath Chandra Tumuluri
Is an addition to the single exponential Smoothing, which is used for smoothing data, which has
trend.
23 | P a g e
Sarath Chandra Tumuluri
Exponential Smoothing
No Yes
Trend
As my Data has got both the charactersitics of non stationary as well as trend , I chose Double
Exponential Model for my data.
But I also tried to forecast and evaluate the data with the single exponential smoothing to see
the performance and compare.
24 | P a g e
Sarath Chandra Tumuluri
This is double exponential model fit with optimal value of alpha and beta chose
n by Holt winters method, which are alpha= 0.1895906 and beta= 0.001607902
. I have chosen a value above the alpha value specified by the Holt Winters m
ethod and one value below keeping the beta value constant, to see if the mode
l fits my data better than the one given by Holt Winter.
25 | P a g e
Sarath Chandra Tumuluri
Fig 5.1.1 Graph between alpha =0.18 and alpha 0.3 keeping beta constant
When we plot the graphs by taking two different alpha values then it shows
that the as the value of alpha increases the model goes more away from the
original data and smoothing is affected.
We subset the data to see how the parameters of the model are affecting the
data. To get a better interpretation of the model parameters.
26 | P a g e
Sarath Chandra Tumuluri
From this it can be interpolated that by changing the values of alpha with
values which are greater than and less than alpha and by keeping the beta
constant, the model that is fitted is lagging by a greater amount each time we
go away from the alpha.
So we can conclude that the optimal value which best fits the orginal data with
minimum lag is alpha = 0.18 and beta=0.0016.
27 | P a g e
Sarath Chandra Tumuluri
Intial thought was to forecast the last year October 2015 data by giving the
input of the September and August 2015.
Fig 6.1.0 HeatConsumption of August and September 2015
But at this time of year we have very unpredictable weather changes and what
was very less consumption in august 2015 will see a sudden large shift in the
heat consumption due to change in climate from normal weather to a very cold
temperatures. So we input this as our training set then I saw a very high
residuals given the ground truth.
As this data is very fluctuating and random , I have choosen to go with selecting
the data based on the the season that is currently going on. So if its a fall then
it would be reasonable to give the data of June and July as trained set or
historic data set based on which accurate predictions or forecast can be made.
Even though they might be potential outliers it would not effect much as it
28 | P a g e
Sarath Chandra Tumuluri
would do if chosen data in between the seasons of the year. Like trying to
predict the Heat Consumption of spring given only the historic fall data.
Keeping this in mind, I have chosen to predict the BTU/Hr of March 2015, given
the data of January and February 2015.
As it can be seen that the confidence intervals does not even lie in the frame
of BTU/Hr consumption of ground truth.
29 | P a g e
Sarath Chandra Tumuluri
6.2 Forecasted data of March 2015 by training the model on the previous two
months
All the measures are trying to explain how much the forecasted value deviated
from the original value.
R squared measures of how well the model explains the given data. But does
not tell how well the model predicts the future.
Mean Absolute Percent Error has a disadvantage if the actual value goes to zero
then it becomes infinity, which does not tell anything about the forecasting
performance of the model.
Whereas Root Mean squared error can be used if the values of the forcasted
and actual are on the same scale and the RMSE is scale dependent.
As, we are not having the problems mentioned above for the current data set ,
except for R squared which does not tell about how well the model predicts .
We calculate RMSE and MAPE for the current model.
30 | P a g e
Sarath Chandra Tumuluri
Now after selecting the model performance parameters. We forecast the data
by using both the single and double exponential smoothing.
First by getting the optimal parameters automatically from the Holt Winters
Method
Second , by using the mean squared error as an evaluator to calculate the best
possible parameters.
Also , by changing the values of parameters which are greater than and less
than the optimal values given by Holt Winters Method.
The performance evaluators chosen evaluate the Best of the above three test
cases and the best forecasting method with optimal values of the parameters is
achieved.
Fig 6.2.1 Single Exponential Method , lamba = 0.1305
MAPE = 244029
MSE = 73639472095
31 | P a g e
Sarath Chandra Tumuluri
RMSE = 271365.9
MAPE = 217.164
MSE=75416239290
RMSE=274620.2
32 | P a g e
Sarath Chandra Tumuluri
MAPE=25401
RMSE = 274620.2
33 | P a g e
Sarath Chandra Tumuluri
MAPE = 168.4257
MSE=74445833884
RMSE = 272847.6
34 | P a g e
Sarath Chandra Tumuluri
MAPE = 66.90126
MSE = 91220442069
RMSE = 302027.2
35 | P a g e
Sarath Chandra Tumuluri
From this we can see that 0.3 lamba is the most optimal for the model.
36 | P a g e
Sarath Chandra Tumuluri
Table 6.4.1 Table with Errors and forecast Measures (Single Exponential
Method)
Table 6.4.2 Table with Erros and Forecast Measures (Double Exponential
Method)
37 | P a g e
Sarath Chandra Tumuluri
38 | P a g e
Sarath Chandra Tumuluri
39 | P a g e
Sarath Chandra Tumuluri
6.3 Analysing ACF Plot and PACF Plot for my de trended and de seasonalized
data
40 | P a g e
Sarath Chandra Tumuluri
As I can see a mixture of exponential decay and damped sinusoid in ACF and
PACF drops to 0 after lag 11 . It can be interpreted as P order autoregressive
model. Where P might be 11 or 12.
41 | P a g e