Академический Документы
Профессиональный Документы
Культура Документы
STOCK PRICES
November 26, 2013
TABLE OF CONTENTS
I.
Introduction.................................................................................................... 1
II.
III.
IV.
V.
VI.
Conclusion ................................................................................................... 33
I. INTRODUCTION
Predicting future stock prices has been a compelling topic for quite some time, as having
an accurate vision of the stock markets future performance can help traders invest more suitably
to maximize financial profit. Perhaps the most common method used to perform stock price
forecasting is time series analysis. As we learned in class, time series analysis is a type of
statistical study on a series of sequential data points over a period of time, where the data points
are usually measured at uniform time intervals. Time series forecasting, then, takes the analysis
from the time series data and attempts to predict what the data will be in the near future, based on
what it has been in the past. This concept is especially important in the field of quantitative
finance because traders want to make wise moves at the right times to maximize their own
welfare. However, there are many factors that influence the fluctuation of the stock market, so
creating an accurate forecast based on time series analysis alone is challenging.
For this project, our team chose to model and forecast the stock prices of three
companies: Google, Apple, and Yahoo. We selected these specific companies because they all
operate in a similar branch of business information technology. As most people know,
Google Inc. is a corporation that specializes in a variety of Internet-related products and services.
Similarly, Yahoo! Inc. is an Internet corporation that is globally known for its impressive range
of services. Finally, Apple Inc. is a corporation that designs, manufactures, and sells computer
software and personal computers. These three companies are very popular in the United States
today, and there is a large amount of information about them available online.
Our teams main goal for this project was to find good models for the stock prices of the
three companies described above to predict the future stock price. However, we also wanted to
implement a couple of new approaches to modeling that were not explicitly discussed in class
throughout the semester. As you will see, our results indicate that there is not one method that
gives the best result for each and every time series.
dataset to form the training set for each company. Similarly, we used the final 252 observations
from the entire dataset of each company to construct the testing set for each company.
For Google, Apple, and Yahoo, the entire dataset for each company contains the open,
high, low, close, and adjusted close stock prices on every trading day from October 24, 2011 to
October 24, 2013. The datasets also contain trading volume values on every trading day. To
achieve consistency, we used the close prices as a general measure of the stock price. By
definition, the closing price of a stock is the final price at which that stock is traded on a given
trading day. It represents the most up-to-date valuation of the stock until trading begins again on
the next trading day. In other words, the stock prices we used in our analysis represent the
closing prices.
Since trading days are never on weekends, the first 252 observations happen to run from
October 24, 2011 to October 22, 2012 and the final 252 prices run from October 23, 2012 to
October 24, 2013 for each company. As the next section shows, quite a bit of analysis was done
on each companys training set. In addition, the models obtained for each company are quite
different, reflecting the fact that our group chose to evenly divide the work among ourselves.
That is, we decided to split the work for this project among the group by each company. Our
reasoning for this is that we wanted a different mind working on each company so all the
results would not be repetitive. We were hoping to obtain a fresh perspective on the appropriate
model for each company, and we were also trying to be efficient when conducting our analysis.
1000
600
400
200
1
16
31
46
61
76
91
106
121
136
151
166
181
196
211
226
241
256
271
286
301
316
331
346
361
376
391
406
421
436
451
466
481
496
800
Time (Days)
Google
RSS(2,1)
RSS(4,3)
RSS(6,5)
RSS(4,3)
RSS(3,2)
22563
21510
21028
21510
22561
RSS(4,3)
RSS(6,5)
RSS(8,7)
RSS(5,4)
RSS(4,3)
N-
245
241
237
243
245
s
4
4
4
2
2
21510
21028
20824
21249
21510
F
FINV(5%, s, N - )
2.998430962
2.40848837
1.381039566
2.409100382
0.580436035
2.409733235
1.492376112
3.032969422
5.985471874
3.032662958
These are the lambda values and the actual model we obtained:
l1 = -0.9233
l3 =1.0110 - 0.0472i
l2 =1.0110 + 0.0472i
l4 = 0.9498
0.1
P(k)
0.05
0
1
10
11
12
13
14
15
16
17
18
19
20
-0.05
-0.1
-0.15
Lag (k)
The Portmanteau test resulted in a value of Q = 8.33 whereas the critical value is 30.14.
Since Q is less than the critical value, our team concluded that the
at s are uncorrelated.
Moreover, the Bartlett Band test confirms this result notice in the above graph that the
at s are
less than the absolute value of 2/sqrt(252) 0.126. Therefore, AR(4, 3) is the adequate model.
The mean squared error (MSE) is 86.73 for the training part and 374.68 for the testing part. This
may imply that the model overts the data. Also, as seen above, there are lambda values greater
than one, so this model is unstable and non-stationary. If a single
the system response can exceed any bound (that is, explode, given sufficient time). As a result,
this non-stationarity needs to be eliminated. The complex roots are very close to one therefore,
(1- 2B + B 2 )seasonal operator was applied and a parsimonious model was calculated. We
the
(1- 2B + B2 )(1+ 0.3485B - 0.5651B2 )Xt = (1- 0.5671B - 0.9258B2 + 0.575B3 )at
After eliminating non-stationarity, the new model gave us a better MSE for the testing
portion of the data; the MSE is 96.67 for the training part and 150.44 for the testing part. This is
the graph of the prediction for the testing part using ARIMA(2, 2, 3):
800
600
400
200
0
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
113
120
127
134
141
148
155
162
169
176
183
190
197
204
211
218
225
232
239
246
1000
Time (Days)
Prediction
Observed
On the other hand, when using the ARMA(n, n - 1) modeling strategy, we selected AR(1)
as the adequate model:
RSS(1,0) 22932
RSS(1,0) 22932
RSS(2,1)
RSS(1,1)
22563
22900
N-
249
250
s
2
1
F
2.036098923
0.349344978
FINV(5%, s, N - )
3.032064916
3.878923701
Xt - 0.9825Xt-1 = at
Furthermore, we found AR(1) to be adequate after applying the Portmanteau and the Bartlett
Band tests. The MSE is 91.36 for the training part and 162.61 for the testing part. Notice that
the MSE value for the testing part is higher than the MSE value we obtained for the
ARIMA(2, 2, 3) model.
Another Approach
1200
800
600
400
200
1
16
31
46
61
76
91
106
121
136
151
166
181
196
211
226
241
256
271
286
301
316
331
346
361
376
391
406
421
436
451
466
481
496
1000
Time (Days)
Google
Trend
The models discussed so far are based on the assumption that the mean and covariance are
independent of the time origin; this assumption implies that the mean is constant and the
autocovariance depends only on the lag. If we look at the graph on the previous page that shows
Googles stock prices, it is evident there is a trend the behavior of Googles stock price
depends on the time origin. At this point, our team first tried to remove this non-stationary trend
and model the remaining data.
Our group decomposed the series into two parts.
stationary trend by a deterministic function that depends on the time origin. The second part
represents stochastic behavior that can be modeled using ARMA. To model the deterministic
part, we applied linear regression formulae (the least squares estimation (LSE) method) to the
centralized series of the training part and estimated the parameters b 0 and b1 .
Here, the
Yt = b0 + b1t + et
b0 = -47.98
b1 = 0.379
300
250
200
150
100
50
0
-50
-100
-150
1
15
29
43
57
71
85
99
113
127
141
155
169
183
197
211
225
239
253
267
281
295
309
323
337
351
365
379
393
407
421
435
449
463
477
491
Residuals
Time (Days)
Residuals
After removing the deterministic trend, the residuals now have a constant zero mean. We
can model this stochastic part using ARMA models. Using the ARMA(2n, 2n - 1) modeling
strategy, we found AR(1) to be the adequate model.
Xt = 0.9727Xt-1 + at
RSS(2,1)
RSS(1,0)
RSS(1,0)
22470
22777
22777
RSS(4,3)
RSS(2,1)
RSS(1,1)
s
4
2
1
21972
22470
22734
N-
245
249
251
F
1.38824413
1.70100134
0.47475147
FINV(5%, s, N - )
2.40848837
3.032064916
3.878773587
The Portmanteau test resulted in a value of Q = 9.184 whereas the critical value was
30.14. Since the Q value is lower than the threshold value, it can be concluded that the
uncorrelated.
at s are
Therefore, AR(1) is
P(k)
0.05
0
1
10
11
12
13
14
15
16
17
18
19
20
-0.05
-0.1
-0.15
Lag (k)
Finally, combining the deterministic and stochastic parts, the complete model can be adopted for
the Google stock prices as:
10
Yt = b0 + b1t + Xt
Yt = -47.98+ 0.379t + 09727Xt-1 + at
where X t is the stationary part that follows AR(1).
300
250
200
150
100
50
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
103
109
115
121
127
133
139
145
151
157
163
169
175
181
187
193
199
205
211
217
223
229
235
241
247
Time (Days)
Prediction
Observed
Trend
The mean squared error for the training set is 90.74, while the mean squared error for the
testing set is 151.8. As seen in the figure below, the trend line does not fit the series of the
testing part well and, consequently, prediction does not perform well enough for the testing part.
11
If the Google stock price data series is examined carefully, it can be seen that the stock
prices have a tendency to increase once they start increasing as seen in the rounded rectangles
above. From this realization, we tried removing the trend depicted by the green line in the figure
above and modeled the remaining residuals again.
Yt = b0 + b1t + et
b0 = -56.71
b1 = 3.01
After removing the deterministic trend, the stochastic part is modeled using the ARMA(2n, 2n 1) modeling strategy. We selected ARMA(2, 1) as the adequate model and the Portmanteau and
Bartlett Band tests approved its adequacy. Here is the model for the residuals:
12
Yt = b0 + b1t + et
b0 = -75.9
b1 = 0.6
400
200
100
1
15
29
43
57
71
85
99
113
127
141
155
169
183
197
211
225
239
253
267
281
295
309
323
337
351
365
379
393
407
421
435
449
463
477
491
300
-100
Time (Days)
Prediction
Observed
13
Trend
Then, AR(1) was selected as the best model for the remaining stochastic part after removing the
trend. Again, we confirmed the adequacy of this model by conducting both the Portmanteau and
Bartlett Band tests. The complete model is:
Yt = 553.331*(1.00101t ) + et
The residuals that remained after removing the trend above from the actual data were centralized
and fitted to a model using the ARMA(2n, 2n - 1) modeling strategy. AR(1) was selected as the
adequate model and the Portmanteau and Bartlett Band tests approved its adequacy once more.
This is the model for the residuals:
Xt = 0.9757Xt-1 + at
Also, this is the complete model:
Yt = 553.331*(1.00101t ) + 0.9757Xt-1 + at
The mean squared error for the training set is 90.93, while the mean squared error for the test set
is 137.58.
14
1000
800
600
400
200
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
113
120
127
134
141
148
155
162
169
176
183
190
197
204
211
218
225
232
239
246
Time (Days)
(Testing Part)
Trend
Prediction
Observed
Model
ARMA(4,3)
ARIMA(2,2,3)
AR(1)
Deterministic Trend and AR(1)
( Yt = -47.98+ 0.379t + 09727Xt-1 + at )
Deterministic Trend and ARMA(2,1)
( Yt = -56.71+ 3.01t + 0.255Xt-1 + 0.754Xt-2 + at + 0.8078at-1)
Deterministic Trend and AR(1)
( Yt = -75.9 + 0.6t + 0.3372Xt-1 + at )
Exponential Trend and AR(1)
t
( Yt = 553.331*(1.00101 ) + 0.9757Xt-1 + at )
MSE for
Training
Part
86.73
96.67
91.36
MSE for
Testing
Part
374.68
150.44
162.61
90.74
151.8
93.4
153.23
90.52
140.89
90.93
137.58
Based on these results, it can be concluded that the model with exponential trend fits the data
best since it gives the lowest MSE value for the testing portion.
15
N-
FINV(5%, s, N-)
RSS(1,0)
21701
RSS(2,1)
21605
249
0.55
3.03
RSS(2,1)
21605
RSS(4,3)
20544
245
3.16
2.41
RSS(4,3)
20544
RSS(6,5)
19641
241
2.77
2.41
RSS(6,5)
19641
RSS(8,7)
18897
237
2.33
2.41
RSS(6,5)
19641
RSS(7,6)
18825
239
5.18
3.03
16
Comparing each consecutive model using the F-test, the AR(1) model is better than the
ARMA(2, 1) model and thus the AR(1) model was adopted. This model is:
Xt - 0.9933Xt-1 = at
The Portmanteau test resulted in a Q value of 34.34, while the threshold in this case is 30.14.
Since Q is greater than this threshold, it can be concluded that the
at s are correlated.
Furthermore, the Bartlett band with a time lag of 20 shows an autocorrelation that is too high for
a time lag of 10. Based on this observation, the AR(1) model does not fit the data well; hence,
the ARMA(2n, 2n - 1) modeling strategy was continued. This finally yielded an ARMA(7, 6)
model. This model is:
The Portmanteau test resulted in a Q value of 12.81, while the threshold in this case is 14.07.
Since Q is less than this threshold, it can be concluded that the
17
Bartlett band with time lag 20 confirms this conclusion, as all time lags have an autocorrelation
less than 2/sqrt(252) 0.126. The ARMA(7, 6) model fits the data well.
Next, we calculated the values of lambda. There are seven total roots:
Lambda
| Lambda |
Period
-0.8210 0.4919i
0.9571
2.48
0.0449 0.9414i
0.9425
4.12
0.9767
0.9767
0.8990 0.0565i
0.9007
18
13.86
One of these values is close to one, so a stochastic constant trend exists. The three pairs of
complex roots are each close to one in absolute value, which represents seasonality with periods
of 2.48, 4.12, and 13.86 respectively.
The obtained model was evaluated by calculating one-step ahead predictions for all the
data in the test set using the model we built based on the training set. The predictions and actual
stock prices of the second year are pictured below.
The mean squared error for the training set was 75.00, while the mean squared error for the test
set was much higher: 120.23. This might imply that the model overfits the data.
Our last effort was to remove the trend and seasonality from the data. Using the roots we
obtained earlier, we formed the following time series model:
19
Hence,
wt is a MA(6) model, and by fitting this model using MATLAB, the following
(that is, the testing portion of the data) are pictured below.
The mean squared error (MSE) for the training set was 91.39, which is higher than the MSE of
75.00 for the ARMA(7, 6) model obtained earlier. However, the MSE for the test set was only
slightly higher than for the training set 109.18, which is in fact less than the MSE of the
previous model of 120.23. Consequently, it seems like the ARMA(7, 6) overfits the data,
especially compared to the MA(6) model. The MA(6) model does not seem to overfit the data,
since the MSE of the test set is only slightly higher than the MSE of the training set. Based on
20
these results, it can be concluded that the best model is the MA(6) model applied to the data
where trend and seasonality are removed.
30
25
20
15
100
200
300
21
400
500
The first 252 data points correspond to the training portion, wherein the data is used to fit a
forecast model. The latter 252 data points correspond to the testing portion, wherein the model is
used to forecast actual stock prices.
RSS
11.76
10.91
10.08
9.03
9.03
ARMA Model
(4,3)
(6,5)
(8,7)
(10,9)
(9,8)
RSS
10.91
10.08
9.03
9.58
8.49
s
4
4
4
4
2
N-
245
241
237
233
235
F
4.79
4.99
6.89
-3.35
7.41
F Crit. (0.05,s,N-)
2.41
2.41
2.41
2.41
3.03
As Table 1 shows, we decided to make one further comparison between the ARMA(8, 7) and
ARMA(9, 8) models and ultimately chose the latter. To check the validity of the model, our
team conducted the Bartlett Band test at = 0.05. However, correlation at lag 15 lies outside of
the band, so we concluded that the model is inadequate.
Therefore, we conducted the same test with ARMA(8, 7), but the correlation at lag 12
also lies outside of the band. Instead of trying the ARMA(6, 5) model, we considered the
ARMA(7, 6) model, given that the ARMA(7, 6) model (with respect to the ARMA(6, 5) model)
has a non-trivial F value of 6.24, which is greater than the threshold of 3.03. The Bartlett Band
test is illustrated in Exhibit 2 on the next page:
22
Bartlett Band
0.15
0.1
Correlations
0.05
0
0
10
12
14
16
18
-0.05
-0.1
-0.15
Lag
Real
-0.781
-0.781
-0.099
-0.099
0.856
0.856
0.825
Imaginary
0.497
-0.497
0.890
-0.890
0.465
-0.465
0.000
r
0.925
0.925
0.896
0.896
0.974
0.974
0.825
23
-1.561
-1.561
-0.198
-0.198
1.711
1.711
2.575
2.575
1.682
1.682
0.498
0.498
Period
2.440
2.440
3.736
3.736
12.624
12.624
20
The periodicities on the previous page show the periodic nature of the model, with the largest
period being 12.624. Smaller periods of 3.736 and 2.440 are also present in the model.
Results
The model replicated the actual observations during the first 252 days quite accurately with an
MSE of 0.038 (for training data) as shown in Exhibit 3 below:
MSE = 0.038
16.5
16
15.5
15
14.5
14
0
50
100
150
200
250
Actual
24
35
30
25
MSE = 2.209
20
15
10
5
0
253
303
353
403
453
Actual
25
503
distributed with zero mean and a standard deviation of one. As a result, E[dB ] and E[dB 2 ] = dt .
Brownian motion has the following properties :
1) Continuity : B(t) is a continuous-time process.
2) Markov property: : B(t) only depends on the previous value.
3) Martingale property : E( Bn 1 | B1 , , Bn ) = Bn .
The stochastic behavior of a stock price St follows the geometric Brownian motion process and
can be written as dSt = St dt St dBt . The solution to this stochastic differential equation can
2
t Wt .
be found by applying the famous It formula. It follows that St = S 0 exp
26
The stock volatility measures the stability of the stock price and it can be computed by the
following equations :
1 n
t =1 (ut u ) 2
n 1
ut = ln
St
St 1
( ) 2
. According to the
2
previous results and assumptions, the expected value E ( S t ) of the stock price at future time t is
given by: E ( St ) = S0 exp y t and y
2
2
2
2
depending on the forecasting approach. In the first approach (denoted by Forecast 1 in the
graphs that complement this section), we forecasted future stock prices by using n-step ahead
prediction and found that the prediction was satisfactory up to 20 days step ahead prediction.
After that point, our model did not capture the actual stock price movements. For our second
approach (denoted by Forecast 2 in the associated graphs), we proposed a different way to
update our forecast so that we could capture the change in the volatility of the stock market. The
volatility in general is not constant and changes due to many factors. We updated the stock
volatility plus the rate of return y
2
2
the stock price. The updating process started by observing the oscillation in the stock price
27
movement in the testing data and measuring its yt value. Then, we predicted the one step-ahead
stock price and periodically updated it with the real value. The forecasted data from this
approach was better and has a time series plot slightly similar to the actual data.
Finally, our group used MATLAB to implement the Brownian motion forecasting
algorithm, and we then computed the future stock prices of the three companies. Brownian
motion produced more than 1,000 random walk paths of the stock prices movements and one of
them has been selected randomly and has been compared with the testing data. We have
observed that our Brownian motion model is an accurate and good predictor method if the
prediction step is less than a month. Brownian motion outcomes slightly deviate from the testing
data when the prediction step is more than a month and that deviation could come from the
volatility of the stock price or the market rate of return. The Brownian motion generated
different paths and the selected path is not guaranteed to be the perfect one. The future path of
the stock price should be selected with accuracy and updated with any information available
about the stock price behavior, similar to what was done in the first two approaches.
Google
1200
1000
Apple
800
600
400
200
Yahoo
1
9
17
25
33
41
49
57
65
73
81
89
97
105
113
121
129
137
145
153
161
169
177
185
193
201
209
217
225
233
241
249
Real
Forecast 1
28
Forecast 2
Forecast 3
1
9
17
25
33
41
49
57
65
73
81
89
97
105
113
121
129
137
145
153
161
169
177
185
193
201
209
217
225
233
241
249
1
9
17
25
33
41
49
57
65
73
81
89
97
105
113
121
129
137
145
153
161
169
177
185
193
201
209
217
225
233
241
249
Apple
1000
900
800
700
600
500
400
300
200
100
Real
Real
Forecast 1
Forecast 2
Forecast 1
Forecast 2
29
Forecast 3
Yahoo
40
35
30
25
20
15
10
Forecast 3
30
31
32
VI. CONCLUSION
Based on our results, there was not much volatility for Yahoo, whereas with Google and
Apple, there was much more oscillation. Based on the Brownian motion analysis, it was easier
to capture the stock fluctuations for Yahoo than for Google. The variance for Apple is very high
one reason for this variability might be that Apple did something very different as a company
in 2011 compared to its operations in 2012. Overall, our results for the three stocks are very
different. We have very different MSEs for the three individual companies, and we noted that
Yahoos stock price is much more reasonable to predict. In conclusion, our group enjoyed
working on this project together and having the opportunity to implement new procedures such
as deterministic trend, exponential trend, and Brownian motion to help us achieve our objective
of finding the best models for each companys stock price.
33
VII. REFERENCES
[1] Baxter, Martin, and Andrew Rennie. Financial Calculus: An Introduction to
Derivative Pricing. Cambridge: Cambridge UP, 1996. Print.
[2] Beichelt, Frank. Stochastic Processes in Science, Engineering, and Finance.
Boca Raton: Chapman & Hall/CRC, 2006. Print.
[3] Fama, E. Random Walks in Stock Market Prices. Financial Analysis
Journal, Vol. 51 (1): 1965. 1-6.
[4] Ladde, G.S. and L. Wu. Development of Modified Geometric Brownian
Motion Models by using Stock Price Data and Basic Statistics, Vol. 71 (12):
15 Dec. 2009.
[5] Mun, Johnathan. Applied Risk Analysis: Moving beyond Uncertainty in
Business. Hoboken, NJ: Wiley, 2004. Print.
[6] Pandit, Sudhakar M., and Shien-Ming Wu. Time Series and System Analysis
with Applications. Malabar, FL: Krieger Pub., 2001. Print.
[7] Ross, Sheldon M. An Elementary Introduction to Mathematical Finance.
NewYork: Cambridge UP, 2011. 38-39. Print.
[8] Ross, Stephen A., Randolph Westerfield, and Bradford D. Jordan.
Fundamentals of Corporate Finance. 10th ed. New York, NY:
McGraw-Hill/Irwin, 2013. 401-02. Print.
34