Вы находитесь на странице: 1из 36

MODELING & FORECASTING

STOCK PRICES
November 26, 2013

IOE 565: Team #6


Abdullah Alshelahi
Caner Arslan
Wouter Hielckert
Colin Jones
Steve Kim

TABLE OF CONTENTS
I.

Introduction.................................................................................................... 1

II.

How We Collected Our Data ....................................................................... 3

III.

System Analysis: Procedures & Results ..................................................... 5


Part A: Forecasting Google Stock Prices .............................................. 5
Part B: Forecasting Apple Stock Prices .............................................. 16
Part C: Forecasting Yahoo Stock Prices ............................................. 21

IV.

Another Approach: Forecasting Stock Prices Using Brownian Motion 26

V.

Discussion & Significance of the Results .................................................. 31

VI.

Conclusion ................................................................................................... 33

VII. References .................................................................................................... 34

I. INTRODUCTION
Predicting future stock prices has been a compelling topic for quite some time, as having
an accurate vision of the stock markets future performance can help traders invest more suitably
to maximize financial profit. Perhaps the most common method used to perform stock price
forecasting is time series analysis. As we learned in class, time series analysis is a type of
statistical study on a series of sequential data points over a period of time, where the data points
are usually measured at uniform time intervals. Time series forecasting, then, takes the analysis
from the time series data and attempts to predict what the data will be in the near future, based on
what it has been in the past. This concept is especially important in the field of quantitative
finance because traders want to make wise moves at the right times to maximize their own
welfare. However, there are many factors that influence the fluctuation of the stock market, so
creating an accurate forecast based on time series analysis alone is challenging.
For this project, our team chose to model and forecast the stock prices of three
companies: Google, Apple, and Yahoo. We selected these specific companies because they all
operate in a similar branch of business information technology. As most people know,
Google Inc. is a corporation that specializes in a variety of Internet-related products and services.
Similarly, Yahoo! Inc. is an Internet corporation that is globally known for its impressive range
of services. Finally, Apple Inc. is a corporation that designs, manufactures, and sells computer
software and personal computers. These three companies are very popular in the United States
today, and there is a large amount of information about them available online.
Our teams main goal for this project was to find good models for the stock prices of the
three companies described above to predict the future stock price. However, we also wanted to

implement a couple of new approaches to modeling that were not explicitly discussed in class
throughout the semester. As you will see, our results indicate that there is not one method that
gives the best result for each and every time series.

II. HOW WE COLLECTED OUR DATA


One of the first things our team needed to do was figure out how we were going to collect
the stock price data for Google, Apple, and Yahoo. Also, we had to agree on a fixed time
horizon we would consider when performing our analysis. After some discussion, we agreed to
analyze a two-year time horizon of daily stock prices from October 24, 2011 to October 24, 2013
for each of the three companies.
Once we finished that step, we knew we had to split the data for each company into parts.
Based on the discussions we had in class during the early stages of this project, we learned that
data partitioning is a necessary step in many predictive exercises the basic idea is to separate
the entire dataset into a training set and a testing (or validation) set. Why do we need to split the
data into two parts? As we learned in class, we partition the data because we want to ensure that
our model does a good job of predicting the seen data. If our model fulfills our expectations,
then we have some level of confidence about the predictive power of the model when we are
presented with the unseen data. Therefore, to partition the stock price data for each of the
three companies, our group (randomly) decided to chose the first half of the samples for the
training set and the remaining, more recent 50 percent of the samples for the testing set.
Next, our team extracted daily stock prices for Google, Apple, and Yahoo from the
Yahoo finance website: http://finance.yahoo.com/. For each company, the entire dataset consists
of 504 observations (prices). As discussed in the introduction, our teams main goal was to find
the best model possible for predicting each companys stock price. So, based on the trainingtesting method described above, we used the first 252 observations from each companys whole

dataset to form the training set for each company. Similarly, we used the final 252 observations
from the entire dataset of each company to construct the testing set for each company.
For Google, Apple, and Yahoo, the entire dataset for each company contains the open,
high, low, close, and adjusted close stock prices on every trading day from October 24, 2011 to
October 24, 2013. The datasets also contain trading volume values on every trading day. To
achieve consistency, we used the close prices as a general measure of the stock price. By
definition, the closing price of a stock is the final price at which that stock is traded on a given
trading day. It represents the most up-to-date valuation of the stock until trading begins again on
the next trading day. In other words, the stock prices we used in our analysis represent the
closing prices.
Since trading days are never on weekends, the first 252 observations happen to run from
October 24, 2011 to October 22, 2012 and the final 252 prices run from October 23, 2012 to
October 24, 2013 for each company. As the next section shows, quite a bit of analysis was done
on each companys training set. In addition, the models obtained for each company are quite
different, reflecting the fact that our group chose to evenly divide the work among ourselves.
That is, we decided to split the work for this project among the group by each company. Our
reasoning for this is that we wanted a different mind working on each company so all the
results would not be repetitive. We were hoping to obtain a fresh perspective on the appropriate
model for each company, and we were also trying to be efficient when conducting our analysis.

III. SYSTEM ANALYSIS: PROCEDURES


& RESULTS

Part A: Forecasting Google Stock Prices


As our team just established, we decided to split the stock price data for Google into two
parts. The first part the first 252 prices is used for training our model and the second part
the last 252 prices is used for testing our model. The whole dataset (504 observations) for
Google is shown here:

Daily Google Stock Prices (10/24/11 - 10/24/13)


1200

1000

600

400

200

1
16
31
46
61
76
91
106
121
136
151
166
181
196
211
226
241
256
271
286
301
316
331
346
361
376
391
406
421
436
451
466
481
496

Stock Price ($)

800

Time (Days)
Google

Using the ARMA(2n, 2n - 1) modeling strategy, we determined ARMA(4, 3) to be the


adequate model for the training dataset. The F-tests were computed as follows:

RSS(2,1)
RSS(4,3)
RSS(6,5)
RSS(4,3)
RSS(3,2)

22563
21510
21028
21510
22561

RSS(4,3)
RSS(6,5)
RSS(8,7)
RSS(5,4)
RSS(4,3)

N-
245
241
237
243
245

s
4
4
4
2
2

21510
21028
20824
21249
21510

F
FINV(5%, s, N - )
2.998430962
2.40848837
1.381039566
2.409100382
0.580436035
2.409733235
1.492376112
3.032969422
5.985471874
3.032662958

These are the lambda values and the actual model we obtained:

l1 = -0.9233

l3 =1.0110 - 0.0472i

l2 =1.0110 + 0.0472i

l4 = 0.9498

Xt - 2.048Xt-1 + 0.2009Xt-2 +1.746Xt-3 - 0.8983Xt-4 = at -1.039at-1 - 0.9175at-2 + 0.9596at-3

Bartlett Band Test


0.15

0.1

P(k)

0.05

0
1

10

11

12

13

14

15

16

17

18

19

20

-0.05

-0.1

-0.15

Lag (k)

The Portmanteau test resulted in a value of Q = 8.33 whereas the critical value is 30.14.
Since Q is less than the critical value, our team concluded that the

at s are uncorrelated.

Moreover, the Bartlett Band test confirms this result notice in the above graph that the

at s are

less than the absolute value of 2/sqrt(252) 0.126. Therefore, AR(4, 3) is the adequate model.
The mean squared error (MSE) is 86.73 for the training part and 374.68 for the testing part. This
may imply that the model overts the data. Also, as seen above, there are lambda values greater
than one, so this model is unstable and non-stationary. If a single

at is injected into the system,

the system response can exceed any bound (that is, explode, given sufficient time). As a result,
this non-stationarity needs to be eliminated. The complex roots are very close to one therefore,

(1- 2B + B 2 )seasonal operator was applied and a parsimonious model was calculated. We

the

arrived at this ARIMA(2, 2, 3) model:

(1- 2B + B2 )(1+ 0.3485B - 0.5651B2 )Xt = (1- 0.5671B - 0.9258B2 + 0.575B3 )at
After eliminating non-stationarity, the new model gave us a better MSE for the testing
portion of the data; the MSE is 96.67 for the training part and 150.44 for the testing part. This is
the graph of the prediction for the testing part using ARIMA(2, 2, 3):

Forecast of Google Stock Prices (Testing Part)


1200

800
600
400
200
0

1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
113
120
127
134
141
148
155
162
169
176
183
190
197
204
211
218
225
232
239
246

Stock Price ($)

1000

Time (Days)
Prediction

Observed

On the other hand, when using the ARMA(n, n - 1) modeling strategy, we selected AR(1)
as the adequate model:

RSS(1,0) 22932
RSS(1,0) 22932

RSS(2,1)
RSS(1,1)

22563
22900

N-
249
250

s
2
1

F
2.036098923
0.349344978

FINV(5%, s, N - )
3.032064916
3.878923701

Xt - 0.9825Xt-1 = at
Furthermore, we found AR(1) to be adequate after applying the Portmanteau and the Bartlett
Band tests. The MSE is 91.36 for the training part and 162.61 for the testing part. Notice that
the MSE value for the testing part is higher than the MSE value we obtained for the
ARIMA(2, 2, 3) model.
Another Approach

1200

Daily Google Stock Prices (10/24/11 - 10/24/13)

800

600

400

200

1
16
31
46
61
76
91
106
121
136
151
166
181
196
211
226
241
256
271
286
301
316
331
346
361
376
391
406
421
436
451
466
481
496

Stock Price ($)

1000

Time (Days)
Google

Trend

The models discussed so far are based on the assumption that the mean and covariance are
independent of the time origin; this assumption implies that the mean is constant and the
autocovariance depends only on the lag. If we look at the graph on the previous page that shows
Googles stock prices, it is evident there is a trend the behavior of Googles stock price
depends on the time origin. At this point, our team first tried to remove this non-stationary trend
and model the remaining data.
Our group decomposed the series into two parts.

The first part represents a non-

stationary trend by a deterministic function that depends on the time origin. The second part
represents stochastic behavior that can be modeled using ARMA. To model the deterministic
part, we applied linear regression formulae (the least squares estimation (LSE) method) to the
centralized series of the training part and estimated the parameters b 0 and b1 .

Here, the

residuals ( e t ) are the deviations from the trend line.

Yt = b0 + b1t + et
b0 = -47.98
b1 = 0.379

300
250
200
150
100
50
0
-50
-100
-150

1
15
29
43
57
71
85
99
113
127
141
155
169
183
197
211
225
239
253
267
281
295
309
323
337
351
365
379
393
407
421
435
449
463
477
491

Residuals

Residuals after Removing


Trend (t = Yt - (-47.98 + 0.379t))

Time (Days)
Residuals

After removing the deterministic trend, the residuals now have a constant zero mean. We
can model this stochastic part using ARMA models. Using the ARMA(2n, 2n - 1) modeling
strategy, we found AR(1) to be the adequate model.

Xt = 0.9727Xt-1 + at
RSS(2,1)
RSS(1,0)
RSS(1,0)

22470
22777
22777

RSS(4,3)
RSS(2,1)
RSS(1,1)

s
4
2
1

21972
22470
22734

N-
245
249
251

F
1.38824413
1.70100134
0.47475147

FINV(5%, s, N - )
2.40848837
3.032064916
3.878773587

The Portmanteau test resulted in a value of Q = 9.184 whereas the critical value was
30.14. Since the Q value is lower than the threshold value, it can be concluded that the
uncorrelated.

Moreover, the Bartlett Band test confirms this result.

at s are

Therefore, AR(1) is

adequate for the residuals.

Bartlett Band Test


0.15
0.1

P(k)

0.05
0
1

10

11

12

13

14

15

16

17

18

19

20

-0.05
-0.1
-0.15

Lag (k)

Finally, combining the deterministic and stochastic parts, the complete model can be adopted for
the Google stock prices as:

10

Yt = b0 + b1t + Xt
Yt = -47.98+ 0.379t + 09727Xt-1 + at
where X t is the stationary part that follows AR(1).

Forecast of Google Stock Prices


450
400
350

Stock Price ($)

300
250
200
150

100
50

1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
103
109
115
121
127
133
139
145
151
157
163
169
175
181
187
193
199
205
211
217
223
229
235
241
247

Time (Days)
Prediction

Observed

Trend

The mean squared error for the training set is 90.74, while the mean squared error for the
testing set is 151.8. As seen in the figure below, the trend line does not fit the series of the
testing part well and, consequently, prediction does not perform well enough for the testing part.

11

If the Google stock price data series is examined carefully, it can be seen that the stock
prices have a tendency to increase once they start increasing as seen in the rounded rectangles
above. From this realization, we tried removing the trend depicted by the green line in the figure
above and modeled the remaining residuals again.

Yt = b0 + b1t + et
b0 = -56.71
b1 = 3.01
After removing the deterministic trend, the stochastic part is modeled using the ARMA(2n, 2n 1) modeling strategy. We selected ARMA(2, 1) as the adequate model and the Portmanteau and
Bartlett Band tests approved its adequacy. Here is the model for the residuals:

Xt = 0.255Xt-1 + 0.754Xt-2 + at + 0.8078at-1

12

Also, the complete model is:

Yt = -56.71+ 3.01t + 0.255Xt-1 + 0.754Xt-2 + at + 0.8078at-1


The complete model has a MSE of 93.4 for the training set and a MSE of 153.23 for the test set.
Finally, our team considered a trend that fits the whole data. The deterministic trend
shown below was calculated by applying regression analysis formulae to the whole dataset (that
is, the dataset containing both the training and testing parts).

Yt = b0 + b1t + et
b0 = -75.9
b1 = 0.6

Forecast of Google Stock Prices


500

400

200

100

1
15
29
43
57
71
85
99
113
127
141
155
169
183
197
211
225
239
253
267
281
295
309
323
337
351
365
379
393
407
421
435
449
463
477
491

Stock Price ($)

300

-100

Time (Days)
Prediction

Observed

13

Trend

Then, AR(1) was selected as the best model for the remaining stochastic part after removing the
trend. Again, we confirmed the adequacy of this model by conducting both the Portmanteau and
Bartlett Band tests. The complete model is:

Yt = -75.9 + 0.6t + 0.3372Xt-1 + at


Fortunately, this model has a mean squared error of 90.52 for the training data and a mean
squared error of 140.89 for the testing data.
Finally, our team tried modeling after removing exponential trend. Exponential trend
was calculated using Minitab for the whole part. The following equation is exponential trend for
the Google stock price series:

Yt = 553.331*(1.00101t ) + et
The residuals that remained after removing the trend above from the actual data were centralized
and fitted to a model using the ARMA(2n, 2n - 1) modeling strategy. AR(1) was selected as the
adequate model and the Portmanteau and Bartlett Band tests approved its adequacy once more.
This is the model for the residuals:

Xt = 0.9757Xt-1 + at
Also, this is the complete model:

Yt = 553.331*(1.00101t ) + 0.9757Xt-1 + at
The mean squared error for the training set is 90.93, while the mean squared error for the test set
is 137.58.

14

Forecast of Google Stock Prices


1200

Stock Price ($)

1000
800
600
400
200

1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
113
120
127
134
141
148
155
162
169
176
183
190
197
204
211
218
225
232
239
246

Time (Days)
(Testing Part)
Trend

Prediction

Observed

Model
ARMA(4,3)
ARIMA(2,2,3)
AR(1)
Deterministic Trend and AR(1)
( Yt = -47.98+ 0.379t + 09727Xt-1 + at )
Deterministic Trend and ARMA(2,1)
( Yt = -56.71+ 3.01t + 0.255Xt-1 + 0.754Xt-2 + at + 0.8078at-1)
Deterministic Trend and AR(1)
( Yt = -75.9 + 0.6t + 0.3372Xt-1 + at )
Exponential Trend and AR(1)
t
( Yt = 553.331*(1.00101 ) + 0.9757Xt-1 + at )

MSE for
Training
Part
86.73
96.67
91.36

MSE for
Testing
Part
374.68
150.44
162.61

90.74

151.8

93.4

153.23

90.52

140.89

90.93

137.58

Based on these results, it can be concluded that the model with exponential trend fits the data
best since it gives the lowest MSE value for the testing portion.

15

Part B: Forecasting Apple Stock Prices


Daily (closing) Apple stock price data from October 24, 2011 to October 24, 2013 was
split into two data sets of 252 data points each. Our team labeled the first half of the data the
training set and the second half of the data the test set (recall this is what we did for Google as
well). The training set was used to fit an ARMA model according to the ARMA(2n, 2n - 1)
modeling strategy, starting from the AR(1) model. The centralized dataset containing all 504
Apple stock prices as well as the F-test results are shown below:

N-

FINV(5%, s, N-)

RSS(1,0)

21701

RSS(2,1)

21605

249

0.55

3.03

RSS(2,1)

21605

RSS(4,3)

20544

245

3.16

2.41

RSS(4,3)

20544

RSS(6,5)

19641

241

2.77

2.41

RSS(6,5)

19641

RSS(8,7)

18897

237

2.33

2.41

RSS(6,5)

19641

RSS(7,6)

18825

239

5.18

3.03

16

Comparing each consecutive model using the F-test, the AR(1) model is better than the
ARMA(2, 1) model and thus the AR(1) model was adopted. This model is:

Xt - 0.9933Xt-1 = at
The Portmanteau test resulted in a Q value of 34.34, while the threshold in this case is 30.14.
Since Q is greater than this threshold, it can be concluded that the

at s are correlated.

Furthermore, the Bartlett band with a time lag of 20 shows an autocorrelation that is too high for
a time lag of 10. Based on this observation, the AR(1) model does not fit the data well; hence,
the ARMA(2n, 2n - 1) modeling strategy was continued. This finally yielded an ARMA(7, 6)
model. This model is:

Xt -1.222Xt-1 - 0.08271Xt-2 - 0.0295Xt-3 + 0.01842Xt-4 - 0.03697Xt-5 + 0.9984Xt-6 - 0.6448Xt-7


=

at-2 - 0.4489at-3 - 0.4088at-4 - 0.2871at-5 + 0.891at-6

The Portmanteau test resulted in a Q value of 12.81, while the threshold in this case is 14.07.
Since Q is less than this threshold, it can be concluded that the

17

at s are uncorrelated. The

Bartlett band with time lag 20 confirms this conclusion, as all time lags have an autocorrelation
less than 2/sqrt(252) 0.126. The ARMA(7, 6) model fits the data well.

Next, we calculated the values of lambda. There are seven total roots:
Lambda

| Lambda |

Period

-0.8210 0.4919i

0.9571

2.48

0.0449 0.9414i

0.9425

4.12

0.9767

0.9767

0.8990 0.0565i

0.9007

18

13.86

One of these values is close to one, so a stochastic constant trend exists. The three pairs of
complex roots are each close to one in absolute value, which represents seasonality with periods
of 2.48, 4.12, and 13.86 respectively.
The obtained model was evaluated by calculating one-step ahead predictions for all the
data in the test set using the model we built based on the training set. The predictions and actual
stock prices of the second year are pictured below.

The mean squared error for the training set was 75.00, while the mean squared error for the test
set was much higher: 120.23. This might imply that the model overfits the data.
Our last effort was to remove the trend and seasonality from the data. Using the roots we
obtained earlier, we formed the following time series model:

wt = (1- B)(1+1.6421B + B2 )(1- 0.0898B + B2 )(1-1.7979B + B2 )Xt

= at - q1at-1 - q2 at-2 - q3at-3 - q 4 at-4 - q5at-5 - q6 at-6

19

Hence,

wt is a MA(6) model, and by fitting this model using MATLAB, the following

parameters were placed in the above equation:

wt = at - 0.1757at-1 + 0.03724at-2 - 0.2669at-3 + 0.04324at-4 - 0.1618at-5 + 0.9611at-6


This model was again evaluated using one-step ahead predictions for the test set. The
predictions, after converting them back to

Xt , and actual stock prices during the second year

(that is, the testing portion of the data) are pictured below.

The mean squared error (MSE) for the training set was 91.39, which is higher than the MSE of
75.00 for the ARMA(7, 6) model obtained earlier. However, the MSE for the test set was only
slightly higher than for the training set 109.18, which is in fact less than the MSE of the
previous model of 120.23. Consequently, it seems like the ARMA(7, 6) overfits the data,
especially compared to the MA(6) model. The MA(6) model does not seem to overfit the data,
since the MSE of the test set is only slightly higher than the MSE of the training set. Based on

20

these results, it can be concluded that the best model is the MA(6) model applied to the data
where trend and seasonality are removed.

Part C: Forecasting Yahoo Stock Prices


Finally, as our group did for Google and Apple in the previous two parts of this section,
we collected 504 daily Yahoo closing stock prices (from October 24, 2011 through October 24,
2013), all of which served to construct our entire dataset. Noting again that there are 252 trading
days in a year, we split the dataset into two series of equal length for purposes of our analysis.
Exhibit 1 below shows the directionality of the Yahoo stock prices.

Daily Yahoo Stock Prices (10/24/11 - 10/24/13)


40
35

Stock Price ($)

30
25
20
15

First 252 Trading Days


10
5
0
0

100

200

300

Trading Day Since 10/24/2011

Exhibit 1: Yahoo Stock Prices

21

400

500

The first 252 data points correspond to the training portion, wherein the data is used to fit a
forecast model. The latter 252 data points correspond to the testing portion, wherein the model is
used to forecast actual stock prices.

Model Selection Method


Our team used the F-test approach for ARMA(2n, 2n -1) forecast model selection with
MATLAB. The table shown below shows the data output:

Table 1: Selection Details


ARMA Model
(2,1)
(4,3)
(6,5)
(8,7)
(8,7)

RSS
11.76
10.91
10.08
9.03
9.03

ARMA Model
(4,3)
(6,5)
(8,7)
(10,9)
(9,8)

RSS
10.91
10.08
9.03
9.58
8.49

s
4
4
4
4
2

N-
245
241
237
233
235

F
4.79
4.99
6.89
-3.35
7.41

F Crit. (0.05,s,N-)
2.41
2.41
2.41
2.41
3.03

As Table 1 shows, we decided to make one further comparison between the ARMA(8, 7) and
ARMA(9, 8) models and ultimately chose the latter. To check the validity of the model, our
team conducted the Bartlett Band test at = 0.05. However, correlation at lag 15 lies outside of
the band, so we concluded that the model is inadequate.
Therefore, we conducted the same test with ARMA(8, 7), but the correlation at lag 12
also lies outside of the band. Instead of trying the ARMA(6, 5) model, we considered the
ARMA(7, 6) model, given that the ARMA(7, 6) model (with respect to the ARMA(6, 5) model)
has a non-trivial F value of 6.24, which is greater than the threshold of 3.03. The Bartlett Band
test is illustrated in Exhibit 2 on the next page:

22

Bartlett Band
0.15

0.1

Correlations

0.05

0
0

10

12

14

16

18

-0.05

-0.1

-0.15

Lag

Exhibit 2: Bartlett Band for ARMA(7, 6)


Clearly, all correlations lie inside the bands, suggesting the model is adequate. The model
passed the Portmanteau test at a maximum lag of K = 20 with Q = 10.255 < 30.144 =
2 (0.95,19). The equation for the ARMA(7, 6) model is shown below:
(1 0.7766 0.1345 2 0.1987 3 + 0.3479 4 + 0.07447 5 + 0.5085 6
0.5369 7 )
= (1 + 0.1624 0.04794 2 0.2524 3 + 0.1669 4 + 0.1504 5 + 0.8714 6 )
This equation has the following characteristic roots:

Table 2: Characteristic Roots


Root
1
2
3
4
5
6
7

Real
-0.781
-0.781
-0.099
-0.099
0.856
0.856
0.825

Imaginary
0.497
-0.497
0.890
-0.890
0.465
-0.465
0.000

r
0.925
0.925
0.896
0.896
0.974
0.974
0.825

23

-1.561
-1.561
-0.198
-0.198
1.711
1.711

2.575
2.575
1.682
1.682
0.498
0.498

Period
2.440
2.440
3.736
3.736
12.624
12.624

20

The periodicities on the previous page show the periodic nature of the model, with the largest
period being 12.624. Smaller periods of 3.736 and 2.440 are also present in the model.

Results
The model replicated the actual observations during the first 252 days quite accurately with an
MSE of 0.038 (for training data) as shown in Exhibit 3 below:

Yahoo Training Data: Prediction vs. Actual


17

MSE = 0.038

Stock Price ($)

16.5
16
15.5
15
14.5
14
0

50

100

150

200

250

Trading Day Since 10/24/2011


Prediction

Actual

Exhibit 3: Training Graphs


On the other hand, the 252-day, one-step-ahead forecast was less accurate with a mean squared
error of 2.209 as shown on the next page in Exhibit 4:

24

Yahoo Testing Data: Prediction vs. Actual


40

Yahoo Stock Price (U$)

35
30
25

MSE = 2.209
20
15
10
5
0
253

303

353

403

453

Trading Day Since 10/24/2011


Prediction

Actual

Exhibit 4: Test Graphs


Note that the prediction diverges downward as time goes on. This is due to the fact that the
model was derived from constant trend stock prices of the first 252 days.

25

503

IV. ANOTHER APPROACH:


FORECASTING STOCK PRICES USING
BROWNIAN MOTION
Researchers have proposed several mathematical models to forecast future stock prices.
For our analysis, our team examined an additional tool besides ARMA modeling to forecast
Yahoo, Apple, and Google stock prices Brownian motion. For a little background, Brownian
motion, also known as the Wiener process, was first introduced by Robert Brown to describe the
motion exhibited by particles immersed in a gas or liquid. This process also describes the stock
price movements although Benoit Mandelbrot, a mathematician, rejected its applicability.
Browian motion can be formulated as a random walk with a drift: B = t Wt , where
Wt is a random walk process. In additon, Wt can be written as Wt = Z t where Z is normally

distributed with zero mean and a standard deviation of one. As a result, E[dB ] and E[dB 2 ] = dt .
Brownian motion has the following properties :
1) Continuity : B(t) is a continuous-time process.
2) Markov property: : B(t) only depends on the previous value.
3) Martingale property : E( Bn 1 | B1 , , Bn ) = Bn .
The stochastic behavior of a stock price St follows the geometric Brownian motion process and
can be written as dSt = St dt St dBt . The solution to this stochastic differential equation can

2
t Wt .
be found by applying the famous It formula. It follows that St = S 0 exp

26

The stock volatility measures the stability of the stock price and it can be computed by the
following equations :

1 n
t =1 (ut u ) 2
n 1

ut = ln

St
St 1

The expected annual rate of return or drift is denoted by =

( ) 2
. According to the
2

previous results and assumptions, the expected value E ( S t ) of the stock price at future time t is
given by: E ( St ) = S0 exp y t and y

2
2

The Computational Results


In this section, our team discusses how we forecasted the future stock prices of the three
companies using three approaches applied on the same model E ( St ) = S0 exp y t . We used
the historical date to obtain y

2
2

by either measuring the daily return or the total return

depending on the forecasting approach. In the first approach (denoted by Forecast 1 in the
graphs that complement this section), we forecasted future stock prices by using n-step ahead
prediction and found that the prediction was satisfactory up to 20 days step ahead prediction.
After that point, our model did not capture the actual stock price movements. For our second
approach (denoted by Forecast 2 in the associated graphs), we proposed a different way to
update our forecast so that we could capture the change in the volatility of the stock market. The
volatility in general is not constant and changes due to many factors. We updated the stock
volatility plus the rate of return y

2
2

for every period in the expected value equation of

the stock price. The updating process started by observing the oscillation in the stock price

27

movement in the testing data and measuring its yt value. Then, we predicted the one step-ahead
stock price and periodically updated it with the real value. The forecasted data from this
approach was better and has a time series plot slightly similar to the actual data.
Finally, our group used MATLAB to implement the Brownian motion forecasting
algorithm, and we then computed the future stock prices of the three companies. Brownian
motion produced more than 1,000 random walk paths of the stock prices movements and one of
them has been selected randomly and has been compared with the testing data. We have
observed that our Brownian motion model is an accurate and good predictor method if the
prediction step is less than a month. Brownian motion outcomes slightly deviate from the testing
data when the prediction step is more than a month and that deviation could come from the
volatility of the stock price or the market rate of return. The Brownian motion generated
different paths and the selected path is not guaranteed to be the perfect one. The future path of
the stock price should be selected with accuracy and updated with any information available
about the stock price behavior, similar to what was done in the first two approaches.
Google
1200
1000

Apple
800
600
400
200

Yahoo
1
9
17
25
33
41
49
57
65
73
81
89
97
105
113
121
129
137
145
153
161
169
177
185
193
201
209
217
225
233
241
249

Real

Forecast 1

28

Forecast 2

Forecast 3

1
9
17
25
33
41
49
57
65
73
81
89
97
105
113
121
129
137
145
153
161
169
177
185
193
201
209
217
225
233
241
249
1
9
17
25
33
41
49
57
65
73
81
89
97
105
113
121
129
137
145
153
161
169
177
185
193
201
209
217
225
233
241
249

Apple

1000

900

800

700

600

500

400

300

200

100

Real

Real
Forecast 1
Forecast 2

Forecast 1

Forecast 2

29
Forecast 3

Yahoo

40

35

30

25

20

15

10

Forecast 3

As a reminder, in the above graphs, Forecast 1 represents n-step ahead prediction,


Forecast 2 represents one-step ahead prediction, and Forecast 3 shows the randomly selected
Brownian motion path. From these graphs, we can see that Forecast 3 has the least mean
squared error because its deviation from the real line, which is shown in blue, is the smallest. In
fact, you can barely see the blue line because the Brownian motion outcomes mimic the real data
so well!

30

V. DISCUSSION & SIGNIFICANCE OF


THE RESULTS
In our analysis above, we assumed that the daily volatility and expected return of the first
year is equal to the daily volatility and expected return of the second year so that we would have
the same oscillation in the stock price movement. When we used the average sigma and mu, we
found out we could not carry out the calculations because in the expectation, we are ignoring the
random walk part in the sigma values. In other words, we are assuming sigma and mu are
constant over all days. Assuming the sigma and mu for each day in the training data is equal to
the sigma and mu in the testing data is better because we can capture all the movement in the
stock prices. Referring back to the graphs in the Brownian motion section of our report, a
constant mu and sigma will resemble the red line because if you take the expectation of mu and
sigma and just change the time, the outcome will always be an increasing or decreasing function.

Which Model Is Best for Each Company?


We found the MSE of each model for each company. We determined that for Google,
the lowest MSE was 137.58. For Google, the MSE for Brownian motion was 285.334.
Therefore, for Google, we should use the AR(1) model with exponential trend. For Apple, the
MA(6) model produced the lowest MSE value of 109.18, whereas the MSE for Brownian motion
was 154.7751064. Therefore, for Apple, we should use the MA(6) model. Lastly, for Yahoo,
the lowest MSE was 2.209, whereas the MSE for Brownian motion was 0.327270663. In this
case, we conclude that Brownian motion is the best option for Yahoo.

31

Significance of Our Results


We can use the models we developed for this project to predict future stock prices, and
we found that Yahoo has the lowest MSE, meaning that it is easier to predict Yahoo stock prices
for the existent conditions. This shows that some stocks are easier to predict than others you
can make money more easily when a companys stock price is more predictable. In other words,
if a company experiences the same (stable) conditions for the relevant time horizon (two years in
our case), we are confident we can predict the stock prices in the third year assuming the same
stable conditions hold.
An important conclusion we reached as a result of this project is that although there are
many modeling techniques available for stock price datasets, there is not one method that gives
the best results for each and every time series. Each time series has its own unique behavior that
presents many modeling challenges; these challenges need to be thoroughly examined before
selecting the best modeling technique that can be used to predict future patterns.

32

VI. CONCLUSION
Based on our results, there was not much volatility for Yahoo, whereas with Google and
Apple, there was much more oscillation. Based on the Brownian motion analysis, it was easier
to capture the stock fluctuations for Yahoo than for Google. The variance for Apple is very high
one reason for this variability might be that Apple did something very different as a company
in 2011 compared to its operations in 2012. Overall, our results for the three stocks are very
different. We have very different MSEs for the three individual companies, and we noted that
Yahoos stock price is much more reasonable to predict. In conclusion, our group enjoyed
working on this project together and having the opportunity to implement new procedures such
as deterministic trend, exponential trend, and Brownian motion to help us achieve our objective
of finding the best models for each companys stock price.

33

VII. REFERENCES
[1] Baxter, Martin, and Andrew Rennie. Financial Calculus: An Introduction to
Derivative Pricing. Cambridge: Cambridge UP, 1996. Print.
[2] Beichelt, Frank. Stochastic Processes in Science, Engineering, and Finance.
Boca Raton: Chapman & Hall/CRC, 2006. Print.
[3] Fama, E. Random Walks in Stock Market Prices. Financial Analysis
Journal, Vol. 51 (1): 1965. 1-6.
[4] Ladde, G.S. and L. Wu. Development of Modified Geometric Brownian
Motion Models by using Stock Price Data and Basic Statistics, Vol. 71 (12):
15 Dec. 2009.
[5] Mun, Johnathan. Applied Risk Analysis: Moving beyond Uncertainty in
Business. Hoboken, NJ: Wiley, 2004. Print.
[6] Pandit, Sudhakar M., and Shien-Ming Wu. Time Series and System Analysis
with Applications. Malabar, FL: Krieger Pub., 2001. Print.
[7] Ross, Sheldon M. An Elementary Introduction to Mathematical Finance.
NewYork: Cambridge UP, 2011. 38-39. Print.
[8] Ross, Stephen A., Randolph Westerfield, and Bradford D. Jordan.
Fundamentals of Corporate Finance. 10th ed. New York, NY:
McGraw-Hill/Irwin, 2013. 401-02. Print.

34

Вам также может понравиться