Вы находитесь на странице: 1из 18

MGSC 372 Final Review

April 2014
Question 1
A pension fund analyst is investigating pension packages for people working in three sectors:
Education, Government, and Industry. The study is conducted in five geographic regions:
Atlantic, Quebec, Ontario, Prairie, and BC. Four sample values are selected for each sector-
region combination.

1. What are the factors for this study?

Factor 1: Sector

Factor 2: Region

Specify the levels of each factor.

Factor 1: Level 1 = Education


Level 2 = Government
Level 3 = Industry

Factor 2: : Level 1 = Atlantic


Level 2 = Quebec
Level 3 = Ontario
Level 4 = Prairie
Level 5 = BC

2. What is the total sample size?

3 x 5 x 4 = 60
3. What type of statistical model is most appropriate for this study?

Two-way ANOVA with replication

Minitab output is as follows:

Two-way ANOVA: Value versus Region, Sector

Source DF SS MS F P
Region 4 24583 6145.8 16.15 0.000
Sector 2 76471 38235.5 100.50 0.000
Interaction 8 5232 654.0 1.72 0.120
Error 45 17120 380.5
Total 59 123407

4. Test the hypothesis that there is a significant interaction between Region and Sector.

Ho: No interaction

H1: Interaction

TS: F = 654/380.5 = 1.72

CV: F.05;8,45 2.18

Conclusion: Do not reject Ho i.e. no significant interaction


5. Test the hypothesis that the main effect Sector is significant.

Ho: µE = µG = µI

H1: Not all µ are equal

TS: F = 38235.5/380.5 = 100.5

CV: F.05;2,45 3.2

Conclusion: Reject Ho => Not all Sector means are equal

The Minitab output for Individual 95% CIs is shown below:

Individual 95% CIs For Mean Based on


Pooled StDev
Region Mean -+---------+---------+---------+--------
Atlantic 250.167 (---*----)
BC 268.500 (---*----)
Ontario 225.750 (---*----)
Prairie 245.000 (----*----)
Quebec 209.917 (----*----)
-+---------+---------+---------+--------
200 225 250 275

Individual 95% CIs For Mean Based on


Pooled StDev
Sector Mean -------+---------+---------+---------+--
Education 197.0 (--*--)
Government 238.2 (--*--)
Industry 284.4 (--*--)
-------+---------+---------+---------+--
210 240 270 300

We see that Industry has the highest values, Government is second and Education is the
lowest value.
6. Although the interaction is not statistically at the 5% level of significance, the analyst has decided
to inspect the interaction plot, and obtained the following result.

Interaction Plot for Value


Data Means
350 Sector
Education
Gov ernment
Industry
300
Mean

250

200

150
Atlantic BC Ontario Prairie Quebec
Region

Would you agree that there is no significant interaction? Explain. Identify any aspect of the
graph that might indicate some weak interaction.

Overall patterns are very similar, with the following exception:

Ontario Prairie

- Govt and Industry increase


- Education decreases
Question 2

Based on the scenario of Question 1, suppose the following data were collected:

A one-way ANOVA is conducted with the following results:

One-way ANOVA: Atlantic, Quebec, Ontario, Prairie, BC

Source DF SS MS F P
Factor 4 17158 4289 3.79 0.040
Error 10 11312 1131
Total 14 28470

S = 33.63 R-Sq = 60.27% R-Sq(adj) = 44.37%

Individual 95% CIs For Mean Based on


Pooled StDev
Level N Mean StDev -----+---------+---------+---------+----
Atlantic 3 240.00 32.79 (--------*--------)
Quebec 3 168.67 3.51 (--------*-------)
Ontario 3 210.00 36.06 (--------*--------)
Prairie 3 231.67 45.37 (-------*--------)
BC 3 270.33 34.79 (--------*--------)
-----+---------+---------+---------+----
150 200 250 300

What can you conclude from the above output?

Conclude not all 5 means are equal.

From the individual 95% CIs it appears that the only significant difference is between the mean
values for Quebec and BC.
Here is part of the Tukey post-hoc output:

Tukey 95% Simultaneous Confidence Intervals


All Pairwise Comparisons

Individual confidence level = 99.18%

Quebec subtracted from:

Lower Center Upper ---------+---------+---------+---------+


Ontario -48.96 41.33 131.63 (--------*--------)
Prairie -27.29 63.00 153.29 (--------*--------)
BC 11.37 101.67 191.96 (--------*--------)
---------+---------+---------+---------+
-100 0 100 200

Explain how you can tell that the mean value for Quebec is less than the mean value for BC.

Because the CI for µ BC - µ Q is (11.37, 191.96) showing that the mean for BC is greater than the
mean for Quebec by at least 11.37 and at most 191.96.

Construct a 98% CI for µ BC - µ Q

_ _
1 1
BC Q ( x BC x Q )t.01;10 s
nBC nQ
1 1
270.33-168.67 2.764(33.63)
3 3
101.66 75.90

25.76 BC Q 177.56
Construct a Bonferroni CI for µ Quebec with a family error rate of 10%.

𝑴𝑺𝑬
𝝁𝑸 = 𝟏𝟔𝟖. 𝟔𝟕 ± 𝒕∝/𝟐𝒌;𝒏𝑻−𝒑 √
𝟑

𝟏𝟏𝟑𝟏
= 𝟏𝟔𝟖. 𝟔𝟕 ± 𝒕.𝟎𝟏;𝟏𝟎 √
𝟑
= 𝟏𝟔𝟖. 𝟔𝟕 ± 𝟐. 𝟕𝟔𝟒√𝟑𝟕𝟕
= 𝟏𝟔𝟖. 𝟔𝟕 ± 𝟓𝟑. 𝟔𝟕

The Bonferroni CI with family confidence = 90% is 𝟏𝟏𝟓 ≤ 𝝁𝑸 ≤ 𝟐𝟐𝟐. 𝟑𝟒


Question 3

The following table shows an extract of average monthly exchange rates from US to Canadian dollars
from Jan 2006 to December 2009:
Time Series Plot of USD_CDN
1.30

1.25

1.20
USD_CDN

1.15

1.10

1.05

1.00

0.95
1 5 10 15 20 25 30 35 40 45
Index
Trend Analysis Plot for USD_CDN
Linear Trend Model
Yt = 1.1095 - 0.000214*t
1.30 Variable
A ctual
1.25 Fits

A ccuracy Measures
1.20 MA PE 5.77722
MA D 0.06345
MSD 0.00581
USD_CDN

1.15

1.10

1.05

1.00

0.95
1 5 10 15 20 25 30 35 40 45
Index

Comment on trend.

Very little trend. Slope is 0.000214.

Comment on stationarity with regard to the mean.

Since the trend is negligible, we can consider the series to be stationary.

Seasonal Indices

Period Index
1 1.03452
2 1.03519
3 1.03885
4 1.01790
5 0.98301
6 0.97400
7 0.97778
8 0.97726
9 0.97427
10 0.98507
11 0.99257
12 1.00959
Accuracy Measures

MAPE 5.30149
MAD 0.05773
MSD 0.00507

Estimate forecasts for Jan 2010 and June 2010 using trend and seasonal effects only. Assume t =
1 in January 2006.

In Jan 2010, t = 49. Therefore, we are calculating forecast values for t = 49 (Jan 2010) and t
= 54 (June 2010)

Jan 2010: t = 49 => T49 = 1.1095 - .000214(49) = 1.099014


S1 = 1.03452

F49 = 1.099014 x 1.03452 = 1.13695

June 2010: t = 54 => T54 = 1.1095 - .000214(54) = 1.097944


S6 = 0.974

F54 = 1.097944 x 0.974 = 1.0694

Calculate a deseasonalized exchange rate for April 2007

April 2007 corresponds to t = 16.

Deseasonalized value for April 2007 = Y16/S4 = 1.13425/1.0179 = 1.1143

The ACF and PACF graphs for the USD -> CDN exchange rates are show below:

Autocorrelation Function for USD_CDN Partial Autocorrelation Function for USD_CDN


(with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations)

1.0 1.0
0.8 0.8
0.6 0.6
Partial Autocorrelation

0.4 0.4
Autocorrelation

0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0

1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
Lag Lag
Comment on seasonality in the data.

Not much evidence of seasonal variation. (Because there are no spikes at seasonal lags)

Specify a potential ARIMA model.

Exponential decay in ACF

Significant spikes in PACF at lags 1 and 2. The spike at lag 5 is probably due to a random
shock since it is not at a seasonal lag and there is no obvious reason to expect that lag 5
would exert a major influence on the exchange rate data.

Therefore we suggest an ARIMA(2,0,0) model.

The ARIMA(2,0,0) model is shown below:

Final Estimates of Parameters

Type Coef SE Coef T P


AR 1 1.2983 0.1372 9.47 0.000
AR 2 -0.3988 0.1373 -2.90 0.006
Constant 0.111048 0.003992 27.82 0.000
Mean 1.10474 0.03971

Number of observations: 48
Residuals: SS = 0.0342418 (backforecasts excluded)
MS = 0.0007609 DF = 45

Write out the theoretical model.

Yt 0 Y
1 y 1 Y
2 t 2 et

Write out the estimated model.

Yˆt 0.111048 1.2983Yt 1 0.3988Yt 2


The last four months of data appear as follows:

Sep 2009 1.08176190 0.92441784


Oct 2009 1.05485238 0.94799995
Nov 2009 1.05957500 0.94377463
Dec 2009 1.05440000 0.94840668

Use this model to forecast values for Jan 2010.

Jan 2010: Y49 = .111048 + 1.2983Y48 - .3988Y47


= .111048 + 1.2983(1.0544) - .3988(1.059575)
= 1.057417

Question 4

Consider the ARIMA model ARIMA(2,1,1)

a) Express this model using the Backshift Operator.

(𝟏 − 𝑩)(𝟏 − 𝝓𝟏 𝑩 − 𝝓𝟐 𝑩𝟐 )𝒀𝒕 = 𝝓𝟎 − (𝟏 − 𝜽𝟏 𝑩)𝒆𝒕

b) Consider the model ARIMA(1,0,0)(0,1,0)4.

1) Express this in backshift notation.

(𝟏 − 𝑩𝟒 )(𝟏 − 𝝓𝟏 𝑩)𝒀𝒕 = 𝝓𝟎 + 𝒆𝒕

2) Express this in a form to forecast Yt based on lagged values of Yt.

𝒀𝒕 = 𝝓𝟎 + 𝝓𝟏 𝒀𝒕−𝟏 + 𝒀𝒕−𝟒 − 𝝓𝟏 𝒀𝒕−𝟓 + 𝒆𝒕


Question 5

The following annual time series data is to be analyzed to develop a suitable forecasting model.

t Y
1 8.7776
2 21.2374
3 13.9845
4 20.3498
5 13.2213
6 21.9456
7 16.8978
8 18.6708
9 15.9082
10 21.0304
11 22.6019
12 22.3881
13 20.2390
14 26.3421
15 26.7765
16 25.9876
17 21.0126
18 22.5638
19 28.7220
20 27.7834

The time series plot follows:

Time Series Plot of Y


30

25

20
Y

15

10

2 4 6 8 10 12 14 16 18 20
Index
Is the data set stationary.

NO

How would you make it stationary?

Take a first difference D1 = Yt – Yt-1

The time series plot of the first difference D1 is as follows:

Time Series Plot of D1


15

10

5
D1

-5

-10
2 4 6 8 10 12 14 16 18 20
Index

Explain why this time series appears to be stationary with regard to the mean.

No trend

A simple linear regression of Y on time (t) appears as follows:

Regression Analysis: Y versus t

The regression equation is


Y = 13.7 + 0.680 t

Predictor Coef SE Coef T P


Constant 13.681 1.550 8.83 0.000
t 0.6801 0.1294 5.26 0.000

S = 3.33663 R-Sq = 60.6% R-Sq(adj) = 58.4%

Analysis of Variance
Source DF SS MS F P
Regression 1 307.58 307.58 27.63 0.000
Residual Error 18 200.40 11.13
Total 19 507.98

Unusual Observations

Obs t Y Fit SE Fit Residual St Resid


2 2.0 21.237 15.041 1.329 6.196 2.02R

R denotes an observation with a large standardized residual.

Durbin-Watson statistic = 2.65415

Residual Plots for Y


Normal Probability Plot Versus Fits
99
5.0
90
2.5
Residual
Percent

50 0.0

-2.5
10
-5.0
1
-8 -4 0 4 8 15 18 21 24 27
Residual Fitted Value

Histogram Versus Order


4
5.0
3 2.5
Frequency

Residual

2 0.0

-2.5
1
-5.0
0
-6 -4 -2 0 2 4 6 2 4 6 8 10 12 14 16 18 20
Residual Observation Order

Comment on the assumptions of the regression model. If you think any of the assumptions are
not satisfied, explain your reasoning.

The four-in-one plot shows that the assumptions of normality and homoscedasticity appear
to be satisfied. However the “versus order” plot appears as if it may a have a pattern
indicating first order autocorrelation. We confirm this suspicion by examining the Durbin-
Watson statistic of 2.65415.

Looking up the critical values of the DW statistic for n = 20 and k = 1 we get DL,.05 = 1.20 and
DU,.05 = 1.41. Since DW = 2.65415 we must look at the upper tail values 4 – 1.41 = 2.59 and 4
– 1.20 = 2.80. Since DW = 2.65415 lies between 2.59 and 2.80 it is in the inconclusive region,
therefore we cannot make any claim about the presence or absence of first order negative
autocorrelation.
Thus, the simple regression model above may be an acceptable forecasting model. We note
that the adjusted R2 = .584 and MSE = 11.13.

Let us now investigate some ARIMA models. Output for three models is shown. Discuss the
models and recommend one, justifying your conclusion.

The ACF and PACF functions for D1 are shown below:

Autocorrelation Function for D1 Partial Autocorrelation Function for D1


(with 5% significance limits for the autocorrelations) (with 5% significance limits for the partial autocorrelations)

1.0 1.0
0.8 0.8
0.6 0.6

Partial Autocorrelation
0.4 0.4
Autocorrelation

0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1.0 -1.0

1 2 3 4 5 6 1 2 3 4 5 6
Lag Lag

We see that these graphs have a significant spike at lag 1 so an ARIMA(1,1,0) or


ARIMA(0,1,1) would be plausible models. Let’s investigate further.

ARIMA(1,1,0)

Final Estimates of Parameters

Type Coef SE Coef T P


AR 1 -0.8386 0.1462 -5.74 0.000
Constant 1.3981 0.8380 1.67 0.114

Differencing: 1 regular difference


Number of observations: Original series 20, after differencing 19
Residuals: SS = 226.610 (backforecasts excluded)
MS = 13.330 DF = 17

This model has a significant p-value for the Lag 1 variable Yt-1 but the MS value is 13.33
(compared with 11.13 for the SLR model).

ARIMA(0,1,1)

Type Coef SE Coef T P


MA 1 0.9458 0.2839 3.33 0.004
Constant 0.7029 0.1413 4.97 0.000

Differencing: 1 regular difference


Number of observations: Original series 20, after differencing 19
Residuals: SS = 178.990 (backforecasts excluded)
MS = 10.529 DF = 17
This model also has a significant Lag 1 variable et-1, but the MS values has been reduced to
10.529, better than the SLR model value of 11.13.

Finally, we will look at the mixed model ARIMA(1,1,1):

Final Estimates of Parameters

Type Coef SE Coef T P


AR 1 -0.4981 0.2103 -2.37 0.031
MA 1 0.9302 0.2233 4.17 0.001
Constant 0.98249 0.08066 12.18 0.000

Differencing: 1 regular difference


Number of observations: Original series 20, after differencing 19
Residuals: SS = 141.804 (backforecasts excluded)
MS = 8.863 DF = 16

Here we see that both the AR(1) term and the MA(1) are significant, and the MS values has
made a large drop to 8.863.

This is the model I recommend!

Вам также может понравиться