Вы находитесь на странице: 1из 4

Statistics 372 Winter 2014 Midterm Solution

out of 42

1. Briefly answer each of the following unrelated questions.
a) State the three principles of statistical thinking [3 marks]

All work occurs in a system of interconnected processes
Variation exists in all processes
Understanding and reducing variation are the keys to success

b) Suppose that the behaviour of a process output can be described by a random variable
1 2
Y X X = + where
1
X and
2
X are independent random variables describing the behaviour of two inputs. For example, Y may
be the total time to complete two tasks which occur sequentially. We can describe the variation in the output
by its standard deviation, denoted ( ) Y . [5 marks]
i. Express ( ) Y in terms of
1
( ) X and
2
( ) X
ii. Suppose
1
( ) X is 30% of ( ) Y . What is the percentage reduction in ( ) Y if we could hold
1
X fixed
and eliminate its contribution to the variation in the output?
iii. What are the consequences of the result in b) if you are trying to reduce variation in an output?

i) ( ) Y =

( ) ( )
2 2
1 2
X X +

ii) Set
1
( ) X =0.3 ( ) Y . Then ( ) Y = ( ) ( )
2 2
2
0.09 Y X + . So
2
( ) X = 0.91 ( ) Y .
If we could eliminate variation in
1
X the variation in Y would equal
2
( ) X so it would be roughly 95% (
100 0.91) of what it was originally. Thus the maximum percentage reduction in ( ) Y is 5%.
iii) Holding
1
X fixed would not help much. To reduce variation in Y substantially we need to focus on inputs
that have a large impact (like
2
X in the example).

c) Suppose on a control chart we set the control limits at 3 sigma limits. Denote the probability of a false
alarm (i.e. a signal occurs on the chart even though the process remains stable) as p. [3 marks]
i. Find p when we assume the plotted statistic follows a Gaussian distribution with known mean and
standard deviation.
ii. Assuming a stable process and independence over time find the probability, in terms of p, that there are
no signals in 30 plotted statistics.

i) The process is assumed stable so assume the plotted statistic denoted Y follows
( ) , G
Then
( ) ( ) Pr 3 Pr 3 p Y Y = > + + < =
( ) ( ) Pr 3 Pr 3 Z Z > + < , where
( ) ~ 0,1 Z G and
( ) Z Y =
Using the mini standardized Gaussian table on the midterm cover
( ) ( ) 1 .99865 1 .99865 0.0027 p = + =
ii) Since the chance of a false alarm is p, the chance of no signal for any time period is 1 p . Assuming
independence over time the chance of no signals in 30 time periods is ( )
30
1 p . [Numerically
30
.99865 0.96 = ]


2
2. Unpaid Invoices Case
a) Camille, the manager of the accounts payable department at a large firm, is worried that last month over 5%
of invoices were unpaid. An individuals (X) control chart for the monthly percent unpaid for the most recent
23 months is given below. [This question was adapted from the excellent book Fourth Generation
Management by Brian J oiner.]


Based on these results Camille is considering the following five options:
i. Look at each invoice that was not paid and find out who worked on that invoice. Have the employee
involved go through further training.
ii. Figure out what was different last month compared to other months. Were there more invoices? Were
new or different services being paid for the first time? Were there new employees in the department?
iii. Dig up all the invoices that had to be reprocessed in the past few months and categorize that causes of
the problems. Look for patterns.
iv. For several weeks, have people working on each major step in the process keep track on how many
and kinds of errors occur in their steps.
v. Change the accounts payable software.
In your answer be sure to comment on the applicability of each of the options. The table below should
summarize your views. [5 marks]
i no (tampering)
ii no (special cause reaction)
iii yes
iv yes
v no, possible solution but premature
The control chart suggests the accounts payable process is stable, that is there is no evidence of special causes.
This suggests the results for the last month are not really unexpected. As such, her best options are 3 and 4, as
these both represent common cause strategies. Option 1 is tampering: Camille needs to look for patterns across
all the data. Training people would be appropriate only if she found systematic differences between people who
were trained differently. Option 2 is a special cause reaction; it assumes something was different or important
about the last data point, which the control chart shows is not the case. Option 5 is a potential reaction but is
premature. Camilles available data do not show that the software is the cause of the problem.

2. b) As an alternative to the X chart shown above, Camille could have plotted the same data using either an
exponentially weighted moving average (EWMA) chart or a p-chart of the proportion unpaid in each
month. Comment on these two alternatives relative to the X chart. [4 marks]

EWMA alternative:
An EWMA would be useful if you hope to detect sustained shifts in the average percent unpaid. If the shift is
sustained the EWMA would more quickly detect small changes than the X chart.

p-chart alternative:
To apply a p-chart for the proportion of unpaid invoices we need to assume a binomial distribution. For this to
work well the following 3 assumptions must be appropriate. Each invoice is either unpaid or paid. The unpaid
rate is constant over time and whether one invoice is unpaid is independent of all the others. In this context
these assumptions dont seem unreasonable
23 21 19 17 15 13 11 9 7 5 3 1
6
5
4
3
2
1
0
Month
P
e
r
c
e
n
t

U
n
p
a
id
_
X=2.896
UCL=5.398
LCL=0.393
3
3. The second order moving average model, MA(2), is given by
1 1 2 2 t t t t
Y A A A

= + + + where
1 2
, , ,...
t t t
A A A

are independent random variables where ( ) ~ 0,
t
A G for all t. and ,
1
,
2
and are
parameters.
a) For the MA(2) model derive [7 marks]
i) ( )
t
E Y
( )
t
E Y = ( )
1 1 2 2 t t t
E A A A

+ + + =

ii) ( )
t
Var Y
( )
t
Var Y = ( ) ( ) ( ) ( )
1 1 2 2 1 1 2 2 t t t t t t
Var A A A Var A Var A Var A

+ + + = + +
=
( )
2 2 2
1 2
1 + +
iii) ( ) ,
t t k
Cov Y Y

for k=1, 2,
( )
1
,
t t
Cov Y Y

= ( )
1 1 2 2 1 1 2 2 3
,
t t t t t t
Cov A A A A A A

+ + + + + +
=
( ) ( ) ( )
2
1 1 1 2 2 1 1 2 t t
Var A Var A

+ = +
( )
2
,
t t
Cov Y Y

= ( )
1 1 2 2 2 1 3 2 4
,
t t t t t t
Cov A A A A A A

+ + + + + +
= ( )
2
2 2 2 t
Var A

=
Other with larger k are zero
iv) autocorrelation
k
, for k=1, 2,

k
=
( )
( ) ( )
1 1 2
2 2
1 2
2
2 2
1 2
if k=1
1
,

if k=2
1
0 if k 3
t t k
t t k
Cov Y Y
Var Y Var Y

+ +

= =

+ +



b) Under suitable constraints on the model parameters
1
and
2
the series of random variables
1 2
, ,...
t
Y Y Y
defined by the MA(2) model is considered stationary. What does stationary mean in this context?
[2 marks]

Stationary implies the process mean and standard deviation (or variance) are constant and finite over time.

c) In part a) we considered autocorrelations. Another way to measure correlation over time is given by partial
autocorrelations. Explain, in words, what the lag 3 partial autocorrelation measures. [2 marks]

Partial autocorrelation at lag 3 measures the correlation between
t
y and
3 t
y

taking into account (or
conditional on) the values
1 t
y

and
2 t
y

.
4
4. In an application it was important to be able to predict the output (y) of a machine. A selection of the last
200 output values are given in the table below. In addition the series of 200 values is plotted along with the
corresponding ACF and PACF plots.

Data



a) Using the above plots clearly justify the choice of an ARIMA(2,0,0) model. [3 marks]

series appears stationary (mean and variance constant over time)
exponential decay in ACF, cut off after lag 2 in PACF

The fit of the ARIMA(2,0,0) from the software R is given below.
Coef f i ci ent s:
ar 1 ar 2 i nt er cept
0. 6164 0. 2127 10. 9590
s. e. 0. 0693 0. 0695 0. 6638
si gma^2 est i mat ed as 2. 73: l og l i kel i hood = - 384. 75, ai c = 777. 5

b) Given the data and the fitted AR(2) model, determine a numerical value for the fifth residual, i.e.
5
a .
[2 marks]
Based on the R output the fitted model is ( ) ( )
1 2
10.96 0.616 10.96 0.213 10.96
t t t
y y y

= + + . So, plugging in
the observed values for
4
y and
3
y gives the fitted value
5
y =12.868. Then the residual
5
a =
5 5
y y =0.2316

c) Describe how you could use the model residuals to check the model assumptions. [2 marks]
We would look at the ACF and PACF plots of the residuals and hope to see no significant values (at least at low
lags). We could also check the Normality assumption using a QQ plot (just like we did for a regression model).

d) Given the data and the fitted AR(2) model determine a numerical value for the two step ahead forecast, i.e.
202
y . [3 marks]

Using the fitted model
202
y = ( ) ( )
201 200
10.96 0.616 10.96 0.213 10.96 y y + + . In this expression we replace
the not yet observed value
201
y with a prediction. We need to first determine
201
y .
We have
201
y = ( ) ( )
200 199
10.96 0.616 10.96 0.213 10.96 y y + + =10.565

So,
202
y = ( ) ( )
201 200
10.96 0.616 10.96 0.213 10.96 y y + + =10.746
Time y
1 9.8
2 12.5
3 14.6
4 12.8
5 13.1

197 10.6
198 12.3
199 8.7
200 11.1

Вам также может понравиться