0 оценок0% нашли этот документ полезным (0 голосов)

12 просмотров114 страницstats

Jul 19, 2015

© © All Rights Reserved

PDF, TXT или читайте онлайн в Scribd

stats

© All Rights Reserved

0 оценок0% нашли этот документ полезным (0 голосов)

12 просмотров114 страницstats

© All Rights Reserved

Вы находитесь на странице: 1из 114

Mark Scheme

Examiners Report

Mark Scheme

Examiners Report

Mark Scheme

2616/01

June 2004

General Instructions

Some marks in the mark scheme are explicitly designated as M, A, B or E.

M marks (method) are for an attempt to use a correct method (not merely for stating the

method).

A marks (accuracy) are for accurate answers and can only be earned if corresponding M

mark(s) have been earned. Candidates are expected to give answers to a sensible level of

accuracy in the context of the problem in hand. The level of accuracy quoted in the mark

scheme will sometimes deliberately be greater than is required, when this facilitates marking.

B marks (explanation) are for explanation and/or interpretation. These will frequently be

subdividable depending on the thoroughness of the candidates answer.

Follow-through marking should normally be used wherever possible there will

however be an occasional designation of c.a.o. for correct answer only.

Full credit MUST be given when correct alternative methods of solution are used. If errors

occur in such methods, the marks awarded should correspond as nearly as possible to

equivalent work using the method in the mark scheme.

All queries about the marking should have been resolved at the standardising meeting.

Assistant Examiners should telephone the Principal Examiner (or Team Leader if

appropriate) if further queries arise during the marking.

Assistant Examiners may find it helpful to use shorthand symbols as follows:FT

Follow-through marking

Correct work after error

Incorrect work after error

BOD

Benefit of doubt

NOS

Work of no value

Q1

2616/01

June 2004

(i)

We want

= E[aX + bY] M1

= a. + b.2 1

2b = 1 a i.e. b = 21 (1 a)

The Var(T)

= a22 + 21 (1- a )

= 2{a2 + (1 a)2}

= 2{2a2 2a + 1}

(ii)

d (2a2 2a + 1) = 0

Consider da

(42)

M1 Substitution of b= 21 (1-a )

reqd

M1

i.e. 0 = 4a 2 1

a = 21 1 Beware printed answer

2

Verification that this is a minimum (e.g. trivially by d 2 )

T = 21 X + 41 Y ~N , 21 2

da

award 1 if any two are correct

[Both X and 21 Y are u. b. for m and both are Normally distributed all of which

( )

E2

(iii)

t =7.48 B1 FT if wrong

One-sided CI is given by

7.48 - 1.645

M1 M1 B1

1 3

2

10

M1

M1 (use of 21 2 as Var(T))

= 7.48 0.63(71)

= 6.84(29) A1 C.A.O.

Q2

A

B

(i)

237

203

249

222

2616/01

213

214

233

216

227

230

June 2004

236

Ranks are

A 10 11

B 1

5

2

3

8

4

6

7

M1 for attempt

A1 if all correct

(Mann-Whitney is 5) 1

Refer to tables of Wilcoxon rank sum (or Mann-Whitney) statistics.

Lower 2 21 % tail is needed. 1

Value for (5, 6) is 18 (or 3 for Mann-Whitney).

Result is not significant. 1

Seems medians are the same. 1

(ii)

n1 = 6

n2 = 5

2

x = 232.5 sn-1

=143.1( sn-1 =11.9624 ) sn2 =119.25,sn =10.9202

2

y = 217.0 sn-1

=100.0 ( sn-1 =10.0 ) sn2 =80.0,sn =8.9443

Pooled s2 =

5143.1+4100.0

=123.94

9

A1 if correct

Test statistic is

M1

232.5- 217.0 ( -0 )

15.5 = 2.29

= 6.7414

123.94 1 + 1

6

( 92)

A1

= 11.1330

FT reasonable attempt

Refer to t9. 1 May be awarded even if test statistic is wrong. No FT if wrong.

Double-tailed 5% point is 2.262. 1 No FT if wrong

Significant, seems means differ. 1

(iii)

sensitive/powerful), E2

but if not it might be seriously misleading and the non-par procedure safer. E2 4

Q3

(i)

2616/01

H1 : D < 0 (or AFTER < BEFORE)

June 2004

1

1

[NOTE candidate might of course define D as before after take core that H1 agrees]

of differences

1.

Differences are [as after before, candidate might use before after]

6

19

13

31

22

44

2

sn1 = 17.621 sn-1

=310.49

d = -12.4

11

14

A1 Accept sn = 16.716(5)

Test statistic is

-12.4-0

17.621

10

= - 2.22 ( 535 ) A1

Refer to t9

Lower s.t. 5% pt is 1.833 1 Sign must agree with H1/test statistic, unless a

clear argument based on modulus is used. No FT if wrong.

Significant

(ii)

14

CI is given by

12.4 2.262 17.621 = 12.4 12.60(4) = (25.00(4), 0.20(4))

10

M1

B1

M1

A1 c.a.o.

Xero out of 4 if not same dist as for test. Some wrong dist can score max M1 B0

M1 A0. Recovery to t9 is ok.

(iii)

Paired Wilcoxon 1 [allow sign test]

Q4

(i)

2616/01

June 2004

H1 : association between age and level of interest. B1

oi

49

145

194

216

435

651

265

580

845

ei

60.84

133.16

204.16

446.84

A2

Award A1 if any

one is correct. But

deduct 1 if not at

least 2 dp

oi ei = 11.84

or 11.34 with Yates correction

x 2 =3.99 ( 71) with Yates

A1 if Yates used

Upper 5% point is 3.84 1

Significant 1

Seems there is association 1*

Seems under-30s have less interest than would be expected,

and over-30s more, then if there were no association. 2*

* These 3 marks are not available if H0 H1

(ii)

Directly-elected

mayor

(iii)

Yes

No

Total

Level of interest

Great

Little

118

314

49

216

167

530

Total

432

265

697

M1 if all margins correctly add up from the individual values.

A1, A1, A1, A1 for each individual cell (118, 314, 49, 216).

We do not [at least prima facie] have a random sampler of 697 people

who were classified over the 4 cells. The usual sample 2 approach requires

such an assumption. E2

Examiners Report

2616 Statistics 4

General Comments

Most candidates appeared to be well prepared for this examination and there was no

evidence that candidates had insufficient time to complete the paper. In fact, some

candidates gave full answers to all four questions.

As in previous years candidates performed much more strongly when carrying out

the numerical parts of questions than they did when discussing assumptions or

analysing results. The two most common examples of this weakness were firstly the

assumptions required for the various t-tests to be valid many candidates were not

clear about whether parent populations, samples, means or data had to be normally

distributed or whether they were looking at one distribution, two distributions or the

difference between two distributions.

The second weakness was in the contextualisation of the results of a hypothesis test.

Many candidates did not make any statement beyond reject H 0 , whilst at the other

end of the scale, candidates were too definitive, making statements such as reject

H 0 , hence the median strength using process A is greater than the median strength

using process B.

Once again, Question 1 on estimation was by far the least popular question.

However most candidates who attempted question 1 scored well.

Q.1

Virtually all candidates knew what they had to do in part (i) and were able to

verify the value of b. Most were also able to calculate the variance of T,

although poor algebra let down some candidates.

In part (ii) most candidates used calculus to show that the variance was

minimised when a = 0.5, although some showed only that the variance had a

stationary value. A few candidates used a method involving completing the

square.

Candidates who got this far were almost all able to state the distribution of T

and explain why it was a better estimator of than either X or Y.

Most candidates who attempted part (iii) knew what they were doing but a

number failed to realise that Var(T) = 12 2 and a number also did not realise

that because the value of 2 was known, the normal distribution should be

used indeed one candidate used specifically because the sample was

small.

Q.2

This was the most popular question on the paper, being attempted by all but 2

candidates.

Part (i) was obviously familiar ground for most candidates and most scored

very well here. The method of choice for most candidates was to calculate the

Wilcoxon rank sum statistic, covert to the Mann-Whitney statistic and then

use the Mann-Whitney tables. Only a small minority of candidates calculated

a statistic (Wilcoxon or Mann-Whitney) and then moved directly to the

relevant statistical table. However, this part of the question was answered

better than any other part of the paper.

Part (ii) was not answered as well with many candidates not realising that

Normality of both underlying populations was required. The pooled variance

also caused some confusion with some candidates trying to pool standard

deviations, some adding variances and others being confused about the use

of s n2 and/or s n21 .

Once a variance had been obtained, most candidates were then able to

calculate the test statistic correctly and compared it with the two-tailed value

of t 9 .

In both parts (i) and (ii) a significant number of candidates were too definitive

in their interpretation of the rejection, or otherwise, of the null hypothesis.

Answers to part (iii) tended to be too vague with very few candidates

mentioning the fact that the t-test is a more powerful, or sensitive, test than

the non-parametric alternatives, as long as the assumptions are satisfied.

However, if the assumptions are not satisfied, results can be seriously

misleading.

Q.3

In part (i) many candidates lost a significant number of marks because they

did not carefully state their hypotheses or take sufficient care with the

distributional assumption. Hypotheses such as the intensity remains the

same and the intensity reduces were common. What is required are explicit

statements about either the mean of the population of differences, or about

the means of the populations before and after. In addition all terms used

should be defined. The required distributional assumption was the Normality

of the population of differences.

As with other questions, most candidates were able to carry out the

calculations competently and most used the correct value of t.

Part (ii) was very well done by the majority of candidates, although a few did

use the Normal distribution.

Virtually all candidates correctly named the paired Wilcoxon test in part (iii)

Q.4

score well.

In part (i) most candidates were able to state the hypotheses correctly,

although some got the hypotheses the wrong way round and some talked

about correlation.

Calculations were inevitably done correctly, but a few candidates only gave

the expected values to 1 decimal place or even to the nearest integer.

correction, but few actually did. Of those that did, some were unsure whether

to add or subtract 0.5.

Most candidates correctly used 1 degree of freedom for the test and were

able to give the correct critical value. A small minority used 2 or 3 degrees of

freedom.

2

results of the hypothesis test, with many candidates considering the

2

contributions to the statistic, or at the very least considering the

differences between observed and expected values.

Most candidates scored full marks in part (ii)

Candidates struggled with part (iii), with the most common suggestion being

about different sample sizes. The actual reason was that we do not have a

random sample of people who were classified over the 4 cells.

Mark Scheme

JANUARY 2005

SOLUTIONS

Question 1

(i)

We have :

X 1 ~ Poisson ( )

X 2 ~ Poisson (4 )

M1 might be implicit

in sequel

X 3 ~ Poisson (10 )

1

(X1 + X 2 + X 3 )

15

1

E ( ) = ( + 4 + 10 )

15

to find E ( )

M1 for use of

Poisson means

=

is unbiased

A1

1

Var( ) = 2 Var( X 1 + X 2 + X 3 )

15

M1 for any

(reasonable) attempt

to find Var

M1 for use of

1

= 2 ( + 4 + 10 )

15

Poisson variances

A1 - beware printed

answer

15

8

(ii)

Y ~ Poisson (10 )

1

1

1

1

E( Y ) = E(Y ) = E(Y ) = . 10

10

10

10

10

=

i.e. unbiased

Now

Var(

1

1

Y)=

Var(Y )

100

10

1 Var(Y )

n

100

1 10

=

100 n

10n

M1

A1

1

M1

M1

M1, A1

7

(iii)

10n

ie

<

15

1

Y

10

JANUARY 2005

M1

for n 2

E1

is better

SOLUTIONS

n

E2

Allow 1 for n 15

5

JANUARY 2005

SOLUTIONS

Question 2

(a)

1 if both correct. DO

NOT allow

H 0 : 1 = 2

X1 = X 2

H 1 : 1 2

or similar. Allow

verbal statement

1 if

1 , 2 are

adequately defined

in words (population

mean times )

12 6

Test statistic is

13 9

M1

(2 4) 2

(3 5) 2

+

80

90

1.3

0.2081

1.3 = 2.84(97)

A1

0.4562

Refer to N (0,1)

1 No FT if wrong

1 No FT if wrong

Significant

CI is given by

-1.3 1.96 0.4562 = -1.3 0.894 = (-2.194, -0.406) A1

M1

B1

M1

12

(b) MUST be Wilcoxon rank-sum test (or Mann-Whitney form

thereof).

[For bottom-up

rankings

W = 55, MW = 34

Use of Ranks M1

Ranks are: I

II

Upper 5% tail

10

11

13

W=55, MW = 34]

12

A1

JANUARY 2005

SOLUTIONS

Result is significant

JANUARY 2005

SOLUTIONS

Question 3

Differences (after before):

6 11 22 5 1 4 28 2 7 3 9 8

(a)

Ranks of |d| are

10

M1

11

12

A1 FT if wrong

memory

7

(b)

1

Normality of differences

d = 7.5

S n 1 = 9.5299 ( S n 1 = 90.8182)

M1 for use of

differences

B1 Accept Sn =

D = 0 against D > 0) is

7.5 0

= 2.72 (62)

9.5299

12

9.1248 (Sn2 =

83.85) ONLY if

correctly used in

sequel

M1 A1

Refer to t11

1 No FT if wrong

Upper 5% pt is 1.796

1 No FT if wrong

Significant

memory

JANUARY 2005

Look at differences

SOLUTIONS

M1

M1, or for any other

relevant

display/discussion

of the data

integers], but the two large upper outliers cast doubt

JANUARY 2005

SOLUTIONS

Question 4

(i) H0:

H1:

type of destination)

association

2

(ii) Oi

100

21

31

152

Ei

57

14

21

92

23 180

13 48

20 72

56 300

24.32

14.72

8.96

36.48

22.08

13.46

Contributions to X2

0.4532 0.0352 1.8216

X2 = 10.63 (985)

awrt 10.64

Refer to

42

A4 - deduct 1 per

error

Must be to this level

of accuracy

M1

A2

[give A1 if

(10.5, 10.8)]

2[or zero; FT if

wrong, unless 300]

1

Significant

Seems there is association

1

1

ZERO

if H0 H1

12

(iii) The key feature is the behaviour of transmission when intended

destinations are universities. There are many more more than one

attempt, and many more not successful at all, transmissions than

would be expected if there were no association, and many fewer

successful at first attempt transmissions. There is little or no

suggestion of any other associations.

E6 (divisible)

Examiners Report

2616

Mark Scheme

June 2005

June 2005

2616

Mark Scheme

June 2005

2616 Statistics 4

Q1

E(Y) = (n 1)2

(i)

(iii)

T =kY

B1

(ii)

Y = Xi X

Var (T) = 2k (n 1)

B1

Bias = E(T) 2

= k(n 1)2 2

M1

A1

M1

A1

If both correct.

A2

BEWARE printed answer.

M1

A1

Correct derivative.

A1

Isolate k.

A1

M1

Or other methods.

A1

B2

Answer not printed.

M1

support of only if, award SC B1.

2

2 2

= 2k (n 1) + {k(n 1) }

2

2

2

= 2k (n 1) +{k (n 1) 2k(n 1) + 1}

= 4[2(n 1) +(n 1)2]k2 2 4(n 1)k + 4

(iv)

d MSE(T )

=0

dk

Consider

d MSE(T )

= 4 2(n 1) + (n 1) 2 2k 2 4 (n 1)

dk

n 1

2(n 1) + (n 1) 2

1

=

n +1

Check minimum by considering

d 2 MSE (T )

= 4 2(n 1) + (n 1) 2 2

d k2

> 0 min

=0 k =

(v)

1

,

n +1

2(n 1) + (n 1) 2 2(n 1)

MSE (T ) = 4

+ 1

n +1

(n + 1) 2

=

2n 2 + n 2 2n + 1 2n 2 + 2 + n 2 + 2n + 1

(n + 1) 2

With k =

(vi)

4

(n + 1)

{2n + 2} =

2

2 4

n +1

1

k =

n 1

In this case, MSE(T) = Var(T)

=

2 4

n 1

A1

M1

A1

(iii) this is not difficult.

4

20

2616

Mark Scheme

June 2005

Q2

(i)

(ii)

H0 : =

H1 :

B1

processes A and B.

B1

Same variance.

B1

B1

other symbols, including, e.g.,

X A = X B or similar, unless they are

clearly and explicitly stated to be

population means. Allow statements in

words (see below).

For adequate verbal definitions of ,

. Must indicate mean; condone

average. Allow absence of

population if correct notation is

used, otherwise insist on population.

4

2

B1

Accept sns ONLY if correctly used in

sequel.

2

s n = 77 5 ,

s n = 8 8066

2

Pooled s 2 =

698 + 763 5

= 97 43

15

Test statistic is

114 6667 123 75

1 1

97 43

+

9 8

9 0833

23 0051 = 4 7964

= 1 89(38)

(iii)

(iv)

M1

A1

M1

M1

A1

s n = 95 4375, s n = 9 7692

For any reasonable attempt at pooling

(and ft into test and CI).

If correct.

Overall structure. Allow cs pooled s.

1 1

+

9 8

ft cs pooled s2.

Refer to t15.

Double tail 5% point is 2131.

Not significant.

Seems mean strengths are the same for both

processes.

M1

A1

E1

E1

No ft from here if wrong.

ft only cs test statistic.

ft only cs test statistic. Expect reference

to means and context.

CI is given by 90833

M1

2947

47964

= 90833 141349 = (2321(8), 505(2))

B1

M1

A1

Must be cs ( x y ) ...

From t15.

Allow cs pooled s.

c.a.o. Must be written as an interval.

Wilcoxon

Rank sum test

B1

B1

10

2

20

2616

Q3

(a)

Mark Scheme

H0 : D = 0 or

H1 : D 0 or

E = S

E S

June 2005

B1

fertilizer population mean for Standard fertilizer.

B1

B1

Differences are

06 23 08 06 09 15 14 08 01 02

2

d = 0 46, s n 1 = 1 0668(75), s n 1 = 1 1382

M1

B1

Test statistic is

0 46 0

1 0668(75)

10

M1

other symbols, including, e.g.,

X E = X S or similar, unless they are

clearly and explicitly stated to be

population means. Allow statements in

words (see below).

For adequate verbal definition of .

Must indicate mean; condone

average. Allow absence of

population if correct notation is

used, otherwise insist on population.

Must be explicit about the population.

ONLY if correctly used in sequel.

Allow cs d and/or sn1.

Allow alternative: 0 (cs 2262)

1 0668(75)

(= 07631) for subsequent

10

comparison with d .

(Or d

= 136(35)

(b)

A1

Refer to t9.

Double tail 5% point is 2262.

Not significant.

Seems mean yield using experimental fertilizer is

same as for standard.

M1

A1

E1

E1

fertilizer.

For these yields,

2

x = 20 43, s n 1 = 4 0803, s n 1 = 16 649

B1

2043

1833

4 0803

10

= 2043 236(51) = 1806(49)

In repeated sampling, lower confidence bounds

obtained in this way would fall below the true mean

on 95% of occasions.

(cs 2262)

1 0668(75)

10

(= 0303, 12231) for comparison with

0.)

c.a.o. (but ft from here if this is wrong.)

Use of D d scores M1A0, but

next 4 marks still available.

No ft from here if wrong.

No ft from here if wrong.

ft only cs test statistic.

ft only cs test statistic. Expect reference

to mean(s) and context.

B1

ONLY if correctly used in sequel.

M1

M1

B1

Mean. Allow cs x .

Minus.

From t9.

M1

A1

E2

lower bound rather than just the

confidence interval.

11

9

20

2616

Q4

(a)

Mark Scheme

Data

Median 60

Difference

Rank of |diff|

June 2005

29

32

34

38

40

46

51

52

59

63

71

95

31

11

28

10

26

9

22

8

20

7

14

6

9

4

8

3

1

1

3

2

11

5

35

12

M1

M1

A1

(b)

(i)

T = 2 + 5 + 12 = 19

B1

statistic.

Lower (or upper if 59 used) 2% tail is needed.

Value for n = 12 is 13 (or 65 if 59 used).

Result is not significant.

No real evidence that median is not 60.

M1

M1

A1

E1

E1

No ft from here if wrong.

ft only cs test statistic.

ft only cs test statistic.

B1

B1

= P(0 6593(4) < N (0, 1) 1 3919(4) )

= 0 9180 0 7452 = 0 1728

(ii)

(iii)

differences not used.

For ranks of |difference|.

All correct.

ft from here if ranks wrong.

Or 1 + 3 + 4 + 6 + 7 + 8 + 9 + 10 + 11

= 59

X2 = 56903 + 01946 + 183265 + 52024 + 8 9526

+ 56195

= 43.98(59)

M1

A1

This becomes + 00769 + 217529.

X2 becomes 6019(62). Then must have

42 below.

Refer to 32 .

Extremely highly significant overwhelming

evidence that Normal model does not fit data.

M1

A1

ft only cs test statistic.

intervals, but the main points are that the modal class

is perhaps half an interval lower than expected, that

there are many fewer low values than expected, and

that there a lot of upper outliers.

E2

underlying distribution is not Normal could be

dangerous to use a t test.

E2

Normality (or at least of symmetry), we could not use

the t test for the mean as a proxy test for the median.

E1

20

2616

Mark Scheme

June 2005

2616 - Statistics 4

General Comments

There were 93 candidates from 20 centres (June 2004: 82 from 20). The overall

standard of the scripts seen was pleasing: many candidates were clearly well

prepared for this paper. Routine calculations were carried out well but the

candidates ability to comment and interpret were a little disappointing at this level.

Question 1 was by far the least popular question with only about 15 candidates

attempting it. Every candidate attempted Question 2; Questions 3 and 4 were

equally popular.

1)

Estimation theory

Although this was the least popular question it seemed to have the highest mean

mark, with most of those attempting it scoring full or almost full marks. Those

who were prepared to try it were likely to be successful as long as their algebra

was up to the task. Sometimes the algebra arrived at the correct destination by

brute force rather than elegance.

There were just two places where marks seemed likely to be lost: part (iv) where

some neglected to verify that the required value of k did indeed give a minimum

and part (vi) where there was a temptation for some to use the converse argument.

2)

Two sample t test and confidence interval; the strengths of steel rods

This was the most popular question being attempted by all candidates. It was also

a very high scoring question: about half of the entry scored full or almost full

marks.

(i)

The hypotheses were usually stated correctly but there was rather less

care in providing verbal definitions of the population means. Similarly, the

required assumptions were sometimes less than ideal.

(ii)

Most candidates carried out the test competently. There was rarely any

problem over finding and using the pooled variance. The critical value was

almost always correct but on a number of occasions the conclusion was

badly expressed.

(iii)

As in part (ii) most candidates had little difficulty here. Just occasionally

the standard error (which had been correctly constructed in part (ii))

became pooled s

(iv)

1

17

2616

3)

4)

Mark Scheme

June 2005

fertilizers

(a)

The hypotheses were usually stated correctly but candidates were not as

careful about defining the symbol . Nor were they sufficiently careful

when it came to the distributional assumption.

However there were only a very few candidates who did not realise that

they should carry out a paired test. The vast majority made good progress

with the test itself, and only the final conclusion left room for

improvement.

(b)

As above, most realised what to do here and the correct value for the

lower bound was usually found. A small minority tried to construct the

confidence interval using the information from the paired test. There was

some uncertainty again with the distributional assumption.

The main area of difficulty was with the interpretation of the interval.

Very many comments revealed a flawed understanding of a confidence

interval to quite a worrying extent.

Wilcoxon rank sum test for the median; Chi-squared test for goodness of fit;

waiting times in an airport

(a)

This part of the question was almost always answered well. Many fully

correct solutions were seen.

(ii)

grouping) but relatively few were able to identify the correct Chi-squared

distribution to look up. Most of those who got this second aspect wrong

made no allowance for estimated parameters while a few thought that

there were 200 degrees of freedom. Hardly any commented on the fact

that the test statistic was significant at any level available to them in the

tables.

Disappointingly few candidates took the trouble to comment at all on the

reasons for the poor quality of fit.

(iii)

In this part of the question very few candidates realised that they could

refer back to the previous part for evidence that the assumption of

background Normality was not viable. They knew that Normality was

required, but often chose to look at the sample data in part (a), sometimes

with the aid of a dot plot. Hardly any candidates included in their

discussion the small sample size which might prompt the use of a t test.

No more than a handful of candidates picked up on the fact that a t test

examines the population mean whereas the Wilcoxon test in part (a)

examined the median.