Вы находитесь на странице: 1из 412

UNIVERSITY OF CALICUT

(Abstract)

B.Sc Programme in Statistics under Choice based Credit Semester System – Scheme and
Syllabus – implemented with effect from 2009 admission onwards – approved - Orders
issued.
-------------------------------------------------------------------------------------------------------------
GENERAL AND ACADEMIC BRANCH – I ‘J’ SECTION
No. GA. I/J2/2455/06 Dated, Calicut University. P.O., 25.06.2009
-------------------------------------------------------------------------------------------------------------
Read : 1. U.O. No. GAI/J2/3601/08 (Vol. II) dated 19.06.2009.
2. Minutes of meeting of the Board of Studies in Statistics (UG) held on
29.01.2009 and 30.04.2009
3. Item No.2. vii(a) of the minutes of the meeting of the Faculty of Science held
on 05.05.2009.
4. Item No.IIA (8) of the minutes of meeting of the Academic Council held on
14.05.2009.
ORDER
Choice based Credit Semester System and Grading has been introduced for UG
Curriculum in the affiliated colleges of the University with effect from 2009 admission
onwards and the Regulation for the same implemented vide University Order cited 1st
paper above.
Vide paper read as (2), the Board of Studies in Statistics (UG) approved the draft
regulation and the syllabi of B Sc Programme in Statistics prepared as per draft regulation
of Choice based Credit Semester System 2009.
The Faculty of Science vide paper read as 3rd above endorsed the minutes of the
Board of Studies in Statistics (UG).
The Academic Council, vide paper read as 4 above, approved the minutes of the
Faculty of Science.
Sanction has therefore been accorded for implementing the scheme & syllabus of
B.Sc Prigramme in Statistics under Choice based Credit Semester System from 2009
admission onwards.
Orders are issued accordingly . Syllabus appended.

Sd/-
DEPUTY REGISTRAR (G&A I)
For REGISTRAR
To
The Principals of all affiliated colleges -
offering B.Sc Statistics programme

Copy to: PS to Vice-Chancellor /PA to PVC/ PA to Registrar


Controller of Examination /EX Sn/EGI/DR B Sc/Enquiry/
System Administrator with a request to upload in the University website.
Tabulation Section/GA I ‘A ‘F’ G’Sections/G&A II, III Branches

Forwarded / By order

SECTION OFFICER

1
SYLLABUS OF B.Sc. STATISTICS MAIN – SEMESTER SYSTEM
CCSSUG 2009 (2009 admission onwards)

Seme Course Code Course Title Instructional Credit Exam Ratio


ster hours/week hours Ext:Int
No
1 ST1B01 METHODOLOGY OF STATISTICS, 4 4 3 3:1
BASIC CALCULUS AND
PROBABILITY THEORY

2 ST2B02 PROBABILITY DISTRIBUTIONS 4 4 3 3:1

3 ST3B03 STATISTICAL INFERENCE – I 5 4 3 3:1

4 ST4B04 STATISTICAL INFERENCE – 2 5 4 3 3:1

5 ST5B05 MATHEMATICAL METHODS IN 5 4 3 3:1


STATISTICS

5 ST5B06 INFORMATICS AND 5 4 3 3:1


NUMERICAL MATHEMATICS

5 ST5B07 SAMPLE SURVEYS 5 4 3 3:1

5 ST5B08 OPERATIONS RESEARCH AND 5 4 3 3:1


STATISTICAL QUALITY CONTROL
5 Open course offered by other 3 4 3 3:1
faculties
6 ST6B09 TIME SERIES AND INDEX NUMBERS 5 4 3 3:1

6 ST6B10 DESIGN OF EXPERIMENTS 5 4 3 3:1

6 ST6B11 POPULATION STUDIES AND 5 4 3 3:1


ACTURIAL SCIENCE

6 ST6B12(P) PRACTICAL 5 4 3 3:1*

5&6 ST6B13(PR) Project Work 2+2 4

6 STB601(E01) Elective offered by the parent 3 2 3 3:1


STB601(E02) department.
STB601(E03)

*For Practical paper the internal marks are based on the practical records

STATISTICS: Electives ( BSc. Statistics Main)


CCSSUG 2009 (2009 admission onwards)

2
Semester Course Course Title Instructional Credit Exam Ratio
No. Code Hours/week hours Ext:int
1 ST6B01 Probability Models and Risk Theory 3 2 3 3:1

2 ST6B02 Stochastic Modeling 3 2 3 3:1

3 ST6B03 Reliability Theory 3 2 3 3:1

STATISTICS: Open Courses (Offered to other faculties)


CCSSUG 2009 (2009 admission onwards)

Semester Course Course Title Instructional Credit Exam Ratio


No. Code Hours/week hours Ext:int
1 ST5D01 Economic Statistics 3 4 3 3:1

2 ST5D02 Quality Control 3 4 3 3:1

3 ST5D03 Basic Statistics 3 4 3 3:1

3
Table showing the components and weightage for internal assessment
Components Weight
Assignment 1

Test paper 2

Seminar 1

Attendance 1

There shall be two test papers and the average grade point is to be considered for
internal assessment.

Pattern of Question papers.

There shall be 4 parts A, B, C and D in all the question papers except for course 12,
practical. Part A consists of 12 objective type questions. Part B consists of 8 questions to
be answered in a word, phrase or sentence. Part C consists of 6 questions of short essay
type of which the student can attempt 4. Part D consists of 3 questions of long essay type
of which the student can attempt 2. In part A the weightage per question is ¼.for part B
weightage is 1/question .For part D the weightage is 2/question and for part D the
weightage is 4/question. As far as possible the number of questions should be proportional
to the modules.

The practical paper consists of 6 questions and the student can attempt 4. Calculators are
permitted
The internal assessment for the practicals shall be based on the average grade point of two
practical test papers and the practical record. The test papers shall have weight 1 each and
the record shall have weight 2

4
CORE COURSE I: METHODOLOGY OF STATISTICS,
BASIC CALCULUS AND PROBABILITY THEORY

Module 1. Meaning, Scope and limitations of Statistics – collection of data,


conducting a statistical enquiry – preparation of questionnaire – primary and
secondary data – classification and tabulation – Formation of frequency distribution
– diagrammatic and graphic presentation of data – population and sample –
advantages of sampling over census – methods of drawing random samples from a
finite population-Fitting of straight line, parabola, exponential and logarithmic
curves using the principal of least squares.

17hours

Module 2 Elements of Differential and Integral Calculus (definition and simple


examples only): -Derivative of a function-relationship between continuity and
differentiability-derivatives of polynomial, exponential and logarithmic functions-
differentiation of sum, difference, product and quotient-function of a function rule -
second order derivative- sign of derivative -increasing and decreasing functions- -
maxima and minima. Integration as inverse operation of differentiation- indefinite
and definite integrals- simple examples -properties of integration-first and second
fundamental theorem on integral calculus-application of integration- area under a
curve. -Beta and Gamma integrals-simple properties-Function of two variables-
double integrals- evaluation of double integrals (application in statistics only)-
change of variable.
25 hours

5
Module 3. Probability concepts: Random experiment, sample space, event, classical
definition, axiomatic definition and relative frequency definition of probability.
Concept of probability measure. Addition and multiplication theorem (limited to
three events). Conditional probability and Bayes’ Theorem – numerical problems
15 hours

Module 4. Random variables: Definition, probability distribution of a random


variable. Probability mass function (pmf), probability density function (pdf) and
(cumulative) distribution function (df) and their properties Change of variables:
Discrete and continuous cases (univariate case only). Simple problems
15hours

Book for reference

1. V.K. Rohatgi: An introduction to Probability theory and Mathematical


Statistics, Wiley Eastern.
2. S.C. Gupta and V.K. Kapoor : Fundamentals of Mathematical Statistics,
Sultan Chand and sons
3. Mood A.M., Graybill. F.A and Boes D.C. : Introduction to Theory of
Statistics McGraw Hill
4. Shaum’s Series : Calculus
5. John E Freund : Mathematical Statistics (Sixth Edition), Pearson
Education (India),New Delhi.

6
Model Question Paper

B.Sc. STATISTICS
I Semester
CORE COURSE I: METHODOLOGY OF
STATISTICS, BASIC CALCULUS AND PROBABILITY THEORY

Time: 3 Hrs

PART A
Answer all questions ( Bunch of 4 carries weight age 1)

1. A frequency distribution is used to


(a) calculate mean only (b) representation of data (c) summarize data (d) none
d (uv )
2. If u and v are functions of x, then is
dx
udv vdu dudv udv vdu du dv
(a) + , (b) , (c) − , (d) +
dx dx dxdx dx dx dx dx
3. If f(x) is an increasing function, then
df df df df
(a) = 0, (b) < 0, (c) ≠ 0, (d ) >0
dx dx dx dx
d2 f
4. Let f ( x) = 2 x 3 + 1, What is ?
dx 2
(a) 6 x 2 + 1, (b) 12x, (c) x, (d) 2 x 2 + 1
5. What is ∫ ∫ xydxdy
x2 y2 x2 y2
(a) , (b) , (c) x 2 y 2 , (d) 4 x 2 y 2
2 4
1
6. Obtain the value of ∫ dx
x
−1 1
(a) 2 , (b) logx, (c) e x , (d) 2
x x
7. Sample space of a coin toss experiment is
(a){HT}, (b){H,T}, (c){HH, TH, HT, TT}, (d){H}
8. Which of the following is an axiom of probability.
(a) 0 < P (Ω) < 1, (b) P (Ω) = 1, (c) if A ⊂ B then P ( A) ≤ P ( B ),
(d) P ( A ∩ B ) = P ( A).P ( B )
9. If f ( x) is a probability density function, then
(a) ∫ f ( x) dx = 0 , (b) ∫ f ( x) dx = 1, (c) ∫ f ( x)dx < 1, (d) ∫ f ( x) dx > 0
11. If F(x) is a distribution function, then
(a)F(x) is increasing in x, (b)F(x) is constant, (c)F(x) is decreasing in x,

7
(d)F(x)=1 for every x
12. If f(x)= x, 0<x<1, Obtain F(x).
x2
(a)F(x)= x 2 , (b) F(x)= , (c) F(x)= x , (d) F(x)=2 x 2
2

PART B
Answer all questions wt 1

13. Every differentiable function is…………


df d2 f
14. If > 0, and 2
< 0, then x0 is the point of …………
dx0 dx0
15. Integral of f(x)=2x+1 over (a,b) is ………….
16. The Beta function is…………..
17. Total probability of a random variable is ……..
18. If A and B are any two events, then P( A | B) is……….
19. If F(x) is a distribution function, its minimum value is…...and maximum value
is……..
20. The third axiom of probability is………

PART C
Answer any four questions wt 2

21. Explain fitting of a straight line?


2
22. Evaluate ∫ (2 x 2 + 1)dx
1

∫ ∫ xy( x
2
23. Evaluate + y 2 )dxdy over [(0,a),(0,b)]
24. State three axioms of probability?
25. A continuous random variable X has the pdf given by f ( x) = 2 x,0 < x < 1 , and 0
Elsewhere. Find F (x) and P(X<1/2)?
26. Given f ( x) = e − x , x ≥ 0, find the pdf of y=-3x+7?

PART D
Answer any two questions wt 4

27. What are the properties of definite integral? Explain?


28. State and prove addition theorem for two events? Explain what happens when A is
subset of B?

29. A continuous random variable X has probability density function f(x)= 3 x 2 ,0 ≤ x ≤ 1 .


Find two numbers a and b such that (1) P ( X ≤ a ) = P( X ≥ a ) and (2) P ( X ≥ b) = 0.5.

CORE COURSE II: PROBABILITY DISTRIBUTIONS

8
Module 1. Mathematical Expectations: Expectation of a random variable, moments,
relation between raw and central moments, moment generating function (mgf) and
15hours
Module 2. Bi variate random variable: Definition (discrete and continuous type)
Joint probability mass function and probability density function, marginal and
conditional distributions, independence of random variables.
Bivariate moments: Definition of raw and central product moments, conditional
mean and conditional variance, covariance, correlation and regression coefficients.
Mean and variance of a random variable in terms of conditional mean and
conditional variance
20 hours

Module 3 Standard Distributions: Discrete type – Bernoulli, Binomial, Poisson,


Geometric (definition, simple properties and applications). Discrete Uniform
(definition, mean, variance and mgf only) – Continuous type – Rectangular,
Exponential, Gamma (definition, mean, variance and mgf only)

Beta ( definition, mean, variance only) -Normal(definition, simple properties and


applications). Lognormal, Pareto and Cauchy Distributions (definition only)
25 hours

Module 4. Law of large Numbers: Chebychev’s inequality, convergence in


probability, Weak Law of Large Numbers, Bernoulli Law of Large Numbers.
12hours

Books for reference

1. V.K. Rohatgi: An introduction to Probability theory and Mathematical


Statistics, Wiley Eastern.
2. S.C. Gupta and V.K. Kapoor : Fundamentals of Mathematical Statistics,
Sultan Chand and sons

9
3. Mood A.M., Graybill. F.A and Boes D.C. : Introduction to Theory of
Statistics McGraw Hill
4. John E Freund: Mathematical Statistics (Sixth Edition), Pearson Education
(India),New Delhi.

Model Question Paper

B.Sc. Statistics (Main)


Core Course II Semester II
COURSE II: POBABILITY DISTRIBUTIONS
Time 3hrs Part A
(In parts A answer all question) weight 1 for a bunch of 4questions
1. The probability of getting one head &one tail in the toss of two
unbiased coins simultaneously is
10
a) .25 b).50 c)1 d) .75
2. If x and y are two independent random variables with joint p.d.f f(x,y)
then
Then E(xy) =
a) E(x).E(y) b) E(x)/E(y) c) E(x) d) E(y)
3. E(x/y) is generally a function of
a) y b) x c) x and y d) none
4. Mx+y(t) ,if x&y are independent r.vs is given by
a) Mx(t)+My(t) b) Mx(t)/My(t) c) Mx(t).My(t) d)none
b) If V(x) = 1, then V(2x ± 3) is
a) 5 b) 13 c) 14 d) 1
2
c) E(x-k) is minimum when
a) k<E(x) b) k= E(x) c) k>E(x) d) K2= E(x)

5.If x is a random variable having probability function f (x), then the function
tx
Σe f(x), , is known as
a. moment generating function
b. probability generating function
c. probability distribution function
d. characteristic function

6.The skewness of a binomial distribution will be zero if

a) p< ½ b) p= ½ c) p> ½ d) p=q

7.The coefficient of variation of poison distribution with mean 4 is

a) ¼ b) 2/4 c) 4 d) 2

8. X is normally distributed with zero mean and unit variance. The variance of

11
x2 is
a) 0 b) 1 c) 2 d) 4

9.In a normal curve area to the right of the point x1 is 0.6 and to the left of the
point x2 is 0.7. Which is the correct statement.
a) n1> n2 b) n1< n2 c) n1= n2 d) none of them

10.For a normal distribution, Q.D, M.D and S.D. are in the ratio.
4 2 4 4 2 1 4
a) : 2/3:1, b) : :1 c) 1: : d) : 1:
5 3 5 5 3 2 5
d)
11.If x is a continuous r.v with means µ and variance σ 2 then for any positive
1
number k P[│x- µ │ > K σ ] ≥ is known as
k2
a. Liapunov’s inequality b) Tchebycheff’s inequality
c. Bienayme- Tchebycheff’s inequality d) Khinchin’s inequality

12.If x and y are two random variables such that their expectations exist and
P(x ≤y) =1 then
a) E(x) ≤E (y) b) E (x) >E (y)
c. E (x) = E (y) d) None of the above

Part B (answer all questions) weight 1


13 Expected value of a random variable x exists if ……………
14 If x is a random variable E (x-constant)2 is minimum when the constant is

15.Name the discrete distribution for which mean and variance have the same
value.
16 What is the third moment about the mean of a poison distribution if the
second moment about the origin is 12.
17. Identify the distribution (using the uniqueness property) if the name of
generating function of the distribution

12
is Mx(t)= (1+et ) 5/32
18. State the additive property of Binomial distribution.
19. Write down the pdf of the exponential distribution and write down its first
raw moments.
20. What are the points of inflexion of a normal curve N(µ,σ).
Part C
(Answer any 4 questions) Weight 2

21 If x and y are two independent random variables, show that


v (ax +by) = a2 v (x) +b2 v (y).
22. x and y are independent random variables with means 10 and 20, and
variances 2 and 3 respectively find the mean and variances of 3x+4y.
23. A symmetric die is thrown 600 times. Find the lower bound for the
probability of getting 80 to 120 sores.
24. For a binominal distribution, the mean is 6 and S. D is 2. Write out all the
parameters of the distribution.
25. Show that for the normal distribution the points of inflexion lie at a distance
of ± σ from the mean where σ is the S. D.
26. If x→ N (30,5) find the probability of │x-30│>5

Part D
(Answer any 4 questions) Weight 4

27. Show that Ey[Ex (X/Y)]= E(x)


28.Show that under certain conditions (to be stated) a Binominal distribution
tends to the poissons distribution .
29. Fit a poisson distribution to the following data .
Number of mistakes per page : 0 1 2 3 4 Total
109 65 22 3 1 200

13
CORE COURSE III: STATISTICAL INFERENCE – I

Module 1. Sampling Distributions: Random sample from a population


distribution, sampling distribution of a statistic, standard error, sampling
from a normal population, sampling distributions of the sample mean and
variance. Chi- square, student’s T and F distributions – derivations,
properties uses and inter relation ships. Central Limit Theorem for
independent and identically distributed random variables (Lindberg Levy
form)
30 hours
Module 2. Theory of Estimation: Point estimation, desirable properties of a
good estimator, unbiased, consistency, sufficiency. Fisher Neyman
factorization criterion(statement and application only), efficiency, Cramer
Rao Inequality
25 hours
Module 3. Methods of estimation – method of moments, method of
maximum likelihood, method of least squares. Properties of estimators
obtained by these methods –concept of Bayesian estimation.

14
20 hours
Module 4. Interval Estimation: Large sample confidence intervals for mean,
equality of means, proportions, equality of proportions. Derivation of exact
confidence intervals for means, equality of means, variance and ratio of
variances based on Normal, t, chi- square and F distribution 15hrs
Books for reference

1. V.K. Rohatgi: An introduction to Probability theory and Mathematical


Statistics, Wiley Eastern.
2. S.C. Gupta and V.K. Kapoor : Fundamentals of Mathematical Statistics,
Sultan Chand and sons
3. Mood A.M., Graybill. F.A and Boes D.C. : Introduction to Theory of
Statistics McGraw Hill
4. John E Freund: Mathematical Statistics (Sixth Edition), Pearson Education
(India),New Delhi.

15
Model Question Paper

B.Sc. Statistics (Main) Statistical Inference-I

Time 3hrs Core Course III Semester II

Part A
Answer all questions ,4 questions carry weight 1
1. The mean of a Chi – square distribution with n degrees of freedom is

( a ) 2n ( b ) n 2 ( c ) n (d ) n
2. The relation between student’s-t and F distribution is.
( a ) t( n) 2 = F( n ,1) ( b ) t( n ) 2 = F(1,n ) ( c ) t(1)2 = F(1,n ) ( d ) t( n ) 2 = F(1,1)
3. Let X 1 , X 2 ,..., X n be a random sample from a normal population N ( µ , σ 2 ) ,then the

∑ ( x − x)
2
i
distribution of is.
σ2
( a ) χ 2( n) ( b ) t( n) ( c ) χ 2( n−1) ( d ) t( n−1)

4. Let X 1 , X 2 ,..., X n be a random sample from an infinite population where

1
( )
2
s2 =
n
∑ xi − x ,the unbiased estimator for the population variance σ 2 is

1 2 1 2 n 2 n −1 2
(a) s (b ) s (c) s (d ) s
n −1 n n −1 n
5. If T is a consistent estimator of θ then
( a ) T is a consistent estimator of θ 2 ( b ) T 2 is a consistent estimator of θ
( c ) T 2 is a consistent estimator of θ 2 ( d ) None of the above

16
6. Let X 1 , X 2 ,..., X n be a random sample from a Bernoulli population. A sufficient
statistics for p is
( a ) ∑ X i ( b ) ∏ X i ( c ) Max( X 1 , X 2 ,..., X n ) ( d ) Min( X 1 , X 2 ,..., X n )

7. Let X 1 , X 2 ,..., X n be a random sample from U ( 0, θ ) , the m.l.e of θ is

( a ) ∑ X i ( b ) ∏ X i ( c ) Max( X 1 , X 2 ,..., X n ) ( d ) Min( X 1 , X 2 ,..., X n )


8. The 95% confidence interval for mean µ of a normal population N ( µ , σ 2 ) with

known σ 2

( a ) x ± 2.33 σ ( b ) x ± 1.96 σ ( c ) x ± 2.58 σ ( d ) x ± 1.65 σ


n n n n
9. The mean difference between 9 paired observations is 15 and standard deviation of
differences is 5. Then the value of the t statistic used in paired t test is
( a ) 27 ( b ) 9 ( c ) 3 ( d ) 0
10. A sample of 12 specimen taken from a normal population is expected to have a mean
50mg/cc. The sample has a mean 64 mg/cc with a variance of 25 .to test
H 0 : µ = µ0 aganistH1 : µ ≠ µ0 , you will choose

( a ) Z − test ( b ) t − test ( c ) χ 2 − test ( d ) F − test


11. A random sample of size 20 from a nor mal population gives a mean 42 and a
variance 25.Then the value of the χ 2 statistic used for testing the significance of
population variance is
( a ) 7.81 ( b )15.62 ( c ) 51.20 ( d )14.36
12. If X>1is the critical region for testing H 0 : θ = 2 aganistH1 : θ = 1 on the basis of the

single observation from the population f ( x, θ ) = θ eθ x , x > 0 ,then the value of type I
error is
( a ) e ( b ) e2 ( c ) e−2 ( d ) e−1
Part B
Answer all questions ,each questions carries weightage 1
13.Let X 1 , X 2 be a random sample of size 2 from N ( 0,1) .Then the distribution of

( X 1 + X 2 ) is-------------
2

( X1 − X 2 )
2

14. Tn a consistent estimator for the parameter θ if------------

17
15.Let X 1 , X 2 , X 3 be a random sample of size 3 from N ( µ , σ 2 ) .he efficiency of

X1 + 2 X 2 + X 3 X + X2 + X3
relative to 1 is------------
4 3
1 X −θ
16.Let X 1 , X 2 ,..., X n be a random sample from the population with pdf f ( x,θ ) = e ,
2
The m.l.e of θ is---------
17.The diameter of a cylindrical rod is assumed to be normally distributed with a variance
of 0.04cm. A sample of 25 rods has a mean diameter of 4.5 cms.95% confidence interval
for population mean is -----------
18.The power of a test is ----------
19.Degrees of freedom for chi-square in case of contingency table of order 4x3 is ---
20.In tossing of a coin ,let the probability of a head turning up be p .the hypotheses are
H 0 : p = 0.4 aganistH1 : p = 0.6 . H0 is rejected if there are five or more heads in six
tosses. Then probability of type I error is----------

PartC
Answer any 4 questions ,each questions carries a weightage of 2
21.Obtain the distribution of the sample mean of a random sample X 1 , X 2 ,..., X n of size n

from N ( µ , σ 2 ) .

22.Define unbiased estimator. Let X 1 , X 2 ,..., X n be a random sample of size n from

B (1, p ) .Let T = ∑ X i .

T (T − 1)
Show that is an unbiased estimator of p2.
n(n − 1)
23.Define sufficient statistic. Let X 1 , X 2 ,..., X n be a random sample of size n from

U ( 0, θ ) .Find a sufficient statistic for θ

24.An oil company claims that less than 20% of all car owners have not tried its gasoline
.Test this claim at the 0.01 level of significance if a random check reveals that 22 out of
200 car owners have not tried oil company’s gasoline.
25.In the comparison of two kinds of paint ,a consumer testing service finds that four 1-
gallon cans of one brand cover on the average 546 square feet with a standard deviation of
31 square feet ,whereas four 1-gallon cans of another brand cover on the average 492
square feet with a standard deviation of 26 square feet. Assuming that the two populations
sampled are normal and have equal variance. Test the hypothesis that on the average the
first kind of paint covers a greater area than the second.
26. Mention the advantages of non-parametric tests over parametric test.

18
Part D
Answer any 2 questions ,each questions carries 4 credit
27 Let X 1 , X 2 ,..., X n be a random sample of size n from N ( µ , σ 2 ) . Find the mle’s of

µ and σ 2 and examine whether they are unbiased and consistent.


28 Explain Interval estimation.Obtain 100(1 − α )% confidence intervals for the

parameter σ 2 of the normal population N ( µ , σ 2 ) .

29 State Cramer -Rao inequlity and give an example of its application

19
CORE COURSE IV: STATISTICAL INFERENCE – 2
1. Module 1. Testing of Hypotheses; concept of testing hypotheses, simple and
composite hypotheses, null and alternative hypotheses, type I and type II
errors, critical region, level of significance and power of a test, most
powerful test, Neyman Pearson theorem and its simple applications. Concept
of p value
35 hours
2. Module 2. Large sample tests concerning mean, equality of means,
proportions, equality of proportions. Small sample tests based on t
distribution for mean, equality of means and paired mean for paired data.
Tests based on F distribution for ratio of variances. Test based on chi-
distribution for variance, goodness of fit and for independence of attributes
and homogeneity of proportions. Test for correlation coefficients- Z
trasformation
35 hours
Module 3. Non parametric tests: Basic idea of distribution free method.
Kolmogorov Smirnov test-one sample and two sample sign tests. Wilcoxen
matched pairs signed rank test- Kruskal Wallis test and test for randomness
(run test).
20 hours
Books for reference

1. V.K. Rohatgi: An introduction to Probability theory and Mathematical


Statistics, Wiley Eastern.
2. Goon A.M., Gupta.M.K., and Das Gupta: Fundamentals of Statistics Vol. I.
the World Press, Culcutta.
3. S.C. Gupta and V.K. Kapoor : Fundamentals of Mathematical Statistics,
Sultan Chand and sons
4. Gibbons J.D.: Non parametric Methods for Quantitative Analysis, McGraw
Hill.
5. John E Freund: Mathematical Statistics (Sixth Edition), Pearson Education
(India),New Delhi.

20
Model Question Paper

B. Sc. Statistics (Main)


IV Semester
COURSE IV: STATISTICAL INFERENCE – 2

Time: 3 Hrs

PartA
(Answer all questions)
(Contains 12 questions, 4 questions carry a weightage of 1)

1. In a chi-square contingency table with 3 rows and 5 columns, the d.f of chi-square
statistic is
a) 15
b) 24
c) 8
d) 7

2. The chi-square test statistic for a goodness of fit test is given by:

21
Oi − Ei
a)
Ei
Oi − Ei
b) ∑ Ei2

(Oi − Ei )2
c) ∑ Ei2

(Oi − Ei ) 2
d) ∑ Ei

3. In a Poisson goodness of fit test having ‘k’ sets of observed frequencies with estimated
value of λ , the chi-square statistic has d.f.
a) k-2
b) k
c) k-1
d) k-2

4 The basic assumption for a non-parametric test is:


a) The variable is continuous
b) The variable is discrete
c) The variable is normal
d) The variable is standard normal
5. The non-parametric equivalent test for a paired t-test is:
a) Signed Rank test
b) Rank sum test
c) Run test
d) Sign test

6. The test used to check the randomness of the collected set of symbols is:
a) Sign test
b) Rank sum test
c) Signed rank test
d) Run test

7 When there are 3 groups, each following normal distribution, and the null hypothesis is
concerned with the equality of means the test used is:
a) Chi square test
b) t-test for equality of means
c) Analysis of variance
d) none of the above

8. The mean of a Chi – square distribution with n degrees of freedom is

( a ) 2n ( b ) n 2 ( c ) n (d ) n
22
9. The relation between student’s-t and F distribution is.
( a ) t( n) 2 = F( n ,1) ( b ) t( n ) 2 = F(1,n ) ( c ) t(1)2 = F(1,n ) ( d ) t( n ) 2 = F(1,1)

10 The mean difference between 9 paired observations is 15 and standard deviation of


differences is 5. Then the value of the t statistic used in paired t test is
( a ) 27 ( b ) 9 ( c ) 3 ( d ) 0
11 A sample of 12 specimen taken from a normal population is expected to have a mean
50mg/cc. The sample has a mean 64 mg/cc with a variance of 25 .to test
H 0 : µ = µ0 aganistH1 : µ ≠ µ0 , you will choose

( a ) Z − test ( b ) t − test ( c ) χ 2 − test ( d ) F − test

12. If X>1is the critical region for testing H 0 : θ = 2 aganistH1 : θ = 1 on the basis of the

single observation from the population f ( x, θ ) = θ eθ x , x > 0 ,then the value of type I
error is

( a ) e ( b ) e2 ( c ) e−2 ( d ) e−1

Part B (Answer all questions) Weightage 1

13. In chi-square test of independences of 2 attributes with 2 observations each, the d.f of
the test statistic is 1.

a) Say true or false.


b) Explain your answer.

14 In the case of sign test, the test statistic follows a binomial distribution.

a) Say true or false.


b) Explain your answer.

15 In χ 2 test of goodness of fit if the calculated value of χ 2 is zero, then it is a bad fit.

a) Say true or false.


b) Explain your answer.

23
c) Let X 1 , X 2 be a random sample of size 2 from N ( 0,1) .Then the distribution of

( X1 + X 2 ) is-------------
2

( X1 − X 2 )
2

16..The power of a test is ----------


17. Degrees of freedom for chi-square in case of contingency table of order 4x3 is ---
18. In tossing of a coin ,let the probability of a head turning up be p .the hypotheses are
H 0 : p = 0.4 aganistH1 : p = 0.6 . H0 is rejected if there are five or more heads in six

tosses. Then probability of type I error is----------


19. Define Type-II error.
20.Write down the test statistics of paired t test naming the notations.

PartC (Answer 4 questions out of 6) weight 2

21. What is the null hypothesis for a chi-square test of homogeneity of proportions and
give the layout of observations.

22. Mention the advantages of non-parametric tests over parametric test.

23. Give an example for a paired t test. Give the test statistics and explain the notations

24. An oil company claims that less than 20% of all car owners have not tried its gasoline
.Test this claim at the 0.01 level of significance if a random check reveals that 22 out of
200 car owners have not tried oil company’s gasoline.
25. In the comparison of two kinds of paint ,a consumer testing service finds that four
1-gallon cans of one brand cover on the average 546 square feet with a standard
deviation of 31 square feet ,whereas four 1-gallon cans of another brand cover on the
average 492 square feet with a standard deviation of 26 square feet. Assuming that the two
populations sampled are normal and have equal variance. Test the hypothesis that on the
average the first kind of paint covers a greater area than the second.
26. Mention the advantages of non-parametric tests over parametric test.

Section 4 (Answer 2 questions out of 3 ) weight 4

24
27.. A factory operates in three shifts. The factory manager feels that quality of part is
related to shifts. For this purpose he has collected the following data from the past
records of production.

No. of Parts

Good Bad
Shift Day
900 130
Evening
Night 700 170

400 200

Test whether the quality of parts produced is independent of shifts.

28.. Fifteen patient records from each of two hospitals were received and assigned a score
designed to measure level of care. The scores were as follows:-

Hospital 99 85 73 98 83 88 99 80 74 91 80 94 94 98 80
A:

Hospital 78 74 69 79 57 78 79 68 59 91 89 55 60 55 79
B

Use a proper non-parametric test to see whether the two populations are identical with
respect to the level of care.

29. Describe Kuder-Richardson’s method of assessing the reliability of a test.

CORE COURSE V: MATHEMATICAL METHODS IN STATISTICS

1. Module 3. Real valued functions: Limit, continuity and differentiability of


real valued functions of one variable. Uniform continuity, Rolle’s theorem,
Mean Value theorem and Taylor’s theorem-Maclaurin’s thereom- expansion
of a function as a power series- simple examples
30 hours
2. Module 3. Riemann Integral: Definition, integrability of continuous
functions, monotonic functions,. Properties of integrals. First mean value

25
theorem and fundamental theorem of integral calculus.
20
hours
3. Module 3. Complex Numbers: Analytic functions – Cauchy Riemann
equations – Cauchy’s integral formula – Taylor and Laurent’s series
expansion – fundamental theorem of algebra – poles and singularities –
contour integration – simple problems.
40 hours
Books for reference

1. Malik S.C.: Principles of Real Analysis, New Age International

2. Shanti Narayan : A Course of Mathematical Analysis

3. Shanti Narayan : Elements of Real Analysis

4. Rudin W: Principles of Mathematical Analysis

5. Kasana H.S. : Complex variables, Prentice Hall.


6. Kresig: Engineering Mathematics

Model Question Paper

B.Sc. STATISTICS
Semester III
Core Course V – Mathematical Methods
Part A
(Answer all questions) weight 1 for a bunch of 4 questions
1
x
1. e
The value of lim is
x − > 01 + e1 / x
a) 0 b) 1 c) .2 d) doesnot exist
2. If lim f(n) exists and lim f(n) ≠ f (c) , them f (x) has n->c
a) Discontinuity if first kind at x =c b) Discontinuity of Second at x =c
c) Removable disconitunity at x =c d) None of these

26
3. If f (x) ‘ { 1, when x is irreational then -1, when x is rational

4. The function of f(x) =x2


a) Is not uniformly continuous on (-1,1)
b) Is uniformly continuous on (-1,1)
c) Has removable discontinuity at x =0
d) Has discontinuity if first kind at x =0
5. A function which is continuous on a……………….interval is also uniformly
continuous on that interval
a) Open b) Closed c) Left open d) Right open
6. The function f(n) = 1x1 is
a) Differentiate at every point on R b) Differentiable on (-1,1)
c) Not differentiable on x>o d) Not differentiable at x =0
1
7. The function defined by f(n) = { x sin /x ; x ≠ 0 is 0 ;x=0
a) Not continuous and derivable at x =0
b) Derivable but not continuous at x =0
c) Continuous but not derivable at x =0
d) Continuous at derivable at x =0
8. If f(n) is derivable at x =c and f ( c ) ≠ 0, then
1 1
a) in not derivable at x =c b) is derivable at x =c
f ( n) f (n)

1 1
c) in not derivable at x ≠ c d) in not derivable at x ≥ o
f ( n) f ( n)
9. The function defined by f(n) = { 0 when x in rational 1 when x is irrigationed
a) Is integrable on any interval on R
b) Is not integrable on any interval on R
c) Is integrable on (0,00)
d) Is not integrable on (0,00)
10. If f(n) is integrable on (a,b), then
a) If (x) is also integrable on (a,b)
b) If (x) is is not integrable on (a,b)
c) If (x) is integrable on (a,b) only if a ≠ o
d) Can not say integrability if if (x) on (a,b)
11. If
∫ f (n) dn = F(b) – F (a) , then F (.) is called
b
n

a) Integral of f (n) b) Upper sum limit of f(n)


c) Primitive if f (n) d) Refinement of f (n)
12. If f(n) and g (x) are integrable on (a,b) then
a) f+g is integrame where as f- g is not enegrable on (a,b)

27
b) Both f+g and f- g are not integration on (a,b)
c) Can not say about the integrability of f +g and f-g on (a,b)

Part- B
( Answer all questions) weight 1
13. Define uniform continuity
14. State Rolle’s Theorem
15. Write Taylor’s Series if f(n) in powers of (n-a)
16. What is meant by Partition of an interval
17. When will you say integral if f(n) exist)
18 What do you mean by Analytic functions
19 State Cauchy’s integral formula.
20. Define contour

Part- C (Answer any 4 questions) weight 2


21. Discuss the continuity of f(n) = { -n2 ; n ≤ o 5x-4 ; 0<x ≤ 1
4x2 – 3x ; 1<x ≤ 2 3x+4 ; x >2
22. Examine Lagrange’s Mean value theorem for f(n) = 2x2- 7x +10 for 2 ≤ x ≤ 5
23. State and prove the first mean value theorem of integral calculus.
24. 2
Evaluate ∫
1
(3n + 1)dn by partitioning the range into n subintervals if length 1/n

25. Examine whether f(2) = (x2- y2) + i (2 ny) satisfies C- R equation


26. Explain singularities if complex junction
Part- D (Answer any 2 questions) weight 4
27. State and Prove Rolle’s Theorem. Verify Rolle’s Theorem for f(n) = n2- 4x on (-2,2)
28. Obtain an infinite series expansion if log (1+x) and log (1-x) using Maclaurin’s
expansion.
29. State and prove Cauchy’s Integral formula.

28
CORE COURSE VI
INFORMATICS AND NUMERICAL MATHEMATICS

Module 1. Programming in C: Algorithm and flow charts – structure of C


programme – executing the C programme
10 hours
Module 2 Numeric constants, variables and data types. Arithmetic operators
and expressions – managing input/output operations
10 hours
Module 3. Conditional operators: Relational operators, loops, one –
dimensional and two dimensional arrays, logical operators and expressions
10 hours
Module 4. Functions: Library functions – mathematical functions – defining
and using functions.
10 hours
Module 5. Simple programmes – summation of series – solution of quadratic
equation – matrix addition and multiplication. Calculation of mean, median,
variance, covariance, correlation and regression coefficients.

29
20 hours
Module 6. Numerical Analysis : Operators E and Delta and their basic
properties.Divided differences. Interpolation formulae: Newton’s forward
and backward formulae, Lagrange’s formulae, Newton’s divided difference
formula Numerical Integration: Trapezodial rule, Simpson’s 1/3rd and 3/8th
rules and Weddle’s rule
30 hours
Books for reference

1. B. Ramu : Computer Fundamentals, Wiley Eastern

2. V. Rajaram: Fundamentals of Computers, Prentice Hall

3. E. Balaguruswamy : Programming in C, Tata McGraw Hill

4. V. Rajaram : Programming in C, Tata McGraw Hill

5. James Scarborough : Numerical Mathematical Analysis, Oxford IBH


Publishing Company
6. Milne – Thomson : Calculus of finite differences

30
Model Question Paper

B.Sc. STATISTICS
Semester III
Core Course VI
Time 3hrs Informatics and Numerical mathematics

Part A (Answer all questions)


Weight 1 for abunch of 4 questions

1. In a (program, a comment line in provided by using


a. Character */………*/ b. Characters **…………..**
c. Characters /*……….*/ d. Characters //………….//
2. Which is a valid C decimal integer constant
a. 46,711 b. 123.00 c. 0624 d. – 5126
3. Choose the correct variable name allowed in C
a. Sum – 1 b. Sum–1 c. Sum,1 d. Sum .1
4. Which of the following date type is not allowed in C
a. Float b. Double c. Char d. real

5. In C language the meaning of Scanf is


a. Reading function b. Writing Function
c. Reading in a formatted way d. Writing in a formatted way
6. In C, all input/output functions are stored in
a. Stdio.h b. conio. h c. stdlin.g. d. ctype.g.
7. The operator used in scan f along with variable name

31
a. ? b. /t c) & d) 01

8 The expression (EI) && (EII) is true when


a. Both EI & EII are true b. EI in true but EII is not true
c. EII in true but EI is not true d. EI or EII is true
9 A = Pow (x,y) returns

10 The relation between E& B is


a. E=1-∆ b. ∆ = E+1 c. ∆+ E =1 d. E - ∆ =1
11. Which of the following formula in independent of difference table
a. Language’s b. Newton’s c. Gauss’s d. Stirlings
12. For applying Newtons formard formula, origin if arguments should be shifted to
a. X0 b. Xn c. Middle of X0, X1, Xn d. Arbitary point

Part- B (Answer All Questions in a Sentence /Phrase)Weight 1


13. What is a symbolic constant used in C.
14. What is the use of get chac in C.
15. Construct an expression for finding the true status of (AxB) / (CxD) if A in positive
and C or D in non Zero.
16. Write down the statement for storing
i) Marks of 100 students of a school ii) a 3x7 matrix in proper syntax
17. Distinguish between Branching and Looping.
18. Write down Newtons Divided difference formula.
19. b
If ∫
a
f ( x)dx is to be evaluated by dividing the range into n equal parts , them h is?
20. Write down Weddle’s rule for integration .
Part- C (Answer any 4 questions )Weight 2
21. Explain briefly the structure of a C program
22. Write a program to output the A.M. of a set of 10 observations.
23. Construct a loop to find S = ε 4ε 5ε 6 (i2 + j2 + k2) using 1=1 j= -1, k= 1 while
statement
24. Find the value of f(a) inf f(x) is
x: 2 4 7 12
f(x) -4 16 196 1296 using Lagrange’s Formula
25. In usual notations, show that i) (1+∆) (1-∆) = 1 ii) E-1 = 1- ∆
Part- D (Answer any 2 questions ) weight 4

27. Write a program in C language to obtain, variance, co-variable, correlaton coefficient


of a byvariate data.

28. Write a program in c language to obtain the value of


1 1 1
S = 1+ + + + ..............
1x 2 2 x3 3 x 4
ii) Derive simpsons 1/3rd rule of Numerical Integration.

29. Given
Weight (lbs) : 20-40 40-60 60-80 80 -100 100- 120
32
No. of Students 25 120 100 70 30
Estimate i) No: if students having weight less than 32 lbs.
ii) No: if students having weight more than 105 lbs.

CORE COURSE VII: SAMPLE SURVEYS

1. Module 1. Census and sampling, principal steps in a simple survery –


probability sampling, judgement sampling, Organisation and execution of
large scale sample surveys sampling and non sampling errors, preparation of
questionnaire
20 hours
2. Module 2. Simple random sampling with and with out replacement, methods
of collecting simple random samples – unbiased estimates of the population
mean and population total – their variances and estimates of these variances
–simple random sampling for proportions.
20 hours

3. Module 3. Stratified random sampling: Estimation of the population mean


and population total – proportional and Neyman allocation of sample sizes –
cost function – optimum allocation considering cost – comparison with
simple random sampling.
20 hours
4. Module 4. Systematic sampling: Liner and circular systematic sampling
comparison of systematic sampling with simple random sampling.
10 hours

33
5. Module 4. Cluster sampling: Clusters with equal sizes – estimation of
population mean and total comparison with simple random sampling two
stage cluster sampling – estimate of the variance of the population mean.
20 hours

Books for reference

1. Murthy M.N.: Sampling Theory and Methods, Statistical Publishing Society,


Culcutta.
2. Cochran W.G.: Sampling Techniques, Wiley Eastern.
3. S.C. Gupta and V.K. Kapoor : Fundamentals of Mathematical Statistics,
Sultan Chand and sons
4. Daroja Singh and F.S.Chaudhary: Theory and Analysis of Sample Survey
Designs, Wiley Eastern Limited.

34
Model Question Paper

B.Sc. STATISTICS
I Semester IV
CORE COURSE VII SAMPLE SURVEYS

Part A (answer all questions)


Bunch of 4 questions has weight 1

1. Sampling is inevitable in the situation:


(a) Blood test of a person
(b) When the population is infinite
(c) Testing of life of dry battery cells
(d) all the above
2. Probability of drawing a unit at each selection remains same in:
(a) srswor (b) srawr
(c) both(a) and (b) (d) neither (a) and (b)
3. Probability of including a specified unit in a sample of size n selected out of N units is:
(a) 1/n (b) 1/N
(c) n/N (d) N/n
4. In simple random sampling with replacement, the same
sampling unit may be included in the sample:
(a) only once (b) only twice
(c) more than once (d) none of the above
5. Greatest drawback of systematic sampling is that:
(a) One requires a large sample
(b) datas are not easily accessible
(c) no single reliable formula for standard error of mean is available
(d) None of the above

6 Which of the following statement is correct?


(a) Systematic sample is superior than stratified random sample
(b) Simple random sample is inferior than systematic sample
(c) Stratified random sample is better than systematic sample
(d) none of the above
7. In srswor the variance of the sample mean is
S2 N −n S2 N −n S2 N −n S2 n − N
(a) (b) (c) (d)
n N N n n N2 n N

8. In srswor Var(p) is
PQ N − n PQ N − 1 PQ N − n PQ N − n
(a) (b) (c) (d)
n N −1 n N −n N N −1 n N1 − 1
9. The total number of samples of size n = 2 from a population of N = 6 is:

(a) 12 (b) 15 (c)18 (d) 10

10. Consider a population of 6 units with values 1, 2, 3, 4, 5, 6. The


value of S2 in question is:
(a) 3 (b) 3.5 (c) 4 (d) 5
11. Consider a population of 6 units with values 1, 2, 3, 4, 5, 6. The
value of σ 2 is:

35
(a) 2.9 (b) 2.8 (c) 2.7 (d) 3

12. Consider a population of 6 units with values 1, 2, 3, 4, 5, 6. The


variance of the sample mean is:
(a) 1.2 (b) 1.1 (c) 1.3 (d) 1

PART-B

Answer all questions weight 1

13. Optimum allocation is also known as………...


14. Cluster sampling helps to ……….cost of the survey.
15. Precision of estimates …………….by proper stratification.
16. Non sampling error arises due to ………..of data.
17. A sample of 30 students is to be drawn from a population consists of 300 students belonging to
two colleges of strength 200 and 100 respectively. What is the value of n1 and n2 if we use
proportional allocation?
18 In a Systematic sampling N = 40 and n = 4 what is k.
19 If the population consists of a linear trend state the relationship of variance of sample means.
20. In cluster sampling the population is divided into…….

PART-C
(Answer any four questions) weight 2 21. Explain the
concept of stratified sampling.
22. What is the difference between cluster and systematic sampling?
23 Derive the expression for variance of sample mean in srswor.
24 Show that sample mean is an unbiased estimate of population
25 What are the advantages of sampling over census.
26 List out the simple random samples for the data given in question
PART – D
(Answer any two questions) weight 4

27 What are the Principal steps in a sample survey?


nk − 1 S 2
28 Show that V ( y sys ) = [1 + (n − 1)ρ ] where ρ is the interclass correlation between
nk n
the units of the same systematic sample.
29 In question No. 17 if the means are30 and 60 and standard deviations are 10 and 40
respectively, obtain the variance of sample mean and compare its efficiency with srswor.

CORE COURSE VIII: OPERATIONS RESEARCH AND STATISTICAL


QUALITY CONTROL

36
1. Module 1. Linear programming: Mathematical formulation of LPP,
Graphical and Simplex methods of solving LPP – duality in linear
programming
20 hours
2. Module 2. Transportation and Assignment problems: North – west corner
rule, row column and least cost method – Vogel’s approximation method.
Assignment problem Hungarian algorithm of solution
20 hours
3. Module 3. General theory of control charts, causes of variations in quality,
control limits, sub grouping, summary of out- of control criteria, charts of
attributes, np chart, p chart, c chart. Charts of variables:X bar chart, R chart
and sigma chart. Revised control charts. Applications and advantages.
25hours
4. Module 4. Principles of acceptance sampling – Problems and lot acceptance,
stipulation of good and bad lots- producers’ and consumers’ risks, simple
and double sampling plans, their OC functions, concepts AQL, LTPD,
AOQL, Average amount of inspection and ASN function 25 hrs

Books for reference

1. Gupta and Manmohan : Linear Programming, Sultan Chand and Sons

2. Hadley. G. : Linear Programming, Addison – Wesley

3. Taha: Operations Research, Macmilan

4. V.K. Kapoor: Operations Research, Sultan Chand & Sons

5. S.C. Gupta & V.K.Kapoor: Fundamentals of Applied Statistics, Sultan

Chand & Sons

Model Question Paper

B.Sc. STATISTICS
Semester IV
CORE COURSE VIII OPERATIONS RESEARCH AND
37
STATISTICAL QUALITY CONTROL
Part A
Time 3hrs Answer all questions (Weight 1 for bunch of 4)

1. In the graphical solution of a lpp optimum solution lies at the


a) in a convex set b) outside a convex set c) at the extreme point of the convex set c)
none of them.
2. Transportation problem is.
a) a lpp b) an assignment problem c) Quadratic programming problem d) Dynamic
programming problem.
3. VAM method is used to solve a
a) Assignment problem b) transportation problem c) usual lpp d) none of them

4. Dual of a dual is
a) slack b) surplus c) artificial d) primal.³σ
5. In a control chart the manageable cause is
a) assignable cause b) random cause c) chance cause d) none of them
6. A control chart for fraction defectives is said to be in control if the points lie within
a) X‾± 3σ b) p’±3np’q’ c)p’±3√p’q’/n d)c±3√c
7. The spread of a process is given by
a) 3σ b) 6σ c)2σ d) 1.96σ
8. Upper control limit for R Chart is
a) A2R‾ b) A1R‾ c) D3R‾ d) D4R‾
9. Consumers risk is usually denoted by
a) µ b)∂ c) β d) α
10. The acceptance sampling plan is used for
a) Identifying good lots b) protecting the consumers interest c) protecting the producers
interest
d) All of the above
11. The Consumers risk usually fixed at
a) .05 b).01 c).95 d) .99
12. The OC curve gives
a) proportion of bad lots b) proportion of good lots c) discriminating power of the
sampling plan
d) none of them.
Part B ( answer all questions ,weight 1)
13. The inequality constrains are made equality in a lpp using---------- variables
14. If maximization lpp problem can be increased infinitely the problem is said to have---
--------- solutions
15. A sampling plan in which we take a decision based on one sample only is called-------
------

16. In a non degenerate transportation problem with m rows and n columns the number
allocations
will be----------
17. Expand the term LTPD
18. The method used to solve an assignment problem is called---------------
19. An artificial variable is used for--------------------
20. Chart used for number of defects is based on ---------- distribution
Part C ( answer 4 questions, Weight 2)
21. Define AOQ and LTPD.
22. Define the Linear programming problem.
23. What is double sampling plan?
24. Write the assignment problem as an lpp.
25. What are probability limits?

38
26. What is an unbalanced transportation problem?
Part D ( answer any 2 questions, weight 4)
27. Distinguish between double and single sampling plans.
28. Draw the OC curve of the single sampling plan showing the consumers and producers
risks.
29. Find the initial basic feasible solution of the following transportation problem. There
are four origins three destinations. The availabilities are 9,10,8,7and the requirements
are 17,10,7 respectively.

A B C

D 2 3 2
E 1 3 4
F 2 3 1
G 2 4 3

1. Grant E.L. : Statistical Quality Control, McGraw Hill

2. Duncan A.J.: Quality Control and Industrial Statistics, Taraporewala Sons

3.Montgomery D.C: Introduction to Statistical Quality Control, John Wiley &

Sons

CORE COURSE IX : TIME SERIES AND INDEX NUMBERS

1. Module 1. Time Series analysis: Economic time series, different


components, illustrations, additive and multiplicative models,
determination of trend, growth curves, analysis of seasonal fluctuations,
construction of seasonal indices.
25 hours

39
2. Module 2. Analysis of income and allied distributions- Pareto distribution
, graphical test, fitting of Pareto’s law, illustrations, log normal
distribution and properties. Lorenz curve, Gini’s coefficient.
20 hours

3. Module 3. Index Numbers: Meaning and definition – uses and types-


problems in the construction of index numbers- simple aggregate and
weighted aggregate index numbers. Test of consistency of index
numbers- factor reversal- time reversal test and unit test. Chain base
index numbers- Base shifting- splicing- and deflating of index numbers.
Consumer price index numbers- family budget enquiry- limitations of
index numbers.
30 hours
4. Module 4. Attitude Measurements and scales: Issues in attitude
measurements-scaling of attitude-Guttman scale-Semantic differential
scale-the Likert Scale- selection of appropriate scale- limitations of scales
15 hours

Books for reference


1. SC Gupta and V.K. Kapoor: Fundamentals of Applied Statistics, Sultan
Chand & Sons
2. Goon A.M., Gupta M.K. and Das Gupta: Fundamentals of Statistics Vol.II
The World Press, Culcutta.
3. Box, G.E.P. and G.M. Jenkins: Time Series Analysis, Holden –Day
4. Meister David: Behavioural Analysis and Measurement Methods, John
Wiley, New york
5. Luck D.J. et al: Marketing Research, Prentice Hall of India, New Delhi

40
Model Question Paper

B.Sc. STATISTICS
Semester IV
CORE COURSE- IX OPERATIONS RESEARCH AND
STATISTICAL QUALITY CONTROL
Part A
Time 3hrs Answer all questions (Weight 1 for bunch of 4)

1. Gini coefficient is a measure of

a) Statistical dispersion b) kurtosis c) skewness c) none

2. Lorenz curve is agraphical representation of

a) Cumulative distribution function b) probability density function c) income

distribution d) none.

3. ‘Business cycle’ is an example of

41
a) Trend b) Seasonal Variation c) Cyclic variation d) Random variation

4. In method of Semi- Averages, Trend in assumed to be

a) Linear b) quadratic c) Exponential Growth d) None of these

5. Which of the method can be used for getting trend values for each given time point

a) Method of simple averages b) Method of moving averages

c) Method of least square curve filling d) All the above

6. Non- centered moving averages are due to

a) Odd period b) Even period

c) Odd no: if time point d) even no : if time points

7. Seasonal variations are periodic due to

a) Man made customs, habits, rituals etc

b) Resulting due to Natural reasons

c) Resulting due to change in weather condition

d) Any force that operate regularly year after year

8. Seasonal variation is measured using

a) Seasonal Averages b) Seasonal Indices

c) Seasonal Relatives d) None of these

9. A monthly seasonal variation measures are adjusted to

a) 12 b) 120 c) 1200 d) None of these

10. A model of time- series explains the ……………….relation between value of variable and
time series components

a) Additive b) Multiplicative c) Mathematical d) None of these

11. If an Index Number I o1 = 112, then it means

a) 12 % growth from base to current year

b) 112 % growth from base to current year

c) 88 % depreciation from base to current year

d) 12 % depreciation from base to current year

12. Which of the following is called ideal Index Number

a) Laspeyre’s b) Parshe’s c) Fishers’ d) Kelly’s

Part- B (Answer all questions) weight 1

13. Give an example each for seasonal and cyclic variation in a time – series

42
14. Define period of Moving average.

15. Give any three examples of irregular variation affecting a Time- series data.

16. Name a seasonal variation measure.

17. Give the formula for converting chain base into fixed base and fixed base into chain base

Index numbers.

18. Why base shifting is necessary for Index Numbers.

19. Why Index Numbers are called Economic Barometers.

20. Give three major limitation of Index Numbers.

Part- C (Answer any six questions)weight 2

21. How trend in measured using Moving Averages.

22. Explain periodic variations in Time- Series with suitable examples.

23. Explain the Link Relative Method of measuring seasonal variation.

24. Explain the uses of Index Numbers.

25. With the help of an Index Number formula, explain Time and Factor Reversal Tests.

26. Explain the use for developing Cost of Living Index Numbher.

Part- D (Answer any 2 Questions) weight 4

27. Given the following data related to yield of a crop in three different seasons.

Yield (Kg/10 cent plot)

Year Season 1 Season 2 Season 3

1990 12 19 17

1991 14 25 23

1992 13 27 20

1993 15 28 22

1994 17 31 24

i) If this trend is followed, what will be the expected yield in 1995?

ii) Does season influence yield of corp?

28. Briefly explain the use of Pareto distribution and its applications

29. Calculate the cost of Living Index Number for the data given below.

Rice

43
Year Season 1 Season 2 Season 3

Food 30 47 4

Fuel 8 12 1

Clothing 14 18 3

House Rent 22 15 2

Miscellaneous 25 30 1

CORE PAPER X: DESIGN OF EXPERIMENTS

1. Module 1. Linear Estimation – estimability of parametric functions and


BLUE – Gauss – Markov theorem – Linear hypothesis
25 hours
2. Module 2. Analysis of variance: One way and two way classification (with a
singles observation per cell). Analysis of covariance with a single
observation per cell.
25 hours
3. Module 3. Principles of design – randomization – replication – local control.
Completely randomized design – randomized block design – Latin Square
design. Missing plot technique – comparison of efficiency.
25 hours
3
4. Module 4. Basic concepts of factorial experiments :2 factorial experiments-
Duncan’s multiple range test
15 hours
Books for reference

44
1. S.C. Gupta & V.K.Kapoor: Fundamentals of Applied Statistics, Sultan

Chand & Sons.

2. Federer: Experimental Design

3. M.N. Das & N. Giri: Design of Experiments, New Age International

4. D.D. Joshy: Linear Estimation and Design of Experiments, Wiley Eastern

5. Montgomeri: Design of Experiments

Model Question Paper

B.Sc. STATISTICS
Semester IV

CORE COURSE- X DESIGN OF EXPERIMENTS

Part A
Time 3hrs Answer all questions (Weight 1 for bunch of 4)

1.Local control in experimental designs is meant to:


(a) increase the efficiency of the design
(b) reduce experimental error
(c) to form homogeneous blocks
(d) all the above
2.Errors in a statistical model are always taken to be:
(a) independent (b) distributed as N (0, )
(c) Both (a) and (b) (d) neither (a) and (b)
3.A completely randomized design is also known as:
(a) unsystematic design (b) non-restrictional design
(c) Single block design (d) all the above
4.A randomized block design has:
(a) Two way classification (b) one way classification
(c) Three way classification (d) no classification
5.In the analysis of data of a randomized block design
with r blocks and s treatments the error degrees of freedom
are
(a) r (s-1) (b) s(r-1) (c) (r-1) (s-1) (d) none of the above
6.Error sum of squares in RBD as compared to CRD using the
same material is
(b) more (b) less (c) equal (d) not comparable
7.The ratio of the number of replications required in CRD and
RBD for the same amount of information is
(a) 6:4 (b) 10:6 (c) 10:8 (d) 6:10
8 In a randomized block design with 4 blocks and 5 treatments
having one missing value, the error degrees of freedom will be
45
(a) 12 (b) 11 (c) 10 (d) 9
9.A Latin square design controls
(a) two way variation (b) three way variation
(c) multi way variation (d) no variation
10.While analysing the data of a latin square, the error degrees
of freedom in analysis of variance is equal to
(a) (r-1)(r-1) (b) r(r-1)(r-2) (c) 2r-2 (d) 2r –r-1
11.Two types of effects measured in a factorial experiment are
(a) main and interaction effects (b) simple and complex effects
(c) both (a) and (b) (d) neither (a) nor (b)

12.If the responses for treatments in a factorial experiment with


factors A and B each at two levels from three replications are,
0,0=18 1,0=17 0,1=25 1,1=30, the sum of square for the
interaction AB is equal to:
(a) 4 (b) 3 (c) 6 (d) 675

PART-B
answer all questions (weight 1)
13. Write down Gauss Markov Linear model.
14. State the necessary and sufficient condition for estimability
of Parametric function.
15. What are the principles of experimental design?
16. Write the expression for estimating missing value in LSD.
17. If there are two missing values in a RBD with 4 blocks and
5 treatments,
What will be the degrees of freedom of error sum

of squares?
18.In a LSD with 4 treatments and error sum of squares is 16,
find the Mean error sum of squares.
19.Write expression for efficiency of LSD compared to CRD

20.Given two factors A and B each at two levels, what is the


simple effect of B at the second level of A?

Part C

Answer any 4 questions (weight 2)

21.What is meant by analysis of variance of experimental data?


What are the assumptions used in it?
22 Give the analysis for completely randomized design
23. Derive the expression for estimating one missing
observationin RBD
24 Explain the efficiency of LSD compared to RBD

46
25How can estimate the effects and calculate the sum of squares
in factorial experiment ?

26.Grain yield per plant (gms) of maize of four varieties in a


RBD were as tabulated below. Analyse the experimental data
and interpret the result.

Varieties Rep. I Rep. II Rep. III

1 21 20 19

2 19 18 18

3 18 19 19

4 27 25 24

PART – D
(Answer any two questions)Weight 4

27. State and prove Gauss – Markov theorem.


28. Derive the analysis of variance of RBD.
29. Estimate the missing value in the following Latin Square
Design and then set up the analysis of variance.

A C B D
12 19 10 8

C B D _
18 12 6

B D A C
22 10 5 21

D A C B
12 7 27 17

47
CORE COURSE XI: POPULATION STUDIES AND ACTURIAL SCIENCE

1. Module 1. Sources of vital statistics in India – functions of vital statistics.


Rates and ratios mortality rates – crude, age specific and standard death rates
– fertility and reproduction rates –c rude birth rates – general and specific
fertility rates – gross and net reproduction rates
20 hours

2. Module 2. Life Tables : Complete life tables and its characteristics – A


bridged life tables and its characteristics principal methods of construction of
abridged life tables . Reed Merrel’s method
40 hours
3. Module 3. Fundamentals of insurance: Insurance defined meaning of loss,
peril, hazard and proximate cause in insurance. Costs and benefits of
insurance to society – branches of insurance. Insurable loss exposures –
feature of loss that is deal for insurance. Construction of mortality table –
computation of premium of life insurance for fixed duration and for the
whole life –
30 hours

Books for reference

1. S.C. Gupta & V.K.Kapoor: Fundamentals of Applied Statistics, Sultan

Chand & Sons

2. Benjamin B: Health and Vital Statistics, Allen and Unwin

3. Mark S Dorfman : Introduction to Risk Management and Insurance, Prentice

Hall

4. C.D. Daykin, T. Pentikainen et al: Practical Risk Theory of Actuaries,

Chapman and Hill.

48
Model Question Paper
B.Sc. STATISTICS
Semester V
CORE COURSE- XII POPULATION STUDIES AND ACTURIAL

SCIENCE
Part A
Time 3hrs Answer all questions (Weight 1 for bunch of 4)
1. Vital statistics is mainly concerned with

(a) births (b) deaths (c) marriages (d) all the above
2. Vital rates are customarily expressed as
(a) percentages (b) per thousand (c) per million (d) per ten thousand
3. The registration of births, deaths and marriages are
(a) a fancy of society (b) a part of medical research
(c) a legal document (d) all the above
4. The child bearing age in India is
(a) 20-24 years (b) 20-29 years (c) 13-49 years (d) 15- 49 years
5. The relation between N.R.R and G.R.R is
(a) N.R.R and G.R.R are usually equal (b) N.R.R can never exceed G.R.R
(c) N.R.R is generally greater than G.R.R (d) none of the above
6. Life-table has also been named as
(a) survival table (b) mortality table (c) life expectancy table (d) all the above
7. Normally a life-table is constructed for an age interval of
(a) five years (b) ten years (c) one year (d) 5-10 years
8. The central mortality rate ‘mx ’ in terms of qx is given by the formula
2q x 2q x qx qx
(a) (b) (c) (d)
2 + qx 2 − qx 2 + qx 2 − qx
9. The payment received by the insurer is known as
(a) loss (b) cost (c) premium (d) benefit
10. _______ is a condition that increases the frequency or severity of loss.
(a) peril (b) hazard (c) risk (d) loss exposure
11. Uncertainty of loss is known as
(a) probability (b) hazard (c) loss exposure (d) risk
12. The cause of loss is defined as
(a) hazard (b) risk (c) peril (d) claim

SECTION B
(Answer all the questions) Weight 1

49
13. Death rate computed for a specified section of the population is known as ______.
14. The ratio of instantaneous rate of decrease in lx to the value of lx is
defined as _______.
15. The expectation of life at any age can be obtained from a ________.
16. Pearle’s Vital Index = ________
17. An abridged life table usually consists of ages at distance of ________ years.
18. ______ is a financial arrangement that redistributes the costs of unexpected losses.
19. The insured’s possibility of loss is called the insured’s _______.
20. If the covered peril is death, the contract is called _______.

SECTION C
(Answer any four questions) weight 2
21. What are the various uses of vital statistics?
22. What is expectation of life? Distinguish between ‘curate expectation’ and ‘complete
expectation’ of life.
23. Define general fertility rate. Explain its merits and demerits.
24. What do you understand by an abridged life table?
25. Discuss the costs and benefits of insurance to society.
26. Explain life insurance and fire insurance.

SECTION D
(Answer any two questions) weight 4
27. Compute the crude and standardized death rates of the two populations A and B,
regarding A as standard population, from the following data:

Age-group A B
(Years) Population Deaths Population Deaths
under10 20,000 600 12,000 372
10-20 12,000 240 30,000 660
20-40 50,000 1250 62,000 1612
40-60 30,000 1050 15,000 525
above 60 10,000 500 3,000 180

28. Fill in the blanks in a portion of life table given below:

Age in years lx dx px qx Lx Tx e xo

50
4 95,000 500 ? ? ? 4,850,300 ?
5 ? 400 ? ? ? ? ?

29. Explain the chief characteristics of an ideally insurable loss exposure.

51
CORE COURSE XII: PRACTICAL

1. . Numerical questions from the following topics of the syllabi are to be asked

for external examination of this paper. The questions are to be evenly chosen

from these topics.

a. Small sample tests

b. Large Sample test

c. Construction of confidence intervals

d. Sample surveys

e. Design of Experiments

f. Construction of control charts

g. Linear Programming

h. Numerical Analysis

i. Time series

j. Index Numbers

2. . The students have to maintain a practical record book. Numerical

examples of the following topics are to be done by the students of sixth

semester class under the supervision of teachers and to be recorded in the

record book. The valuation of the record shall be done internally.

a) Small sample tests

b) Large sample tests

c) Construction of Confidence Intervals

d) Numerical Analysis

e) Sample surveys

f) Design of Experiments

52
g) Construction of Control Charts

h) Linear Programming

i) Time Series

Model Question Paper

B.Sc. STATISTICS
Semester VI

CORE COURSE- XII PRACTICALS

Calculators are permitted

Time : Three Hours


(Answer any Four Questions)

1(a) Compute chain index numbers with 1981 prices as base from the following table
giving the average wholesale prices of the commodities A,B and C for the year
1986 to 1990. (Wt-1)
Commodity Average Whole Sale Price (Rs)
1986 1987 1988 1989 1990
A 20 16 28 35 21
B 25 30 24 36 45
C 20 25 30 24 30
(b) Calculate seasonal indices by the ratio to moving average method (Wt-1)
Year 1Qtr II Qtr III Qtr IV Qtr
1998 68 62 61 63
1999 65 58 66 61
2000 68 63 63 67

53
2(a) In a study it is reported that 60 out of group of 1000 insured person died within an
year. Examine whether this justifies the assumption that less than 4% only are
likely to die with in an year, at 5% level of significance (Wt-1)
(b) A sample of 200 boys who passed SSLC examination has a mean marks 50 with
standard deviation 5. The mean marks for a sample of 100 girls was found to be 48
with standard deviation 4. Does this indicate any significant differences between
the abilities of hoys and girls, assuming that the standard deviations are the same,
at 5% level of significance (Wt-1)
3 (a) Tea accountants were given intensive earaching and two tests were conducted in a
month. The scores of test 1 and 2 are given below. (Wt-1)
S.No. of Accounts : 1 2 3 4 5 6 7 8 9
10
Marks in Ist Test : 50 42 51 42 60 41 70 55 62
38
nd
Marks in 2 Test : 62 40 61 52 68 51 64 63 72
50
Does the scores from test 1to test 2 shows an improvement? Test at 5% level of
significance.

(b) A random sample of 10 observations from a normal population is obtained as follows.


23:6 , 28:1, 21.0 , 27.8, 19.2, 22.2, 25.0, 23.0, 26.0 obtain (i) a point estimate of
the population mean (ii) a 99% confidence interval for the population mean.
(Wt-1)
4(a) A sample of 30 students, is to be drawn from a population consisting of 300
students belonging to two colleges A and B. The means and standard deviations
are given below.
Total No. of Mean Standard Deviation
Students
College A 200 300 10
College B 100 60 40
Draw a sample using proportional allocation. Hence obtain the variance of the
population mean and compare its efficiency with SRSWOR. (Wt-2)
5(a) An experiment was carried out on wheat with 3 treatments in it randomized blocks.
The plan and yield per plot are as follows.
Blocks
1 2 3 4
A(8) C (10) A(6) B (10)

54
C (12) B (8) B (9) A (8)
B (10) A (8) C (10) C (9)
Analyze the data and give your conclusions. (Wt-1)
(b) The following are the number of defects noted in the final inspection of 30 days of
woolen cloths:-
0,3,1,3,2,2,1,3,5,0,2,0,0,1,2,4,3,0,0,0
1,2,4,5,0,9,4,10,3 And 6
Draw suitable control chart. (Wt-1)

6. Solve the following lpp using Simplex method. (Wt-2)


Maximum z = 5x 1+3x2
Subject to 3x 1+5x2 ≤ 15
5x1+2x2 ≤ 10
x1+x2 ≥ 0

55
56
PROJECT

The following guidelines may be followed for Project work

1. The project is offered in the fifth and sixth semester of the degree course

and duration of the project may spread over the complete year

2. A project may be undertaken by a group of students, the maximum number

in a group shall not exceed five. However, the project report shall be

submitted by each student.

3. There shall be a teacher from the department to supervise the project and the

synopsis of the project should be approved by that teacher. The head of the

department shall arrange teachers for supervision of the project work.

4. As far as possible, topics for the project may be selected from the applied

branches of Statistics so that there is enough scope for applying and

demonstrating statistical skills learnt in the degree course.

The following books may be used to get an idea about projects and project

report writing.

1. C.R. Kothari: Introduction to Research Methodlogy, New age International

2. P.L. Bhandarkar and T.S. Wilkinson: Methodology and Techniques in Social

Research, Himalaya Publishing House.

ELECTIVE SUBJECTS

The Statistics Department where the B.Sc. (Statistics) programme is offered


can select one of the following Elective subjects in the sixth semester.
57
A. ACTURIAL SCIENCE
PROBABILITY MODELS AND RISK THEORY

Module.1 Individual risk model for a short time: Model for individual claim
random variables-Sums of independent random variable-
Approximation for the distribution of the sum-Application to
insurance 10hrs

Module.2 Collective risk models for a single period: The distribution of


aggregate claims-Selection of basic distributions-Properties of
compound Poisson distributions –Approximations to the distribution
of aggregate claims 15hrs

Module.3 Collective risk models over an extended period: Claims process-The


adjustment coefficient-Discrete time model-The first surplus below
the initial level-The maximal aggregate loss 15hrs

Module.4 Application of risk theory: Claim amount distributions-


Approximating the individual model-Stop-loss re-insurance-The
effect of re-insurance on the probability of ruin 14hrs

Books for study and reference:

Institute of Actuaries Act Ed. Study materials


McCutcheon, J.J., Scott William (1986): An introduction to Mathematics
of Finance
Butcher,M.V., Nesbit, Cecil. (1971)Mathematics of compound interest,
Ulrich’s Books
Neill, Alistair, Heinemann, (1977): Life contingencies.
Bowers, Newton Let al (1997): Actuarial mathematics, society of
Actuaries, 2nd Ed

Model Question Paper

Probability models And Risk Theory

Time: 3 Hrs

58
Part A
Choose the correct answer from the brackets
Bunch of four questions carries one weight age

1. Let X is the number obtained when one true die is tossed. Let y be the sum of the
numbers obtained when x true dice are then thrown. calculate E[y]
(a) 4/7 (b)7/4 (c)2/6 (d)3/6
2. Under certain assumptions, the probability of ruin is
Ψ(u)= (0.3) e-2u +(0.2) e-4u+(0.1)e-7u, u > 0. Calculate θ?
(a)2/3 (b)1/3 (c)1 (d)½
3. Suppose that λ = 3, C = 1 and P(x) = 1/3 e-3x +16/3 e-6x , x >0 Calculate P1
(a)3/27 (b)6/27 (c)4/27 (d)5/27
4. Suppose that λ = 1, C = 10 and P(x) = 9x/25 e-3x/5, x>0. Calculate θ
(a) 3 (b) 4 (c) 2 (d)5
5. Suppose that the claim amount distribution is discrete with P(1)=1/4 and
P(2)=3/4.If R=log 2.Calculate θ
(a) 10 -1 (b) 10 (c) 10 -1 (d)10
7log2 7log2 5log2 5log2
6. Suppose that Wi assumes only, the value 0 and +2 and that
P[W=0]=p,P[W=2]=q,where p+q=1,Assume that C=1,P>1/254
7. Consider an insurance portfolio that will produce 0, 1, 2, or three claims in a fixed
Time period with probabilities 0.1, 0.3, 0.4 and 0.2respectevely An individual
Claim will be of amount 1, 2, or 3 with probabilities 0.5, 0.4 and 0.1
Respectively Calculate E[N]
(a) 1.7 (b) 2.7 (c) 2.8 (d)1.6
8. Suppose that θ=2/5 and p(x)= 3/2e-3x + 7/2e-7x , x>0 calculate γ
(a) 2 (b)3 (c)4 (d)2.5
9. If S has a compound Poisson distribution given by λ=3,p(1)= 5/6,p(2)=1/6,
Calculate fs(x) for x=0
(a) 0.050 (b) 0.25 (c) 0.052 (d) 0.523
10 Consider an insurance portfolio that will produce 0, 1, 2, or three claims in a fixed
Time period with probabilities 0.1, 0.3, 0.4 and 0.2respectevely an individual
Claim will be of amount 1, 2, or 3 with probabilities 0.5, 0.4 and 0.1
Respectively Calculate V [N]
(a) 0.8 (b) 0.028 (c) 0.08 (d) 0.285
-3x -7x
11. Suppose that θ=2/5 and p(x)= 3/2e + 7/2e , x>0 calculate R
(a) 2.5 (b) 3.45 (c) 4.25 (d) 2.5

12. If S has a compound Poisson distribution given by λ=3,p(1)= 5/6,p(2)=1/6,


Calculate Fs(x) for x=2
(a) 0.354 (b) 0.258 (c) 0.520 (d) 0.545

PART B
Attempt all questions- each questions carries one weight age

13. Assume that N has a geometric distribution; that is ,the probability


function of N is given by
P[N=n] = pqn , n=0,1,2…..
Where 0<q<1 and p=q-1.Determine MS(t) in terms of MX(t)?
14. If S has a compound Poisson distribution, specified λ and p(x), Then the distribution of

59
Z = S- λP1Converges the standard normal distribution as λ→∞?
λP2

15. Write an expression for the distribution of the surplus level at the first time, the
surplus falls below the initial level u, given that it does fall below u, if all
claims are of size 2?

16. Derive an expression for Ψ(u) if the Xi’s have an exponential claim amount
distribution?
17. Write an expression for the distribution of L, if the size of the individual claims
has an exponential distribution with parameter β?
18. Find the mean and variance of the Inverse Gaussian distribution, by using its
mgf
19. Derive an expression for R in the special case where the Wi’s common
distribution is N(µ,σ2)?
20. Determine the adjustment coefficient if the claim amount distribution is
exponential with parameter β>0?

PART C
Attempt any four questions- each questions carries two weight age
21.Assume that u(λ) is the gamma probability distribution function with parameter α
and β,
u(λ) = βα λα-1 e –βλ
Γα ,λ>0
Where Γα = ∫∞0 yα-1 e –y dy. Show that the marginal distribution of N is negative
binomial with parameters, r = α , p= β
1+ β
22.Prove that if S1,S2,………….Sm are mutually independent random variables,such
that Si has a compound Poisson distribution with parameter λi and d.f of
claim amount Pi(x),i=1,2,……….m, then S= S1+ S2+………….+Sm has a
m m
Compound Poisson distribution with λ= ∑ λi and P(x)= ∑ λi /λPi (x)
i=m i=m
23. Assume that u(λ) is the inerse Gaussian p d f with parameters α and β . Exhibit
the moment generating function of N, E [N] and V [N]?

24.Find E[Id] if S~ compound Poisson with parameter λi and individual


distribution p(x) is exponential with parameter θ ?

25. Calculate the adjustment coefficient if all the claims are of size 1?

26. Calculate the probability of ruin in the case that the claim amount distribution is
exponential with parameter β

PART D
Attempt any two questions- each questions carries four weight age

27.Consider a portfolio of 32 policies, for each policy the probability q of a claim

60
is 1/6 and B, the benefit amount given that there is a claim ,has pdf
F(y) = 2(1-y), 0<y<1
0 , elsewhere
Let S be the total claims for the portfolio. Using a normal distribution, Estimate
P[S>4]

28.Prove that for compound distribution where the probability distribution for N
the number of claims , satisfies the condition
P[N=n] = a+b/n ,for n= 1,2,………..
P[N=n-1] and where the distribution of claim amounts is restricted to the
positive integers.
x
fS(x) = ∑ [a+bi/x]p(i) fS(x-i) ,x=1,2,………
i=1
With the starting value fS(0) = P[N=0]

29.Given that θ = 2/5, and P(x) = 3/2 e-3x +7/2 e-7x , x>0 .Calculate Ψ(u),γ,R?

B. STOCHASTIC MODELING
Module 1. Concept of mathematical modeling, definition, natural testing a

informal mathematical representations. 10hrs

Module 2 Concept of stochastic process, probability generating functions,

convolution generating function of sum of independent random variables.

61
Definition of stochastic process, classification, Markov chain, transition

probabilities, Chapman and Kolmogorov equations, transition probability matrix,

examples and computation.

30hrs

Module 3. First passage probabilities, classification of states, recurrent, transient

and ergodic states, stationary distribution, mean ergodic.

14hrs

Books for reference


1. V.K. Rohatgi: An introduction to probability Theory and Mathematical

Statistics. Wiley Eastern.

2. Ross.S.M.: An introduction to Probability theory and Stochastic Models

3. V.K.Rohatgi: Statistical Inference, Wiley Eastern

Model Question Paper

B.Sc. STATISTICS
Semester VI
STOCHASTIC MODELING

Time 3Hr PART A

Answer all questions (Weight 1 for a bunch of 4)

1. If X is non negative integer valued random variables, then its pgf is

62
(a) P( X = k ) (b) ∑ P( X = k ) (c) ∑ P( X = k )s
k k
k
(d) P ( X = k ) s k

2. The convolution of two functions is


x x x


(a) g ( x − y ) f ( y ) dy (b)
0
∫ g ( x + y) f ( y)dy (c)
0
∫ g ( x − y) f ( y − x)dy (d)
0
x

∫ g ( x) f ( y )dy
0

3. Number of telephone calls received in a switch board is a stochastic


process of
(a)Discrete time (b) Continuous time (c) Discrete state & Continuous
time (d) Discrete
time & discrete state

4. { X n , n ≥ 1 } is a Markov chain, then


(a) P ( X n | X n −1 ) = 0 (b) P ( X n | X n −1 ) ≠ 0 (c) P ( X n | X n −1 ) ≥ 0 (d)
P( X n | X n −1 ) = P( X n )
5. State i is a return state if
( n) (n)
(a) Pij > 0 for some n ≥ 1 (b) Pij > 0 for all n ≥ 1 (c)
(n)
Pij = 0 for all n ≥ 1
(d) Pij > 0

6. State j is absorbing if
( n)
(a) Pjj > 0 for some n ≥ 1 (b) f jj = 1 (c) Pjj < 1 (d) f ij = 1
7. For an irreducible markov chain, if one state is ergotic, then
(a)all states are ergotic (b) one more state is ergotic (c) no other state is
ergotic (d) none
 0 1 0 
 
8. For the following Markov chain, P= 1 / 2 0 1 / 2  with state 1, 2, 3, the
 0 1 0 
 
chain is
(a)transient (b) recurrent (c) absorbing (d) none of these
9. For the above Markov chain
(a) P 2 = P (b) P 3 = P (c) P 4 = P (d) P 2 = P 3
10. For a poisson process, {N(t)}, p n (t )
(a)independent of time (b) depends on t (c) depends on time length
(d) zero
11. Which of the following is incorrect for a poisson process
(a)Markovian (b) time homogeneous (c) independent (d)
nonstationory
12. Interarrival distribution of poisson process is
(a)gamma (b) geometric (c) exponential (d) binomial

63
Part B.
Answer all question Wt 1

13. Stochastic process is a sequence of …………………..


14. …………..is an example of discrete state stochastic process
15. Markov chain is a ………..time and …………………state stochastic process
16. A state j is recurrent if …………………………….
17. Chapman-Kolmogorov equation of a Markov chain
is………………………………
18. A recurrent non-null and aperiodic state of a Markov chain is
called…………………..
19. Poisson process has……………and …………………increments
20. For a poisson process its mean value is………

Part C.
Answer any four question Wt 2

21. Define classification of stochastic process with suitable examples


22. For the following Markov chain, compute P ( X 3 = 1, X 2 = 2, X 1 = 1, X 0 = 2)
 3 / 4 1/ 4 0 
 
 1 / 4 1 / 2 1 / 4  with initial probability P ( X 0 = i ) = 1 / 3, ∀i
 0 3 / 4 1/ 4
 
 0 0 1 0 
 
 0 0 0 1 
23. For the following Markov chain  check whether all
0 1 0 0 
 
1 / 4 1 / 8 1 / 8 1 / 2 
 
states are ergotic or not
24. Prove that in an irreducible chain, all the states are of the same type.
25. Define poisson process.
26. Show that sum of two poisson process is a poisson process.

Part D.
Answer any two questions Wt 4

∑p =∞
n
27. Prove State j is persistant if ij
n =0

64
1 / 3 2 / 3 0 0 
 
 1 0 0 0 
28. For the following Markov chain  show that State 1 is
1/ 2 0 1/ 2 0 
 
 0 0 1 / 2 1 / 2 

ergotic, state 2 is recuurent and chain is ergotic.
e − λt (λ t ) n
29. Derive, for a poisson process, p n (t ) = , n = 0,1,... , using its postulates
n!

C. RELIABILITY THEORY

Module 1. Structural properties of Coherent Systems: System of components-


series and parallel structure with example-dual structure function-coherent
structures-preservation of coherent system in terms of paths and cuts-
representations of bridge structure-times to failure-relative importance of
components-modules of coherent systems.
(20 hours)
Module 2.Reliability of Coherent systems: reliability of system of independent
components-some basic properties of system reliability-computing exact system
reliability-inclusion exclusion method-reliability importance of components.
(20 hours)
Module.3 Parametric distributions in reliability: A notion of ageing (IFR and
DFR only) with example-exponential distribution-Poisson distribution.
(14 hrs)

Books for Reference

1. R. E. Barlow and F Proschan (1975) Statistical theory of reliability and life testing,
Holt Rinhert, Winston
2. N. Ravi Chandran Reliability Theory, Wiley Estern

65
Model Question Paper

B.Sc. STATISTICS
Semester VI
ELECTIVE- RELIABILITY THEORY

Time 3Hr PART A

Answer all questions (Weight 1 for bunch of 4)

1. Structure function of series system is


(a) φ ( x ) = min( x1 ,..., xn ) (b) φ ( x) = max( x1 ,..., xn ), (c) φ ( x) = ( x1 + xn ), (d) φ ( x ) = x1...xn )
2. A k-out-of n system functions if

66
(a)all components functions, (b)only one component functions, (c)atleast k components
functions, (d)atmost k component functions
3. If φ (1i , x) = φ (0i , x), ∀(.i , x) then component i is
(a) relevant, (b) irrelevant, (c) coherent, (d)monotonic
4. If φ is the structure function, then its dual is
(a) φ D ( x) = 1 − φ ( x) , (b) φ D ( x ) = 1 − φ (1 − x) , (c) φ D ( x) = φ (1 − x) , (d) φ D ( x) = 1 + φ ( x)
5. For a coherent system, which of the following argument is correct?
(a)a component may relevant, (b) each of the component is relevant, (c) no component
is relavant, (d) atleast two component is relavant
6. Which of the following is reliability of a binary system?
(a) Eφ (x) , (b) Eφ 2 ( x) , (c) 1 − Eφ ( x) , (d) Eφ ( x ) − 1
7. Reliability of a three component series system is
(a) (1 − p )3 , (b) p 3 , (c)1-(1-p) 3 , (d) p (1 − p ) 2
8. Let h(p) is the reliability function of a coherent structure.
(a) h(p) is increasing in pi , (b) h(p) is decreasing in pi , (c) constant in pi ,
(d)independent of pi
9. Which of the following is true?
(a) 0 < I h ( j ) ≤ 1 , (b) 0 < I h ( j ) < 1 , (c) 1 < I h ( j ) ≤ ∞ , (d) 0 < I h ( j ) ≤ ∞
10. Which of the following is a failure rate function?
f (t ) f (t ) F (t ) 1 − f (t )
(a) , (b) , (c) , (d)
F (t ) 1 − F (t ) f (t ) F (t )
11. Which distribution has constant failure rate?
(a) normal, (b) poisson, (c) exponential, (d) lognormal
12. A process which has stationary independent increments is
(a) gamma process, (b) poisson process, (c) exponential process, (d)geometric
Process

PART B
Answer all questions (Weight 1)

13. In a coherent structure φ is………………….


14. A parallel system functions if …………component is functioning.
15.The structural importance of a component is……………
16. Reliability of a 2-out-of 3 system is………..
17. If p=0.5, then reliability of a 5 component series system is……….
18. If p=0.6, the reliability of a 2-out of 3 system is………
19. If λ = 1, then failure rate of exponential distribution is……..
20. A distribution having memory-less property is……..

PART C
Answer any four questions( weight 2)

21. Define coherent structure function?


n n
22.Let φ (x ) be the coherent structure of n components, show that ∏ xi ≤ φ ( x) ≤ Χ xi
i =1 i =1
23. Define relative importance of components?
24. Let h(p) be the reliability function of a coherent structure, show that h(p) is increasing
In each pi .
25. Explain inclusion exclusion method.
26. Define IFR and DFR results for exponential distribution?

67
PART D
Answer any two questions (weight 4)

27. Explain the role of Poisson distribution in reliability?


28. What is reliability importance of a component? How can we compute reliability
Importance in a system?
29. Establish the lack of memory property of exponential distribution? Check whether
failure rate function is increasing or decreasing or constant?

OPEN COURSES

A. ECONOMIC STATISTICS

Module 1. Time Series analysis: Economic time series, different components,


illustrations, additive and multiplicative models, determination of trend,
growth curves, analysis of seasonal fluctuations, construction of seasonal
indices.
24 hours

Module 2. Index Numbers: Meaning and definition – uses and types- problems
in theconstruction of index numbers- simple aggregate and weighted
aggregate index numbers. Test of consistency of index numbers- factor
reversal- time reversal test and unit test. Chain base index numbers- Base
shifting- splicing- and deflating of index numbers. Consumer price index
numbers- family budget enquiry- limitations of index numbers.

68
30 hours
Books for reference
1. SC Gupta and V.K. Kapoor: Fundamentals of Applied Statistics,
Sultan Chand & Sons
2. Goon A.M., Gupta M.K. and Das Gupta: Fundamentals of Statistics
Vol.II The World Press, Culcutta.

Model Question Paper

B.Sc. STATISTICS

Semester V
OPEN COURSE (ECONOMIC STATISTICS)
Time 3Hr

Part A
Answer all questions (Weight 1 for bunch of 4)

1. A component of a time – series in the following case is


a. The natural forces affecting the variable value
b. Systematic forces affecting the variable value
c. Manmade forces affecting the variable value
d. Any sort of force affecting the variable value
2. The rise in human population is an example of
a) Trend b) Seasonal Variation c) Cyclic variation d) Random variation
3. ‘Business cycle’ is an example of
a) Trend b) Seasonal Variation c) Cyclic variation d) Random variation
4. In method of Semi- Averages, Trend in assumed to be
a) Linear b) quadratic c) Exponential Growth d) None of these
5. Which of the method can be used for getting trend values for each given time point
a) Method of simple averages b) Method of moving averages
c) Method of least square curve filling d) All the above
6. Non-centered moving averages are due to
a) Odd period b) Even period
c) Odd no:of time point d) even no : of time points

69
Seasonal variations are periodic due to

7.
a) Man made customs, habits, rituals etc
b) Resulting due to Natural reasons
c) Resulting due to change in weather condition
d) Any force that operate regularly year after year
8. Seasonal variation is measured using
a) Seasonal Averages b) Seasonal Indices
c) Seasonal Relatives d) None of these
9. A monthly seasonal variation measures are adjusted to
a) 12 b) 120 c) 1200 d) None of these
10. A model of time- series explains the ……………….relation between value of
variable and time series components
a) Additive b) Multiplicative c) Mathematical d) None of these

11. If an Index Number I o1 = 112, then it means


a) 12 % growth from base to current year
b) 112 % growth from base to current year
c) 88 % depreciation from base to current year
d) 12 % depreciation from base to current year
12. Which of the following is called ideal Index Number
a) Laspere’s b) Paschee’s c)Fischer’s d) Kelly’s
Part B
Answer all questions wt 1
13. Distinguish between seasonal and acyclic variation in a time – series
14. Define period of Moving average.
15. Give any three examples of irregular variation effecting a Time- series data.
16. How seasonal variation in measured.
17. Give the formula for converting chain base into fixed base and fixed base into chain
base Index Numbers.
18. Why base shifting is necessary for Index Numbers.
19. Why Index Numbers are called Economic Barometers.
20. Give three major limitation of Index Numbers.

70
Part- C (answer any 4 questions) weight 2
21. How trend in measured using Moving Averages.
22. Explain periodic variations in Time- Series with suitable examples.
23. Explain the Link Relative Method of measuring seasonal variation.
24. Explain the uses of Index Numbers.
25. With the help of an Index Number formula, explain Time and Factor Reversal Tests.
26. Explain the concept behind developing cost of Living Index Numbher.
Part- D (Answer any 2 Questions) weight 4
27 Given the following data related to yield of a crop in three different seasons.
Yield (Kg/10 cent plot)
Year Season 1 Season 2 Season 3
1990 12 19 17
1991 14 25 23
1992 13 27 20
1993 15 28 22
1994 17 31 24
i) If this trend is followed, what will be the expected yield in 1995?
ii) Does season influence yield of crop?
28. Briefly explain the problems in the construction of an Index Number.
29. Calculate the cost of Living Index Number for the data given below.
Rice
Year Season 1 Season 2 Season 3
Food 30 47 4
Fuel 8 12 1
Clothing 14 18 3
House Rent 22 15 2
Miscellaneous 25 30 1

71
B. QUALITY CONTROL

Module 1. General theory of control charts, causes of variations in quality,


control limits, sub grouping, summary of out- of control criteria, charts of
attributes, np chart, p chart, c chart. Charts of variables:X bar chart, R chart
and sigma chart. Revised control charts. Applications and advantages.
30 hours
Module 2. Principles of acceptance sampling – Problems of lot acceptance,
stipulation of good and bad lots- producers’ and consumers’ risks, simple
and double sampling plans, their OC functions, concepts AQL, LTPD,
AOQL, Average amount of inspection and ASN function
24 hrs

Books for reference

1. Grant E.L. : Statistical Quality Control, McGraw Hill

2. Duncan A.J.: Quality Control and Industrial Statistics, Taraporewala and

Sons

3. Montgomery D.C: Introduction to Statistical Quality Control, John Wiley &

Sons

Model Question Paper

B.Sc. STATISTICS

72
SEMESTER V -OPEN COURSE (QUALITY CONTROL)
Time 3Hr

Part A
Answer all questions (Weight 1 for bunch of 4)

1.The spread of a process is given by

a) 3σ b) 6σ c)2σ d) 1.96σ
2. Upper control limit for R Chart is
a) A2R‾ b) A1R‾ c) D3R‾ d) D4R‾
3. Consumers risk is usually denoted by
a) µ b)∂ c) β d) α
4. The acceptance sampling plan is used for
a) Identifying good lots b) protecting the consumers interest c) protecting the producers
interest
d) All of the above
5. The Consumers risk usually fixed at
a) .05 b).01 c).95 d) .99
6. The OC curve gives
a) proportion of bad lots b) proportion of good lots c) discriminating power of the
sampling plan
d) none of them.
7. Number of breakdowns in an electric wire is studied using
a) R chart b) Sigma chart c) d chart d) c chart
8. The manageable cause of a process out of control is
a) assignable b) random c) unknown d) none
9. The quality of the lot after rectifying inspection will
a) not change b) change c) improve d) worsen.
10.Which of the following is an assignable cause.
a) Humidity b) Temperature d) Location c) Wear & tear.
11. To study the variation of a process where of costly items we use
a) R chart b) sigma chart c) p chart d) d chart.
12. The exact distribution used in acceptance sampling is
a) Binomial b) poisson c) geometric d) hyper geometric.

Part B (answer all questions, weight 1)


13. A sampling plan in which we take a decision based on one sample only is called
--------------

14. Expand the term LTPD


15. Chart used for number of defects is based on ---------- distribution
16. The control limits used before the availability of sufficient data is called-----------
17. The tabled values corresponding to subgroup sizes is given in -------- table.

73
18. Expand the term AOQ
19. Give an example where there is only upper specification limits.
20. Give an example where there is only lower specification limits.

Part C ( answer 4 questions, Weight 2)


21. Define AOQ and LTPD.
22. What is double sampling plan?
23. What are probability limits?
24. What are rational subgroups?
25. What happens when the control limits are within the spread of the process?
26. What is AOQL.

Part D ( Answer any 2 questions, weight 4)


27. Distinguish between double and single sampling plans.
28. Draw the OC curve of the single sampling plan showing the consumers and
producers risks.
29. Describe the basis of a control chart.

C. BASIC STATISTICS
74
Module 1. Elements of sample surveys: Census and sampling, advantages, principal
steps in a sample survey, sampling and non sampling errors. Probability sampling,
judgement sampling and simple random sampling
15 hours

Module 2. Measures of central tendency: Mean, median, mode and their empirical
relationship. Weighted arithmetic mean- Dispersion: absolute and relative measures,
standard deviation and coefficient of variation

15 hours

Module 3. Fundamental characteristics of bivariate data: univariate and bivariate data,


scatter diagram, curve fitting, principle of least squares, fitting of straight line. Simple
correlation, Pearson’s correlation coefficient, limits of correlation coefficient,
Invariance of correlation coefficient under linear transformation.

19 hours

Module 4. Basic Probability: Random experiment, sample space, event, algebra of


events. Statistical regularity, frequency definition, classical definition and axiomatic
definition of probability- Addition theorem, conditional probability, multiplication
theorem and independence of events (limited to three events).

20hrs

Books for Reference:

1. S.C.Gupta: Fundamentals of Mathematical Statistics


2. D.C.Sancheti and V.K.Kapoor: Statistics (Theory, Methods and Application)

Model Question Paper

SEMESTER V- BASIC STATTISTICS (OPEN COURSE)


Time: 3 Hrs

75
Section A
Answer all questions (Contains 12 questions, 4 Questions carry a weightage of 1)

1. When there are zeroes in the data we can not use


a) Median
b) Mode
c) Geometric mean
d) Arithmetic mean
2. The most suitable measure for an ordinal data is:
a) Median
b) Arithmetic mean
c) Combined mean
d) Mode

3. Mean of 20 values is 45. If one of these values is to be taken 64 instead of 46, the
correct value of mean is:
a) 49.5
b) 45.9
c) 40.9
d) 42.9
4. The formula to find coefficient of variation is:
__
σ X
a) × 100 b) × 100
__
σ
X
Median
c) ×100 d) σ × 100
σ
5. Mean deviation from median is:
a) Equal to mean deviation from mean
b) Greater than mean deviation from mean
c) Less than mean deviation from mean
d) No relation
a) Leptokurtic curve
b) Mesokurtic curve

6. The value of the square of Karl Pearson’s coefficient of correlation lies between:

a) 0 and 1 b) -1 and 1

c) 0 and infinity d) No limits

76
7. Karl Pearson’s coefficient of correlation for the following set of observation (3,12),(5,6)

is: a) Zero b) -1 c) +1 d) infinity

8. If the regression coefficient of Y on X is negative, the regression coefficient of X on Y


will be:

a) Negative b) Positive
c) Zero d) No relation

9. Mutually exclusive events other than null event and sure event are:

a) not independent
b) independent
c) no relation
d) independent under some conditions

10. The probability that India wins a cricket match against England is 1/3. If India and
England play 3 matches, what is the probability that India will lose all the three
matches?

a) 1/27 b) 1/3 c) 1/9 d) 8/27

11. What is the probability that a non leap year selected at random will have 53 Sundays?

a) 2/7 b) 0 c) 3/7 d) 1/7

Q12. For a discrete r.v P(X >0) = P(X <0) and P(X =0) = p. The variable takes the
following values -2, -1, 0, 1, 2. What is the probability that X >0?

a) Zero b) one c) 1-p/2 d) 1-p

PART B (Answer all questions) weight 1

13. In the case of infinite population, sampling is better than census

a) Say true or false


b) Explain your answer

14. Sampling error occurs in census.

a) Say true or false

Explain your answer

77
15. Classical definition of probability can be used in the case of a sample space with
infinite outcomes.

a) Say true or false


b) Explain your answer

16. In the case of disjoint events A and B, P(A Υ B)< P(A) +P(B).
a) Say true or false
b) Explain your answer

17. Getting a queen and getting a Jack while drawing cards from a deck of cards are
independent events.

a) Say true or false


b) Explain your answer

18. The correlation coefficient between X and Y is 0.85. Find the coefficient of
determination. 1

19. Zero correlation implies independence

a) Say true or false


b) Explain your answer

20. If P ( A ∪ B ) = 0.8, P( A) = P( B) = 0.5 , find P( A ∩ B ) .

PART C (Answer any 4 Questions) weight 2

21. Explain why A.M. is considered as the best measure of central tendency? 2)
22. Calculate quartile deviation for the following data:-
26, 54, 33, 41, 94, 41, 54, 26, 93, 87, 81, 64, 68, 95.
23. The first two-sub-groups have 10 items with mean 15 and S. D. 3. If the whole group
has 250 items with mean 15. 6 and S.D. 13.44 , find the standard deviation of the
second subgroup.
24. If A and B are two independent events such that
P ( A c ) = 0.7, P ( B c ) = k , P ( A ∪ B ) = 0.8 , then find the value of k.

25. A and B stand in a ring with 12 other persons. Find the probability that A & B are
together.

26. Explain why in the case of two variables there are always two regression lines? When
do they coincide?

78
PART D ( Answer any 2 questions) Weight 4

27. State and prove addition theorem for two events? Explain what happens when A is

subset of B?

28. P (A) = 1/3, P(B) = 1/4, P(A∩B) = 1/11. Find the following probabilities.

a) Exactly one of the events A, B happens.


b) At least one of the events A, B happens.
c) None happens.
29. Explain the concept of rank correlation. When is it used?

79
STATISTICS: COMPLEMENTARY – I Syllabus for BSc.

CUCCSSUG 2009 (2009 admission onwards)

Sem Course Code Course Title Instructional Credit Exam Ratio


ester hours/week hours Ext:Int
No
1 ST1C01 4 3 3 3:1
PROBABILITY
THEORY
2 ST2C02 PROBABILITY 4 3 3 3:1
DISTRIBUTIONS
3 ST3C03 STATISTICAL 5 3 3 3:1
INFERENCE

4 ST4C04 5 3 3 3:1
APPLIED STATISTIC

Pattern of Question papers.

There shall be 4 parts A, B, C and D in all the question papers*. Part A consists of 12
objective type questions. Part B consists of 8 questions to be answered in a word, phrase
or sentence. Part C consists of 6 questions of short essay type of which the student can
attempt 4. Part D consists of 3 questions of long essay type of which the student can
attempt 2. In part A the weightage per question is ¼.for part B weightage is 1/question
.For part D the weightage is 2/question and for part D the weightage is 4/question.
As far as possible the number of questions should be proportional to the modules.

*For course 4 applied Statistics the distribution of questions should be as


follows

PATTERN OF QUESTIONS FOR COURSE IV

1. 12 objective type questions 6 theory + 6 problems weight 1/4

1
2. 8 short answer questions 4 theory + 4 problems weight 1

3. 6 short essay type question 3 problems + 3 theory weight 2

(Answer any 4 of this type and calculators are permitted)

Table showing the components and weightage for internal assessment

Components Weight
Assignment 1

Test paper 2

Seminar 1

Attendance 1

There shall be two test papers and the average grade point is to be considered for

internal assessment

Semester I

2
COURSE I : PROBABILITY THEORY

Module 1. Probability concepts: Random experiment, sample space, event,

classical definition, axiomatic definition and relative frequency definition

of probability, concept of probability measure. Addition and multiplication

theorem (limited to three events). Conditional probability and Bayes’

Theorem-numerical problems. 25 hours

Module 2. Random variables: Definition- probability distribution of a

random variable, probability mass function (pmf), probability density

function (pdf) and (cumulative) distribution function (df) and their

properties.

15 hours

Module 3. Mathematical Expectations: Expectation of a random variable,

moments, relation between raw and central moments, moment generating

function (mgf) and its properties. Measures of skewness and kurtosis in

terms of moments. Definition of characteristic function and its simple

properties.

20 hours

Module 4. Change of variables: Discrete and continuous cases (univariate

only), simple problems. 12 hour

3
Book for reference

1. V.K. Rohatgi: an Introduction to Probability theory and Mathematic

Statististics, Wiley Eastern.

2. S.C. Gupta and V.K.Kapoor: Fundamentals of Mathematical Statistics,

Sultan Chand and sons

3. Mood A.M., Graybill. F.A and Boes D.C: Introduction to Theory of

Statistics McGraw Hill.

4. John E Freund : Mathematical Statistics (Sixth Edition), Pearson

Education (India),New Delhi.

4
Model Question Paper

Semester I
COMPLEMENTARY COURSE I
PROBABILITY THEORY
Time: 3 Hrs

Part-A
Answer all the questions weight 1 for bunch of 4

1. Cans of soft drinks cost $0.30 in a certain vending machine. What is the
expected value and variance of daily revenue (Y) from the machine, if X, the
number of cans sold per day has E(X) = 125, and Var(X) = 50 ?

(a) E(Y ) = 37.5 , V ar(Y ) = 50


(b) E(Y ) = 37.5 , V ar(Y ) = 4.5
(c) E(Y ) = 37.5 , V ar(Y ) = 15
(d) E(Y ) = 37.5 , V ar(Y ) = 15
(e) E(Y ) = 125 , V ar(Y ) = 4.5

Solution: b

2. A restaurant manager is considering a new location for her restaurant. The


projected annual cash flow for the new location is:

Annual
Cash Flow $10,000 $30,000 $70,000 $90,000
$100,000
Probability 0.10 0.15 0.50 0.15 ?
The expected cash flow for the new location is:

(a) $12,800
(b) $64,000
(c) $70,000
(d) $60,000
(e) $50,000

Solution: b

5
3 The probability that the Red River will flood in any given year has been estimated
from200 years of historical data to be one in four .This means
(a) The Red River will flood every four year.
(b) In the next 100 years, the Red River will flood exactly 25 times.
(c) In the last 100 years, the Red River flooded exactly 25 times.
(d) In the next 100 years, the Red River will flood about 25 times.
(e) in the next 100 years, it is very likely that the Red River will flood exactly 25
times.
Solution: d

4 The chances that you will ticketed for illegal parking on campus are about 1/3.
During the last nine days, you have illegally parked everyday and have NOT been
ticketed you lucky person)! Today, on the 10th day, you again decided to park
illegally. The chances that you will be caught are:
(a) greater than 1/3 because you were not caught in the last nine days.
(b) less than 1/3 because you were not caught in the last nine days.
(c) still equal to 1/3 because the last nine days do not affect the probability.
(d) equal to 1/10 because you were not caught in the last nine days.
(e) equal to 9/10 because you were not caught in the last nine days.

Solution: c

5. The chance that a person will contract AIDS after asexual contact with an infected
partner has been estimated to be 1/4. This means:
(a) A person will be infected after exactly 4 sexual contacts with infected partners.
(b) Of 1000 people having sexual contacts with infected partners, exactly 250 will
become infected.
(c) Of 200 people having sexual contacts with infected partners, about 50 will
become infected.
(d) In exactly 25% of all sexual contacts with infected partners, the infection will
spread.
(e) Of 20 people having sexual contact with infected partners it is very likely that
exactly 5 people will become infected.
Solution: c

6 A random variable X has a probability distribution as follows:


Y -1 0 1 2
P(y) 3C 2C 0.4 0.1

The value of constant C is


a) 0.1
b) 0.15
c) 0.2
d) 0.25
e) 0.75
Solution a

6
7. A random variable X has probability distribution as follows
R 0 1 2 3
P[R=r] 2k 3k 13k 2k
The probability that P[X < 0.2] is equal to
a) 0.9
b) 0.25
c) 0.65
d) 0.15
e) 0.75
Solution b
8 If A, B, C are any three events probability of at least one is represented by
a) P[ A Υ B Υ C ]
b) P[ AB Υ AC Υ BC ]
c) P[ A Ι B Ι C ]
d) 1 − P[ A Υ B Υ C ]
e) P[ A Υ B Υ C ]
9 A continuous random variable X has p.d.f. f ( x) = 3 x 2 ,0 ≤ x ≤ 1 . If
P[ X ≤ a ] = P[ X > a ] , then a is
1
a)
3
−1 / 3
b) 2
3
1
c)  
2
1
d)
3
3
1
e)
2
Solution b
10 If F(x) is the distribution function of X, and if Y = F(x), then E(Y) is
1
a)
2
b) 1
c) y
d) 2
e) none of the above

7
11 For a continuous random variable with p.d.f. f(x) and distribution function F(x),
which may not be true
a) 0 ≤ f ( x ) ≤ 1

b) ∫ f ( x)dx = 1
−∞
c) 0 ≤ F ( x) ≤ 1
d) P[ X = 0] = 0
e) F (∞ ) = 1
Solution a
12 If the rth moment of a random variable X is µ r′ = r! , the Moment generating
function is
a) (1 − t )
t
b
1− t
c) (1 − t ) −1
d) ln(1-t)
e) None of these
Part-B
Answer all the questions ,weight 1

13 Define classical definition of probability


14 State the addition theorem of probability for 3 events.
15 Two coins are tossed one after the other until head appears. Write the sample
space
16 Let A and B be the possible outcomes of a random experiment and suppose
P(A) = 0.4, P ( A Υ B ) = 0.7 and P(B) = p. For what choice of p, are A and B
independent.
x
 , x = 1,2,3,4,5
17 If f ( x) = 15 . Find P ( 12 < X < 5
2 X > 1)
0 else where
0 if x < − a
1 x
18 Is the following is a distribution function F ( x)  2 ( a + 1), − a ≤ x ≤ a .
1 If x > a

19 If φ X (t ) is the characteristic function of X . show that φ X (−t ) and φ X (t ) are
conjugate functions.
20 Define probability density function of a discrete random variable.

Part-C
Answer any four questions ,weight 2
21 State and prove addition and multiplication theorem of probability for two events.
22 From a vessel containing 3 white and 5 black balls, four balls are transferred in to
an empty vessel. From this vessel a ball is drawn and is found to be white. What
is the probability that out of four balls transferred, 3 are white and 1 is black.

8
kx ,0 ≤ x < 1
k ,1 ≤ x < 2

23 Let X be a continuous random variable with p.d.f. f ( x) = 
− kx + 3k ,2 ≤ x < 3
0 , else where
(1) Find the constant k, (2) Determine the distribution function.
24 Define row and central moments. Establish the relation between row and central
moments of a random variable.
25 Find the measures of skewness and kurtosis based on moments for the following
1 2 −x
p.d.f. f ( x) = x e , 0 < x < ∞.
2
26 State and prove bayes theorem.

Part-D
Answer any two questions, weight 4
27 The kms X in thousands of kms which car owners get with a certain kind of tyre is
 1 − 20x
 ,x > 0 .
a random variable having probability density function f ( x) =  20 e
0 ,x ≤ 0
Find the probabilities that one of these tyres will last (1) at least 10000kms.(2)
anywhere from 16000 to 24000kms and (3) at least 30000kms. (4) Find the
expected distance in kms the car owners get with the tyre.
28 Explain axiomatic definition of probability
29 Explain the terms. (1) Random experiment, (2) Sample space, (3) Mutually
exclusive events, (4) Equally likely events. With example.

9
Semester II

COURSE II: PROBABILITY DISTRIBUTIONS

Module 1. Bivariate random variable: Definition (discrete and continuous

type), Joint probability mass function and probability density function,

marginal and conditional distributions, independence of random variables.

15hours

Module 2. Bivariate moments: Definition of raw and central product

moments, conditional mean and conditional variance, covariance,

correlation and regression coefficients. Mean and variance of a random

variable in terms of conditional mean and conditional variance

15 hours

Module 3. Standard Distributions: Discrete type-Bernoulli, Binomial,

Poisson distributions (definition, properties and applications)- Geometric

and Discrete Uniform ( definition, mean , variance and mgf only).

Continuous type – Normal (definition, properties and applications)-

Rectangular, Exponential, Gamma, Beta, ( definition mean, variance and

mgf only). Lognormal, Pareto and Cauchy Distributions(definition only)

30 hours

10
Module 4. Law of large Numbers: Chebychev’s inequality, convergence

in probability, Weak Law of Large Numbers for iid random variables,

Bernoulli Law of Large Numbers, Central Limit Theorem for independent

and identically distributed random variables (Lindberg-Levy form).

12 hours

Book for reference

1. V.K. Rohatgi: An Introduction to Probability theory and Mathematical

Statististics, Wiley Eastern.

2. S.C.Gupta and V.K.K.Kapoor: fundamentals of Mathematical Statistics,

Sultan Chand and sons.

3. Mood A.M., Graybill. F.A and Boes D.C.: Introduction to Theory of

Statistics McGraw Hill

4. John E Freund : Mathematical Statistics (Sixth Edition), Pearson

Education (India),New Delh

11
Model Question Paper

Semester II
COMPLEMENTARY COURSE I
PROBABILITY DITRBUTIONS
Time 3hrs

COMPLEMENTARY COURSE- I (Semester-2)

MODEL QUESTION PAPER


Part A

(Answer all the questions. Choose the correct answer from the alternatives
given below each question). Weight 1 for a bunch of 4 questions

1. For two random variables x and y, the relation E (xy)= E(x) E(y) holds good.
a) if x and y are identical
b) for all x and y
c) if x and y are statistically independent
d) None of the above.
2. If V(x) = 1, then V(2x ± 3) is
a) 5 b) 13 c) 14 d) 1
3. E(x-k)2 is minimum when
a) k<E(x) b) k= E(x) c) k>E(x) d) K2= E(x)
4. If x is a random variable having probability function f (x), then the function
itx
Σ e f(x), for i to be an imaginary unit, is known as
a) moment generating function
b) probability generating function
c) probability distribution function
d) characteristic function
5. The skewness of a binomial distribution will be zero if
a) p < ½
b) p = ½

12
c) p > ½
d) p < q
6. The coefficient of variation of poison distribution with mean 4 is
a) ¼ b) 2/4 c) 4 d) 2

7. X is normally distributed with zero mean and unit variance. The variance of
x2 is
a) 0 b) 1 c) 2 d) 4
8. In a normal curve area to the right of the point x1 is 0.6 and to the left of the
point x2 is 0.7. Which is the correct statement.
a) n1> n2 b) n1< n2 c) n1= n2 d) none of them
9. For a normal distribution, Q.D, M.D and S.D. are in the ratio.
4 2 4 4 2 1 4
a) : 2/3:1, b) : :1 c) 1: : d) : 1:
5 3 5 5 3 2 5
10. If x is a continuous r.v with means µ and variance σ 2 then for any positive
1
number k P[│x- µ │ > K σ ] ≥ is known as
k2
a. Liapunov’s inequality b) Tchebycheff’s inequality
c. Bienayme- Tchebycheff’s inequality d) Khinchin’s inequality
11. If x and y are two random variables such that their expectations exist and
P(x ≤y) =1 then
a) E(x) ≤E (y) b) E (x) >E (y)
c. E (x) = E (y) d) None of the above
1 2
12. If x is a standard normal variate then x is
2
1
a) Gramma variate with parameters
2
b) Normal variable
1
c. Passion variable with parameter
2
d) Exponential variable with parameter 2

13
Part B
(Answer all the questions) Weight 1

13. Expected value of a random variable x exists if ……………


14. If x is a random variable E (x-constant)2 is minimum when the constant is

15. Name the discrete distribution for which mean and variance have the same
value.
16. What is the third moment about the mean of a poison distribution if the
second moment about the origin is 12.
17. Identify the distribution (using the uniqueness property) if the name of
generating function of the distribution
is Mx(t)= (1+et ) 5/32
18. The relationship between Beta distributors of the first and second kind is----
19. What is the characteristic function of a standard cauchy distribution.
20. What are the points of inflexion of a normal curve N(µ,σ).

Part C
(Answer any 4 questions) Weight 2

21. If x and y are two independent random variables, show that


v (ax +by) = a2 v (x) +b2 v (y).
22. x and y are independent random variables with means 10 and 20, and variances
2 and 3 respectively find the mean and variances of 3x+4y.
23. A symmetric die is thrown 600 times. Find the lower bound for the probability
of getting 80 to 120 sores.
24. For a binominal distribution, the mean is 6 and S. D is 2. Write out all the
parameters of the distribution.
25. Show that for the normal distribution the points of inflexion lie at a distance
of ± σ from the mean where σ is the S. D.
26. If x→ N (30,5) find the probability of │x-30│>5

14
Part D
(Answer any 2 questions) Weight 4

27. Write a note on the salient features of a normal distribution .


28. Show that under certain conditions (to be stated) a Binominal distribution
tends to the poisson distribution
29. Fit a poisson distribution to the following data .
Number of mistakes per page : 0 1 2 3 4 Total
109 65 22 3 1 200

15
SEMESTER III

COURSE III : STATISTICAL INFERENCE

Module 1. Sampling Distributions: Random sample from a population

distribution, sampling distribution of a statistic, standard error, sampling

from a normal population, sampling distributions of the sample mean and

variance. Chi-square, student’s t and F distributions – derivations, simple

properties and inter relationships. 25 hours

Module 2. Theory of Estimation: Point estimation, desirable properties of

a good estimator, unbiased consistency, sufficiency, statement of Fisher

Neyman factorization criterion, efficiency. Methods of estimation,

method of moments, method of maximum likelihood-Properties estimators

obtained by these methods 25 hours

Module 3. Interval Estimation: Interval estimates of mean, difference of

means, variance, proportions and difference of proportions Large and

small sample cases

10 hours

Module 4. Testing of Hypotheses: Concept of testing hypotheses, simple

and composite hypotheses, null and alternative hypotheses, type I and type

II errors, critical region, level of significance and power of a test.

Neymann-Pearson approach-Large sample tests concerning mean, equality

of means, proportions, equality of proportions. Small sample tests based

16
on t distribution for mean, equality of means and paired mean for paired

data. Tests based on F distribution for ratio of variances. Test based on

chi square-distribution for variance, goodness of fit and for independence

of attributes. 30 hours

Books for reference

1. V.K. Rohatgi: An Introduction to Probability theory and Mathematical

Statististics, Wiley Eastern.

2. S.C. Gupta and V.K.Kapoor: Fundamentals of Mathematics Statistics,

Sultan Chand and Sons.

3. Mood A.M., Graybill. F.A. and Boes D.C.: Introduction to Theory of

Statistics McGraw Hill.

4. John E Freund: Mathematical Statistics (Sixth Edition), Pearson Education

(India),New Delhi.

17
Model Question Paper

Semester III
Time 3hrs
COMPLEMENTARY COURSE- I
STATISTICAL INFERENCE

Part A
Answer all questions ,4 questions carry weight 1
1. The mean of a Chi – square distribution with n degrees of freedom is

( a ) 2n ( b ) n 2 ( c ) n (d ) n
2. The relation between student’s-t and F distribution is.
( a ) t( n ) 2 = F( n,1) ( b ) t( n) 2 = F(1,n ) ( c ) t(1) 2 = F(1,n) ( d ) t( n ) 2 = F(1,1)
3. Let X 1 , X 2 ,..., X n be a random sample from a normal population N ( µ , σ 2 ) ,then the

∑ ( x − x)
2
i
distribution of is.
σ2
( a ) χ 2( n ) ( b ) t( n) ( c ) χ 2( n −1) ( d ) t( n−1)

4. Let X 1 , X 2 ,..., X n be a random sample from an infinite population where

1
( )
2
s2 =
n
∑ xi − x ,the unbiased estimator for the population variance σ 2 is

1 2 1 2 n 2 n −1 2
(a) s (b ) s (c ) s (d ) s
n −1 n n −1 n
5. If T is a consistent estimator of θ then
( a ) T is a consistent estimator of θ 2 ( b ) T 2 is a consistent estimator of θ
( c ) T 2 is a consistent estimator of θ 2 ( d ) None of the above
6. Let X 1 , X 2 ,..., X n be a random sample from a Bernoulli population. A sufficient
statistics for p is

( a ) ∑ X i ( b ) ∏ X i ( c ) Max( X1 , X 2 ,..., X n ) ( d ) Min( X 1 , X 2 ,..., X n )

7. Let X 1 , X 2 ,..., X n be a random sample from U ( 0,θ ) , the m.l.e of θ is

18
( a ) ∑ X i ( b ) ∏ X i ( c ) Max( X1 , X 2 ,..., X n ) ( d ) Min( X 1 , X 2 ,..., X n )
8. The 95% confidence interval for mean µ of a normal population N ( µ , σ 2 ) with

known σ 2

( a ) x ± 2.33 σ ( b ) x ± 1.96 σ ( c ) x ± 2.58 σ ( d ) x ± 1.65 σ


n n n n
9. The mean difference between 9 paired observations is 15 and standard deviation of
differences is 5. Then the value of the t statistic used in paired t test is

( a ) 27 ( b ) 9 ( c ) 3 ( d ) 0
10. A sample of 12 specimen taken from a normal population is expected to have a
mean 50mg/cc. The sample has a mean 64 mg/cc with a variance of 25 .to test
H 0 : µ = µ0 aganistH1 : µ ≠ µ0 , you will choose

( a ) Z − test ( b ) t − test ( c ) χ 2 − test ( d ) F − test

11. A random sample of size 20 from a nor mal population gives a mean 42 and a
variance 25.Then the value of the χ 2 statistic used for testing the significance of
population variance is

( a ) 7.81 ( b )15.62 ( c ) 51.20 ( d )14.36


12. If X>1is the critical region for testing H 0 : θ = 2 aganistH1 : θ = 1 on the basis of the

single observation from the population f ( x, θ ) = θ eθ x , x > 0 ,then the value of type I
error is

( a ) e ( b ) e2 ( c ) e−2 ( d ) e−1

19
Part B
Answer all questions ,each questions carries weightage 1
13.Let X 1 , X 2 be a random sample of size 2 from N ( 0,1) .Then the distribution of

( X 1 + X 2 ) is-------------
2

( X1 − X 2 )
2

14. Tn a consistent estimator for the parameter θ if------------


15.Let X 1 , X 2 , X 3 be a random sample of size 3 from N ( µ , σ 2 ) .he efficiency of

X1 + 2 X 2 + X 3 X + X2 + X3
relative to 1 is------------
4 3
1 X −θ
16Let X 1 , X 2 ,..., X n be a random sample from the population with pdf f ( x, θ ) = e ,
2
The m.l.e of θ is---------
17.The diameter of a cylindrical rod is assumed to be normally distributed with a
variance of 0.04cm. A sample of 25 rods has a mean diameter of 4.5 cms.95% confidence
interval for population mean is -----------
18.The power of a test is ----------
19.Degrees of freedom for chi-square in case of contingency table of order 4x3 is ---
20.In tossing of a coin ,let the probability of a head turning up be p .the hypotheses are
H 0 : p = 0.4 aganistH1 : p = 0.6 . H0 is rejected if there are five or more heads in six
tosses. Then probability of type I error is----------

20
PartC
Answer any 4 questions ,each questions carries a weightage of 2
21.Obtain the distribution of the sample mean of a random sample X 1 , X 2 ,..., X n of size n

from N ( µ , σ 2 ) .

22.Define unbiased estimator. Let X 1 , X 2 ,..., X n be a random sample of size n from

B (1, p ) .Let T = ∑ X i .

T (T − 1)
Show that is an unbiased estimator of p2.
n( n − 1)
23.Define sufficient statistic. Let X 1 , X 2 ,..., X n be a random sample of size n from

U ( 0,θ ) .Find a sufficient statistic for θ

24.An oil company claims that less than 20% of all car owners have not tried its gasoline
.Test this claim at the 0.01 level of significance if a random check reveals that 22 out of
200 car owners have not tried oil company’s gasoline.
25.In the comparison of two kinds of paint ,a consumer testing service finds that four 1-
gallon cans of one brand cover on the average 546 square feet with a standard deviation
of 31 square feet ,whereas four 1-gallon cans of another brand cover on the average 492
square feet with a standard deviation of 26 square feet. Assuming that the two
populations sampled are normal and have equal variance. Test the hypothesis that on the
average the first kind of paint covers a greater area than the second.
26. Mention the advantages of non-parametric tests over parametric test.

21
Part D
Answer any 2 questions ,each questions carries 4 credit
27 Let X 1 , X 2 ,..., X n be a random sample of size n from N ( µ , σ 2 ) . Find the mle’s

of µ and σ 2 and examine whether they are unbiased and consistent.


28 Explain Interval estimation.Obtain 100(1 − α )% confidence intervals for the

parameter σ 2 of the normal population N ( µ , σ 2 ) .

29 Use the data shown in the following table to test at the 0.01% level of significance
whether a person’s ability in mathematics is independent of his or her interest in
statistics.
Ability in Mathematics
Low Average High
Interest
Low 63 42 15
in
Statistics Average 58 61 31
High 14 47 29

22
SEMESTER IV

COURSE IV: APPLIED STATISTICS

Module 1. Univariate data: Skewness and kurtosis- Pearson’s and

Bowley’s coefficient of skewness- moment measures of skewness and

kurtosis

5 hours

Module 2. Analysis of bi-variate data: Curve fitting – fitting of straight

lines, parabola, power curve and exponential curve. Correlation-Pearson’s

correlation coefficient and rank correlation coefficient – partial and

multiple correlation- formula for calculation in 3 variable cases -Testing

the significance of observed simple correlation coefficient. Regression –

simple linear regression, the two regression lines, regression coefficients

and their properties.

30 hours

Module 3. Time series: Components of time series- measurement of trend

by fitting polynomials- computing moving averages- seasonal indices-

simple average- ratio to moving average

15 hours

Module 4. Statistical Quality control: Concept of statistical quality

control, assignable and chance causes, process control. Construction of

23
control charts, 3 sigma limits. Control chart for variables – X-bar chart and

R chart. Control chart for attributes-p chart, d chart and c chart

25 hours

Module 5. Analysis of variance: One way and two way classifications.

Null hypotheses, total, between and within sum of squares. Assumptions-

ANOVATable 15 hours

.Books for reference

1. Goon A.M., Gupta M.K and Das Gupta: Fundamentals of Statistics Vol.1

The World Press, Culcutta.

2. S.C Gupta and V.K. Kapoor: Fundamentals of Applied Statistics, Sultan

Chand and Sons

3. S.P Gupta: Statistical Methods

4. E.L.Grant: Statistical Quality Control

5. PATTERN OF QUESTIONS FOR COURSE IV

1. 12 objective type questions 6 theory + 6 problems weight 1/4

2. 8 short answer questions 4 theory + 4 problems weight 1

3. 6 short essay type question 3 problems + 3 theory weight 2

(answer any 4 of this type)

24
4. 3 long essay type question 1 problem + 2 theory weight 4

(answer any 2 questions) (all the 3 questions from course IV)

Model Question Paper

Semester IV
Time 3hrs
COMPLEMENTARY COURSE- I
APPLIED STATISTICS

Part A
Answer all questions (weight 1 for a bunch of 4 questions)
Calculators are permitted
1. If the coefficient of kurtosis is equal to 3 the distribution is called

( a ) platykurtic ( b ) leptokurtic ( c ) mesokurtic ( d ) none of the above


2. If ρ = 0 the lines of regression are .

( a ) coincident ( b ) parallel ( c ) perpendicular to each other ( d ) none of the above


3. The range of multiple correlation coefficient R is.

( a ) 0 to1 ( b ) 0 to ∞ ( c ) − 1to1 ( d ) − ∞ to ∞
4. The test statistic for testing the significance of ρ = 0 with usual notation is.

r 1− r2 r n−2 r n−2 r 2 (1 − r 2 )
(a)t = (b ) t = (c) t = (d )t =
n−2 1− r2 1− r2 n−2

5. The long term regular movement in a time series is called .


( a ) Trend ( b ) Cyclic variation
( c ) Seasonal variation ( d ) Irregular variation
6. For the given five values 15,24,18,33,42,the three years moving averages are.

( a )19, 22,33 ( b )19, 25,33 ( c )19,30,31 ( d )19,30,33


7. Seasonal variation means the variations occuring within.

25
( a ) a number of years ( b ) parts of a year
( c ) parts of a month ( d ) none of the above
8. Link relatives in a time series remove the influence of.
( a ) Trend ( b ) Cyclic variation
( c ) Seasonal variation ( d ) all the above

9. Analysis of variance utilises:

( a ) Z − test ( b ) t − test ( c ) χ 2 − test ( d ) F − test


10. The error degrees of freedom for two way anova with k rows and n columns is

( a ) k − 1 ( b ) n − 1 ( c )( k − 1)( n − 1) ( d ) nk − 1
11. The causes leading to vast variation in the specifications of a product are
( a ) random causes ( b ) assignable causes
( c ) non − traceable causes ( d ) all the above
12. The control charts for fraction defectives are known as

( a ) X − chart ( b ) R − chart ( c ) p − chart ( d ) c − chart


Part B
Answer all questions Weight 1
13 Karl Pearsons’s formula for measure of skewness is -------------

14 Given two lines of regression as, 3 x − 4 y + 8 = 0 and 4 x − 3 y = 1, the means of X


and Y are ------------

15 The formula for multiple correlation coefficient R2.13 in terms of the simple --
correlation coefficients r12 , r13 and r23 is ----------

16 Given the trend equation , Y = 108 + 2.8 X with 2000 as orgin and yearly data from
2000 to 2002,the estimated trend value for 2005 is.---------

17 Moving average method estimates -----------

18 Equality of several normal population means can be tested by ----------

26
19 One or more points outside the control limit indicates that -------

20 Control limits for mean with usual notations are ----------

PartC
Answer any 4 questions ,weight 2

21. State and prove any two properties of regression coefficients.

22. Show that Correlation coefficent is indepndent of change of orgin and scale.

23. A computer while calculating correlation coefficent between two variables X

and Y from 25 pairs of obser vations obtained the following results

n = 25, ∑ X = 125, ∑ X 2 = 650, ∑ Y = 100, ∑ Y 2 = 460, ∑ XY = 508 .It

was,however,later discovered at the time of checking that he had copied down

two pairs as (6,14) and (8,6) while the correct values where (8,12) and (6,8)

. Obtain the correct value of the correlation coefficent .

24. In a trivariate distribution r12 = .77, r13 = .72, r23 = .52 .Find the partial correlation

Coefficient r1 .23 and multiple correlation coefficient R12.3

25.Explain the steps followed in Link relative method.

26. What do you understand by 3-σ control chart. Obtain the 3-σ control limits for

X bar chart

Part D
Answer any 2 questions , weight 4

27. The following are the cholesterol contents in milligrams per package that four
laboratories obtained for 6-ounce packages of three very similar diet foods

27
Diet food A Diet food B Diet food C
.
Laboratory 1 3.4 2.6 2.8
Laboratory 2 3.0 2.7 3.1
Laboratory 3 3.3 3.0 3.4
Laboratory 4 3.5 3.1 3.7

Perform a two way analysis of variance and test the null hypotheses concerning
the diet foods and laboratories at the 0.05 level of significance.
28. .Calculate seasonal index for the following time series by Ratio to moving
average method.

Year Quarter I Quarter II Quarter III Quarter IV


1995 65 58 56 61
1996 68 63 63 67
1997 70 59 56 52
1998 60 55 51 58

28
29. The net weight of a dry bleach product is to be monitored by X-bar and R
chart
using a sample size of n=5 .Data for 12 preliminary samples are as follows.
Sample no. X1 X2 X3 X4 X5
1 15.8 16.3 16.2 16.1 16.6
2 16.3 15.9 15.9 16.2 16.4
3 16.1 16.2 16.5 16.4 16.3
4 16.3 16.2 15.9 16.4 16.2
5 16.1 16.1 16.4 16.5 16.0
6 16.1 15.8 16.7 16.6 16.4
7 16.2 16.1 16.2 16.1 16.2
8 16.2 16.1 16.2 16.1 16.3
9 16.3 16.2 16.4 16.1 16.5
10 16.6 16.3 16.4 16.1 16.5
11 16.2 16.4 15.9 16.3 16.4
12 15.9 16.6 16.7 16.2 16.5

Set up X-bar and R control chart using this data. Does the process exhibit statistical
control.

29
SYLLABUS OF COMPLEMENTARY II- ACTUARIAL SCIENCE

STATISTICS: COMPLEMENTARY – II
CUCCSSUG 2009 (2009 admission onwards)

Sem Course Code Course Title Instructional Credit Exam Ratio


ester hours/week hours Ext:Int
No
1 AS1C01 4 3 3 3:1
FINANCIAL
MATHEMATICS
FINANCIAL
MATHEMATICS
2 AS2C02 FINANCIAL 4 3 3 3:1
MATHEMATICS
3 AS3C03 LIFE 5 3 3 3:1
CONTINGENCIES
AND PRINCIPLES OF
INSURANCE
4 AS4C04 LIFE 5 3 3 3:1
CONTINGENCIES
AND PRINCIPLES OF
INSURANCE
Pattern of Question papers.

There shall be 4 parts A, B, C and D in the question papers. Part A consists of 12


objective type questions. Part B consists of 8 questions to be answered in a word, phrase
or sentence. Part C consists of 6 questions of short essay type of which the student can
attempt 4. Part D consists of 3 questions of long essay type of which the student can
attempt 2. In part A the weightage per question is ¼.for part B weightage is 1/question

.For part D the weightage is 2/question and for part D the weightage is 4/question
As far as possible the number of questions should be proportional to the modules.

Table showing the components and weightage for internal assessment

Components Weight
Assignment 1

Test paper 2

Seminar 1

Attendance 1

There shall be two test papers and the average grade point is to be considered for
internal assessment
SEMESTER I
Course I

Financial mathematics

Module I: Rates of interest-Simple and Compound interest rates-Effective


rate of interest-Accumulation and Present value of a single
payment-Nominal rate of interest-Constant force of interest-
Relation ship between these rate of interest- Accumulation and
Present value of a single payment using these rate of interest-
Accumulation and Present value of a single payment using these
symbols-When the force of interest is a function of t,
δ(t).Definition of A(t1,t2),A(t),v(t1,t2) and v(t).Expressing
accumulation and present values of a single payment using these
symbols-when the force of interest is a function of t, δ(t) 22hrs

Module II: Series of payments-Definition of annuity (Ex:-real life situation)-


Accumulation and present vales of annuities with level payments
and where the payments and interest rates have same frequencies-
Definition and derivation –Definition of perpetuity and derivation-
Accumulation and present values of annuities where payments and
interest rates have different frequencies 22hrs

Module III: Increasing and decreasing annuities-Definition and derivation—


Annuities payable continuously-Annuities where payments are
increasing continuously and payable continuously-Definition and
derivation 10hrs

Module IV: Loan schedules-Purchase price of annuities net of tax-consumer


credit transaction 18hrs

Books for study and reference:

Institute of Actuaries Act Ed. Study materials


McCutcheon, J.J., Scott William (1986): An introduction to Mathematics
of Finance
Butcher,M.V., Nesbit, Cecil. (1971)Mathematics of compound interest,
Ulrich’s Books
Neill, Alistair, Heinemann, (1977): Life contingencies.
Bowers, Newton Let al (1997): Actuarial mathematics, society of
Actuaries, 2nd Ed
Model Question Paper

Semester I
COMPLEMENTARY COURSE II
FINANCIAL MATHEMATICS
Time: 3 Hrs

Part A
Choose the correct answer from the brackets
Bunch of four questions carries one weight age

1. If an investor deposits £4000 in a bank account that pays simple interest at a rate
of 6% pa. Then after 8 years it will be ------------------
(a)5920 (b)4920 (c)3920 (d)3000
2. If an investor deposits £4000 in a bank account that pays compound interest at a
rate of 6% pa. Then after 8 years will be ------------------
(a)5920 (b)4920 (c)6375 (d)6000
3. An investor must make a payment of £5000 in 5years time. The investor wishes to
make provision for this payment by investing a single sum now in a deposit
account that pays 10% pa compound interest. How much should the initial
investment be?
(a)3105 (b)4105 (c)4000 (d)3000
4. An 8 month loan repayable by a single repayment is issued at a rate of
commercial discount of 15%pa. If the amount of the repayment is £1,00,000 How
much was initially lent to the borrower?
(a)80000 (b)90000 (c)100000 (d)75000
5. £80 is invested at time 5 and the accumulated amount at time 8 is £100.what is the
value of interest
(a)8.33% (b)8% (c)7% (d)7.33%
6. Find the value at time t=0 of$250 due at time t=6 and $600 due at time t=8. If
S(t)=3%pa for all t
(a)680.79 (b)650 (c)675.25 (d)680
7. Calculate a25 at 13½%pa effective
(a)7.095 (b)7.25 (c)8.095 (d)8.75
8. A loan of £900 is repayable by equal monthly payments for 3years, with interest
payable at 18½%pa effective. Calculate the amount of each monthly payments
(a)32.13 (b)31.13 (c)35.25 (d)30.75
9. Find R,if P=7892, l=5, i= 10% and n=10
(a)125.01 (b)123.25 (c)175 (d)150
10. Find P, if l=5, R=125, i=10% and n=20
(a)61.15 (b)65.25 (c)60.825 (d)62.13
11. Calculate numerical value for ā7 @7½%pa
(a)5.4928 (b)6.492 (c)7.25 (d)8.125
12. Calculate 5\ ä8(3) @ 6%
(a) 3.8247 (b) 4.8247 (c) 5.25 (d)6.875]
Part B
Attempt all questions- each questions carries one weight age

13. Calculate v, assuming an effective annual rate of interest of 4%


14. An investor makes an initial investment of £5000 and is credited with £500
interest at the end of the year. What is the effective rate of interest and the value
of i?
15. £4600 is invested at time 0 and the proceeds at time 10 are £8200. Calculate
A(7,10) if A(0,9)=1.8, A(2,4)=1.1, A(2,7)=1.32, A(4,9)=1.45
16. Find the accumulated value if $1 is invested for 7years at an interest rate of
6.5%pa effective
17. Calculate the present value on 1-Sept-2002 of payments of £280 due on 1-Sept-
2004 and £360 due on 1-March-2005. Interest is 15%pa effective
18. Calculate a6(4) at 1½%pa, first without using the tables and then with the tables
19. Write down a formula for Lt , if the loan is repaid by level regular instatements, so
that Xt=X,for all t
20. Write down a formula for m\ān in terms of ān ?

PART C
Attempt any four questions- each questions carries two weight age

21. Consider two non-overlapping time periods. Period 1 has length l time units and
period 2 has length m time units. If the effective period 1 interest rate is i. Express
the equivalent effective period 2 interest rate in terms of I, l and m
22. If the force of interest is δ(t)=0.04,0<t<6 and δ(t)=0.2-0.02t, 6<t<9. Find the
accumulated value at time 8 of a payment of $400 at time 3
23. Find the accumulated value of a payment stream of 0.3+1.5t that is received
continuously from time 4 to time 8 During which time the force of interest is
0.01+0.05t
24. A motorist buys a car costing £5000 using a loan with a flat rate of interest of
10% and repayments at the end of each of the next 12 months .Calculate the loan
outstanding immediately thereafter the second payment

25. A loan of $50000 is repayable by equal annual payments at the end of each of the
next 5 years; interest is 8%pa for the first 3 years and 12%pa thereafter. Calculate
the loan outstanding immediately thereafter the second payment
26. Derive formulae for (Iä)n
a. Algebraically, and
b. By general reasoning, starting from the formula for (Ia)n
PART D
Attempt any two questions- each questions carries four weight age

27.A woman takes out a home improvement loan for £11000 over 5 years. She makes

monthly repayments in arrears and the bank charges an effective rate of interest of

6%pa
(a) What is the monthly repayment?
(b) How much interest does she pay in the 3rd year?
(c) How much capital is repaid in the 20th installments?
28. An investor wishes to find the present value of a stream of property income
payments. She proposes to make the following assumptions
• The level of current payment is £20,000 paid quarterly in advance
• Payments will remain fixed for 5 years period. At the end of each
5-year period the payments will raise in line with total inflationary
growth over the previous 5 years
• Inflation assumed to be constant at 3%pa
• The interest rate for the calculation is 12%pa effective
Find the P.V of the income stream; assume that the payments continue for 50
years
29. The force of interest is given by
δ(t)={0.04+0.002t 0<t<10
0.015t-0.08 10<t<12
0.07 t>12
Find the expression for the accumulation factor from time 0 to t?
SEMESTE II

Course II Life contingencies

Module I: Survival distribution and Life tables:


Probability for the age at death- life tables- The deterministic
survivorship group. Other life table functions, assumptions for
Fractional Ages Some analytical laws of mortality select and
ultimate life table 25hrs

Module II: Multiple life functions: Joint life status-the last survivor status-
Probabilities and expectations-Insurance and annuity benefits-
Evaluation-Special mortality laws-Evaluation-Uniform distribution
of death-Simple contingent functions-Evaluation 10hrs

Module III: Evaluation of assurance:


Life assurance contracts-(whole, n-year term, n-year endowment,
deferred)-Insurance payable at the moment of death and insurance
payable at the end of year of death-Recursion equations-
Commutation functions 19hrs

Module IV: Life annuities: single payment contingent on survival-Continuous


life annuities-Discrete life annuities-Life annuities with monthly
payment-Commutation Function formulae for annuities with level
payments-Varying annuities-Recursion equations-complete
annuities-immediate and apportion able annuity –due 18hrs

Books for study and reference:

Institute of Actuaries Act Ed. Study materials


McCutcheon, J.J., Scott William (1986): An introduction to Mathematics
of Finance
Butcher,M.V., Nesbit, Cecil. (1971)Mathematics of compound interest,
Ulrich’s Books
Neill, Alistair, Heinemann, (1977): Life contingencies.
Bowers, Newton Let al (1997): Actuarial mathematics, society of
Actuaries, 2nd Ed
Model Question Paper

Semester II
COMPLEMENTARY COURSE II

Time: 3 Hrs LIFE CONTINGENCIES

PART A

Choose the correct answer from the brackets


Bunch of four questions carries one weight age

1. If S(x)=1-x/100 , 0<x<100. Calculate µ(x)


(a)1/100-x (b)x/100 (c)1/100 (d)3/100
2. If S(x) = [1-x/100]1/2 , 0<x<100. Evaluate 17P19
(a)1/8 (b)8/9 (c)9/8 (d)5/8
3. Given that 25P25:50 = 0.2 and 15P25=0.9, calculate the probability that a person
aged 40 will survive to age 75
(a)1/2 (b)9/2 (c)1/9 (d)2/9
4. If m(x)=1/100-x for 0<x<100, calculate e040:50
(a)36.94 (b)18.06 (c)25.05 (d)32.30
5. In a mortality table known to follow Makeham’s law, you are given that
A=0.003 and C10=3. If e40:50=17 , calculate xq401:50
(a)0.0123 (b)0.5632 (c)0.2755
(d)0.4835
6. Assume mortality is described by lx=100-x, 0<x<100 and that the force of
interest is δ=0.05. Calculate Ā401:50
(a)0.47890 (b)1.3567 (c)0.0523
(d)0.2378
7. If lx=100-x, 0<x<100 and i= 0.05, calculate (IA)40
(a)5.5545 (b)2.5678 (c)4.2891
(d)6.7235
8. If Ax=0.25, Ax+20 = 0.40 and Ax:20= 0.55, calculate Ax:201
(a)0.2 (b)0.3 (c)0.5 (d)0.125
9. On the basis of the illustrative life table with interest at the effective annual
rate of 6%, calculate the value of ä(12)25:40
(a)11.20 (b)15.038 (c)19.638
(d)25.32
10. Let Y be the PVRV for a continuous 10-year temporary life annuity of 1 pa
commencing at age 60. On the basis of your illustrative life table with uniform
distribution of deaths over each year of age and i= 0.08, calculate mean
(a)6.4634 (b)2.5891 (c)8.7800 (d)5.3239
11. Use the illustrative life table with uniform distribution of deaths over each year
of age and i= 0.07, to determine ä30:20
(a)11.415 (b)10.415 (c)9.897 (c)8.326
12. If S(x) =1-x/100, 0<x<100. Calculate FX(x)
(a)X/100 (b)1/100 (c)x-100 (d)100
PART B
Attempt all questions- each questions carries one weight age

13.On the basis of life table, evaluate the probability that (20)will
(a) live to 100
(b) die before 70
14.Explain complete expectation of life
15. Under the assumption of uniform distribution of deaths, show that
(a) e0x=ex+1/2
(b) Var[T]= Var[K]+1/2
16. The pdf of the future life time T,for (x)is assumed to be
fT(t)={1/80 , 0<t<80
0 , elsewhere
At a force of interest δ, calculate for Z,the PVRV for a whole life insurance f
or unit amount issued to (x)
(a) The actuarial present value
(b) The variance
17. Explain Endowment life insurance at the moment of death
18. Compare the variances of the PVRV’s for the complete annuity - immediate
19. Prove that n/qx = (Ax:n - Ax)/d – nEx
20. Explain n-year temporary life annuity – due

PART C
Attempt any four questions- each questions carries two weight age

21. Assuming that future life times of (80) and (80)are independent, obtain an
expression in single life table functions for the probability that their
(a) First death occurs after 5 and before 10 years from now
(b) Last death occurs after 5 and before 10 years from now
22. Prove that nqx2y = nqx-nqx1y & nqx1x= ½nqxx
23. Using life tables, evaluate
(a)2P[30] (b) 5P[30] (c) 1\q[31] (d)q[31]+1
24. Under the constant force of mortality assumption, are the random variable K and
S are independent
25. Assume that each of 100 independent lives
(i) Is age x
(ii) Is subjected to a constant force of mortality µ=0.04 and
(iii) Is insured for a death benefit amount of 10 units, payable at the
moment of death
The benefit payments are to be withdrawn from an investment fund earning δ=
0.06. Calculate the minimum amount at t=0, so that the probability is
approximately 0.95 that sufficient funds will be on hand to withdraw the benefit
payment at the death of each individual
26. Consider a 5-year deferred whole life insurance payable at the moment of death of
(x). The individual is subject to a constant force of mortality µ=0.04. For the
distribution of the PV of the benefit payment at δ=0 .10
(a) Calculate the expectation
(b) Calculate the variance

PART D
Attempt any two questions- each questions carries four weight age

27. Relationship between insurance payable at the moment of death and the end of
year of death
28. Under the assumptions of a constant force of mortality M, and of a constant force
of interest delta, evaluate
(a) āx=E[āT]
(b) Var[āT]
(c) Probability that āT exceeds ax
29. The future life T(x)and T(y)are independent and each has a distribution defined
by the pdf fX(t)={0.02(10-t) , 0<t<10
0 , elsewhere
(a) Determine the distribution function, survival function and force of
mortality
(b) Determine the joint pdf & joint distribution function and joint survival
Function for T(x) and T(y)
(c) Determine complete expectation for the joint life status T(x,y)
SEMESTER III
Course III

Life contingencies and Principles of insurance

Module I: Net premiums: Fully continuous premiums-fully discrete


premiums-True mthly payment premiums-Apportion able
premiums-Commutation functions-Accumulation type benefits
20hrs

Module II: Fully continuous net premium reserves-other formulas for fully
discrete net premium results-Reserves on semi continuous basis-
Reserves based on semi continuous basis-Reserves based on
apportion able or discounted continuous basis-Recursive formulae
for fully discrete basis-Reserves at fractional duration-Allocation
of the loss to the policy years-Differential equation for fully
continuous reserves 25
Module III: Concept of Risk-the concept of Insurance-Classification of
Insurance-Types of Life Insurance-Insurance Act, fire ,marine,
motor engineering, Aviation and agricultural-Alternative
classification-Insurance of property-pecuniary interest, liability
&person, Distribution between Life & General Insurance-History
of General Insurance in India. 25hrs

Module IV: The Economic of Insurance: Utility theory-Insurance and Utility-


elements of Insurance-optimal insurance-Multiple decrement
models 20 hrs

Books for study and reference:

Institute of Actuaries Act Ed. Study materials


McCutcheon, J.J., Scott William (1986): An introduction to Mathematics
of Finance
Butcher,M.V., Nesbit, Cecil. (1971)Mathematics of compound interest,
Ulrich’s Books
Neill, Alistair, Heinemann, (1977): Life contingencies.
Bowers, Newton Let al (1997): Actuarial mathematics, society of
Actuaries, 2nd Ed
Model Question Paper

Semester III
COMPLEMENTARY COURSE II

COURSE III– Life contingencies & Principles of insurance

Time: 3 Hrs
PART A
Choose the correct answer from the brackets
Bunch of four questions carries one weight age

PART A

1. Given for a double decrement table, that q401(1) = 0.02 and q1(2)=0.04. Calculate
q40(1) to four decimals
(a) .0909 (b)0.0592 (c)0.0426 (d)0.3296
0.1x
2. Let the loss random variable X have a pdf given by f(x)=0.1e ,x>0, calculate
E[X]?
(a) 10 (b)30 (c)25 15
3. The loss random variable X have the pdf given by f(x)=1/100, 0<x<100, calculate
V[X]?
(a) 50, (b) 2500/3 (c) 250/3 (d)45/3
1 (12) (12)
4. If Px :20 = 1.032 and Px:20=0.040, what is the value of Px:20 ?
(a) [0.035 (b) 0.326 (c) 0.957 (d) 0.583
5. Using the illustrate life table and directly calculate P(2)[Ā50:20/ā(2)50:20]
(a) [0.0413 (b) 0.0328 (c) 0.191 (d) 0.0456]
6. Using the illustrate life table and interest rate of 6%, calculate the component of the
decomposition
1000 P50:20 = 1000(P50:120 + P50:201`)

(a) 20.4106 (b) 6.3099 (c) 25.6458 (d)11.5451


7. An ordinary life contract for a unit amount on a fully discrete basis is issued to a
person age x with an annual premium of 0.048. Assume d= 0.06, Ax=0.4 and
2
Ax=0.2. Let L be the insurer’s loss function at issue of this policy calculate E[L].
(a) 0.1296 (b)-0.1296 (c)-0.08 (d) 0.08
8. Calculate the value of Px :n if nVx = 0.080 , Px = 0.024 and Px1:n = 0.2
1

(a) 0.008 (b) 0.8 (c)0.08 (d)0.0008


9. If 10V35 = 0.150 and 20V35 = 0.354 calculate 10V45.
(a) 0.252 (b) 0.240 (c) 0.232 (d)0.2
10. Assuming δ = 0.05 qx = 0.05 and a uniform distribution of death in each year of
age, calculate (ĪĀ)x1:1
(a) 0.01896 (b) 0.02418 (c) 0.2418 (d) 0.1896
11.f Px1:20(12) = 1.032 and Px:20=0.040, what is the value of Px:20(12)?
(a) 0.035 (b) 0.326 (c) 0.957 (d) 0.583
12.Using the illustrate life table and directly calculate P(2)[Ā50:20/ā(2)50:2
(a) 0.0413 (b) 0.0328 (c) 0.191 (d) 0.0456
PART B
Attempt all questions- each questions carries one weight age

13. Determine an expression in actuarial present values and benefit premiums for the
Var[ kL / k(x) = k, k+1,……..] for a fully discrete n-year endowment insurance with a
unit benefit
14. A fully discrete whole life insurance with a unit benefit issued to (x) has its first years
benefit and the remaining benefit premiums are level and determined by the equivalence
principle
Determine formulas for
a. The first year benefit premium
b. The level benefit premium after the 1st year
15.Calculate P[Āx] and Var[L] with the assumptions that the force of mortality is a
constant µ=0.04 and the force of interest δ=0.06
16.Derive relationships among continuous benefit premiums using identities
17. A decision maker’s utility function is given by u[w]=-e-5w. The decision maker has
two random economic prospects available. The outcome of the first has a normal
distribution with mean 5 and variance 2 and the outcome of the second has a normal
distribution with mean 6 and variance 2.5. Which prospects will be preferred
18. Explain fully continuous benefit reserves in whole life insurance
19. Derive a general expression for 2Āx - (Āx)2/ (δāx)2 , where µx(t)=µ and δ is the force of
interest for t>0
20. Prove and interpret the formula Px:n = nPx + Px:n1(1-Ax+n)

PART C
Attempt any four questions- each questions carries two weight age

21.. Show that kV{m}(Āx:n) = knV{m}(Āx) + (1- Āx+n) kV{m}x:n1


22. The normal benefit premiums for a fully discrete whole life insurance with a unit
benefit issued to (x) are Πj = Πwj where wj =(1+r)j , the rate r might be selected to
estimate the expected growth rate in the insured. Develop formulas for (a) Π
(b) hV when r=i
23. What you mean by insurance and explain the classification of insurance
24.What is utility and Explain its importance in insurance

25The probability that a property will not be damaged in the next period is 0.75. The pdf
of a possible loss is given by f(x)=0.25(0.01)e-0.01x, x>0 . The owner of property has a
utility function given by u(w)= - e-0.05w. Calculate the expected loss and the maximum
insurance premium. The property owner will pay to the complete insurance

26Distinguish between random survivorship group and deterministic survivorship group


PART D
Attempt any two questions- each questions carries four weight age

27 An insurer is planning to issue a policy to a life age 0, whose curtate future life
time k is governed by the p.f k/q0=0.2,k=0,1,2,3,4
The policy will pay 1 unit at the end of year of death in exchange for the payment
of a premium P at the beginning of each year, provided the life survives. Find the
annual premium P is determined by;

c. Principle I: P will be the annual premium such that the insurer, using a
utility of wealth function u(x)=x will be indifferent between accepting and
not accepting the risk

d. Principle II: P will be the annual premium such that the insurer, using a
utility of wealth function u(x)= -e-0.01x will be in utility of wealth function
28. If k\qx= C (0.96) k+1, k=0, 1, 2……where c=0.04/0.96 and i=0.06. Calculate Px
and V[L]

29. On the basis of De-Moiver’s law with lx=100-x and the interest rate of 6%.
Calculate
(a) P(Ā35) , (b)tV(Ā35) and V[tL\T(x)>t] , for t=0,10,20,….,60
SEMESTER IV

Course IV
Probability models and Risk theory

Module I: Individual risk model for a short time: Model for individual claim
random variables-Sums of independent random variable-
Approximation for the distribution of the sum-Application to
insurance 20hrs

Module II: Collective risk models for a single period: The distribution of
aggregate claims-Selection of basic distributions-Properties of
compound Poisson distributions –Approximations to the
distribution of aggregate claims 25hrs

Module III: Collective risk models over an extended period: Claims process-
The adjustment coefficient-Discrete time model-The first surplus
below the initial level-The maximal aggregate loss 20hrs

Module IV: Application of risk theory: Claim amount distributions-


Approximating the individual model-Stop-loss re-insurance-The
effect of re-insurance on the probability of ruin 25hrs

Books for study and reference:

Institute of Actuaries Act Ed. Study materials


McCutcheon, J.J., Scott William (1986): An introduction to Mathematics
of Finance
Butcher,M.V., Nesbit, Cecil. (1971)Mathematics of compound interest,
Ulrich’s Books
Neill, Alistair, Heinemann, (1977): Life contingencies.
Bowers, Newton Let al (1997): Actuarial mathematics, society of
Actuaries, 2nd Ed
Model Question Paper

Semester IV
COMPLEMENTARY COURSE II

COURSE IV – Probability models And Risk Theory

Time: 3 Hrs

Part A
Choose the correct answer from the brackets
Bunch of four questions carries one weight age

1. Let X is the number obtained when one true die is tossed. Let y be the sum of the
numbers obtained when x true dice are then thrown. calculate E[y]
(a) 4/7 (b)7/4 (c)2/6 (d)3/6
2. Under certain assumptions, the probability of ruin is
Ψ(u)= (0.3) e-2u +(0.2) e-4u+(0.1)e-7u, u > 0. Calculate θ?
(a)2/3 (b)1/3 (c)1 (d)½
3. Suppose that λ = 3, C = 1 and P(x) = 1/3 e-3x +16/3 e-6x , x >0 Calculate P1
(a)3/27 (b)6/27 (c)4/27 (d)5/27
4. Suppose that λ = 1, C = 10 and P(x) = 9x/25 e-3x/5, x>0. Calculate θ
(a) 3 (b) 4 (c) 2 (d)5
5. Suppose that the claim amount distribution is discrete with P(1)=1/4 and
P(2)=3/4.If R=log 2.Calculate θ
(a) 10 -1 (b) 10 (c) 10 -1 (d)10
7log2 7log2 5log2 5log2
6. Suppose that Wi assumes only, the value 0 and +2 and that
P[W=0]=p,P[W=2]=q,where p+q=1,Assume that C=1,P>1/254
7. Consider an insurance portfolio that will produce 0, 1, 2, or three claims in a fixed
Time period with probabilities 0.1, 0.3, 0.4 and 0.2respectevely An individual
Claim will be of amount 1, 2, or 3 with probabilities 0.5, 0.4 and 0.1
Respectively Calculate E[N]
(a) 1.7 (b) 2.7 (c) 2.8 (d)1.6
-3x -7x
8. Suppose that θ=2/5 and p(x)= 3/2e + 7/2e , x>0 calculate γ
(a) 2 (b)3 (c)4 (d)2.5
9. If S has a compound Poisson distribution given by λ=3,p(1)= 5/6,p(2)=1/6,
Calculate fs(x) for x=0
(a) 0.050 (b) 0.25 (c) 0.052 (d) 0.523
10 Consider an insurance portfolio that will produce 0, 1, 2, or three claims in a fixed
Time period with probabilities 0.1, 0.3, 0.4 and 0.2respectevely an individual
Claim will be of amount 1, 2, or 3 with probabilities 0.5, 0.4 and 0.1
Respectively Calculate V [N]
(a) 0.8 (b) 0.028 (c) 0.08 (d) 0.285
-3x -7x
11. Suppose that θ=2/5 and p(x)= 3/2e + 7/2e , x>0 calculate R
(a) 2.5 (b) 3.45 (c) 4.25 (d) 2.5
12. If S has a compound Poisson distribution given by λ=3,p(1)= 5/6,p(2)=1/6,
Calculate Fs(x) for x=2
(a) 0.354 (b) 0.258 (c) 0.520 (d) 0.545

PART B
Attempt all questions- each questions carries one weight age

13. Assume that N has a geometric distribution; that is ,the probability function of N
is given by
P[N=n] = pqn , n=0,1,2…..
Where 0<q<1 and p=q-1.Determine MS(t) in terms of MX(t)?
14. If S has a compound Poisson distribution, specified λ and p(x), Then the
distribution of Z = S- λP1
λP2 Converges the standard normal distribution as
λ→∞?
15.Write an expression for the distribution of the surplus level at the first time, the
surplus falls below the initial level u, given that it does fall below u, if all claims
are of size 2?

16. Derive an expression for Ψ(u) if the Xi’s have an exponential claim amount
distribution?
17. Write an expression for the distribution of L, if the size of the individual claims
has an exponential distribution with parameter β?
18. Find the mean and variance of the Inverse Gaussian distribution, by using its
mgf
19. Derive an expression for R in the special case where the Wi’s common
distribution is N(µ,σ2)?
20. Determine the adjustment coefficient if the claim amount distribution is
exponential with parameter β>0?
PART C
Attempt any four questions- each questions carries two weight age

21.Assume that u(λ) is the gamma probability distribution function with parameter
α and β,
u(λ) = βα λα-1 e –βλ
Γα ,λ>0
Where Γα = ∫∞0 yα-1 e –y dy. Show that the marginal distribution of N is negative
binomial with parameters, r = α , p= β
1+ β

22.Prove that if S1,S2,………….Sm are mutually independent random variables,such


that Si has a compound Poisson distribution with parameter λi and d.f of
claim amount Pi(x),i=1,2,……….m, then S= S1+ S2+………….+Sm has a
m m
Compound Poisson distribution with λ= ∑ λi and P(x)= ∑ λi /λPi (x)
i=m i=m
23. Assume that u(λ) is the inerse Gaussian p d f with parameters α and β . Exhibit
the moment generating function of N, E [N] and V [N]?

24.Find E[Id] if S~ compound Poisson with parameter λi and individual


distribution p(x) is exponential with parameter θ ?

25. Calculate the adjustment coefficient if all the claims are of size 1?

26. Calculate the probability of ruin in the case that the claim amount distribution
is exponential with parameter β >0

PART D
Attempt any two questions- each questions carries four weight age

27.Consider a portfolio of 32 policies, for each policy the probability q of a claim


is 1/6 and B, the benefit amount given that there is a claim ,has pdf
F(y) = 2(1-y), 0<y<1
0 , elsewhere
Let S be the total claims for the portfolio. Using a normal distribution, Estimate
P[S>4]

28.Prove that for compound distribution where the probability distribution for N
the number of claims , satisfies the condition
P[N=n] = a+b/n ,for n= 1,2,………..
P[N=n-1] and where the distribution of claim amounts is restricted to the
positive integers.
x
fS(x) = ∑ [a+bi/x]p(i) fS(x-i) ,x=1,2,………
i=1
With the starting value fS(0) = P[N=0]

29.Given that θ = 2/5, and P(x) = 3/2 e-3x +7/2 e-7x , x>0 .Calculate Ψ(u),γ,R?
STATISTICS: COMPLEMENTARY – I Syllabus for BSc.

CUCCSSUG 2009 (2009 admission onwards)

SYLLABUS FOR BSc. ( GEOGRAPHY MAIN)

Sem Course Code Course Title Instructional Credit Exam Ratio


ester hours/week hours Ext:Int
No
1 SG1C01 4 3 3 3:1
STATISTICAL
METHODS
2 SG2C02 Regression Analysis, 4 3 3 3:1
Time Series and Index
Numbers
3 SG3C03 PROBABILITY 5 3 3 3:1

4 SG4C04 5 3 3 3:1
TESTING OF
HYPOTHESIS

Pattern of Question papers.

There shall be 4 parts A, B, C and D in all the question papers. Part A consists of 12
objective type questions. Part B consists of 8 questions to be answered in a word,
phrase or sentence. Part C consists of 6 questions of short essay type of which the
student can attempt 4. Part D consists of 3 questions of long essay type of which the
student can attempt 2. In part A the weightage per question is ¼.for part B weightage
is 1/question .For part D the weightage is 2/question and for part D the weightage is
4/question. As far as possible the number of questions should be proportional to the
modules.
Table showing the components and weightage for internal assessment

Components Weight
Assignment 1

Test paper 2

Seminar 1

Attendance 1

There shall be two test papers and the average grade point is to be considered for
internal assessment.

Semester I

Course-I (STATISTICAL METHODS)


Module 1. Meaning, Scope and limitations of Statistics – collection of data,
conducting a statistical enquiry – preparation of questionnaire – primary and
secondary data – classification and tabulation – Formation of frequency
distribution – diagrammatic and graphic presentation of data – population and
sample –advantages of sampling over census – methods of drawing random
samples from a finite population. (Only a brief summary of the above topics is
intended to be given by the teacher. Detailed study is expected from the part of
students). 12hrs

2. Module 2. Measures of central tendency – Arithmetic mean-weighted


arithmetic mean, medium, mode, geometric mean and harmonic mean, partition
values – quartiles – deciles and percentiles. 30hrs

Module 3. Measure of dispersion – relative and absolute measures of


dispersion, measures of dispersion – range – quartile deviation – mean
deviation-standard deviation – Lorenz curve – skewness and kurtosis.
30 hours
Model Question Paper

B.Sc.Geography (Main)
I Semester
COURSE I : (Complementary I)
STATISTICAL METHODS

Time: 3 Hrs Maximum


Marks:

Section A
Answer all questions (Contains 12 questions, 4 Questions carry a weightage of 1)

1. The heights of 150 students are collected. The type of classification that is best
suited is
a) Qualitative
b) Quantitative
c) Geographical
d) Chronological
2. A frequency distribution in which the upper limits are not included in their
respective classes is called a
a) Continuous frequency distribution
b) Discrete frequency distribution
c) Raw data
d) Ungrouped frequency distribution
3. The class mark of a class is obtained by
a) upper limit-lower limit
b) upper limit + lower limit
upperlimit + lower limit
c)
2
upperlimit − lower limit
d)
2
4. When there are zeroes in the data we can not use
a) Median
b) Mode
c) Geometric mean
d) Arithmetic mean
5. The most suitable measure for an ordinal data is:
a) Median
b) Arithmetic mean
c) Combined mean
d) Mode
6. Mean of 20 values is 45. If one of these values is to be taken 64 instead of 46, the
correct value of mean is:
a) 49.5
b) 45.9
c) 40.9
d) 42.9
7. The formula to find coefficient of variation is:
__
σ X
a) × 100 b) × 100
__
σ
X
Median
c) ×100 d) σ × 100
σ
8. Mean deviation from median is:
a) Equal to mean deviation from mean
b) Greater than mean deviation from mean
c) Less than mean deviation from mean
d) No relation
9. The 50th percentile is equal to:
a) 10th decile
b) 1st decile
c) 2nd decile
d) 5th decile
10. For a symmetric distribution median and mode = 10. The value of mean is:
a) Zero
b) 20
c) 10
d) 5
11. For a positively skewed data:
a) Mean = mode
b) Mean < mode
c) Mean > mode
d) (Mean – Mode)/2
12. A curve which is flatter than a normal curve is called
a) Skewed curve
b) Platykurtic curve
c) Leptokurtic curve
d) Mesokurtic curve
Section B (Contains 6 questions answer any 4) Weight-1
13. When there are open end classes, we use median as a measure of central tendency
(1) Say true or false
(2) Explain your answer
14. In the case of categorical data we can not use histogram
(1) Say true or false
(2) Explain your answer
15 Suppose that the standard deviation of a set of observation is 3. If from each
observation ‘3’ is subtracted, the new standard deviation is zero.
(1) Say true or false
(2) Explain your answer
16. If 25% of the items are less than 10 and 25% are more than 40 the coefficient of
quartile deviation is -------.
17. Karl Pearson’s coefficient of skewness of a distribution is 0.32 and its standard
deviation is 6.5. The mean is 29.6. The mode is -------.
18. Define harmonic mean of n observations.
19. Give an example of a primary data.
20. Give the empirical relationship between mean ,median and mode.
Section C
(4 Questions to be answered out of 6) Weight-2
21. Explain why A.M. is considered as the best measure of central tendency?
22. Calculate quartile deviation for the following data:-
26, 54, 33, 41, 94, 41, 54, 26, 93, 87, 81, 64, 68, 95.
23. The first two-sub-groups have 10 items with mean 15 and S. D. 3. If the whole
group has 250 items with mean 15. 6 and S.D. 13.44 , find the standard deviation
of the second subgroup.

24. Distinguish between absolute and relative measures of dispersion.


25. Explain the terms skewness and kurtosis.
26. The means of two samples of sizes 50 and 100 respectively are 54.1 and 50.3. The
standard deviations are 8 and 7. Obtain the mean and standard deviations of the
sample consisting of 150 observations by combining the two samples.

Section.4 (2 questions to be answered out of 3)


27 What is a Lorenz curve? Give its uses?
28. What is meant by classification? What are its bases?
29. Calculate mean deviation about median for the data given below.
Class: 5-10 10-15 15-20 20-25 25-30 30-35
Freq: 8 7 30 26 12 7
Semester II
Course-II Regression Analysis, Time Series and Index Numbers
Module 1. Fitting of curves of the form – linear, y=abx, y=aebx – correlation
analysis – concept of correlation – methods of studying correlation – scatter
diagram – Karl Pearson’s correlation coefficient – concept of rank correlation
and Spearman’s rank correlation coefficient – regression analysis – linear
regression – regression equations (concepts only – Derivations are beyond the
scope of this syllabus). 30hrs

Module 2. Index numbers, meaning and use of index numbers – simple and

weighted Index numbers – price index numbers – Laspeyer’s, Paasche’s

Marshall – Edgeworth and Fisher’s index number – Test of good index

number, chain base and fixed base index number – construction of cost of

living index number.

20hrs

Module 3. Time series analysis – component of time series – measurement of

secular trend semi average, moving average and least square methods (linear

function only) concept of seasonal and cyclical variation. 22hours


Model Question Paper

B.Sc.Geography (Main)
Semester II
COURSE II : (Complementary I)

Regression Analysis, Time Series and Index Numbers


Part A
Answer all questions (A bunch of 4 carries weight 1)
Time 3hrs
1. If the coefficient of kurtosis is equal to 3 the distribution is called
a) leptokurtic b) mesokurti c) platykurtic d) none
2. If ρ = 0 the lines of regression are .
3. The long term regular movement in a time series is called .
a) trend b) cyclical variation c) seasonal variation c) none
4. For the given five values 15,24,18,33,42,the three years moving averages are.
a) 19,22,23.b) 19,25,33. c)19,30,31. d) 19, 30,33.
5. Seasonal variation means the variations occurring during.
a) a year b) part of a year c) part of a month d) none
6. Non- centered moving averages are due to

a) Odd period b) Even period

c) Odd no: if time point d) even no : if time points

7. Seasonal variations are periodic due to

a) Man made customs, habits, rituals etc

b) Resulting due to Natural reasons

c) Resulting due to change in weather condition

d) Any force that operate regularly year after year

8. Seasonal variation is measured using

a) Seasonal Averages b) Seasonal Indices

c) Seasonal Relatives d) None of these

9. A monthly seasonal variation measures are adjusted to

a) 12 b) 120 c) 1200 d) None of these

10. A model of time- series explains the ……………….relation between value of

variable and time series components

a) Additive b) Multiplicative c) Mathematical d) None of these


11. If an Index Number I o1 = 112, then it means

a) 12 % growth from base to current year

b) 112 % growth from base to current year

c) 88 % depreciation from base to current year

d) 12 % depreciation from base to current year

12. Which of the following is called ideal Index Number

a) Laspeyre’s b) Parshe’s c) Fishers’ d) Kelly’s

Part B
Answer all questions Weight 1
13. Karl Pearsons’s formula for measure of skewness is -------------

14. Given two lines of regression as, 3 x − 4 y + 8 = 0 and 4 x − 3 y = 1, the means of X


and Y are ------------

15. Write down the normal equation for fitting a straight lune.

16. Given the trend equation , Y = 108 + 2.8 X with 2000 as orgin and yearly data
from 2000 to 2002,the estimated trend value for 2005 is.---------

17. The formula for calculating the rank correlation coefficient is--------

18. Why base shifting is necessary for Index Numbers.

19. Why Index Numbers are called Economic Barometers.

20. Give three major limitation of Index Numbers.

PartC
Answer any 4 questions ,weight 2

21. How trend in measured using Moving Averages.

22. Explain periodic variations in Time- Series with suitable examples.

23. Explain the Link Relative Method of measuring seasonal variation.

24.Explain the uses of Index Numbers.

25.With the help of an Index Number formula, explain Time and Factor Reversal Tests.

26.Explain the concept behind developing cost of Living Index Numbher


Part- D (Answer any 2 Questions) weight 4d

27.Given the following data related to yield of a crop in three different seasons.

Yield (Kg/10 cent plot)

Year Season 1 Season 2 Season 3

1990 12 19 17

1991 14 25 23

1992 13 27 20

1993 15 28 22

1994 17 31 24

i) If this trend is followed, what will be the expected yield in 1995?

ii) Does season influence yield of corp?

28. Briefly explain the problems in the construction of an Index Number.

29. Calculate the cost of Living Index Number for the data given below.

Rice

Year Season 1 Season 2 Season 3

Food 30 47 4

Fuel 8 12 1

Clothing 14 18 3

House Rent 22 15 2

Miscellaneous 25 30 1
Semester III

Course III-PROBABILITY

1. Module 1. Probability theory – concept of random experiment, sample point,

sample space and events – mathematical and statistical definitions of

probability, limitations, axiomatic approach to probability–addition and

multiplication theorems, concept of conditional probability, probability in

discrete sample space – numerical problems. 35 hours

2. Module 2. Random variable, definition of discrete and continuous type –

probability mass function, distribution function – mathematical expectation,

definition, numerical problems in the discrete case only. 25 hours

3. Module 3. One point, two point, Bernoulli, binomial, Poisson. Normal

distributions – probability density function, properties – simple numerical

problems. 30hrs
Model Question Paper

B.Sc.Geography (Main)
Semester III
COURSE III : (Complementary I)
PROBABILITY
Part A

(Answer all the questions. Choose the correct answer from the
alternatives given below each question). Bunch of 4 carries weight 1

1. The distribution function Fx(n) of a random variable x gives


a. P (x=X) b P (x>X) c. P (x ≤ X) d) none of the above
2. A and B are any two events, then
a. P (A ∪ B) + P (A ∩ B)> 1
b. P (A ∪ B) + P ( A1 ∩ B1) = 1
c. P (A) + P (A ∩ B)= P (A ∪ B)
d. A ∪ B and A1 ∪ B1 are mutually exclusive events.
3. A continuous random variable x has the distribution function Fx (n). The
range of variation of Fx (n) is
a. -∞ to +∞
b. 0 to ∞
c. Is that of the random variable x
d. 0 to 1
4. Two events A and B are said to be independent if
a. P(A/B)= P (A) & P(B/A) = P (B)
b. P (A ∪ B) = P (A) + P (B)
c. P (A ∩ B) = P (A) P (B/A)
d. P (A ∩ B) = 0
5. A random variable x is said to be continuous if it takes
a. an infinite number of values
b. Finite number of values or a countably infinite numbers of values
c. A continuum of value
d. A finite number of values
6. There are 4 houses available and 4 applicants. The probability that all
the applicants apply for the same house is
a. 3/32 b) 1/16, c) 1/64, d) 1/44
7. A coin is tossed 3 times. The chance that head and tail show alternatively is
a. 1/8 b) 1/4, c) 3/8, d) 1/2
8. Let the distribution function of a r.v n be
F(n) = 1-e-2n, n ≥ 0 ,
= 0 other wise
Then the density function is
a) 1-2e-n, n>0 =0 other wise b) 2e-2n, n>0 = 0 other wise
c) 1-2e-2n, n>0 =0 other wise d) e-2x, n>0 = 0 other wise
9. The theory of probability which takes into account of prior probabilities
of an event is
a) Addition theorem of probability
b) Multiplication theorem for dependent events
c) Baye’s Theorem
d) None of these
10. If x is a continuous random variable having the p.d.f. f(x) then
∫xf(x )dx is
a) >1 b) 0 c) 1 d) non negative
11. A and B are two events such that
P (A ∩ B) = 1/3, P (A1 ∩ B1) = 1/6 and 2 P (A) = P (B)= K then k is
a) 5/9 b) 7/9 c) 1/3 d) 2/3
12. The mean of the standard normal distribution is equal to
a) infinity b) zero c) unity d) none of these

Part B
(Answer all the questions) Weight-1

13. If A and B are any two events in a sample space S.


Then P (A-B) =……………….
14. If F is the distribution function of a.r. v x and if a<b then
P (a<x ≤ b)=……………………….
15. A and B are two independent events such that P(A1) = 0.7, P (B1)= k and
Space S. Then P (A ∪ B) = 0.8 then the value of k is equal
to……………
16. -------- ditribution has a mean greater than its variance.
17. If (x,y) is a pair of continuous r.vs with joint p.d.f f (x,y) then
∫ yf (x,y) dy is the ------------------------
18. Write down the set theoretic equivalent of the statement “ neither A nor
B” occcurs where A and B are two events in a sample space S.
19. What is the probability of getting a spade or an ace from a pack of cards.
20. Define a probability space.
Part C
(Answer any 4 questions) Weight-2

21. Show that for any two events A and B in a sample space S
P ( A ∩ B) ≥ P (A) + P (B) -1
22. In a swimming race the odds that A will win are 2 to 3 and the odds that
B will win are 1 to 4. Find the probability that A or B wins the race.
23. State and prove the multiplication law of probability.
24. What are the properties of a distribution function.
25. For a poisson distribution with parameter 3find Pr(X>2)
26. Examine whether f(x) as defined below is a pdf.
F(x) =0 for x<2
1
(3+2x) for 2 ≤ x<4
18
= 0 for n>4
Part D
(Answer any 2 questions) Weight-4

27. For any two events in a sample space s show that


P ( A ∩ B) ≤ P (A) ≤ P (A ∪ B) ≤ P (A) + P (B)
28. Let x be a continuous random variable with p.d.f given by
ax 0≤ x ≤ 1
f(x) = a 1≤ x ≤2
-ax +3a 2 ≤ x ≤3
0 otherwise
Determine the constant a and determine the distribution function F(x)
29.Describe the various definitions of probability pointing out the
limitations of each.
Semester IV

Complementary I

Course-IV-TESTING OF HYPOTHESIS

Module 1. Testing of statistical hypotheses, large and small sample tests, basic

ideas of sampling distribution, test of mean, proportion, difference of means,

difference of proportions, tests of variance and correlation coefficient, chi

squares tests.

35hours

Module 2. Non parametric tests – advantages, sign test, run test, signed rank

test, rank-sum test. Kolmogorov – Smirnov goodness of fit test.

30 hours

Module 3. Analysis of variance: One way and two way classifications.

Null hypotheses, total, between and within sum of squares. ANOVATable.

Solution of problems using ANOVA tables. 25 hours

Books for reference.


1. S.C. Gupta and V.K. Kapoor : Fundamentals of Mathematical
Statistics, Sultan Chand and sons
2. Mood A.M., Graybill. F.A and Boes D.CIntroduction to Theory of
3. Gibbons J.D.: Non parametric Methods for Quantitative Analysis,
McGraw Hill.
4. S.C. Gupta & V.K.Kapoor: Fundamentals of Applied Statistics, Sultan

Chand & Sons.

5. Box, G.E.P. and G.M. Jenkins: Time Series Analysis, Holden –Day
Model Question Paper

B.Sc.Geography (Main)
Semester IV
COURSE IV : (Complementary I)

TESTING OF HYPOTHESIS
Part A
Time 3 hours Answer all questions

A bunch of 4 questions carries weight 1

1. Type II error is the error of


a) Rejecting H0 when H0 is true
b) Accepting H0 when H0 is true
c) Accepting H0 when H1 is true
d) Accepting H1 when H0 is true

2. In a paired t-test:
a) The sample sizes should be equal
b) The size of the first sample should be less than the size of the second
c) The size of the second sample should be less than the size of the first
d) Both sample sizes should be ≥ 50

3. The test statistic for testing the significances of correlation coefficient


r
a. 1− r2
n−3

r 1− r2
b.
n−2

r 1− r2
c. n−3

r
d. n−2
1− r2

4. In the test of equality of means of two normal population with small samples of
sizes n1 and n2 taken from them and if the population have equal but unknown
variance, the test statistic follows:
a) t n1+n2-1 b)t n1+n2 c)t n1+n2/2 d) t n1+n2-2
5. In a chi-square contingency table with 3 rows and 5 columns, the d.f of chi-square
statistic is
a) 15
b) 24
c) 8
d) 7

6. The chi-square test statistic for a goodness of fit test is given by:
Oi − Ei
a)
Ei
Oi − Ei
b) ∑ Ei2

(Oi − Ei ) 2
c) ∑ Ei2

(Oi − Ei )2
d) ∑ Ei

7. The basic assumption for a non-parametric test is:


a) The variable is continuous
b) The variable is discrete
c) The variable is normal
d) The variable is standard normal
8.. The non-parametric equivalent test for a paired t-test is:
a) Signed Rank test
b) Rank sum test
c) Run test
d) Sign test

9. The test used to check the randomness of the collected set of symbols is:
a) Sign test
b) Rank sum test
c) Signed rank test
d) Run test

10. When there are 3 groups, each following normal distribution, and the null
hypothesis is concerned with the equality of means the test used is:
a) Chi square test
b) t-test for equality of means
c) Analysis of variance
d) none of the above
11. The test statistic in a two way ANOVA table follows:
a) Chi-square distribution
b) t-distribution
c) Normal distribution
d) F-distribution

12. In a one way ANOVA if the d.f of the total S.S is 13 and the d.f of the between
sample sum of squares is 6, the d.f of the error sum of squares is:

a) 7 b) 6 c) 19 d) 3

Part B weight 1( Answer all questions)

13. In chi-square test of independences of 2 attributes with 2 observations each, the d.f
of the test statistic is 1.

a) Say true or false.


b) Explain your answer.

14.In the case of sign test, the test statistic follows a binomial distribution.

a) Say true or false.


b) Explain your answer.

15In an one-way ANOVA, the total sum of squares of observations is 6212 and the
error sum of squares is 3272. The sum of squares between samples is 2900.

a) Say true or false.


b) Explain your answer.

16 In χ 2 test of goodness of fit if the calculated value of χ 2 is zero, then it is a bad


fit.
b) Say true or false.
a) Explain your answer.

17 Define power of a test.

18A sample of size 12 is taken from a normal distribution. The sample variance is 1.8.
What is the value of the test statistic for the test with H o = σ 2 = 3 .

19 Chi-square test is a----------- test.

20. Define level of significance.


Part C weight-2 ( answer any 4 questions)

21. What is the null hypothesis for a chi-square test of homogeneity of proportions
and give the layout of observations.

22. Mention the advantages of non-parametric tests over parametric test.

23. What are the assumptions behind analysis of variance?

24. In a lot containing 1235 articles, 35 were found to be defective. Does the
hypothesis: The proportion of defective articles is less than 0.02 hold?

25. The value of the sample mean from a population which was assumed to have
mean
5 is 4. The sample size is 100 and the variance of the sample is 1. Is there
significant difference between sample mean and population mean?
26.Explain paired t-test.
Part D Weight-4 ( Answer any 2 questions)

27 A factory operates in three shifts. The factory manager feels that quality of
part is related to shifts. For this purpose he has collected the following data
from the past records of production.

No. of Parts

Good Bad
Shift Day
900 130
Evening
Night 700 170

400 200

Test whether the quality of parts produced is independent of shifts.

28 Fifteen patient records from each of two hospitals were received and
assigned a score designed to measure level of care. The scores were as
follows:-

Hospital 99 85 73 98 83 88 99 80 74 91 80 94 94 98 80
A:

Hospital 78 74 69 79 57 78 79 68 59 91 89 55 60 55 79
B
Use a proper non-parametric test to see whether the two populations are
identical with respect to the level of care.
29. The laboratories A and B carry out independent estimates of fat content in ice-
creams made by a firm. A sample is taken from each batch, halved and the
separate halves sent to the two laboratories. The fat contents obtained by the
laboratories are recorded below. (Fat contents in milligrams are given below)

Batch No. 1 2 3 4 5 6 7 8 9 10
Lab A 7 8 7 3 8 6 9 4 7 8
Lab B 9 8 8 4 7 7 9 6 6 6
Is there a significant difference between the mean fat content obtained by the two
laboratories A and B?
STATISTICS: COMPLEMENTARY – I
SYLLABUS FOR BSc. PSYCHOLOGY (MAIN)
CUCCSSUG 2009 (2009 admission onwards)

Sem Course Code Course Title Instruction Credit Exam Ratio


este al hours
Ext:Int
r No hours/wee
k

1 PS1C01 4 3 3 3:1
STATISTICAL
METHODS

2 PS2C02 REGRESSION 4 3 3 3:1


ANALYSIS AND
PROBABILITY

3 PS3C03 PROBABILITY 5 3 3 3:1


DITRIBUTIONS AND
PARAMETRIC TESTS

4 PS4C04 NON PARAMETRIC 5 3 3 3:1


TESTS AND ANALYSIS
OF VARIANCE

Pattern of Question papers.

There shall be 4 parts A, B, C and D in all the question papers .Part A consists of 12 objective
type questions. Part B consists of 8 questions to be answered in a word, phrase or sentence.
Part C consists of 6 questions of short essay type of which the student can attempt 4. Part D
consists of 3 questions of long essay type of which the student can attempt 2. In part A the
weightage per question is ¼.for part B weightage is 1/question .For part D the weightage is
2/question and for part D the weightage is 4/question. As far as possible the number of
questions should be proportional to the modules.
Table showing the components and weightage for internal assessment

.
Components Weight

Assignment 1

Test paper 2

Seminar 1

Attendance 1

There shall be two test papers and the average grade point is to be considered for
internal assessment
Semester-I STATISTICAL METHODS
Modue 1. Pre-requisites.
A basic idea about data, its collection, organization and planning of survey and
diagramatic representation of data is expected from the part of the students.
Classification of data, frequency distribution, formation of a frequency distribution, Graphic
representation viz. Histogram, Frequency Curve, Polygon, Ogives and Pie Diagram. 20hr

Modue 2. Measures of Central Tendency.


Mean, Median, Mode, Geometric Mean, Harmonic Mean, Combined Mean, Advantages and
disadvantages of each average. 20hrs
Modue 3. Measures of Dispersion.
Range, Quartile Deviation, Mean Deviation, Standard Deviation, Combined Standard
Deviation, Percentiles, Deciles, Relative Measures of Dispersion, Coefficient of Variation.
Modue 4. Skewness and Kurtosis.
Pearson’s Coefficient of Skewness, Bowley’s Measure, Percentile Measure of
Kurtosis. 16hrs
Books for Study.
1. Gupta, S P (1988). Statistical Methods, Sultan Chand and Sons, New Delhi.
2. Gupta, S C and Kapoor, V K (2002). Fundamentals of Applied Statistics, Sultan
Chand and Sons, New Delhi.
3. Garret, H E and Woodworth, R S (1996). Statistics in Psychology and Education,
Vakila, Feffex and Simens Ltd., Bombay.
Model Question Paper

B.Sc. Psychology
I Semester -Staistical Methods
COURSE I : Psychological Statistics (Complementary I)

Time: 3 Hrs

PART A
(Contains 12 questions, 4 Questions carry a weightage of 1)

1. The heights of 150 students are collected. The type of classification that is best suited is

a) Qualitative

b) Quantitative

c) Geographical

d) Chronological

2. A frequency distribution in which the upper limits are not included in their respective
classes is called a

a) Continuous frequency distribution

b) Discrete frequency distribution

c) Raw data

d) Ungrouped frequency distribution

3. The class mark of a class is obtained by

a) upper limit-lower limit

b) upper limit + lower limit

upperlimit + lower limit


c)
2

upperlimit − lower limit


d)
2
4. When there are zeroes in the data we can not use

a) Median

b) Mode

c) Geometric mean

d) Arithmetic mean

5. The most suitable measure for an ordinal data is:

a) Median

b) Arithmetic mean

c) Combined mean

d) Mode

6. Mean of 20 values is 45. If one of these values is to be taken 64 instead of 46, the correct
value of mean is:

a) 49.5

b) 45.9

c) 40.9

d) 42.9

7. The formula to find coefficient of variation is:


__
σ X
a) __
× 100 b) × 100
σ
X
Median
c) × 100 d) σ × 100
σ

8. Mean deviation from median is:

a) Equal to mean deviation from mean

b) Greater than mean deviation from mean

c) Less than mean deviation from mean

d) No relation
9. The 50th percentile is equal to:

a) 10th decile

b) 1st decile

c) 2nd decile

d) 5th decile

10. For a symmetric distribution median and mode = 10. The value of mean is:

a) Zero

b) 20

c) 10

d) 5

11. For a positively skewed data:

a) Mean = mode

b) Mean < mode

c) Mean > mode

d) (Mean – Mode)/2

12. A curve which is flatter than a normal curve is called

a) Skewed curve

b) Platykurtic curve

c) Leptokurtic curve

d) Mesokurtic curve

PART B

(Contains 6 questions. Answer all questions weight 1)

13. When there are open end classes, we use median as a measure of central tendency

(1) Say true or false

(2) Explain your answer


14. In the case of categorical data we can not use histogram

(1) Say true or false

(2) Explain your answer

15. Suppose that the standard deviation of a set of observation is 3. If from each observation
‘3’ is subtracted, the new standard deviation is zero.

(1) Say true or false

(2) Explain your answer

16. If 25% of the items are less than 10 and 25% are more than 40 the coefficient of quartile
deviation is -------.

17. Karl Pearson’s coefficient of skewness of a distribution is 0.32 and its standard deviation
is 6.5. The mean is 29.6. The mode is -------.

18. Define harmonic mean of n observations.

19. Give the empirical relationship between mean mode median .

20Give an example where mode is the appropriate measure of central tendancy.

PART C

(4 Questions to be answered out of 6 weight 2)

21. Explain why A.M. is considered as the best measure of central tendency?

22. Calculate quartile deviation for the following data:-

26, 54, 33, 41, 94, 41, 54, 26, 93, 87, 81, 64, 68, 95.

23. The first two-sub-groups have 10 items with mean 15 and S. D. 3. If the whole group has
250 items with mean 15. 6 and S.D. 13.44 , find the standard deviation of the second
subgroup.

24. Distinguish between absolute and relative measures of dispersion.

25. Explain the terms skewness and kurtosis.

26. The means of two samples of sizes 50 and 100 respectively are 54.1 and 50.3. The
standard deviations are 8 and 7. Obtain the mean and standard deviations of the sample
consisting of 150 observations by combining the two samples.
PART D

(2 questions to be answered out of 3 weight 4)

27. Draw a histogram to the following data.

Classes:- 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39

Freq: 13 15 19 20 23 25 28 13

28. What is meant by classification? What are its bases?

29. Calculate mean deviation about median for the data given below.

Class: 5-10 10-15 15-20 20-25 25-30 30-35

Freq: 8 7 30 26 12 7
COURSE II -SEMESTER-II
REGRESSION ANALYSIS AND PROBABILITY

Modue 1. Correlation and Regression.


Meaning, Karl Pearson’s Coefficient of Correlation, Scatter Diagram, Calculation of
Correlation From a 2-way table, Interpretation of Correlation Coefficient, Rank Correlation,
Regression, Regression Equation, Identifying the Regression Lines. 20hrs

Modue 2. Multiple Correlation and Regression.


Partial and Multiple Correlation Coefficients, Multiple Regression Equation,
Interpretation of Multiple Regression Coefficients (three variable cases only). 16h

Modue 3. Basic Probability.


Sets, Union, Intersection, Complement of Sets, Sample Space, Events, Classical,
Frequency and Axiomatic Approaches to Probability, Addition and Multiplication Theorems,
Independence of Events (Up-to three events). 20hrs

Modue 4. Random Variables and Their Probability Distributions.


Discrete and Continuous Random Variables, Probability Mass Function, Distribution
Function of a Discrete Random Variable. 16hrs
Books for Study.
4. Gupta, S P (1988). Statistical Methods, Sultan Chand and Sons, New Delhi.
5. Gupta, S C and Kapoor, V K (2002). Fundamentals of Applied Statistics, Sultan
Chand and Sons, New Delhi.
6. Garret, H E and Woodworth, R S (1996). Statistics in Psychology and Education,
Vakila, Feffex and Simens Ltd., Bombay.
Model Question Paper

B. Sc. Psychology
II Semester REGRESSION ANALYSIS AND PROBABILITY

COURSE II: Psychological Statistics (Complementary)

Time: 3 Hrs Maximum Marks

Part A

(answer all questions 4 questions carry a weight age of 1)

1. The value of the square of Karl Pearson’s coefficient of correlation lies between:

a) 0 and 1 b) -1 and 1

c) 0 and infinity d) No limits

2. Karl Pearson’s coefficient of correlation for the following set of observation (3,12),(5,6) is: a)

Zero b) -1 c) +1 d) infinity

3. If the regression coefficient of Y on X is negative, the regression coefficient of X on Y will be:

a) Negative b) Positive

c) Zero d) No relation

4. The formula to find r13. 2 is:

r23 −r 13 r12 r23 − r13 r12


a) b)
2
1− r 23 1 − r232 1 − r132

r13 − r23 r12 r13 − r23 r12


c) d)
1− r 2
23 1− r 2
12 1 − r132
5. In a multiple regression equation of X3 on X1 and X2:

a) b13.2 = b31.2 b) b13.2 = b2.13

c) b13.2 = b3.12 d) b13.2 ≠ b13.2 , in general.

6. The partial correlation coefficient r23.1 measures:

a) The joint effect of X2, X3 on X1.

b) The joint effect of X2 and X3 are studied keeping the effect of X1 a constant.

c) It is same as the Karl Pearson’s correlation coefficient between X2 and X3.

d) The correlation between X2 and X3 are studied keeping the effect of X1 a constant.

7. Mutually exclusive events other than null event and sure event are:

a) not independent

b) independent

c) no relation

d) independent under some conditions

8. The probability that India wins a cricket match against England is 1/3. If India and England play 3
matches, what is the probability that India will lose all the three matches?

a) 1/27 b) 1/3 c) 1/9 d) 8/27

9. What is the probability that a non leap year selected at random will have 53 Sundays?

a) 2/7 b) 0 c) 3/7 d) 1/7

10. The probability mass function of a discrete r.v is: p(x) = cx/15, x = 1, 2, 3, 4, 5. The value of c is:

a) zero b) 15 c) 5 d) 1

11. The distribution function of a discrete r.v is:

a) constant

b) non-decreasing

c) non-increasing

d) never exists
12. For a discrete r.v P(X >0) = P(X <0) and P(X =0) = p. The variable takes the following values -2, -
1, 0, 1, 2. What is the probability that X >0?

a) Zero b) one c) 1-p/2 d) 1-p

Part B

Answer all questions wt 1

13. Classical definition of probability can be used in the case of a sample space with infinite
outcomes.

a) Say true or false

b) Explain your answe

14. In the case of disjoint events A and B, P(A Υ B)< P(A) +P(B).

a) Say true or false

b) Explain your answer

15. Getting a queen and getting a Jack while drawing cards from a deck of cards are independent
events.

a) Say true or false

b) Explain your answer

16. The correlation coefficient between X and Y is 0.85. Find the coefficient of determination.

17. Zero correlation implies independence

a) Say true or false

b) Explain your answer

18. If P ( A ∪ B ) = 0.8, P ( A) = P ( B ) = 0.5 , find P ( A ∩ B ) .

19. If the data is qualitative in nature the correlation is measured using-----------

20. Give the range of multiple correlation coefficients.


Part C

Answer any 4 questions weight 2

21. Give the axiomatic definition of probability. Mention one advantage of the definition.

22. If A and B are two independent events such that P ( A c ) = 0.7, P ( B c ) = k , P ( A ∪ B ) = 0.8 , then
find the value of k.

23. A and B stand in a ring with 12 other persons. Find the probability that A & B are together.

24. Explain briefly the concept of partial correlation with the help of an example.

25. Explain why in the case of two variables there are always two regression lines? When do they
coincide?

26. Define distribution function of a r.v X. what are its properties?

Part D

Answer any 2 questions wt 4

27. From a bag containing 5 red and 6 blue balls, 4 balls are taken at random. Find the probability
mass function of:

a) X, the number of blue balls.

b) Compute the probability that X is even.

c) Find the distribution function of X.

28. P(A) = 1/3, P(B) = 1/4, P(A∩B) = 1/11. Find the following probabilities.

a) Exactly one of the events A, B happens.

b) At least one of the events A, B happens.

c) None happens.Q3. Explain the concept of rank correlation. (weightage 4)

29. Give an example to show that correlation coefficient is a measure of linear correlation only
Semester-III
Course III -PROBABILITY DITRIBUTIONS AND PARAMETRIC TESTS

Modue 1. Distribution Theory.


Binomial, Poisson and Normal Distributions, Mean and Variance (without
derivations), Numerical Problems, Fitting, Importance of Normal Distribution, Central Limit
Theorem. 25hrs

Modue 2. Sampling Theory.


Methods of Sampling, Random and Non-random Sampling, Simple Random
Sampling, Stratified, Systematic and Cluster Sampling. 20hrs
Modue 3. Testing of Hypotheses.
Fundamentals of Testing, Type-I & Type-II Errors, Critical Region, Level of
Significance, Power, p-value, Tests of Significance.
Large Sample Tests – Test of a Single Mean, Equality of Two Means, Test of a Single
Proportion, Equality of Two Proportions. 25hrs
Modue 4. Small Sample Tests.
Test of a Single Mean, Paired and Unpaired t-Test, Chi-Square Test of Variance, F-
Test for the Equality of Variance, Tests of Correlation. 20hrs
Books for Study.
7. Gupta, S P (1988). Statistical Methods, Sultan Chand and Sons, New Delhi.
8. Gupta, S C and Kapoor, V K (2002). Fundamentals of Applied Statistics, Sultan
Chand and Sons, New Delhi.
9. Garret, H E and Woodworth, R S (1996). Statistics in Psychology and Education,
Vakila, Feffex and Simens Ltd., Bombay.
Model Question Paper

B. Sc. Psychology
III Semester PROBABILITY DITRIBUTION AND PARAMETRIC TESTS

COURSE III: Psychological Statistics (Complementary I)

Time: 3 Hrs Maximum Marks:

Section 1 (Contains 12 questions, 4 Questions carry a weightage of 1)

1. The mean of a binomial distribution B ( n, p ) is 4 and variance = 3. The value of p is


a) Zero
b) 1/4
c) 3/4
d) One
2. The parameter of a Poisson distribution is 6. Its variance is:
a) Less than 6
b) Greater than 6
c) Equal to 6
d) No relation

3. For a standard normal distribution, the value of skewness is:


a) One
b) Zero
c) Three
d) Four

4. If a sample of size n is taken without replacement from a population with N units, the
probability of getting a sample is:
a) 1/n b) 1/N c) 1/nCn d) 1/2N
5. The test statistic that is used to check equality of variance of two normal populations when
two small samples are taken from them is:
a) standard normal
b) F
c) t
d) χ 2
6. A statistic is

a) Constant

b) Same as parameter

c) Varies from sample to sample

d) Computed from population values

7. Type II error is the error of

a) Rejecting H0 when H0 is true

b) Accepting H0 when H0 is true

c) Accepting H0 when H1 is true

d) Accepting H1 when H0 is true

8. In a paired t-test:

a) The sample sizes should be equal

b) The size of the first sample should be less than the size of the second

c) The size of the second sample should be less than the size of the first

d) Both sample sizes should be ≥

9. The test statistic for testing the significances of correlation coefficient

r
a. 1− r2
n−3

r 1− r2
b.
n−2

r 1− r2
c. n −3

r
d. n−2
1− r2

10. In the test of equality of means of two normal population with small samples of sizes n1
and n2 taken from them and if the population have equal but unknown variance, the test
statistic follows:

a) t n1+n2-1 b)t n1+n2 c)t n1+n2/2 d) t n1+n2-2


11. Stratified sampling procedure of highly effective in:

a) heterogeneous population

b) homogeneous population

c) infinite population

d) always

12 The percentage of observations covered by the limit µ − 3σ and µ + 3σ in a normal


population is.

a) 65% b) 99.7% c) 95.4% d) 90%

Part B (answer all the questions ) wt 1

13. The mean of a binomial distribution is less than variance.

a) Say true or false.

b) Explain your answer

14 If X ~ N (3,4) , P(X<3) = P(X≥3).

a) Say true or false

b) Explain your answer

15 In the case of infinite population, sampling is better than census.

a) Say true or false

b) Explain your answer

16 Sampling error occurs in census.

a) Say true or false

b) Explain your answer

17 Define power of a test.

18 A sample of size 12 is taken from a normal distribution. The sample variance is 1.8. What
is the value of the test statistic for the test with H o = σ 2 = 3 .
19. Define type II error
20 Define the power of the test.
Part C (4 Questions to be answered out of 6) wt 2
21. What do you mean by standard error?
22. Explain paired t-test.
23. What are the advantages of systematic sampling compared to SRS.
24. A correlation coefficient 0.65 was observed in a sample of 50 bi-variate observations. Is
the value significant?
25. In a lot containing 1235 articles, 35 were found to be defective. Does the hypothesis: The
proportion of defective articles is less than 0.02 hold?
26. The value of the sample mean from a population which was assumed to have mean = 5 is
4. The sample size is 100 and the variance of the sample is 1. Is there significant
difference between sample mean and population mean?
Part D (2 Questions to be answered out of 3) wt4
27. Using Poisson approximation to the Binomial distribution, solve the following. If the
probability that an individual suffers a bad reaction from a particular infection is 0.001,
determine the probability that out of 2,000 individuals,

a) Exactly 3

b) More than 2 individuals

c) Will suffer from the bad reaction.

28. The laboratories A and B carry out independent estimates of fat content in ice-creams
made by a firm. A sample is taken from each batch, halved and the separate halves sent to
the two laboratories. The fat contents obtained by the laboratories are recorded below.
(Fat contents in milligrams are given below)

Batch No. 1 2 3 4 5 6 7 8 9 10

Lab A 7 8 7 3 8 6 9 4 7 8

Lab B 9 8 8 4 7 7 9 6 6 6

Is there a significant difference between the mean fat content obtained by the two laboratories
A and B?

29. Explain cluster sampling with an example.


Semester-IV NON PARAMETRIC TESTS AND ANALYSIS OF VARIANCE
Course IV
Modue 1. Chi-square Tests.
Chi-square Test of Goodness of Fit, Test of Independence of Attributes, Test of
Homogeneity of Proportions. 25hrs
Modue 2. Non-Parametric Tests.
Sign Test, Wilcoxen’s Signed Rank Test, Wilcoxen’s Rank Sum Test, Run Test,
Krushkal-Wallis Test. 20hrs
Modue 3. Analysis of Variance.
One-way and Two-way Classification with Single Observation Per Cell, Critical
Difference. 25hrs
Modue 4. Preparation of Questionnaire, Scores and Scales of Measurement, Reliability and
Validity of Test Scores. 20hrs
Books for Study.
10. Gupta, S P (1988). Statistical Methods, Sultan Chand and Sons, New Delhi.
11. Gupta, S C and Kapoor, V K (2002). Fundamentals of Applied Statistics, Sultan
Chand and Sons, New Delhi.
12. Garret, H E and Woodworth, R S (1996). Statistics in Psychology and Education,
Vakila, Feffex and Simens Ltd., Bombay.
Model Question Paper

B. Sc. Psychology

PROBABILITY DITRIBUTION AND PARAMETRIC TESTS

COURSE IV: Psychological Statistics (Complementary I)

Time: 3 Hrs

Part A (Contains 12 questions, 4 questions carry a weightage of 1)

Answer all questions

Q.1. In a chi-square contingency table with 3 rows and 5 columns, the d.f of chi-square
statistic is

a) 15

b) 24

c) 8

2. The chi-square test statistic for a goodness of fit test is given by:

Oi − Ei
a)
Ei

Oi − Ei
b) ∑ Ei2

(Oi − Ei ) 2
c) ∑ Ei2

(Oi − Ei ) 2
d) ∑ Ei
3. In a Poisson goodness of fit test having ‘k’ sets of observed frequencies with estimated
value of λ , the chi-square statistic has d.f.

a) k-2

b) k

c) k-1

d) k-2

4 The basic assumption for a non-parametric test is:

a) The variable is continuous

b) The variable is discrete

c) The variable is normal

d) The variable is standard normal

5. The non-parametric equivalent test for a paired t-test is:

a) Signed Rank test

b) Rank sum test

c) Run test

d) Sign test

6. The test used to check the randomness of the collected set of symbols is:

a) Sign test

b) Rank sum test

c) Signed rank test

d) Run test
7 When there are 3 groups, each following normal distribution, and the null hypothesis is
concerned with the equality of means the test used is:

a) Chi square test

b) t-test for equality of means

c) Analysis of variance

d) none of the above

8. The test statistic in a two way ANOVA table follows:

a) Chi-square distribution

b) t-distribution

c) Normal distribution

d) F-distribution

9. Kruskal–Walli’s test is the non-paramtric equivalent of:

a) t-test

b) Normal test

c) Chi-square test

d) ANOVA10. In a one way ANOVA if the d.f of the total S.S is 13 and the d.f of the
between sample sum of squares is 6, the d.f of the error sum of squares is:

a) 7 b) 6 c) 19

11. The mean value of a set of scores is 50 with S.D.=5. If the raw score of an individual is
55, his z-score is:

a) Zero b) -1 c) 50 d) +1

12. The reliability coefficient of a test of 50 items is 0.60. How much should it be lengthened
to raise the self correlation to 0.9?

a) 5 b) 6 c) 7 d)

\
Part B Answer all questions weight 1

13. In chi-square test of independences of 2 attributes with 2 observations each, the d.f of the
test statistic is 1.

a) Say true or false.

b) Explain your answer.

14 In the case of sign test, the test statistic follows a binomial distribution.

a) Say true or false.

b) Explain your answer.

15. In an one-way ANOVA, the total sum of squares of observations is 6212 and the error
sum of squares is 3272. The sum of squares between samples is 2900.

a) Say true or false.

b) Explain your answer.

16. In χ 2 test of goodness of fit if the calculated value of χ 2 is zero, then it is a bad fit.

a) Say true or false.

b) Explain your answer.

17. Define standard scores.

18. In test re-test method Karl Pearson’s coefficient of correlation between two test scores is
0.9. What is the coefficient of reliability?

19. Is χ 2 test a parametric or non parametric test. Why?

20. Define randomization.


Part C (Answer 4 questions out of 6) weight 2

21. What is the null hypothesis for a chi-square test of homogeneity of proportions and give
the layout of observations.

22. Mention the advantages of non-parametric tests over parametric test.

23. What are the assumptions behind analysis of variance

24. The reliability coefficient of a test of 50 items is 0.6. How much should the test be
lengthened to raise the self correlation to 0.9? What effect will the doubling of the test
length has upon the reliability coefficient?

25. A test of 50 items has reliability 0.7 and validity 0.5. If another 150 comparable items are
added to it what will be the validity?

26. In a one-way analysis of variance with three groups (samples) each consisting of 5
observations, the mean error sum of squares is 30.5. Calculate the critical difference. The
group means are 20, 25 and 26 respectively. Find which pairs show significant difference
if any.

Section D

(Answer 2 questions out of 3) weight 4

27. A factory operates in three shifts. The factory manager feels that quality of part is related
to shifts. For this purpose he has collected the following data from the past records of
production.

No. of Parts

Good Bad

Shift Day 900 130

Evening 700 170

Night 400 200

Test whether the quality of parts produced is independent of shift


28. Fifteen patient records from each of two hospitals were received and assigned a score
designed to measure level of care. The scores were as follows:-

Hospital 99 85 73 98 83 88 99 80 74 91 80 94 94 98 80
A:

Hospital 78 74 69 79 57 78 79 68 59 91 89 55 60 55 79
B

Use a proper non-parametric test to see whether the two populations are identical with respect
to the level of care.

29. Describe Kuder-Richardson’s method of assessing the reliability of a test.


Multiple Choice Questions
Analysis of Variance - Single factor completely
randomized design

The following questions refer to the following situation

Health and Welfare wishes to investigate if the tar contents (milligrams)


varies among four brand of cigarettes. Three packs of each brand were selected,
and one cigarette from each pack was placed in a smoking machine to determine
the tar content. An Analysis of Variance was performed and here are the results
(some parts are hidden):

SOURCE DF SUM OF SQUARES MEAN SQUARE F VALUE PR > F


MODEL 3 ***.******** 116.00000000 ***** 0.0028
ERROR ** 80.00000000 ************
CORR. TOTAL ** 428.00000000

SOURCE DF TYPE III SS F VALUE PR > F


BRAND 3 348.00000000 ***** 0.0028

1. The value of the F-statistic for testing the equality of the means is:

(a) 4.35
(b) .0028
(c) 13.05
(d) 11.60
(e) 116.00

Solution: d
Past performance 1990 Apr - 75%
Past performance 1991 Feb - 63% (c-27%)
Past performance 1993 Feb - 84% (c-10%)

1
2. The hypothesis would be rejected at α=0.05 if the test statistic is greater
than:

(a) 4.07
(b) 3.86
(c) 8.85
(d) 8.81
(e) 3.59

Solution: a
Past performance 1990 Apr - 79%
Past performance 1991 Feb - 61% (b-31%)
Past performance 1993 Feb - 86% (b-12%)

3. Which of the following is correct:

(a) Because the p-value is small, there is evidence that all the brands
differ from each other in the mean amount of tar present.
(b) Because the p-value is small, there is no evidence that any of the
brands differ in the mean tar content.
(c) Because the p-value is small, there is evidence that at least one brand
has a different mean tar content from the other brands.
(d) Because the p-value is small, there is no evidence that at least one
brand has a different mean tar content from the other brands.
(e) Because the p-value is small, there is evidence that all of brands have
the same mean tar content.

Solution: c
Past performance 1993 Feb - 95%

Since the p-value is 0.0028 the hypothesis of equal means is rejected. Con-
sequently a multiple comparison procedure was performed. Here is a por-
tion of the output:

T TESTS (LSD) FOR VARIABLE: TAR


NOTE: THIS TEST CONTROLS THE TYPE I COMPARISONWISE ERROR RATE,
NOT THE EXPERIMENTWISE ERROR RATE
ALPHA=0.05 DF=* MSE=***
CRITICAL VALUE OF T=2.30600
LEAST SIGNIFICANT DIFFERENCE=5.9541

MEANS WITH THE SAME LETTER ARE NOT SIGNIFICANTLY DIFFERENT.

2006
c Carl James Schwarz 2
T GROUPING MEAN N BRAND

A 122.000 3 Wheezer

B 112.000 3 Choker
B
B 110.000 3 Hacker
B
B 108.000 3 Killer

4. Which statement is not correct ?

(a) The comparison-wise error rate is the probability of a Type I error


in any comparison.
(b) The experiment-wise error rate is the probability of at least one Type
I error in all possible comparisons
(c) There is no evidence of a difference between the average tar content
of the Hacker and Killer brands.
(d) The Hacker brand appears to have lower mean tar content than the
Choker brand.
(e) Two sample means must differ by the Least Significant Difference
(5.9541) before the corresponding population means are declared dif-
ferent.

Solution: d
Past performance 1990 Apr - 62% (B-15%)

5. The analyst now wishes to perform a new experiment to distinguish among


three different brands. She believes that the value of 4 is a good estimate
of the population standard deviation. What is the estimated sample size
to be 80% sure of detecting a 5 mg. difference in the mean tar content
when testing at α=0.05?

(a) 12 of each brand for a total of 36 cigarettes


(b) 12 cigarettes in total; four of each brand
(c) 14 of each brand for a total of 42 cigarettes
(d) 14 cigarettes in total; five cigarettes in two brands, four in the third
(e) 15 cigarettes in total; five of each of three brands.

Solution: c
Past performance 1990 Apr - 76% (E-13%)
Past performance 1991 Feb - 86%

2006
c Carl James Schwarz 3
6. Suppose the analyst wishes to repeat the experiment blocking by the type
of inhalation of smokers. Which of the following is NOT CORRECT about
a randomized complete block design?

(a) Every block is randomized separately from every other block.


(b) Every treatment must appear at least once in every block.
(c) Blocking is used to remove the effects of another factor (not of inter-
est) from the comparison of the levels of the primary factor.
(d) The ANOVA table will have another line in it for the contribution to
the variability from the blocks.
(e) Block should contain experimental units that are as different as pos-
sible from each other.

Solution: e
Past performance 1990 Apr - 79%

The following questions refer to the following situation.


Some varieties of nematodes (round worms that live in soil and are fre-
quently so small that they are invisible to the naked eye) feed on the roots
of lawn grasses and crops such as strawberries and tomatoes. The pest,
which is particularly troublesome in warm climates, can be treated by the
application of nematocides. However, because of the size of the worms, it
is very difficult to count them directly. Hence, the yield of a crop is used
as a surrogate for the the number of worms. Four brands of nematocides
are to be compared. Twelve plots of land of comparable fertility that were
suffering from nematodes were planted with a crop. Each nematocide was
applied to three plots; the assignment of the nematocide to the plot was
made at random. At harvest time, the yields of each plot were recorded
and part of the ANOVA table appears below:

Source df SS MS F-value
Nematocides * 3.456 * *
Error 8 1.200 *
Total 11 4.656

7. The value of the test statistics to test the hypothesis of no differences in


the mean yields among the four brands is:

(a) 23.04
(b) 2.89
(c) 3.46
(d) 1.20
(e) 7.68

2006
c Carl James Schwarz 4
Solution: e
Past performance 1990 Feb - 90%

8. The rejection criterion is (at α= 0.05):

(a) Reject H if F ∗ > 7.59


(b) Reject H if F ∗ > 3.59
(c) Reject H if F ∗ > 4.07
(d) Reject H if F ∗ > 2.60
(e) Reject H if F ∗ > 8.85

Solution: c
Past performance 1990 Feb - 92%

9. Suppose that based upon this experiment, the scientist wishes to be 80%
sure of detecting a difference of about 0.45 kg/plot in the average yield
among the four nematocides when testing at α=0.05. She decides to use
0.15 as an estimate of the population variance. Then:

(a) The required sample size is about 20 plots per nematocide for a total
of 80 plots.
(b) The required total sample size is 20 plots, i.e., 5 plots per nematocide.
(c) The required sample size is about 4 plots per nematocide for a total
of 16 plots.
(d) The required total sample size is 4 plots, i.e., 1 plot per nematocide.
(e) The required sample size cannot be determined because the individ-
ual population means are not known.

Solution: a
Past performance 1990 Feb - 40% (A-40%, C-46%)

10. What is the best reason for randomly assigning treatment levels to the
experimental units?

(a) Randomization make the experiment easier to conduct because we


can apply the nematocides in any pattern rather than in a systematic
fashion.
(b) Randomization will tend to average out all other uncontrolled fac-
tors such as soil fertility so that they are not confounded with the
treatment effects.
(c) Randomization makes the analysis easier because the data can be
collected and entered into the computer in any order.

2006
c Carl James Schwarz 5
(d) Randomization is required by statistical consultants before they will
help you analyze the experiment.
(e) Randomization implies that it is not necessary to be careful during
the experiment, during data collection, and during data analysis.

Solution: b
Past performance 1990 Feb - 97%

11. A possible Type I error in this experiment would be to:

(a) Conclude that the mean yields of the four nematocides are equal
when in fact at least one is not equal.
(b) Conclude that the mean yields of the four nematocides are equal
when in fact they are equal.
(c) Conclude that the mean yields of the four nematocides are unequal
when in fact at least one is not equal.
(d) Conclude that the mean yields of the four nematocides are unequal
when in fact they are equal.
(e) Fail this exam because you used the osmosis method of studying.

Solution: d
Past performance 1990 Feb - 82%

The next 3 questions refer to the following situation.


Cuckoo birds lay their eggs in the nests of other species (the host species).
Can cuckoo birds modify their eggs sizes according to the nest of the host
species. A sample of nests containing a cuckoo egg were found and the size
of the cuckoo egg in the host species nest was measured. The following
output was obtained:

2006
c Carl James Schwarz 6
12. Which is the null and alternate hypothesis?
(a) H: all sample means are equal;
A: at least one sample mean differs from the others.
(b) H: all host species have the same population mean cuckoo egg size;

A: at least one population mean differs from the others.


(c) H: all eggs are the same size:
A: at least one egg differs in size from the others.

2006
c Carl James Schwarz 7
(d) H: all host species are the same;
A: at least one host species is different from the others.
(e) H: all host species have the same size eggs;
A: at least one host species has different sized eggs from the others.
Solution: b
Past performance 2006 Dec - 38% (30%-a; 19%-c)

13. Which is CORRECT about this experiment.


(a) This is a paired experiment because all host species were measured
more than once.
(b) This experiment is unbalanced with unequal number of eggs mea-
sured from each host species.
(c) There is no need to carefully select a random sample of host species
nests because the sample size is large.
(d) The ANOVA methods tests if the variances are equal across all treat-
ment groups.
(e) In the Analysis of Variance (ANOVA) method, the F-test can be
thought of as test of equal variances.
Solution: b - rats a typing error made the original have no answer

14. Which of the following is CORRECT?


(a) Because the p-value is small, there is very strong evidence that the
means are equal.
(b) The F -ratio of 10.4 tests if all the individual values are the same.
(c) Because some confidence diamonds do not overlap, there is evidence
that not all means are equal.
(d) The Tukey-Kramer output shows that all the means are different
from each other.
(e) The comparison circles show that the eggs from Wren nests are all a
different size than eggs from other host species.
Solution: c
Past performance 2006 Dec - 71% (20%-e)

2006
c Carl James Schwarz 8
Multiple Choice Questions
Analysis of Variance - general

1. Which of the following is not a necessary assumption underlying the use


of the Analysis of Variance technique?

(a) The samples are independent and randomly selected.


(b) The populations are normally distributed.
(c) The variances of the populations are the same.
(d) The means of the populations are equal.
(e) all of the above

Solution: d

1
Multiple Choice Questions
Analysis of Variance - Single factor randomized
complete block designs

The next three questions refer to the following situation:

An experiment was conducted to determine the effect of three methods of soil


preparation on the first year growth of slash pine seedlings. Four locations
(provincial forest areas) were selected, and each location was divided into three
plots. Three methods of soil preparation were used: no preparation, light fer-
tilization, and burning. One treatment was randomly assigned to the plots
within each location and all three treatments were applied at each locations.
On each plot, the same number of seedlings was planted, and the average first
year growth for the seedlings on the plot was recorded. Two outputs appear
below - only one of which is a “correct” way of analyzing this data.

Source df SS MS F Prob
prep * 38 **.* **.* 0.1517
Error * 73 **.*
Total * 111

prep Mean
burn 12.0
fertilize 16.0
none 12.5

Source df SS MS F Prob
prep * 38.0 **.* **.* 0.0121
locn * 61.7 **.* **.* 0.0077
Error * 11.3 **.*
Total * 111

prep Mean

1
burn 12.0
fertilize 16.0
none 12.5

1. The value of the test statistic for testing the appropriate hypothesis is:

(a) 2.3
(b) 10.1
(c) 10.9
(d) 2.6
(e) 11.8

Solution: b
Past performance 1993 Apr - 60% (a-15%; e-10%)

2. The LSD for comparing two means is:

(a) 2.1
(b) 4.3
(c) 2.7
(d) 4.6
(e) 2.4

Solution: e
Past performance 1993 Apr - 33% (c-33%)

3. Based upon the results of this study, we wish to be 80% confident of


detecting a difference between the “burn” and the “none” treatments in
the mean growth when testing at α=0.05. The required sample size for
each treatment is:

(a) > 17
(b) 4
(c) > 21
(d) 12
(e) 14

2006
c Carl James Schwarz 2
Solution: a
Past performance 1993 Apr - 25% (b-15%; c-42%; d-10%)

The next four questions refer to the following situation:


Every winter, tons of salt are dumped on Winnipeg streets. In the spring,
the salt washes into the soil where it can be very harmful to trees and grass.
To investigate this problem, an experimenter wishes to investigate the
effects of different salinity levels upon vegetation growth. Since different
areas of the city differ by soil type and other factors, she blocks by location
in the city. In each location, she administers six different levels of salinity
(15, 20, 30, 35, 45, 50 ppm). The output from SAS follows:

Sum of Mean
Source DF Squares Square F Value Pr > F
Model 8 891.05166667 111.38145833 13.73 0.0001
Error 15 121.67791667 8.11186111
Corr Total 23 1012.72958333

Source DF Type III SS Mean Square F Value Pr >


TRT 5 664.43708333 132.88741667 16.38 0.0001
BLOCK 3 226.61458333 75.53819444 9.31 0.0010

T tests (LSD) for variable: BIOMASS


NOTE: This test controls the type I comparisonwise error rate not
the experimentwise error rate.

Alpha= 0.05 \df= 15 MSE= 8.1118 Critical Value of T= 2.13


Least Significant Difference= 4.2926

T Grouping Mean N TRT


A 18.100 4 20
A
A 14.150 4 15

B 7.475 4 30
B
C B 6.000 4 35
C B
C B 5.775 4 45
C
C 3.075 4 50

4. The test statistic and rejection region (α=.05) are:

(a) F*= 16.38; Reject H if F ∗ > 2.90.

2006
c Carl James Schwarz 3
(b) F*= 9.31; Reject H if F ∗ > 8.71.
(c) F*= 13.73; Reject H if F ∗ > 2.64.
(d) F*= 9.31; Reject H if F ∗ > 3.29.
(e) F*= 16.38; Reject H if F ∗ > 4.62.

Solution: a
Past performance 1991 Apr - 55% (C-25%)

5. What statement is not correct ?

(a) The comparison-wise error rate is the probability of a Type I error


in any comparison.
(b) The experiment-wise error rate is the probability of at least one Type
I error in all possible comparisons
(c) According to the output, there is evidence of a difference between
15ppm and 35 ppm.
(d) Since the mean biomass at 30 ppm is not found to be different from
that at 35 ppm, and that at 35 ppm is not found to be different
from that at 50 ppm, there is no evidence of a difference in the mean
biomass between 30 ppm and 50 ppm.
(e) Two sample means must differ by the Least Significant Difference
(4.29) before the corresponding population means are declared dif-
ferent.

Solution: d
Past performance 1991 Apr - 95%

6. The results of this experiment were interesting but not conclusive. She
now wishes to detect differences when testing at α =.05. Which of the
following is not correct?

(a) We would need less than 5 blocks to be 80% sure of detecting a


difference of 9 in the biomass means.
(b) We would need more than 27 blocks to be 80% sure of detecting a
difference of 1.5 in the biomass means.
(c) We would need 27 blocks to be 80% sure of detecting a difference of
8 in the biomass means.
(d) We would need 13 blocks to be 80% sure of detecting a difference of
4.5 in the biomass means.
(e) We would need 8 blocks to be 80% sure of detecting a difference of
5.8 in the biomass means.

2006
c Carl James Schwarz 4
Solution: d
Past performance 1991 Apr - 71% (A-10%)

7. Which of the following is NOT CORRECT about a randomized com-


plete block experiment?
(a) Every block is randomized separately from every other block.
(b) Every treatment must appear at least once in every block.
(c) Blocking is used to remove the effects of another factor (not of inter-
est) from the comparison of levels of the primary factor.
(d) The ANOVA table will have another line in it for the contribution to
the variability from blocks.
(e) Blocks should contain experimental units that are as different as pos-
sible from each other.
Solution: e
Past performance 1991 Apr - 93%
Past performance 1998 Dec - 85%

2006
c Carl James Schwarz 5
Multiple Choice Questions
Chi-square tests for independence

The next set of questions refer to the following situation:

A survey was conducted to investigate the severity of rodent problems in egg


and poultry operations. A random sample of operators was selected, and the
operators were classified according to the type of operation and the extent of
the rodent population. A total of 78 egg operators and 53 turkey operators were
classified and the summary information is:

1
1. Which of the following is not correct?

(a) Operators who had both operations could not be used because this
type of analysis requires each unit to be counted in one and only one
cell.
(b) The null hypothesis is that the severity of the rodent problem is
independent of the type of operator.
(c) The alternate hypothesis is that the proportion of turkey operators
with mild, moderate, and severe rodent problems is different from the
proportion of egg operators with mild, moderate, and severe rodent
problems.
(d) A Type I error would be to conclude that the severity of rodent
problems is dependent upon the type of operator while, in fact, the
proportion of turkey operators with mild, moderate, and severe ro-
dent problems is the same as the proportion of egg operators with
mild, moderate, and severe rodent problems.
(e) A Type II error would be to conclude that the proportion of egg
operators with mild, moderate, or severe rodent problems is the same
as the proportion of turkey operators with mild, moderate, or severe
rodent problems when in fact they are independent.

Solution: e
Past performance 1993 Apr - 52% (a-10%; b-10%; c-14%; d-14%)
Past performance 1996 Dec - 61% (a-10%, d-12%)
Past performance 1998 Dec - 72%

2. The value of the test statistic is:


(a) about 5.99

2006
c Carl James Schwarz 2
(b) about 9.71
(c) about 6.81
(d) about 5.64
(e) about 8.60

Solution: d
Past performance 1993 Apr - 65% (a-14%; c-10%)
Past performance 1998 Dec - 99%

3. The expected count in the (egg, mild infestation) cell is:


(a) about 26.00
(b) about 33.33
(c) about 53.00
(d) about 31.55
(e) about 78.00
Solution: d
Past performance 1996 Dec - 71% (a-16%)
Past performance 1998 Dec - 87%

4. The approximate p-value is found to be:


(a) about .060
(b) about .014
(c) about .032
(d) about .008
(e) about .05
Solution: a
Past performance 1993 Apr - 48% (b-14%; c-16%; e-13%)
Past performance 1996 Dec - 89%
Past performance 1998 Dec - 96%

5. One reviewer of the study suggested that there may be a problem with the
study because results from small operators were pooled with the results
from large operators. Which of the following is NOT CORRECT?

(a) Simpson’s paradox occurs when conclusions from a pooled table differ
from the individual tables.
(b) Tables can be pooled when the underlying rates are equal among
tables.

2006
c Carl James Schwarz 3
(c) Simpson’s paradox occurs when tables with unequal row totals are
pooled.
(d) Inspection of the row or column percents will give a good clue if
Simpson’s paradox is likely to occur.
(e) Simpson’s paradox occurs when the pooled table gives no evidence
of an effect but the individual tables show evidence of an effect.
Solution: c
Past performance 1990 Dec - 68%
Past performance 1993 Apr - 32% (b-16%; d-22%; e-25%)
Past performance 1996 Dec - 65% (b-10%, d-10%)
Past performance 1998 Dec - 73% ( d-10%)

The next set of questions refer to the following situation


In the paper “Color Association of Male and Female Fourth-Grade School
Children” (J. Psych., 1988, 383-8), children were asked to indicate what
emotion they associated with the color red. The response and the sex of
the child are noted and summarized below. The first number in each cell
is the count, the second number is the row percent.

Frequency|
Row Pct |anger |happy |love |pain | Total
---------+--------+--------+--------+--------+
f | 27 | 19 | 39 | 17 | 102
| 26.47 | 18.63 | 38.24 | 16.67 |
---------+--------+--------+--------+--------+
m | 34 | 12 | 38 | 28 | 112
| 30.36 | 10.71 | 33.93 | 25.00 |
---------+--------+--------+--------+--------+
Total 61 31 77 45 214

Statistic DF Value Prob


------------------------------------------------------
Pearson Chi-Square * 4.629 *****
Likelihood Ratio Chi-Square * 4.661 *****
Mantel-Haenszel Chi-Square 1 0.307 *****

6. Under a suitable null hypothesis, the expected frequency for the cell cor-
responding to Anger and Males is:

(a) 15.9
(b) 55.7
(c) 30.4
(d) 31.9

2006
c Carl James Schwarz 4
(e) 29.1

Solution: d
Past performance 1991 Apr - 63% (C-17%, E-15%)
Past performance 1991 Dec - 84% (e-11%)
Past performance 1997 Aug - 87%

7. The null hypothesis will be rejected at α=0.05 if the test statistic exceeds:

(a) 3.84
(b) 5.99
(c) 7.81
(d) 9.49
(e) 14.07

Solution: c
Past performance 1991 Apr - 86%

8. The approximate p-value is:

(a) Between .100 and .900


(b) Between .050 and .100
(c) Between .025 and .050
(d) Between .010 and .025
(e) Between .005 and .010

Solution: a
Past performance 1991 Dec - 77% (e-11%)

9. Which of the following is NOT CORRECT?

(a) The children were cross-classified by sex and emotion associated with
red. Each child was counted in one and only one cell.
(b) The null hypothesis is that the type of emotion associated with red
is independent of the sex of the child.
(c) The null hypothesis is that the proportion of emotions associated
with red is the same for both sexes.
(d) All expected cell counts should be greater than five in order that
the distribution of the test statistic is an approximate chi-square
distribution.
(e) If we reject the null hypothesis than we have proven that the two
sexes associate red with emotions in different ways.

2006
c Carl James Schwarz 5
Solution: e
Past performance 1991 Apr - 76% (C-12%)
Past performance 1991 Dec - 77% (c-9%, d-12%)
Past performance 1993 Feb - 67% (d-16%)

10. Which of the following is not correct?

(a) A lower percentage of female students associate the emotion “anger”


with the color red than do male students.
(b) More students associate the color red with the emotion “love” than
with the emotion “anger”.
(c) Each student was classified by gender and by emotion association.
Each student was counted in one and only one cell.
(d) We will be unable to compute a correlation for this data because the
variables are not both interval or ratio in scale.
(e) We compute row or column percentages by dividing the cell count by
the table total (214).

Solution: e
Past performance 1993 Feb - 67% (d-16%)
Past performance 1996 Oct - 92%

11. A Type I error would be committed if:

(a) We conclude that the sex of the child and the emotion associated
with red are independent when in fact they are not independent.
(b) We conclude that the sex of the child and the emotion associated
with red are not independent when in fact they are not independent.
(c) We conclude that the proportion of emotions associated with red
differs between males and female when in fact they are the same.
(d) We conclude that the proportion of emotions associated with red is
the same for male and female when in fact they are the same.
(e) We fail to find any association between the color red and emotions
for either sex.

Solution: c
Past performance 1991 Apr - 76% (E-20%)
Past performance 1991 Dec - 84%
Past performance 1997 Aug - 76%

12. The null hypothesis is:


(a) emotional association with red is independent of gender

2006
c Carl James Schwarz 6
(b) gender is dependent upon the emotional association with red
(c) the probability of selecting an emotion with red is related to gender
(d) the number of children in each cell does not depend upon gender nor
upon emotion
(e) the color red is independent of the emotion associated with it and
with gender.
Solution: c
Past performance 1997 Aug - 74%

13. The test statistic and approximate p-value is:


(a) 4.661 .1983
(b) 4.661 .3966
(c) 4.629 .2011
(d) 4.629 .4022
(e) 4.629 .1006
Solution: b
Past performance 1997 Aug - 76%

14. Each person in a random sample of 50 was asked to state his/her sex and
preferred colour. The resulting frequencies are shown below.

Colour
Red Blue Green
Male 5 14 6
Sex Female 15 6 4

A chi-square test is used to test the null hypothesis that sex and preferred
colour are independent. Which of the following statements is a correct
decision about the null hypothesis?

(a) Reject at the 0.005 level.


(b) Reject at the 0.01 level but not at the 0.005 level.
(c) Reject at the 0.025 level but not at the 0.01 level.
(d) Reject at the 0.05 level but not at the 0.025 level.
(e) Accept at the 0.05 level.

Solution: not available

2006
c Carl James Schwarz 7
15. The following data were obtained from a company which manufactures
special plastic containers which are to hold a specified volume of hazardous
material. On each of the three 8 hour shifts workers are able to make 500
of the containers. Some containers do not meet specifications as required
by the company’s customer because they are too small, others because
they are too large.

Conformance to Specification
Shift Too Small Within Spec. Too Large
8am 36 452 12
4pm 24 443 33
midnight 12 438 50

If conformance to specifications is independent of shift, the expected num-


ber of containers that meet specification on the 4pm shift is

(a) 166.7
(b) 443
(c) 33
(d) 444.3
(e) 500

Solution: not available

16. Are all employees equally prone to having accidents? To investigate this
hypothesis, Parry (1985) looked at a light manufacturing plant and clas-
sified the accidents by type and by age of the employee.

Accident Type
Age Sprain Burn Cut
Under 25 | 9 17 5
25 or over | 61 13 12

A chi-square test gave a test-statistic of 20.78. If we test at α =.05:

(a) There appears to be no association between accident type and age.


(b) Age seems to be independent of accident type.
(c) Accident type does not seem to be independent of age.
(d) There appears to be a 20.78% correlation between accident type and
age.
(e) The proportion of sprain, cuts and burns seems to be similar for both
age classes.

2006
c Carl James Schwarz 8
Solution: c
Past performance 1989 Apr - 64%

17. A random sample of 100 members of a union are asked to respond to


two questions: Question 1. Are you happy with your financial situation?
Question 2. Do you approve of the Federal government’s economic poli-
cies? The responses are:

Question 1.
Yes No | Total
Question Yes 22 48 | 70
2 No 12 18 | 30
Total 34 66 | 100

To test the null hypothesis that response to Question 1 is independent of


response to Question 2 at 5% level, the expected frequency for the cell
(Yes,Yes) and the critical value of the associated test statistic are:

(a) 23.8 and 1.96 respectively


(b) 10.2 and 3.84 respectively
(c) 23.8 and 3.84 respectively
(d) 23.8 and 7.81 respectively
(e) 10.2 and 7.81 respectively

Solution: c

18. A survey was conducted to investigate whether alcohol consumption and


smoking are related. The following information was compiled for 600
individuals:

Smoker Non-smoker
Drinker 193 165
Non-drinker 89 153

Which of the following statements is true?

(a) The appropriate alternative hypothesis is A: Smoking and Alcohol


Consumption are independent.
(b) The appropriate null hypothesis is H: Smoking and Alcohol Con-
sumption are not independent.
(c) The calculated value of the test statistic is 3.84.
(d) The calculated value of the test statistic is 7.86.

2006
c Carl James Schwarz 9
(e) At level .01 we conclude that smoking and alcohol consumption are
related.

Solution: e

19. Doctors’ practices have been categorized as to being Urban, Rural, or


Intermediate. The number of doctors who prescribed tetracycline to at
least one patient under the age of 8 were recorded for each of these practice
areas. The results are:

Urban Intermediate Rural


Tetracycline 95 74 31
No tetracycline 126 84 30

If the county type of practice and the use of tetracycline are independent,
then the expected number of rural doctors who prescribe tetracycline is:

(a) 31.0
(b) 27.7
(c) 1.37
(d) 51%
(e) 62

Solution: b

20. For the problem outlined above, the critical value(table value) of the test
statistic when the level of significance is α =0.05, is:

(a) 0.1026
(b) 7.3778
(c) 5.9915
(d) 12.5916
(e) 7.8147

Solution: c

The next set of questions refer to the following situation:


A study was conducted to determine if the fatality rate depends on the
size of the automobile. The analysis of accidents is as follows (with some
values hidden):

2006
c Carl James Schwarz 10
DEATH SIZE
FREQUENCY| m | s | L | TOTAL
---------+--------+--------+--------+
no | 63 | 128 | 46 | 237
---------+--------+--------+--------+
yes | 26 | 95 | 16 | 137
---------+--------+--------+--------+
TOTAL 89 223 62 374

STATISTICS FOR TABLE OF DEATH BY SIZE


STATISTIC DF VALUE PROB
------------------------------------------------------
CHI-SQUARE * 8.663 *****
LIKELIHOOD RATIO CHI-SQUARE * 8.838 *****

21. Under a suitable null hypothesis, the expected frequency for the cell cor-
responding to fatal type of accident and small size automobile is:

(a) 81.68
(b) 67.00
(c) 61.43
(d) 63.41
(e) 59.72

Solution: a
Past performance 1990 Apr - 92%

22. Which of the following is NOT CORRECT?

(a) The accidents were cross-classified by size of automobile and fatality


status. Each accident was counted in one and only one cell.
(b) The null hypothesis is that the fatality status is independent of the
size of the automobile.
(c) The null hypothesis is that the proportion of fatality status is the
same for all three sizes of automobiles.
(d) All expected cell counts should be greater than five in order that
the distribution of the test statistic is an approximate chi-square
distribution.
(e) If we reject the null hypothesis than we have proven that the size of
the automobile affects the chances of a fatality.

Solution: e
Past performance 1990 Apr - 39% (B-12%, C-36%)
Past performance 1990 Dec - 20% ( 15% - c, 56% - d)

2006
c Carl James Schwarz 11
23. The null hypothesis will be rejected at α=0.05 if the test statistic exceeds:

(a) 12.59
(b) 7.81
(c) 5.99
(d) 3.84
(e) 9.49

Solution: c
Past performance 1990Apr - 79%

24. The approximate p-value is:

(a) less than .005


(b) between .005 and .010
(c) between .010 and .025
(d) between .025 and .050
(e) between .050 and .100

Solution: c
Past performance 1990 Dec - 78%
Past performance 1993 Apr - 80%

25. A controversial issue in sports is the use of the “instant replay” for making
decisions on plays that are extremely close or hard to call by an official.
A survey of players in each of four professional sports was conducted,
asking them if they felt “instant replays” should be used to decide close or
controversial calls. The results are as follows:

Use of Instant Replay


Favor Oppose
Football 22 2
Baseball 18 6
Basketball 15 26
Soccer 3 10

In testing to see whether opinion with respect to the use of instant replays
is independent of sport, a table of expected frequencies is found. In this
table, the expected number of professional baseball players opposing the
use of instant replays is equal to:

(a) 10.4
(b) 24.1

2006
c Carl James Schwarz 12
(c) 11.0
(d) 6.0
(e) 8.4

Solution: not available

26. Each person in a random sample of males and females was asked to state
his/her sex and preferred colour. The resulting frequencies are shown
below.

Colour
Red Blue Green
Male 3 11 6
Sex Female 17 11 2

Which of the following is FALSE?


(a) 55% of males prefer the colour blue.
(b) Of those who prefer the colour green, 75% are males.
(c) 44% of people surveyed prefer the colour blue.
(d) A higher percentage of males prefered the colour blue than females.
(e) 15% of people are males who prefer the colour red.
Solution: e
Past performance 2006 Oct - 76% (16%=d)

2006
c Carl James Schwarz 13
Multiple Choice Questions
Chi-square tests for independence

The next set of questions refer to the following situation:

A survey was conducted to investigate the severity of rodent problems in egg


and poultry operations. A random sample of operators was selected, and the
operators were classified according to the type of operation and the extent of
the rodent population. A total of 78 egg operators and 53 turkey operators were
classified and the summary information is:

1
1. Which of the following is not correct?

(a) Operators who had both operations could not be used because this
type of analysis requires each unit to be counted in one and only one
cell.
(b) The null hypothesis is that the severity of the rodent problem is
independent of the type of operator.
(c) The alternate hypothesis is that the proportion of turkey operators
with mild, moderate, and severe rodent problems is different from the
proportion of egg operators with mild, moderate, and severe rodent
problems.
(d) A Type I error would be to conclude that the severity of rodent
problems is dependent upon the type of operator while, in fact, the
proportion of turkey operators with mild, moderate, and severe ro-
dent problems is the same as the proportion of egg operators with
mild, moderate, and severe rodent problems.
(e) A Type II error would be to conclude that the proportion of egg
operators with mild, moderate, or severe rodent problems is the same
as the proportion of turkey operators with mild, moderate, or severe
rodent problems when in fact they are independent.

Solution: e
Past performance 1993 Apr - 52% (a-10%; b-10%; c-14%; d-14%)
Past performance 1996 Dec - 61% (a-10%, d-12%)
Past performance 1998 Dec - 72%

2. The value of the test statistic is:


(a) about 5.99

2006
c Carl James Schwarz 2
(b) about 9.71
(c) about 6.81
(d) about 5.64
(e) about 8.60

Solution: d
Past performance 1993 Apr - 65% (a-14%; c-10%)
Past performance 1998 Dec - 99%

3. The expected count in the (egg, mild infestation) cell is:


(a) about 26.00
(b) about 33.33
(c) about 53.00
(d) about 31.55
(e) about 78.00
Solution: d
Past performance 1996 Dec - 71% (a-16%)
Past performance 1998 Dec - 87%

4. The approximate p-value is found to be:


(a) about .060
(b) about .014
(c) about .032
(d) about .008
(e) about .05
Solution: a
Past performance 1993 Apr - 48% (b-14%; c-16%; e-13%)
Past performance 1996 Dec - 89%
Past performance 1998 Dec - 96%

5. One reviewer of the study suggested that there may be a problem with the
study because results from small operators were pooled with the results
from large operators. Which of the following is NOT CORRECT?

(a) Simpson’s paradox occurs when conclusions from a pooled table differ
from the individual tables.
(b) Tables can be pooled when the underlying rates are equal among
tables.

2006
c Carl James Schwarz 3
(c) Simpson’s paradox occurs when tables with unequal row totals are
pooled.
(d) Inspection of the row or column percents will give a good clue if
Simpson’s paradox is likely to occur.
(e) Simpson’s paradox occurs when the pooled table gives no evidence
of an effect but the individual tables show evidence of an effect.
Solution: c
Past performance 1990 Dec - 68%
Past performance 1993 Apr - 32% (b-16%; d-22%; e-25%)
Past performance 1996 Dec - 65% (b-10%, d-10%)
Past performance 1998 Dec - 73% ( d-10%)

The next set of questions refer to the following situation


In the paper “Color Association of Male and Female Fourth-Grade School
Children” (J. Psych., 1988, 383-8), children were asked to indicate what
emotion they associated with the color red. The response and the sex of
the child are noted and summarized below. The first number in each cell
is the count, the second number is the row percent.

Frequency|
Row Pct |anger |happy |love |pain | Total
---------+--------+--------+--------+--------+
f | 27 | 19 | 39 | 17 | 102
| 26.47 | 18.63 | 38.24 | 16.67 |
---------+--------+--------+--------+--------+
m | 34 | 12 | 38 | 28 | 112
| 30.36 | 10.71 | 33.93 | 25.00 |
---------+--------+--------+--------+--------+
Total 61 31 77 45 214

Statistic DF Value Prob


------------------------------------------------------
Pearson Chi-Square * 4.629 *****
Likelihood Ratio Chi-Square * 4.661 *****
Mantel-Haenszel Chi-Square 1 0.307 *****

6. Under a suitable null hypothesis, the expected frequency for the cell cor-
responding to Anger and Males is:

(a) 15.9
(b) 55.7
(c) 30.4
(d) 31.9

2006
c Carl James Schwarz 4
(e) 29.1

Solution: d
Past performance 1991 Apr - 63% (C-17%, E-15%)
Past performance 1991 Dec - 84% (e-11%)
Past performance 1997 Aug - 87%

7. The null hypothesis will be rejected at α=0.05 if the test statistic exceeds:

(a) 3.84
(b) 5.99
(c) 7.81
(d) 9.49
(e) 14.07

Solution: c
Past performance 1991 Apr - 86%

8. The approximate p-value is:

(a) Between .100 and .900


(b) Between .050 and .100
(c) Between .025 and .050
(d) Between .010 and .025
(e) Between .005 and .010

Solution: a
Past performance 1991 Dec - 77% (e-11%)

9. Which of the following is NOT CORRECT?

(a) The children were cross-classified by sex and emotion associated with
red. Each child was counted in one and only one cell.
(b) The null hypothesis is that the type of emotion associated with red
is independent of the sex of the child.
(c) The null hypothesis is that the proportion of emotions associated
with red is the same for both sexes.
(d) All expected cell counts should be greater than five in order that
the distribution of the test statistic is an approximate chi-square
distribution.
(e) If we reject the null hypothesis than we have proven that the two
sexes associate red with emotions in different ways.

2006
c Carl James Schwarz 5
Solution: e
Past performance 1991 Apr - 76% (C-12%)
Past performance 1991 Dec - 77% (c-9%, d-12%)
Past performance 1993 Feb - 67% (d-16%)

10. Which of the following is not correct?

(a) A lower percentage of female students associate the emotion “anger”


with the color red than do male students.
(b) More students associate the color red with the emotion “love” than
with the emotion “anger”.
(c) Each student was classified by gender and by emotion association.
Each student was counted in one and only one cell.
(d) We will be unable to compute a correlation for this data because the
variables are not both interval or ratio in scale.
(e) We compute row or column percentages by dividing the cell count by
the table total (214).

Solution: e
Past performance 1993 Feb - 67% (d-16%)
Past performance 1996 Oct - 92%

11. A Type I error would be committed if:

(a) We conclude that the sex of the child and the emotion associated
with red are independent when in fact they are not independent.
(b) We conclude that the sex of the child and the emotion associated
with red are not independent when in fact they are not independent.
(c) We conclude that the proportion of emotions associated with red
differs between males and female when in fact they are the same.
(d) We conclude that the proportion of emotions associated with red is
the same for male and female when in fact they are the same.
(e) We fail to find any association between the color red and emotions
for either sex.

Solution: c
Past performance 1991 Apr - 76% (E-20%)
Past performance 1991 Dec - 84%
Past performance 1997 Aug - 76%

12. The null hypothesis is:


(a) emotional association with red is independent of gender

2006
c Carl James Schwarz 6
(b) gender is dependent upon the emotional association with red
(c) the probability of selecting an emotion with red is related to gender
(d) the number of children in each cell does not depend upon gender nor
upon emotion
(e) the color red is independent of the emotion associated with it and
with gender.
Solution: c
Past performance 1997 Aug - 74%

13. The test statistic and approximate p-value is:


(a) 4.661 .1983
(b) 4.661 .3966
(c) 4.629 .2011
(d) 4.629 .4022
(e) 4.629 .1006
Solution: b
Past performance 1997 Aug - 76%

14. Each person in a random sample of 50 was asked to state his/her sex and
preferred colour. The resulting frequencies are shown below.

Colour
Red Blue Green
Male 5 14 6
Sex Female 15 6 4

A chi-square test is used to test the null hypothesis that sex and preferred
colour are independent. Which of the following statements is a correct
decision about the null hypothesis?

(a) Reject at the 0.005 level.


(b) Reject at the 0.01 level but not at the 0.005 level.
(c) Reject at the 0.025 level but not at the 0.01 level.
(d) Reject at the 0.05 level but not at the 0.025 level.
(e) Accept at the 0.05 level.

Solution: not available

2006
c Carl James Schwarz 7
15. The following data were obtained from a company which manufactures
special plastic containers which are to hold a specified volume of hazardous
material. On each of the three 8 hour shifts workers are able to make 500
of the containers. Some containers do not meet specifications as required
by the company’s customer because they are too small, others because
they are too large.

Conformance to Specification
Shift Too Small Within Spec. Too Large
8am 36 452 12
4pm 24 443 33
midnight 12 438 50

If conformance to specifications is independent of shift, the expected num-


ber of containers that meet specification on the 4pm shift is

(a) 166.7
(b) 443
(c) 33
(d) 444.3
(e) 500

Solution: not available

16. Are all employees equally prone to having accidents? To investigate this
hypothesis, Parry (1985) looked at a light manufacturing plant and clas-
sified the accidents by type and by age of the employee.

Accident Type
Age Sprain Burn Cut
Under 25 | 9 17 5
25 or over | 61 13 12

A chi-square test gave a test-statistic of 20.78. If we test at α =.05:

(a) There appears to be no association between accident type and age.


(b) Age seems to be independent of accident type.
(c) Accident type does not seem to be independent of age.
(d) There appears to be a 20.78% correlation between accident type and
age.
(e) The proportion of sprain, cuts and burns seems to be similar for both
age classes.

2006
c Carl James Schwarz 8
Solution: c
Past performance 1989 Apr - 64%

17. A random sample of 100 members of a union are asked to respond to


two questions: Question 1. Are you happy with your financial situation?
Question 2. Do you approve of the Federal government’s economic poli-
cies? The responses are:

Question 1.
Yes No | Total
Question Yes 22 48 | 70
2 No 12 18 | 30
Total 34 66 | 100

To test the null hypothesis that response to Question 1 is independent of


response to Question 2 at 5% level, the expected frequency for the cell
(Yes,Yes) and the critical value of the associated test statistic are:

(a) 23.8 and 1.96 respectively


(b) 10.2 and 3.84 respectively
(c) 23.8 and 3.84 respectively
(d) 23.8 and 7.81 respectively
(e) 10.2 and 7.81 respectively

Solution: c

18. A survey was conducted to investigate whether alcohol consumption and


smoking are related. The following information was compiled for 600
individuals:

Smoker Non-smoker
Drinker 193 165
Non-drinker 89 153

Which of the following statements is true?

(a) The appropriate alternative hypothesis is A: Smoking and Alcohol


Consumption are independent.
(b) The appropriate null hypothesis is H: Smoking and Alcohol Con-
sumption are not independent.
(c) The calculated value of the test statistic is 3.84.
(d) The calculated value of the test statistic is 7.86.

2006
c Carl James Schwarz 9
(e) At level .01 we conclude that smoking and alcohol consumption are
related.

Solution: e

19. Doctors’ practices have been categorized as to being Urban, Rural, or


Intermediate. The number of doctors who prescribed tetracycline to at
least one patient under the age of 8 were recorded for each of these practice
areas. The results are:

Urban Intermediate Rural


Tetracycline 95 74 31
No tetracycline 126 84 30

If the county type of practice and the use of tetracycline are independent,
then the expected number of rural doctors who prescribe tetracycline is:

(a) 31.0
(b) 27.7
(c) 1.37
(d) 51%
(e) 62

Solution: b

20. For the problem outlined above, the critical value(table value) of the test
statistic when the level of significance is α =0.05, is:

(a) 0.1026
(b) 7.3778
(c) 5.9915
(d) 12.5916
(e) 7.8147

Solution: c

The next set of questions refer to the following situation:


A study was conducted to determine if the fatality rate depends on the
size of the automobile. The analysis of accidents is as follows (with some
values hidden):

2006
c Carl James Schwarz 10
DEATH SIZE
FREQUENCY| m | s | L | TOTAL
---------+--------+--------+--------+
no | 63 | 128 | 46 | 237
---------+--------+--------+--------+
yes | 26 | 95 | 16 | 137
---------+--------+--------+--------+
TOTAL 89 223 62 374

STATISTICS FOR TABLE OF DEATH BY SIZE


STATISTIC DF VALUE PROB
------------------------------------------------------
CHI-SQUARE * 8.663 *****
LIKELIHOOD RATIO CHI-SQUARE * 8.838 *****

21. Under a suitable null hypothesis, the expected frequency for the cell cor-
responding to fatal type of accident and small size automobile is:

(a) 81.68
(b) 67.00
(c) 61.43
(d) 63.41
(e) 59.72

Solution: a
Past performance 1990 Apr - 92%

22. Which of the following is NOT CORRECT?

(a) The accidents were cross-classified by size of automobile and fatality


status. Each accident was counted in one and only one cell.
(b) The null hypothesis is that the fatality status is independent of the
size of the automobile.
(c) The null hypothesis is that the proportion of fatality status is the
same for all three sizes of automobiles.
(d) All expected cell counts should be greater than five in order that
the distribution of the test statistic is an approximate chi-square
distribution.
(e) If we reject the null hypothesis than we have proven that the size of
the automobile affects the chances of a fatality.

Solution: e
Past performance 1990 Apr - 39% (B-12%, C-36%)
Past performance 1990 Dec - 20% ( 15% - c, 56% - d)

2006
c Carl James Schwarz 11
23. The null hypothesis will be rejected at α=0.05 if the test statistic exceeds:

(a) 12.59
(b) 7.81
(c) 5.99
(d) 3.84
(e) 9.49

Solution: c
Past performance 1990Apr - 79%

24. The approximate p-value is:

(a) less than .005


(b) between .005 and .010
(c) between .010 and .025
(d) between .025 and .050
(e) between .050 and .100

Solution: c
Past performance 1990 Dec - 78%
Past performance 1993 Apr - 80%

25. A controversial issue in sports is the use of the “instant replay” for making
decisions on plays that are extremely close or hard to call by an official.
A survey of players in each of four professional sports was conducted,
asking them if they felt “instant replays” should be used to decide close or
controversial calls. The results are as follows:

Use of Instant Replay


Favor Oppose
Football 22 2
Baseball 18 6
Basketball 15 26
Soccer 3 10

In testing to see whether opinion with respect to the use of instant replays
is independent of sport, a table of expected frequencies is found. In this
table, the expected number of professional baseball players opposing the
use of instant replays is equal to:

(a) 10.4
(b) 24.1

2006
c Carl James Schwarz 12
(c) 11.0
(d) 6.0
(e) 8.4

Solution: not available

26. Each person in a random sample of males and females was asked to state
his/her sex and preferred colour. The resulting frequencies are shown
below.

Colour
Red Blue Green
Male 3 11 6
Sex Female 17 11 2

Which of the following is FALSE?


(a) 55% of males prefer the colour blue.
(b) Of those who prefer the colour green, 75% are males.
(c) 44% of people surveyed prefer the colour blue.
(d) A higher percentage of males prefered the colour blue than females.
(e) 15% of people are males who prefer the colour red.
Solution: e
Past performance 2006 Oct - 76% (16%=d)

2006
c Carl James Schwarz 13
Multiple Choice Questions
Experimental and Survey Design

1. There is a positive association between the number of drownings and ice


cream sales. This is an example of an association likely caused by:
(a) coincidence
(b) cause and effect relationship
(c) confounding factor
(d) common cause
(e) none of the above
Solution: d
Past performance 1991 Oct - 31% (30% a, 25% c)
Past performance 1992 Oct - 55% (17% a; 17% b)
Past performance 2006 Oct - 70% (10% a; 15% c)

2. A new headache remedy was given to a group of 25 subjects who had


headaches. Four hours after taking the new remedy, 20 of the subjects
reported that their headaches had disappeared. From this information
you conclude:

(a) that the remedy is effective for the treatment of headaches.


(b) nothing, because the sample size is too small.
(c) nothing, because there is no control group for comparison.
(d) that the new treatment is better than aspirin.
(e) that the remedy is not effective for the treatment of headaches.

Solution: c
Past performance 1997 Jun - 99%
Past performance 1997 Aug - 99%

1
3. A nutritionist wants to study the effect of storage time (6, 12, and 18
months) on the amount of vitamin C present in freeze dried fruit when
stored for these lengths of time. Vitamin C is measured in milligrams per
100 milligrams of fruit. Six fruit packs were randomly assigned to each of
the three storage times. The treatment, experimental unit, and response
are respectively:

(a) a specific storage time, amount of vitamin C, a fruit pack


(b) a fruit pack, amount of vitamin C, a specific storage time
(c) random assignment, a fruit pack, amount of vitamin C
(d) a specific storage time, a fruit pack, amount of vitamin C
(e) a specific storage time, the nutritionist, amount of vitamin C

Solution: d
Past performance 1992 Dec - 92%
Past performance 1996 Dec - 97%

4. We wish to investigate if a new medicine is effective in reducing the length


and severity of the flu. We take the next 20 patients that come to the
walk-in clinic complaining of flu and, after a medical exam to verify that
the patients do have the flu, we give them the new medicine and tell them
about the new drug we are giving them. One week later, the patients are
contacted and 15 patients state the new remedy was helpful in reducing
the severity and length of the illness. Which of the following is NOT
CORRECT?
(a) This is a poor experiment because there is no control group. We do
not know how many would feel better in a week without treatment.
(b) This is a poor experiment because it is not double-blinded. The
patients may feel relief because they thought the drug should work.
(c) This is a poor experiment because a convenience sample was selected.
Patients who come to the a walk-in clinic may have more severe flu
than people who do not.
(d) This is a poor experiment because we didn’t give the remedy to people
without the flu to assess its effect in a control group.
(e) This is a poor experiment because the sample size is likely to be too
small to detect anything but a gross improvement in measuring the
proportion of people reporting an improvement.
Solution: d
Past performance 1991 Feb - 63% (c-14%, e-13%)
Past performance 1991 Dec - 69% (e-20%)
Past performance 1993 Feb - 56%
Past performance 1996 Oct - 64% (28%-e)

2006
c Carl James Schwarz 2
Past performance 1996 Dec - 68% (28%-e)
Past performance 1998 Dec - 80% (15%-e)

5. A survey is to be undertaken of recent nursing graduates in order to com-


pare the starting salaries of women and men. For each graduate, three
variables are to be recorded (among others) ů sex, starting salary, and
area of specialization.

(a) Sex and starting salary are explanatory variables; area of specializa-
tion is a response variable.
(b) Sex is an explanatory variable; starting salary and area of specializa-
tion are response variables.
(c) Sex is an explanatory variable; starting salary is a response variable;
area of specialization is a possible confounding variable.
(d) Sex is a response variable; starting salary is an explanatory variable;
area of specialization is a possible confounding variable.
(e) Sex and area of specialization are response variables; starting salary
is an explanatory variable.

Solution: c
Past performance 1991 Dec - 74% (b-10%)
Past performance 1993 Apr 99%

6. Which of the following is CORRECT?

(a) We do not need to randomize if our sample size is sufficiently large.


(b) A large sample size always ensures that our sample is representative
of the population.
(c) If all other things are equal, we need a larger sample size for a larger
population.
(d) In a properly chosen sample, an estimate will be less variable with a
large sample size and hence more precise.
(e) In random samples, the randomization ensures that we get precise
and accurate estimates.

Solution: d
Past performance 1992 Dec - 63% (30%e)
Past performance 1996 Dec - 89%

7. An experimenter wishes to test whether or not two types of fish food (a


standard fish food and a new product) work equally well at producing fish
of equal weight after a 2-month feeding program. The experimenter has 2

2006
c Carl James Schwarz 3
identical fish tanks (1 & 2) to put fish in and is considering how to assign
the 40 tagged fish to the tanks. To properly assign the fish, one step would
be to:

(a) put all the odd tagged numbered fish in one tank, the even in the
other, and give the standard food type to the odd numbered ones
(b) obtain pairs of fish whose weights are virtually equal at the start of
the experiment and randomly assign one to the group tank 1, the
other to tank 2 with the feed assigned at random to the tanks.
(c) to proceed as in as in (b), but put the heavier of the pair into tank
2.
(d) assign the fish at random to the two tanks and give the standard feed
to tank 1.
(e) not to proceed as in (b) because using the initial weight in (b) is a
non-random process.Use the initial length of the fish instead.

Solution: d

8. A researcher wishes to compare the effects of 2 fertilizers on the yield of


a soybean crop. She has 20 plots of land available and she decides to use
a paired experiment – using 10 pairs of plots. Thus, she will:

(a) use a table of random numbers to divide the 20 plots into 10 pairs
and then, for each pair, flip a coin to assign the fertilizers to the 2
plots.
(b) subjectively divide the 20 plots into 10 pairs (making the plots within
a block as similar as possible) and then, for each pair, flip a coin to
assign the fertilizers to the 2 plots.
(c) use a table of random numbers to divide the 20 plots into 10 pairs
and then use the table of random numbers a second time to decide
upon the fertilizer to be applied to each pair.
(d) flip a coin to divide the 20 plots into 10 pairs and then, for each pair,
use a table of random numbers to assign the fertilizers to the 2 plots.
(e) use a table of random numbers to assign the 2 fertilizers to the 20
plots and then use the table of random numbers a second time to
place the plots into 10 pairs.

Solution: b

9. A student wishes to examine the effect of wing width and wing length on
the length of flight of a paper airplane. There are 4 different models of
airplanes. Which of the following is NOT correct?

2006
c Carl James Schwarz 4
(a) A factor (such as wing width) is an experimental variable under con-
trol of the experimenter.
(b) The order of flights was randomized to remove the influence of any
other variables upon the flight distance of each flight.
(c) It would be better to make four copies of each model of plane to give
some feel for the plane-to-plane variations. Flying a single copy four
times gives information about the internal variation.
(d) Interaction between two factors means that the effect of a factor at
one level depends on the level of the second factor.
(e) Planned experiments (where randomization can take place) is one of
the strongest pieces of evidence in try to establish a causal relation-
ship.
Solution: b - randomization does not remove influences - makes them
equal in all groups
Past performance 1996 Nov - 8% (41%-c; 18%-d; 30%-e)

10. An experiment was conducted where you flew paper airplanes after mod-
ifying wing depth and wing length. There were four different models of
airplane. One design consideration was the choice between
flying each plane four times or making four copies of each model, each of
which is flown once. Which of the following is NOT correct?
(a) Flying multiple copies of each model (i.e. separate planes of each
model) could give information on variability in flight due to fabrica-
tion effects (i.e. how you made the plane).
(b) Flying a single copy of each model four times could give information
on variability in flight due to changes in initial launch conditions.
(c) The differences in flight length among the different models gives in-
formation on the “effects” of the design factors - wing depth and wing
length.
(d) The response variable is flight length; the explanatory variables are
wing depth and wing width.
(e) Interaction between the effects of wing depth and wing width implies
that the effects of wing depth are the same for all wing widths.
Solution: e
Past performance 1997 Jul - 83%

11. An experiment was designed an experiment to investigate the effect of


the amount of water and seed variety upon subsequent growth of plants.
Each plant was potted in a clay plot, and a measured amount of water
was given weekly. The height of the plant at the end of the experiment
was measured. Which of the following is not correct?

2006
c Carl James Schwarz 5
(a) The response variable is the plant height.
(b) The explanatory variables are the amount of water and seed variety.
(c) Randomization was used to eliminate the effect of other possible fac-
tors upon the growth of the plants.
(d) A possible uncontrollable factor in this experiment is any nutrients
that might be present in the clay pots.
(e) Designed experiments give the best evidence of “cause-and-effect” re-
lationships.
Solution: c - randomization does not remove influences - makes them
equal in all groups
Past performance 1997 Jun - 54% (11%-b; 19%-d; 15%-e)

12. A survey was conducted by visiting a student parking lot to estimate the
proportion of cars that were red. Which of the following is NOT correct?
(a) If the sampled stall was empty, we can simply choose another stall, at
random, to take its place because it is not likely that the stall being
vacant is related to a car being red.
(b) The sample would be representative of the population if 100 cars were
chosen regardless if randomization was used or not.
(c) Even though a random sample was taken from cars in the parking
lot, the sample may not be representative of the cars driven by SFU
students because the decision to park in B-lot is self-selected.
(d) If a another sample of cars was chosen, it is likely that a different
proportion of cars that are red would be obtained.
(e) The confidence interval computed gave a 95% confidence interval for
the true proportion of cars that were red in the population of cars
that park in B-lot (assuming that the sample was selected using the
3 R’s).
Solution: b
Past performance 1997 Jun - 91%

13. A survey was done to estimate the proportion of cars that are red and are
Japanese made in the City of Vancouver by taking a random sample of
size 25 from a student parking lot at Simon Fraser University. Which of
the following is NOT CORRECT:
(a) This sample may not be representative of the cars in Vancouver be-
cause mainly students park at SFU.
(b) If the particular stall is vacant, we can simply select another stall at
random because it is unlikely that a stall is vacant is related to the
color or manufacturer of the car.

2006
c Carl James Schwarz 6
(c) It would be dangerous to simply select the first 25 stalls in the lot
closest to the Applied Science Building because there are a number
of stall reserved for service vehicles whose primary color is white.
(d) Different students obtained different answers for their sample propor-
tions. This is an example of a sampling distribution for an estimator.
(e) The margin of error will depend upon the total number of cars in the
lot when we did the sample.
Solution: e
Past performance 1998 Nov - 76%

14. Discriminant analysis is a statistical technique, which attempt to find a


set of variables to allow you distinguish among groups, e.g. in one of the
assignments, you tried to distinguish among authors based on sentence
length and other statistics. Which of the following is NOT CORRECT?
(a) We needed to adjust some variables to a “per 100 word basis” or to
a “per sentence basis” to adjust for the different number of words in
the texts where authorship is known.
(b) Potentially useful variables are selected by finding variables whose
distribution are as similar as possible for all the authors.
(c) Another example of this method might be a bank making a decision
on granting a student a loan based on characteristics such as grade
point average, past credit history, etc.
(d) We looked at many pairs of plots to find the pair of variables that
gave the best separation between the two authors.
(e) Because of natural variability, errors can always be made. However,
the goal of this analysis is to minimize the costs of misclassification.
Solution: b
Past performance 1998 Nov - 80%

15. An experiment was conducted where here you tried to distinguish among
authors based on sentence length and other statistics. Which of the fol-
lowing is NOT correct?
(a) We needed to adjust some variables to a “per 100 word basis” to
adjust for the different number of words on a page.
(b) This was a simplified form of discriminant analysis where, in general,
one wishes to distinguish among groups of objects based on charac-
teristics observed.
(c) Another example of this method might be a bank making a decision
on granting a student a loan based on characteristics such as grade
point average, past credit history, etc.

2006
c Carl James Schwarz 7
(d) The polygon plot is a way of “enclosing” typical values of the statistics
for each author.
(e) Potentially useful variables are selected by finding variables whose
distribution are as similar as possible for all the authors.
Solution: e
Past performance 1997 Jul - 71% (20%-c)

16. An experiment was conducted where you analyzed the results of the plant
growth experiment after you manipulated the amount of water and seed
variety. Which of the following is correct?
(a) We randomized the plants to plots to eliminate any effect of hidden
variables.
(b) We could determine the best combination of water and seed variety
by examining the difference in the plant height in the final week of
the experiment.
(c) The variability in growth among plants of the same variety who re-
ceived the same amount of water was constant over time.
(d) The growth of a particular plant in week 3 is likely to be independent
(unrelated) of the growth of the same plant in week 2.
(e) The growth of the plants was linear over time.
Solution: b
Past performance 1997 Jul - 39% (30%-a; 11%-c; 11%-d; 7%-e)

17. The following numbers are extracted from a table of random digits:

38683 50279 38224 09844 13578 28251 12708 24684

A scientist will be measuring the total amount of woody debris in a random


sample of sites selected without replacement from a population of 45 sites.
The sites are labeled 01, 02, ..., 45 and she starts at the beginning of the
line of random digits and takes consecutive pairs of digits. Which of the
following is correct?
(a) Her sample is 38, 25, 02, 38, 22
(b) Her sample is 38, 68, 35, 02, 22
(c) Her sample is 38, 35, 27, 28, 08
(d) Her sample is 38, 65, 35, 02, 79
(e) Her sample is 38, 35, 02, 22, 40
Solution: e

2006
c Carl James Schwarz 8
18. We wish to draw a sample of size 5 without replacement from a population
50 households. Suppose the households are numbered 01, 02, . . . , 50, and
suppose that the relevant line of the random number table is:

Digits 11362 35692 96237 90842 46843 62719 64049 17823.

Then the households selected are:


(a) households 11 13 36 62 73
(b) households 11 36 23 08 42
(c) households 11 36 23 23 08
(d) households 11 36 23 56 92
(e) households 11 35 96 90 46
Solution: b
Past performance 1998 Dec - 50% (19% c; 27% d)
Note that (c) is WITH replacement; (d) uses pairs corresponding to house
numbers not in the range 1..50

19. An experiment to measure the effect of giving growth hormones to girls


affected by Turner’s Syndrome was carried out recently in Vancouver. All
34 girls in the study were given the growth hormone and their heights
were measured at the time the hormone was given and again one year
later. No measurements were made on their final adult heights. Which of
the following is NOT a problem with this experiment:
(a) there was no blinding
(b) there was no control group
(c) nonresponse bias
(d) there was insufficient attention to the placebo effect
(e) Because final heights were not measured, it would be impossible to
tell if the hormone affected final height or only accelerated growth
and made no difference to final height.
Solution: c
Past performance 1998 Oct - 71%

20. Which of the following statements is FALSE?

(a) Nonresponse can cause bias in surveys because non-respondents often


tend to behave differently from people who respond.
(b) Non-sampling errors are often bigger than the random sampling er-
rors in surveys.

2006
c Carl James Schwarz 9
(c) Slight changes in the wording of questions can make a measurable
difference to survey results.
(d) People will sometimes answer a question differently for different in-
terviewers.
(e) Sophisticated statistical methods can always correct the results if the
population you are sampling from is different from the population of
interest, e.g. due to under-coverage.
Solution: e
Past performance 1998 Oct - 87%

21. A properly conducted random survey selected 1000 Canadians (from a


total population of about 30 million) and 1000 Americans (from a total
population of about 300 million). Which of the following is FALSE?
(a) Randomization ensures that both samples are representative of their
respective populations.
(b) The precision is determined by the ratio of the sample size to the
total population size.
(c) A smaller proportion of the American population has been chosen.
Therefore, a particular person has a smaller chance of being selected
in America than in Canada.
(d) A potential stratification variable for both countries could be location
- eastern, middle, or western continental.
(e) Random digit dialing to select people for the survey could induce
biases in the results if the characteristic of interest for the survey is
related to income.
Solution: b - because precision is determined mainly by sample size
Past performance 1998 Oct - 54% (25% c)
Past performance 2006 Nov - 67% (19% c)

22. An experiment was conducted by the Schwarz family to look at the yield
of popcorn (total grams that popped when 15 g of popcorn were heated)
when two variables (the type of popcorn: gourmet or plain) and the
amount of oil (little or lots) was used. A profile plot of the results is
below:

2006
c Carl James Schwarz 10
Which of the following is NOT CORRECT:
(a) Because the lines are not parallel, there appears to be evidence of
interaction between the two variables.
(b) The two explanatory factors are the amount of oil and the type of
popcorn. The response variable is the yield of popcorn.
(c) The difference in yield between gourmet and plain popcorn is esti-
mated to increase by about 6 g when lots of oil were used.
(d) There was little change in the yield for plain popcorn when either
little or lots of oil were used.
(e) An interaction would exist if the increase in yield from going from
little to lots of oil were the same for both types of popcorn.
Solution: e
Past performance 1998 Nov - 63% (16% a; 13% c)

23. A recent survey by a large-circulation Canadian magazine on the contri-


bution of universities to the economy was circulated to 394 people who
the magazine decided “are the most likely to know how important are uni-
versities to the Canadian economy”. The main problem with using these
results to draw conclusions about the general public’s perception is:
(a) selection bias
(b) insufficient attention to the placebo effect
(c) no control group
(d) non-response bias
(e) interviewer bias

2006
c Carl James Schwarz 11
Solution: a
Past performance 1998 Dec - 90%

24. In Assignment 2, you investigated the effect of different paper weight on


the distance origami frogs jumped. Which of the following is FALSE?
(a) This experiment had pseudo-replication because each frog was tested
multiple times.
(b) A better experiment would require us to make multiple copies of each
frog from each paper weight.
(c) Because the stiffer paper is harder to fold, a better experiment would
use a larger sheet of the stiffer paper while making a frog.
(d) A proper experiment could use 10 replicate frogs of the lighter weight
paper and only 5 replicate frogs of the stiffer paper in a completely
random order.
(e) It would be a poor experiment if two people made the frogs jump
with person A using the light weight frogs and person B using the
heavier weight frogs.
Solution: c - there is actually nothing wrong with an unbalanced design
as long as proper randomization is used. In more advanced classes you
will see that the design with the best power and small se for the estimated
difference have equal sample sizes, but this does not invalidate the exper-
iment.
Past performance 2006 Oct - 47% (45%-d)
Past performance 2006 Dec - 67% (25%-d)

25. In class, we performed a randomized response survey to estimate the pro-


portion of class who used marijuana in the last year. Each student ob-
tained a random digit between 0 and 9 (inclusive). Of those who received
the digits 0, 1, 2, 3, or 4, these students answered the question on mari-
juana usage. Of those who received the digits 5, 6, 7, 8, 9, these students
answered the question if their favorite person’s birthday was in January
to June (inclusive). We obtained a total of 150 yes and 250 no responses.
Which of the following is FALSE?
(a) We estimate that about 25% of students have used marijuana in the
last year.
(b) About 50% of people have birthdays in January-June (inclusive)
(c) Of the 150 yeses, about 66%=100/150 of these had favorite people
with birthdays in January-June (inclusive).
(d) Of people with birthdays in January-June, we estimate that about
25% used marijuana in the last year.

2006
c Carl James Schwarz 12
(e) About 37%=150/400 said yes to having used marijuana in the last
year.
Solution: e
Past performance 2006 Oct - 71% (12%-d

26. Recall in one assignment, you conducted a two factor experiment to com-
pare the flying distances of paper airplanes. One factor was wing length
with two levels; the second factor was wing depth also with two factors.
Which of the following is CORRECT?
(a) A good experiment would fly all four copies of the different airplanes
in sequential order.
(b) A good experiment would control for the person launching the planes
by having the same person do all the launches.
(c) A good experiment would make a single copy of each treatment com-
bination and test each copy 10 times.
(d) A good experiment would examine the effect of paper weight on flying
by making all planes of the same weight of paper.
(e) A good experiment would order the planes by weight while running
the experiment.
Solution: b
Past performance 2006 Nov - 70%; 12% choose (c); 14% choose (d)

27. Recall in one assignment you surveyed cars in a parking lot to estimate
the proportion that were red or the proportion that were from a Japanese
manufacturer. Which of the following is NOT CORRECT?
(a) A convenience sample of the cars closest to the Applied Science build-
ing may give a biased estimate of the proportion of cars which are
from a Japanese manufacturer.
(b) Different students may get different answers for the proportion of
cars that are red.
(c) The sample proportion of cars that are red is an unbiased estimate of
the population proportion if the sampling is a simple random sample.
(d) A sample of 100 cars in a convenience sample is always better than
a sample of 20 cars from a proper random sample.
(e) A sample of 100 cars from a proper random sample will give more
precise estimates of the proportion of cars that are red than a sample
of 20 cars from a proper random sample.
Solution: d
Past performance 2006 Nov - 92%

2006
c Carl James Schwarz 13
28. Consider an experiment to investigate the efficacy of different insecticides
in controlling pests and their effects on subsequent yield. What is the best
reason for randomly assigning treatment levels (spraying or not spraying)
to the experimental units (farms)?
(a) Randomization make the experiment easier to conduct because we
can apply the insecticide in any pattern rather than in a systematic
fashion.
(b) Randomization makes the analysis easier because the data can be
collected and entered into the computer in any order.
(c) Randomization is required by statistical consultants before they will
help you analyze the experiment.
(d) Randomization implies that it is not necessary to be careful during
the experiment, during data collection, and during data analysis.
(e) Randomization will tend to average out all other uncontrolled fac-
tors such as soil fertility so that they are not confounded with the
treatment effects.
Solution: e
Past performance 1990 Feb - 97%
Past performance 1993 Feb - 98%
Past performance 1996 Dec - 100%
Past performance 2006 Dec - 99%

2006
c Carl James Schwarz 14
Multiple Choice Questions
Inference - Paired samples on means

1. Which of the following is INCORRECT about the use of a paired ex-


periment?
(a) The object of pairing (or blocking) is to account for the effect of
possible other factors (such as fertility of soils).
(b) The analysis of paired data starts by finding the difference between
the values of the pair. The order of the difference (as long as it is
consistent) is unimportant.
(c) It is crucial to recognize pairing. If pairing is not recognized, the
results will not be as accurate and precise as possible.
(d) The degrees of freedom is equal to the number of pairs - 1.
(e) Because pairing is beneficial, we can pair all data by matching the
smallest value of each sample, the second smallest value of each sam-
ple, the third smallest value of each sample, etc.
Solution: e
Past performance 1990 Dec - 65%
Past performance 1992 Dec - 93%

2. Trace metals in drinking water wells affect the flavor of the water and un-
usually high concentrations can pose a health hazard. Furthermore, the
water in well may vary in the concentration of the trace metals depending
upon from where it is drawn. In the paper, “Trace Metals of South Indian
River Region” (Environmental Studies, 1982, 62-6), trace metal concen-
trations (mg/L) on zinc were found from water drawn from the bottom
and the top of each of 6 wells. The data follows:

Location Bottom Top


1 .430 .415
2 .266 .238
3 .567 .390
4 .531 .410
5 .707 .605
6 .716 .609

1
A a 95% confidence interval for the mean difference in the zinc concentra-
tions in this area between water drawn from the top and bottom of wells
is:

(a) .0917 ± 2.57(.061)


(b) .0917 ± 2.45(.061)
(c) .0917 ± 2.57(.025)
(d) .0917 ± 2.45(.025)
(e) .0917 ± 2.20(.025)

Solution: c
Past performance 1990 Dec - 64%
Past performance 1992 Dec - 75% (20%a)

2006
c Carl James Schwarz 2
Multiple Choice Questions
Inference - Single sample on means

1. What is a statistical inference?

(a) A decision, estimate, prediction, or generalization about the popula-


tion based on information contained in a sample.
(b) A statement made about a sample based on the measurements in
that sample.
(c) A set of data selected from a larger set of data.
(d) A decision, estimate, prediction or generalization about sample based
on information contained in a population.
(e) A set of data that characterizes some phenomenon.

Solution: a

2. Which of the following statements about confidence intervals is INCOR-


RECT?

(a) If we keep the sample size fixed, the confidence interval gets wider as
we increase the confidence coefficient.
(b) A confidence interval for a mean always contains the sample mean.
(c) If we keep the confidence coefficient fixed, the confidence interval gets
narrower as we increase the sample size.
(d) If the population standard deviation increases, the confidence interval
decreases in width.
(e) If the confidence intervals for two means do not overlap very much,
there is evidence that the two population means are different.

Solution: d
Past performance 1990 Dec - 72%
Past performance 1996 Nov - 76%

1
3. You have measured the systolic blood pressure of a random sample of 25
employees of a company. A 95% confidence interval for the mean systolic
blood pressure for the employees is computed to be (122,138). Which of
the following statements gives a valid interpretation of this interval?
(a) About 95% of the sample of employees have a systolic blood pressure
between 122 and 138.
(b) About 95% of the employees in the company have a systolic blood
pressure between 122 and 138.
(c) If the sampling procedure were repeated many times, then approx-
imately 95% of the resulting confidence intervals would contain the
mean systolic blood pressure for employees in the company.
(d) If the sampling procedure were repeated many times, then approxi-
mately 95% of the sample means would be between 122 and 138.
(e) The probability that the sample mean falls between 122 and 138 is
equal to 0.95.
Solution: c
Past performance 1997 Aug - 40% (40%-d; 15%-e)
Past performance 1998 Nov - 57% (15%-d; 15%-b)

4. The government claims that students earn an average of $4500 during


their summer break from studies. A random sample of students gave a
sample average of $3975 and a 95% confidence interval was found to be
($3525 < µ < $4425). This interval is interpreted to mean that:

(a) if the study were to be repeated many times, there is a 95% prob-
ability that the true average summer earnings is not $4500 as the
government claims.
(b) because our specific confidence interval does not contain the value
$4500 there is a 95% probability that the true average summer earn-
ings is not $4500.
(c) if we were to repeat our survey many times, then about 95% of all
the confidence intervals will contain the value $4500.
(d) if we repeat our survey many times, then about 95% of our confi-
dence intervals will contain the true value of the average earnings of
students.
(e) there is a 95% probability that the true average earnings are between
$3525 and $4425 for all students.

Make a link to MoreInfo/ci1.text here for more information


about the answer
Solution: d

2006
c Carl James Schwarz 2
5. Does playing music to dairy cattle increase their milk production? An
experiment was conducted where a group of dairy cattle was divided into
two groups. Music was played to one group; the control group did not
have music played. The average increase in production was 2.5 L/cow over
the time period in question. A 95% confidence interval for the difference
(treatment-control) in the mean production was computed to be (1.5,3.5)
L/cow. This means:

(a) 95% of the cows increased their production by between 1.5 and 3.5
L.
(b) We are 95% confident that the average increase in production in the
sample is 2.5 L/cow.
(c) Because the confidence interval does not contain zero, we are 95%
confident that there was no effect of playing music.
(d) We don’t know the true increase in production, but we are 95% con-
fident that the increase in the mean production is in this interval.
(e) Because the confidence interval does not include zero, we are 95% con-
fident that the true increase in production for all cows is 2.5 L/cow.

Solution: d
Past performance 1992 Dec - 76% (10%e)
Past performance 1996 Dec - 86%

6. An experiment was conducted to estimate the mean yield of a new variety


of oats. A sample of 20 plots gave a mean yield of 2.9 t/hectare, and a
95% confidence interval of (2.48, 3.32) t/ha. This means:

(a) We are sure the true mean yield of this new variety is between 2.48
and 3.32 t/ha.
(b) We are 95% confident that the true mean yield of this variety is 2.9
t/ha.
(c) About 95% of the yields of the new variety will be between 2.48 and
3.32 t/ha.
(d) We are 95% confident that the true mean yield of this variety is
between 2.48 and 3.32 t/ha.
(e) We are 95% confident that the mean yield of 2.9 t/hectare is between
2.48 and 3.32 t/ha.

Solution: d
Past performance 1990 Dec - 87%

7. A 95 percent confidence interval for the mean time taken to process new
insurance policies is (11, 12) days. This interval can be interpreted to
mean that:

2006
c Carl James Schwarz 3
(a) only 5 percent of all policies take less than 11 or more than 12 days
to process
(b) only 5 percent of all policies take between 11 and 12 days to process
(c) about 95 out of every 100 such intervals constructed from random
samples of the same size will contain the population mean processing
time
(d) the probability is .95 that all policies take between 11 and 12 days
to process
(e) none of the above

Solution: c

8. The diameter of ball bearings are known to be normally distributed with


unknown mean and variance. A random sample of size 25 gave a mean
2.5 cm. The 95% confidence interval had length 4 cm. Then

(a) The sample variance is 4.86.


(b) The sample variance is 26.03.
(c) The population variance is 4.84.
(d) The population variance is 23.47.
(e) The sample variance is 23.47.

Solution: e - use t with 24 df rather than z=1.96 beacuse σ is unknown


and ’small ’ sample

9. A turkey producer knows from previous experience that profits are maxi-
mized by selling turkeys when their average weight is 12 kilograms. Before
determining whether to put all their full grown turkeys on the market this
month, the producer wishes to estimate their mean weight. Prior knowl-
edge indicates that turkey weights have a standard deviation of around 1.5
kilograms. The number of turkeys that must be sampled in order to esti-
mate their true mean weight to within 0.5 kilograms with 95% confidence
is:
(a) 35
(b) 5
(c) 65
(d) 10
(e) 150

2006
c Carl James Schwarz 4
Solution: a
Past performance 1992 Dec - 85%
Past performance 1998 Nov - 85%

10. A random sample of 4 Herefords, each with a frame size of three (on a
one-to-seven scale), gave a sample mean weight of 452 kg and a sample
standard deviation of 12 kg. A 95% confidence interval for the average
weight of all Herefords of this frame size is (using an “exact” confidence
interval):
(a) (435.3, 468.7)
(b) (432.9, 471.1)
(c) (440.2, 463.8)
(d) (428.5, 475.5)
(e) (436.6, 467.4)
Solution: b
Past performance 1990 Dec - 75%
Past performance 1997 Jul - 75%

11. Referring to the previous question, about how many animals should be
sampled (in total) in order to be 95% confident of determining the true
mean weight WITHIN 2 kg?
(a) 140
(b) 170
(c) 550
(d) 100
(e) 190
Solution: a
Past performance 1990 Dec - 72%
Past performance 1997 Jul - 60%

12. The average yield of grain on 9 randomly picked experimental plots of


farm was found to be 150 bushels. If the yield in bushels per plot in
previous studies was found to be approximately normally distributed with
a variance of 400 bushels2 , a 98% confidence interval for the mean yield
is:

(a) (136.9, 163.1)


(b) (144.8, 155.2)

2006
c Carl James Schwarz 5
(c) (132.8, 167.2)
(d) (134.5, 165.5)
(e) (145.7, 154.4)

Solution: d
Past performance 1989 Dec - 61% ( 22% -b)

13. An analyst, using a random sample of n = 500 families, obtained a 90


percent confidence interval for mean monthly family income for a large
population: ($600, $800). If the analyst had used a 99 percent confidence
coefficient instead, the confidence interval would be:

(a) narrower and would involve a larger risk of being incorrect


(b) wider and would involve a smaller risk of being incorrect
(c) narrower and would involve a smaller risk of being incorrect
(d) wider and would involve a larger risk of being incorrect
(e) wider but it cannot be determined whether the risk of being incorrect
would be larger or smaller

Solution: b

14. A horticulturist wishes to estimate the mean growth of seedlings in a large


timber plot last year. A random sample of n = 100 seedlings is selected
and the one-year growth for each is measured. The sample results are: X
= 5.62 cm and s = 2.50 cm. The 95 percent confidence interval for the
mean growth is:

(a) (3.12, 8.12)


(b) (4.98, 6.26)
(c) (5.13, 6.11)
(d) (5.37, 5.87)
(e) (5.57, 5.67)

Solution: c
Past performance 1989 Dec - 93%

15. In an investigation on toxins produced by molds that infect corn crops, a


biochemist prepares extracts of the mold culture and then measures the
amount of the toxic substance per gram of solution. From six preparations
of the mold culture the following observations on toxic substances (mg)
are obtained:

2006
c Carl James Schwarz 6
1.2, .8, .6, 1.1, 1.2, .8.

A 95% confidence interval for the mean amount of toxic substances is:

(a) .95 ś 2.57 (.10)


(b) .95 ś 1.96 (.10)
(c) .95 ś 2.57 (.25)
(d) .95 ś 1.96 (.25)
(e) .95 ś 2.02 (.10)

Solution: a
Past performance 1989 Dec - 57% (18% - c, 11% -b,d)
Past performance 1990 Dec - 58% (14% - c, 14% - b)
Past performance 1990 Dec - 63% (11% - d, 21% - c)

16. The effect of acid rain upon the yield of crops is of concern in many places.
In order to determine baseline yields, a sample of 13 fields was selected,
and the yield of barley (g/400m2 ) was determined. The output from SAS
appears below:

QUANTILES(DEF=4) EXTREMES
N 13 SUM WGTS 13 100% MAX 392 99% 392 LOW HIGH
MEAN 220.231 SUM 2863 75% Q3 234 95% 392 161 225
STD DEV 58.5721 VAR 3430.69 50% MED 221 90% 330 168 232
SKEW 2.21591 KURT 6.61979 25% Q1 174 10% 163 169 236
USS 671689 CSS 41168.3 0% MIN 161 5% 161 179 239
CV 26.5958 STD MEAN 16.245 1% 161 205 392

A 95% confidence interval for the mean yield is:

(a) 220.2 ± 1.96(58.6)


(b) 220.2 ± 1.96(16.2)
(c) 220.2 ± 2.18(58.6)
(d) 220.2 ± 2.18(16.2)
(e) 220.2 ± 2.16(16.2)

Solution: d
Past performance 1989 Dec - 60% (25% - b)

17. The effect of salinity upon the growth of grasses is of concern in many
places where excess irrigation is causing salt to rise to the surface. In
order to determine baseline yields, a sample of 24 fields was selected, and
the biomass of grasses in a standard sized plot was measured (kg). The
output from SAS appears below:

2006
c Carl James Schwarz 7
QUANTILES(DEF=4) EXTREMES
N 24 SUM WGTS 24 100% MAX 22.6 99% 22.6 LOW HIGH
MEAN 9.09 SUM 218.3 75% Q3 11.45 95% 22.52 0.7 15.1
STD DEV 6.64 VARIANCE 44.0 50% MED 8.15 90% 21.8 1 19.8
SKEWNE 0.924 KURTO -0.0209 25% Q1 3.775 10% 1.6 2.2 21.3
USS 2998 CSS 1012.73 0% MIN 0.7 5% 0.77 2.2 22.3
CV 72 STD MEAN 1.35 1% 0.7 2.8 22.6
T:MEAN=0 6.7153 PROb>|T| 0.0001 RANGE 21.9

A 95% confidence interval for the mean yield is:

(a) 9.09 ± 1.9600(1.35)


(b) 9.09 ± 2.0639(1.35)
(c) 9.09 ± 2.0639(6.64)
(d) 9.09 ± 2.0687(1.35)
(e) 9.09 ± 2.0687(6.64)

Solution: d
Past performance 1990 Dec - 65%
Past performance 1996 Nov - 82%

18. An electrical firm which manufactures a certain type of bulb wants to


estimate its mean life. Assuming that the life of the light bulb is normally
distributed and that the standard deviation is known to be 40 hours, how
many bulbs should be tested so that we can be 90 percent confident that
the estimate of the mean will not differ from the true mean life by more
than 10 hours?

(a) 7
(b) 44
(c) 8
(d) 62
(e) 87

Solution: b
Past performance 1989 Dec - 70%

19. A study conducted by an airline showed that a random sample of nine


of its passengers disembarking at the Winnipeg airport, took an average
of 24.1 minutes to claim their luggage. From a previous survey it was
willing to assume that time to claim luggage is normally distributed with
a variance of 18 (min 2 ). A 95% confidence interval for the mean time to
claim one’s luggage has endpoints.

2006
c Carl James Schwarz 8
(a) 24.1 ± 8.32
(b) 24.1 ± 3.92
(c) 24.1 ± 2.77
(d) 24.1 ± 3.26
(e) 24.1 ± 9.78

Solution: c

20. Consider the following graph of the mean yield of barley in 1980, 1984,
and 1988 along with a 95% confidence interval.

Which of the following is INCORRECT?


(a) Since the confidence intervals for 1984 and 1980 have considerable
overlap, there is little evidence that the sample means differ.
(b) Since the confidence intervals for 1988 and 1980 do not overlap, there
is good evidence that their respective population means differ.
(c) The sample mean for 1984 is about 195 g/400 m2 .
(d) The sample mean for 1988 is less than the sample mean for 1984.
(e) The estimate of the population mean in 1988 is more precise than
that for 1980 because the confidence interval for 1988 is narrower
than that for 1980.
Solution: a
Past performance 1989 Dec - 30% (41% - e, 20% - b)
Past performance 1990 Dec - 49% (39% - e)
Past performance 1990 Dec - 40% (25% - b, 32% -e)
Past performance 1991 Dec - 79% (13%-b)
Past performance 1996 Nov - 25%

2006
c Carl James Schwarz 9
21. A researcher in biochemistry is attempting to summarize the results of an
experiment. The experiment involved measuring enzyme active under a
variety of conditions. The analysis has yielded the following statistics:

n 10
Median 157.00
Mean 163.50
Variance 45.29
Std. Deviation 6.73
Range 38.00

A 95% confidence interval for the mean enzyme activity is:

(a) (161.4, 165.6)


(b) (154.9, 159.1)
(c) (158.8, 168.2)
(d) (158.7, 168.3)
(e) (152.2, 161.8)

Solution: d
Past performance 1991 Dec - 95%

22. The United States Golf Association (USGA) tests new brands of golf balls
to assure that they meet USGA specifications. One test involves measuring
the average distance traveled when the ball is hit by a machine called
“Iron Byron”. Past tests have indicated that the standard deviation of the
distances “Iron Byron” hits golf balls is 10 meters. How many golf balls
should be hit by “Iron Byron” in order to estimate the mean distance for
a new brand with a 90% confidence interval of WIDTH 2 meters?

(a) 17
(b) 9
(c) 384
(d) 68
(e) 271

Solution: e

23. A student is interested in estimating the average number of showers per


week taken by college students. Based on a preliminary sample he believes
that σ 2 is close to 2.1. How large a sample is needed if his estimate is to
be within 0.3 with probability 0.95.

2006
c Carl James Schwarz 10
(a) 183
(b) 253
(c) 64
(d) 359
(e) 90

Solution: e

24. Recently, a price war has developed among retailers selling Brand X denim
jeans. A major chain buyer wishes to estimate the mean price of these
jeans during this period to compare it to the normal selling price of $20.00.
A random sample of 7 major retailers produces a mean retail price of
$13.50 with a standard deviation of $3.50. A 80% confidence interval for
the true mean retail price of Brand X jeans during the price war is:

(a) (10.93, 16.07)


(b) (8.46, 18.54)
(c) (11.81, 15.19)
(d) (10.00, 17.00)
(e) (11.60, 15.40)

Solution: not available

25. A very simple interval estimator for µ is (Y − 2se → Y + 2se). Which of


the following statements is/are true if the sample size, n, is “large”?

(a) This interval will contain the true value of µ approximately 95 times
out of one hundred.
(b) This interval is an approximate 95% confidence interval for µ
(c) This interval is too narrow to be a useful interval estimator for µ.
(d) This interval will contain the true value of µ 997 time out of 1000.
(e) Both (a) and (b) are true.

Solution: e

26. An engineer is investigating the strength of a new type of fastener. The


only information she has right now is that the strength of a similar fastener
has a standard deviation of 35. Assuming that the new fasteners have the
same standard deviation, how many fasteners should she test so that she
can be 99% confident that the sample mean will be within ś 10 of the true
mean strength? Choose the answer that is closest to your computed value.

2006
c Carl James Schwarz 11
(a) 15
(b) 30
(c) 50
(d) 80
(e) 325

Solution: d - Note that if you use a 3 multiplier for a 99% c.i. you will
get an answer near 110.
The exact multipler for a 99% confidence interval is 2.57 (look for the
99.5th percentile on a normal curve
which gives you an answer of 81.

27. Auditor A is faced with a population of 1,000 accounts (Population A). He


is going to select a random sample of 30 accounts from Population A and
he is going to use the average amount owing in these sampled accounts as
an estimate of the average amount owing in Population A. Auditor B is
faced with a population of 10,000 accounts (Population B). He is going to
select a random sample of 30 accounts from Population B and he is going
to use the average amount owing in these sample accounts as an estimate
of the average amount owing in Population B. Other things being equal:

(a) Auditor A’s estimate will be about 10 times more accurate than
Auditor B’s estimate.
(b) Auditor B’s estimate will be about 10 times more accurate than Au-
ditor A’s estimate.
(c) Auditor A’s estimate will be about 3.16 times more accurate than
Auditor B’s estimate.
(d) Auditor B’s estimate will be about 3.16 times more accurate than
Auditor A’s estimate.
(e) the accuracy of the two estimates will be about the same.

Solution: e
Past performance 1991 Dec - 95%

28. You wish to estimate µ, the average lifetime of a particular type of battery.
You are planning to select n batteries of this type and to operate them
continuously until they fail. You have some feeling that the standard
deviation of the lifetimes should be around 20 hours, and you wish your
estimate of µ to be within 1 hour of µ with probability 0.95. How many
batteries should you select?

(a) 1537
(b) 784

2006
c Carl James Schwarz 12
(c) 40
(d) 77
(e) 1083

Solution: a - The exact answer of 1537 is found using the exact multi-
plier of 1.96 = 97.5th percentile
of the normal curve rather than the approximate multiplier of 2.

29. A statistical procedure to estimate the mean shell thickness of eggs from
chickens contaminated with PCBs obtains a point estimate of 0.70 mm
and an estimated standard error of .05 mm. This means:
(a) The standard deviation of actual shell thickness in the sample was
.05 mm.
(b) We are 95% confident that the sample mean shell thickness is accurate
to with .05 mm.
(c) An estimate of the standard deviation of the sample mean shell thick-
ness over repeated samples is .05 mm
(d) The standard deviation of the population mean over all eggs is about
.05 mm.
(e) An approximate 95% confidence interval for the sample mean shell
thickness is .70mm ± .10mm.
Solution: c - note that e refers to “sample mean”
Past performance 1996 Dec - 34% (13%-d; 45%-e)

2006
c Carl James Schwarz 13
Multiple Choice Questions
Inference - Single sample on proportions

1. A statistician selects a random sample of 200 seeds from a large shipment


of a certain variety of tomato seeds and tests the sample for percentage
germination. If 155 of the 200 seeds germinate, then a 95% confidence
interval for p, the population proportion of seeds that germinate is:

(a) (.726, .824)


(b) (.717, .833)
(c) (.706, .844)
(d) (.713, .844)
(e) (.726, .833)

Solution: b

2. Some scientists believe that a new drug would benefit about half of all peo-
ple with a certain blood disorder. To estimate the proportion of patients
who would benefit from taking the drug, the scientists will administer it to
a random sample of patients who have the blood disorder. What sample
size is needed so that the 95% confidence interval will have a width of
0.06?

(a) 748
(b) 1,068
(c) 1,503
(d) 2,056
(e) 2,401

Solution: b
Past performance 1989 Dec - 74%

1
3. In a random sample of 800 Winnipeg automobile owners, it was found
that 480 would like to see the size of the cars reduced. A 95% confidence
interval for the proportion of all Winnipeg car owners who would like to
see smaller cars is:

(a) (0.566, 0.634)


(b) (0.572, 0.628)
(c) (0.532, 0.667)
(d) (0.555, 0.645)
(e) (0.560, 0.630)

Solution: a
Past performance 1991 Dec - 92%

4. A random sample of 900 individuals has been selected from a large pop-
ulation. It was found that 180 are regular users of vitamins. Thus, the
proportion of the regular users of vitamins in the population is estimated
to be 0.20. An estimate of the standard error of this estimate is:

(a) 0.1600
(b) 0.0002
(c) 0.4000
(d) 0.0133
(e) 0.0267

Solution: d
Past performance 1996 Dec - 86%

5. A Gallup poll of 1089 adults found 326 supported the policies of a particu-
lar political party. A 95% confidence interval for the true level of support
in the entire Canadian population is:

(a) (.270, .330)


(b) (.299, .300)
(c) (.285, .313)
(d) (.267, .332)
(e) (.273, .327)

Solution: e
Past performance 1989 Dec - 81%
Past performance 1990 Dec - 68%
Past performance 1992 Dec - 77% (12%a)
Past performance 1993 Apr - 80% (a-10%)

2006
c Carl James Schwarz 2
6. Refer to the previous question. What sample size would be needed in
order to be 95% confident that the true level of support is within .01 of
the estimated proportion, assuming that the previous poll provides us with
a reasonable estimate of the true support?

(a) 5047
(b) 9604
(c) 1089
(d) 3458
(e) 8068

Solution: e

7. A Gallup poll of a sample of 1089 Canadians (total population of 26,000,000)


found that about 80% favoured capital punishment. A Gallup poll of a
sample of 1089 Americans (total population of 260,000,000) also found
that 80% favoured capital punishment. Which if the following statements
is TRUE?
(a) The Canadian poll is much more accurate because a larger proportion
of the total population was surveyed.
(b) The American poll is more accurate because they have a larger total
population.
(c) Both polls are almost equally precise because they have the same
sample size and the two populations are relatively large.
(d) You cannot compare the precision of the two polls because we do not
know the confidence coefficient used.
(e) Both polls are equally precise because in both polls 871 of respondents
favoured capital punishment.
Solution: c
Past performance 1989 Dec - 88%
Past performance 1990 Dec - 81%
Past performance 1992 Dec - 77% (18%e)
Past performance 1993 Apr - 88%
Past performance 1996 Dec - 92%
Past performance 1998 Dec - 87%

8. A marketing research organization wishes to estimate the proportion of


television viewers who watch a particular prime-time comedy on May 24th.
The proportion is thought to be about .30 . What is the least number of
viewers that should be randomly selected to ensure that a 95% confidence
interval for the true proportion of viewers will have a WIDTH of .06 or
less ?

2006
c Carl James Schwarz 3
(a) 225
(b) 1068
(c) 267
(d) 897
(e) 683

Solution: d

9. A quality control engineer wants to estimate the fraction of defective bulbs


in a large lot of lightbulbs. From past experience, he feels that the actual
fraction of defective bulbs should be somewhere around 0.2 . How large
a sample should be taken if he wants to estimate the true fraction within
.02 using a 95% confidence interval?

(a) 6147
(b) 24587
(c) 38416
(d) 4330
(e) 1537

Solution: e

10. A research analyst for an energy conservation group is interested in the


proportion of air conditioners that have an energy efficiency ratio of at
least 8. He takes a random sample of 400 owners of air conditioners and
finds that 240 own air conditioners with energy efficiency ratio of at least
8. The width of the 95% confidence interval of the true proportion of air
conditioners that have an energy efficiency ratio of at least 8 is:
q 240
(1− 240 )
(a) 1.96 400 400 400
q 240
(1− 240 )
(b) 1.645 400 400 400
q 240
(1− 240 )
(c) 2(1.96) 400 400 400
q 240
(1− 240 )
(d) 2(1.645) 400 400 400
240
(e) r
240 (1− 240 )
400±1.96 400 400
400

Solution: c

2006
c Carl James Schwarz 4
11. Many television viewers express doubts about the validity of certain com-
mercials. In an attempt to answer their critics, the Timex Corporation
wishes to estimate the proportion of consumers who believe what is shown
in Timex television commercials. Let p represent the true proportion of
consumers who believe what is shown in Timex television commercials. If
Timex has no prior information regarding the true value of p, how many
consumers should be included in their sample so that they will be 85%
confident that their estimate is within 0.03 of the true value of p ?

(a) 400
(b) 12
(c) 576
(d) 384
(e) 544

Solution: not available

12. The 3ůM company started a new recreation program for its employees in
the hope that a little recreation would improve an employee’s performance
at work. To determine whether the high cost of the program is justified,
the president of the company wishes to estimate the proportion of the
employees who participate in the recreational activities. In a random
sample of 200 employees, 60 were found to regularly participate in the
recreation program. A 95% confidence interval for the true proportion of
3-M employees who participate in the new recreation program is:

(a) (0.231, 0.369)


(b) (0.298, 0.302)
(c) (0.267, 0.333)
(d) (0.247, 0.353)
(e) (0.237, 0.364)

Solution: e

13. A random sample of married people were asked “Would you remarry your
spouse if you were given the opportunity for a second time?”; Of the
150 people surveyed, 127 of them said that they would do so. Find a
95% confidence interval for the proportion of married people who would
remarry their spouse.

(a) 0.847 ś 0.002


(b) 0.847 ś 0.029

2006
c Carl James Schwarz 5
(c) 0.847 ś 0.048
(d) 0.847 ś 0.058
(e) 0.847 ś 0.113

Solution: d
Past performance 1990 Dec - 83%

14. A music buff wants to estimate the percentage of students at the University
of Manitoba who believe that Elvis is still alive. How many students should
he include in a random sample if he wants a 90% confidence interval that
is less than 10 percentage points wide? Choose the sample size that is
closest to your solution

(a) 68
(b) 97
(c) 269
(d) 385
(e) 1022

Solution: c - The multiplier for a 90% confidence interval = 1.645 =


95th percentile of a normal curve(why?).
As well, the WIDTH is .10 which gives a plus/minus size of .05. Because
the actual proportion is
not known, use .5. This gives n = 1.6452 (.5)(.5)/.052 = 270.

15. You would like to estimate the percentage of “regular users of vitamins”
in a large population and you would like your estimate to be accurate to
within 4 percentage points, 19 times out of 20. Approximately how large
should your sample size be?

(a) 600
(b) 2400
(c) 400
(d) 1000
(e) 150

Solution: a
Past performance 1990 Dec - 37% (14% - b, 14% -c, 27% - c)
Past performance 1992 Dec - 78% (13%-b)

16. In order for the confidence interval in the previous question to be valid:

2006
c Carl James Schwarz 6
(a) we must assume that we have a random sample from a normal pop-
ulation.
(b) we must assume that we have a random sample from some population
(but it need not be a normal population because of the Central Limit
Theorem).
(c) we must assume that the population is normal (but we do not require
a random sample because of the Central Limit Theorem).
(d) we do not need to assume that the population is normal nor that the
sample is random (because of the Central Limit Theorem).
(e) we must assume that we have a random sample from a dichotomous
population.

Solution: b - the Wonderful CLT (it will change your life) strikes again.

17. A political poll of Canadians was conducted to investigate their opinions


on gun control. Each person was asked if they were in favor or gun control
or not in favor of gun control - non respondents were removed from the
results. The survey found that 25% of people contacted were not in favor
of gun control laws. These results were accurate to within 3 percentage
points, 19 times out of 20. Which of the following is NOT CORRECT?
(a) The 95% confidence interval is approximately from (22% to 28%).
(b) We are 95% confidence that the true proportion of people not in favor
is within 3 percentage points of 25%.
(c) In approximately 95% of polls on this issue, the confidence interval
will include 25%.
(d) If another poll of similar size were taken, the percentage of people
IN FAVOR of gun control would likely range from 72% to 78%.
(e) A properly designed poll of the same size in the United States would
have the same margin of error.
Solution: c
Past performance 1998 Nov - 25% (10% a; 15% b; 33% d; 14% e)

18. A 95% confidence interval for p the proportion of Canadian beer drinkers
who prefer Lion Red was found to be (0.236 to 0.282). Which of the
following is correct?
(a) About 95% of beer drinkers have between a 23.6% and a 28.2% chance
of drinking Lion Red.
(b) There is a 95% probability that the sample proportion lies between
0.236 and 0.282.

2006
c Carl James Schwarz 7
(c) If a second sample was taken, there is a 95% chance that its confidence
interval would contain 0.25.
(d) This confidence interval indicates that we would likely reject the hy-
pothesis H: p=0.25.
(e) we are reasonably certain that the true proportion of beer drinkers
who prefer Lion Red is between 24% and 28%.
Solution: e
Past performance 1998 Dec - 71% (15% c)

19. Refer to the previous question. Suppose that the same poll was repeated
in the United States (whose population is 10 times larger than Canada),
but in this new pool, four times the number of people were interviewed.
The resulting 95% confidence intervals will be:
(a) about 1/2 as wide as the Canadian interval
(b) about 1/4 as wide as the Canadian interval
(c) about 1/10 as wide as the Canadian interval
(d) about 4/10 times as wide as the Canadian interval
(e) the same size as the Canadian interval
Solution: a
Past performance 1998 Dec - 38% (30% b; 20% e)
If you increase the sample size by a factor of x, the ci decreases in width
by sqrt(x)
The easiest way to see this is to simply compute the two se.

20. Suppose that we wish to estimate the proportion of Canadians who ac-
tually understand the Constitution of Canada. What is the approximate
number of Canadians who need to be sampled so that the 95% confidence
interval has a width of 2 percentage points?
(a) about 500
(b) about 1,000
(c) about 2,500
(d) about 5,000
(e) about 10,000
Solution: e
Past performance 1998 Dec - 42% (15% b; 28% c)

2006
c Carl James Schwarz 8
Multiple Choice Questions
Inference - Two independent samples on means

1. In a study of iron deficiency among infants, random samples of infants


following different feeding programs were compared. One group contained
breast-fed infants, while the children in another group were fed by a stan-
dard baby formula without any iron supplements. Here are summary
results of blood hemoglobin levels at 12 months of age.

Group Sample Size Sample Mean Sample Std. Deviation


Breast-fed 8 13.3 1.7
Formula-fed 10 12.4 1.8

A 98% confidence interval for the mean difference in hemoglobin level


between the two populations of infants is:

(a) 0.9 ± 1.94


(b) 0.9 ± 2.08
(c) 0.9 ± 2.13
(d) 0.9 ± 2.15
(e) 0.9 ± 1.63

Solution: d
Past performance 1989 Dec - 64% (14% a,c)
Past performance 1990 Dec - 73%

2. A study was conducted to investigate the effectiveness of a new drug for


treating Stage 4 AIDS patients. A group of AIDS patients was randomly
divided into two groups. One group received the new drug; the other
group received a placebo. The difference in mean subsequent survival
(those with drugs - those without drugs) was found to be 1.04 years and
a 95% confidence interval was found to be 1.04 ± 2.37 years. Based upon
this information:
(a) We can conclude that the drug was effective because those taking the
drug lived, on average, 1.04 years longer.

1
(b) We can conclude that the drug was ineffective because those taking
the drug lived, on average, 1.04 years less.
(c) We can conclude that there is no evidence the drug was effective
becaue the 95% confidence interval covers zero.
(d) We can conclude that there is evidence the drug was effective because
the 95% confidence interval does not cover zero.
(e) We can make no conclusions because we do not know the sample size
nor the actual mean survival of each group.
Solution: c
Past performance 1990 Dec - 79%
Past performance 1998 Dec - 77%
Past performance 2006 Dec - 85%

3. Samples of hamburger were selected from two different outlets of a large


supermarket to measure the percentage of fat present in the meat, with
the following summary data.

Outlet 1 Outlet 2
n 5 10
mean 10.3 10.7 percent
std.dev 1.6 2.3 percent

It is reasonable to believe that both outlets have the same variability.


Hence, the pooled standard deviation is:

(a) 1.95
(b) 2.08
(c) 4.38
(d) 2.09
(e) 2.11

Solution: e
Past performance 1989 Dec - 72%

4. The degrees of freedom of the pooled estimate in the previous question is:

(a) 15
(b) 13
(c) 7.5
(d) 5
(e) 10

2006
c Carl James Schwarz 2
Solution: b
Past performance 1989 Dec - 90%

5. A study was conducted to estimate the effectiveness of doing assignments


in an introductory statistics course. Students in one section taught by
instructor A received no assignments. Students in another section taught
by instructor B, received assignments. The final grade of each student was
recorded. A 95% confidence interval for the difference in the mean grades
(Section A - Section B) was computed to be −3.5 ± 1.8. This means:

(a) There is evidence that doing assignments improves the average grade
because the difference in the population means is less than zero.
(b) There is little evidence that doing assignments improves the average
grade because the 95% confidence interval does not cover 0.
(c) There is evidence that doing assignments improves the average grade
because the 95% confidence interval does not cover 0.
(d) There is evidence that doing assignments does not improve the aver-
age grade because the 95% confidence interval does not cover 0.
(e) There is little evidence that doing assignments does not improve the
average grade because the 95% confidence interval does cover 0.

Solution: c
Past performance 1989 Dec - 73%

6. Popular wisdom is that eating pre-sweetened cereal tends to increase the


number of dental caries (cavities) in children. A sample of children was
(with parental consent) entered into a study and followed for several years.
Each child was classified as a sweetened-cereal lover or a non-sweetened
cereal lover. At the end of the study, the amount of tooth damage was
measured. Here is the summary data:

Group n mean std. dev


Sugar Bombed 10 6.41 5.0
No sugar 15 5.20 15.0

An approximate 95% confidence interval for the difference in the mean


tooth damage is:
q
5
(a) (6.41 − 5.20) ± 2.26 10 + 15
15
q
(b) (6.41 − 5.20) ± 2.26 25 225
10 + 15
q
(c) (6.41 − 5.20) ± 1.96 25 225
10 + 15

2006
c Carl James Schwarz 3
q
146 146
(d) (6.41 − 5.20) ± 2.07 10 + 15
q
146 146
(e) (6.41 − 5.20) ± 1.96 10 + 15

Solution: b
Past performance 1990 Dec - 55%

7. An experiment was conducted to compare the efficacies of two drugs in


the prevention of tapeworms in the stomachs of a new breed of sheep.
Samples of size 5 and 8 from each breed were given the drug and the two
sample means were 28.6 and 40.0 worms/sheep. From previous studies, it
is known that the variances in the two groups are 198 and 232, respectively,
and that the number of worms in the stomachs has an approximate normal
distribution. A 95% confidence interval for the the difference in the mean
number of worms per sheep is:

(a) −11.4 ± 18.6


(b) 11.4 ± 18.2
(c) −11.4 ± 17.9
(d) 11.4 ± 16.2
(e) −11.4 ± 16.6

Solution: d
Past performance 1989 Dec - 43% (27% -a)

8. A researcher wants to see if birds that build larger nests lay larger eggs.
She selects two random samples of nests: one of small nests and the other
of large nests. She weighs one egg from each nest. The data are summa-
rized below.

small nests large nests


sample size 60 159
sample mean (g) 37.2 35.6
sample variance 24.7 39.0

A 95% confidence interval for the difference between the average mass of
eggs in small and large nests.

(a) 1.6 ± 1.33 = (0.27, 2.93)


(b) 1.6 ± 1.48 = (0.12, 3.08)
(c) 1.6 ± 1.59 = (0.01, 3.19)
(d) 1.6 ± 1.76 = (−0.16, 3.36)

2006
c Carl James Schwarz 4
(e) 1.6 ± 7.31 = (−5.71, 8.91)

Solution: c
Past performance 1992 Dec - 82%

9. Refer to the previous question. We wish to be 95% confident of being


within 1.0 g of the true value. What is the approximate sample size is
needed for each group?

(a) 240
(b) 60
(c) 8000
(d) 2000
(e) 125

Solution: a
Past performance 1992 Dec - 79%

The following 2 questions refer to the following situation


A researcher wants to see if birds that build larger nests lay larger eggs.
She selects two random samples of nests: one of small nests and the other
of large nests. She measures one egg from each nest. The data are sum-
marized below.

2006
c Carl James Schwarz 5
10. Refer to the 95% confidence interval circled on the output. This means:
(a) We are 95% confident that the sample mean egg size in large nests is
between 37 and 40 mm if the survey was repeated.
(b) If the survey was repeated, we are 95% confident that eggs sizes in
large nests are between 37 and 40 mm.
(c) We are 95% confident that nests will be have large eggs between 37
and 40 mm if the survey was repeated.
(d) We are 95% confident that the true mean eggs size for large nests is
between 37 and 40 mm.
(e) We are 95% confident that repeated surveys will have population
means between 37 and 40 mm.
Solution: d
Past performance 2006 Dec - 61% (19%-a; 12%-b)

11. Which of the following is NOT CORRECT?

2006
c Carl James Schwarz 6
(a) Because the 95% confidence interval for the difference in means in-
cludes zero, there is no evidence of a difference in the mean egg size.
(b) Because the one-sided p-value is .18, there is no evidence of a differ-
ence in mean egg sizes.
(c) Because the confidence intervals for the two groups have a great deal
of overlap, there is no evidence of a difference in the mean egg size.
(d) Because the individual values of the eggs sizes for the two groups
have a great deal of overlap, there is no evidence of a difference in
the means.
(e) Because the 95% confidence intervals for the mean eggs sizes are
approximately equal in width, the two estimates are about equally
precise.
Solution: d
Past performance 2006 Dec - 58% (14%-a; 15%-b; 19%-d)

2006
c Carl James Schwarz 7
Multiple Choice Questions
Inference - Two independent samples on
proportions

1. Two surveys were conducted before and after the recent Autopac rate
increases to find the proportion of voters who state they would vote for
the current government. The results were as follows:

Week 1 Week 2 Total over both weeks


No. surveyed 400 600 1000
No. in favor
of current gov’t 150 150 300

An approximate 95% confidence interval for the change in support is:


q
(a) (.375 − .250) ± 1.96 (.375)(.625)
400 + (.250)(.750)
600
q
(.375)(.625) (.250)(.750)
(b) (.375 − .250) ± 1.96 1000 + 1000
q
(c) (.375 − .250) ± 1.96 (.300)(.700)
400 + (.300)(.700)
600
q
(d) (.375 − .250) ± 1.96 (.300)(.700)
1000 + (.300)(.700)
1000
q
(e) (.375 − .250) ± 1.96 (.375)(.625)
500 + (.250)(.750)
500

Solution: a
Past performance 1992 Dec - 97%

2. The above confidence intevals are of the order ś6 percentage points. What
sample size for each poll would be needed so that we are 95% confident
of being within 2 percentage points of the true difference assuming that
the above proportions are reasonable estimates of the proportions in the
population?

(a) 6,000

1
(b) 1,000
(c) 15,000
(d) 2,000
(e) 4,000

Solution: e
Past performance 1992 Dec - 73%

3. Two surgical procedures are widely used to treat a certain type of cancer.
To compare the success rates of the two procedures, a random sample
from each type of procedure is obtained, and the number of patients with
no reoccurrence of the disease after 1 year was recorded. Here is the data.

n No occurrence
Procedure A 100 78
Procedure B 120 102

A 95% confidence interval for the difference in success rates is:

(a) .07 ± .053


(b) .07 ± .0054
(c) .07 ± .103
(d) .07 ± .115
(e) .07 ± .059

Solution: c
Past performance 1989 Dec - 78%

4. There may be a cure for male pattern baldness (at least millions of males
hope there will be) using the blood pressure drug Minoxidil. A group of
males was randomly assigned to two groups. One group received topi-
cal applications of the drug; the other group received applications of an
identical looking placebo. The summary data

Number with
Sample Size New $H_A$ir Growth
Minoxidil group 310 100
Placebo group 100 25

A 95% confidence interval for the difference in the proportion of males


showing new hair growth is:

(a) .073 ± .152

2006
c Carl James Schwarz 2
(b) .073 ± .048
(c) .073 ± .024
(d) .073 ± .051
(e) .073 ± .099

Solution: e

5. A new insect spray, type A, is to be compared with a spray, Type B,


that is currently in use. Two rooms of equal size are sprayed with the
same amount of spray, one room with Type A and the other with Type
B. Two hundred insects are released into each room, and after one hour
the numbers of dead insects are counted. The results are given in the
following table:

SPRAY A SPRAY B
Total number of insects 200 200
Total number of dead insects 140 100

A 90% confidence interval for the difference in the rates of kill for the two
sprays, is:
q
.46
(a) .2 ± 1.645 200
q
.48
(b) .2 ± 1.645 200
q
.46
(c) .2 ± 1.96 200
q
.48
(d) .2 ± 1.96 200
q
.48
(e) .2 ± 2.326 200

Solution: a
Past performance 1990 Dec - 78%

6. Two vaccines against measles are being tested. It is important to know


the difference in success rate very accurately, i.e. to be 95% sure that the
estimated difference is within 0.01 of the true difference. If both vaccines
are expected to have an approximate success rate of 80%, then the required
sample size for each group is obtained by solving:
q
(a) .01 = 1.96 .8(.2)
n + n
.8(.2)

q
(b) .02 = 1.96 .8(.2)
n + n
.8(.2)

2006
c Carl James Schwarz 3
q
(c) .01 = 1.96 .5(.5)
n +
.5(.5)
n
q
(d) .02 = 1.96 .5(.5)
n +
.5(.5)
n

(e) none of the above

Solution: a
Past performance 1989 Dec - 80%

7. Two vaccines against measles are being tested. It is important to know


the difference in success rate very accurately, i.e. to be 95% sure that the
estimated difference is within 0.01 of the true difference. If both vaccines
are expected to have an approximate success rate of 80%, then the required
sample size is:

(a) about 750 in each group for a total of 1500 people.


(b) about 1500 in each group for a total of 3000 people.
(c) about 3000 in each group for a total of 6000 people.
(d) about 6000 in each group for a total of 12000 people.
(e) about 12000 in each group for a total of 24000 people.

Solution: e
Past performance 1990 Dec - 32% ( 12% - b, 14% - c, 36% - d, 31% - e)

2006
c Carl James Schwarz 4
Multiple Choice Questions
Probability - Binomial

1 Probability - Binomial distribution


1. A random sample of 15 people is taken from a population in which 40%
favour a particular political stand. What is the probability that exactly 6
individuals in the sample favour this political stand?
(a) 0.4000
(b) 0.5000
(c) 0.4000
(d) 0.2066
(e) 0.0041
Solution: d

2. Experience has shown that a certain lie detector will show a positive read-
ing (indicates a lie) 10% of the time when a person is telling the truth and
95% of the time when a person is lying. Suppose that a random sample of
5 suspects is subjected to a lie detector test regarding a recent one-person
crime. Then the probability of observing no positive reading if all suspects
plead innocent and are telling the truth is
(a) 0.409
(b) 0.735
(c) 0.00001
(d) 0.591
(e) 0.99999
Solution: d

1
1 PROBABILITY - BINOMIAL DISTRIBUTION

3. It has been estimated that about 30% of frozen chicken contain enough
salmonella bacteria to cause illness if improperly cooked. A consumer
purchases 12 frozen chickens. What is the probability that the consumer
will have more than 6 contaminated chickens?
(a) .961
(b) .118
(c) .882
(d) .039
(e) .079
Solution: d
Past performance 1989 Dec - 74%
Past performance 1990 Oct - 68%
Past performance 1992 Oct - 93%
Past performance 1997 Aug - 91%

4. Refer to the previous question. Suppose that a supermarket buys 1000


frozen chickens from a supplier. Find an approximate 95% interval for the
number of frozen chickens that may be contaminated.
(a) (90, 510)
(b) (285, 315)
(c) (0, 730)
(d) (270, 330)
(e) (255, 345)
Solution: d
Past performance 1990 Oct - 74%
Past performance 1997 Aug - 81% (13%-b)

5. Which of the following is NOT an assumption of the Binomial distribu-


tion?
(a) All trials must be identical.
(b) All trials must be independent.
(c) Each trial must be classified as a success or a failure.
(d) The number of successes in the trials is counted.
(e) The probability of success is equal to .5 in all trials.
Solution: e
Past performance 1990 Oct - 84%
Past performance 1996 Nov - 97%

2006
c Carl James Schwarz 2
1 PROBABILITY - BINOMIAL DISTRIBUTION

6. It has been estimated that as many as 70% of the fish caught in certain
areas of the Great Lakes have liver cancer due to the pollutants present.
Find an approximate 95% range for the number of fish with liver cancer
present in a sample of 130 fish.
(a) (80, 102)
(b) (86, 97)
(c) (63, 119)
(d) (36, 146)
(e) (75, 107)
Solution: a
Past performance 1989 Dec - 83%
Past performance 1991 Oct - 56% (11%d, 20% e)
Past performance 1992 Oct - 78%

7. In a triangle test a tester is presented with three food samples, two of


which are alike, and is asked to pick out the odd one by testing. If a tester
has no well developed sense and can pick the odd one only, by chance,
what is the probability that in five trials he will make four or more correct
decisions?
(a) 11/243
(b) 1/243
(c) 10/243
(d) 233/243
(e) 232/243
Solution: a

8. The probability that a certain machine will produce a defective item is


1/4. If a random sample of 6 items is taken from the output of this
machine, what is the probability that there will be 5 or more defectives in
the sample?
(a) 1/4096
(b) 3/4096
(c) 4/4096
(d) 18/4096
(e) 19/4096
Solution: e

2006
c Carl James Schwarz 3
1 PROBABILITY - BINOMIAL DISTRIBUTION

9. The probability that a certain machine will produce a defective item is


0.20. If a random sample of 6 items is taken from the output of this
machine, what is the probability that there will be 5 or more defectives in
the sample?
(a) .0001
(b) .0154
(c) .0015
(d) .2458
(e) .0016

Solution: e

10. Suppose 60% of a herd of cattle is infected with a particular disease. Let Y
= the number of non-diseased cattle in a sample of size 5. The distribution
of Y is
(a) binomial with n = 5 and p = 0.6
(b) binomial with n = 5 and p = 0.4
(c) binomial with n = 5 and p = 0.5
(d) the same as the distribution of X, the number of infected cattle.
(e) Poisson with λ = .6
Solution: b

11. Fifteen percent of new residential central air conditioning units installed
by a supplier need additional adjustments requiring a service call. Assume
that a recent sample of seven such units constitutes a Bernoulli process.
Interest centers on X, the number of units among these seven that need
additional adjustments. The mean and variance of X are, respectively
(a) .15; .85
(b) .15; 1.05
(c) .15; .8925
(d) 1.05; .1275
(e) 1.05; .8915
Solution: e - remember variance = (std dev) squared

12. If you buy one ticket in the Provincial Lottery, then the probability that
you will win a prize is 0.11. If you buy one ticket each month for five
months, what is the probability that you will win at least one prize?

2006
c Carl James Schwarz 4
1 PROBABILITY - BINOMIAL DISTRIBUTION

(a) 0.55
(b) 0.50
(c) 0.44
(d) 0.45
(e) 0.56
Solution: c

13. Suppose that the probability that a cross between two varieties will express
a particular gene is 0.20. What is the probability that in 8 progeny plants,
two or fewer plants will express the gene?
(a) .2936
(b) .3355
(c) .1678
(d) .6291
(e) .7969
Solution: e
Past performance 1989 Oct - 95%

14. Refer to the previous question. Suppose that 120 crosses are bred. Find
a likely 95% range for the number of progeny that will express the gene.
(a) 24ś19.2
(b) 24ś4.4
(c) 24ś8.8
(d) 24ś4.9
(e) 24ś9.8
Solution: c
Past performance 1989 Oct - 65%

15. Seventeen people have been exposed to a particular disease. Each one
independently has a 40% chance of contracting the disease. A hospital
has the capacity to handle 10 cases of the disease. What is the probability
that the hospital’s capacity will be exceeded?
(a) .965
(b) .035
(c) .989
(d) .011

2006
c Carl James Schwarz 5
1 PROBABILITY - BINOMIAL DISTRIBUTION

(e) .736
Solution: b
Past performance 1991 Oct - 75%
Past performance 1993 Feb - 59% (c-14%; d-14%)
Past performance 1993 Apr - 70%
Past performance 1996 Nov - 90%
Past performance 1998 Nov - 88%

16. Refer to the previous problem. Planners need to have enough beds avail-
able to handle a proportion of all outbreaks. Suppose a typical outbreak
has 100 people exposed, each with a 40% chance of coming down with the
disease. Which is not correct:
(a) This experiment satisfies the assumptions of a binomial distribution.
(b) About 95% of the time, between 30 and 50 people will contract the
disease.
(c) Almost all of the time, between 25 and 55 people will contract the
disease.
(d) On average, about 40 people will contract the disease.
(e) Almost all of time, less than 40 people will be infected.
Solution: e
Past performance 1993 Feb - 73% (d-13%)
Past performance 1996 Nov - 80% (d- 8%)
Past performance 1998 Nov - 87%

17. There are 10 patients on the Neo-Natal Ward of a local hospital who are
monitored by 2 staff members. If the probability (at any one time) of a
patient requiring emergency attention by a staff member is .3, assuming
the patients to be behave independently, what is the probability at any
one time that there will not be sufficient staff to attend all emergencies?
(a) .3828
(b) .3000
(c) .0900
(d) .9100
(e) .6172
Solution: e

18. A newborn baby whose Apgar score is over 6 is classified as normal and
this happens in 80% of births. As a quality control check, an auditor
examined the records of 100 births. He would be suspicious if the number

2006
c Carl James Schwarz 6
1 PROBABILITY - BINOMIAL DISTRIBUTION

of normal births in the sample of 100 births fell above the upper limit of
a “95%-normal-range”. What is this upper limit?
(a) 112
(b) 72
(c) 88
(d) 8
(e) none of these
Solution: c
Past performance ???? 73% (18% -e)

19. Refer to the previous question. Babies that have Apgar scores of 6 or lower
require more expensive medical care. What is the probability that in the
next 10 births, 3 or more babies will have Apgar scores of 6 or lower?
(a) .2013
(b) .3222
(c) .9999
(d) .0001
(e) .1536
Solution: b
Past performance ???? 48% (19%-c; 11%-d; 14%-e)

20. Newsweek in 1989 reported that 60% of young children have blood lead
levels that could impair their neurological development. Assuming that a
class in a school is a random sample from the population of all children at
risk, the probability that at least 5 children out of 10 in a sample taken
from a school may have a blood level that may impair development is:
(a) about .25
(b) about .20
(c) about .84
(d) about .16
(e) about .64
Solution: c
Past performance 1998 Dec - 80%

21. Refer to the previous problem. The total number of children in the school
is about 400. In order to estimate the cost of treating all the children at
one school, the health board wishes to be reasonably sure of the upper
limit on the number of children affected. This upper limit is:

2006
c Carl James Schwarz 7
1 PROBABILITY - BINOMIAL DISTRIBUTION

(a) about 260


(b) about 350
(c) about 240
(d) about 400
(e) about 250
Solution: a
Past performance 1998 Dec - 72% (15% c)

22. Consider 8 blood donors chosen randomly from a population. The prob-
ability that the donor has type A blood is .40. Which of the following is
CORRECT?
(a) The probability of 1 or fewer donors having type A blood is about
.11.
(b) The probability of 7 or more donors NOT having type A blood is
about .0087.
(c) The probability of exactly 5 donors having type A blood is about .28.
(d) The probability of exactly 5 donors NOT having type A blood is
about .12.
(e) The probability that between 3 and 5 donors (inclusive) will have
type A blood is about .37.
Solution: a
Past performance 2006 Nov - 84%
Past performance 2006 Dec - 79%

23. Consider 100 blood donors chosen randomly from a population where the
probability of type A is 0.40? What is the approximate probability that
at least 43 donors will have type A blood?
(a) about .43
(b) about .62
(c) about .73
(d) about .27
(e) about .38
Solution: d
Past performance 2006 Nov - 64%
Past performance 2006 Dec - 58% (27%-c)

2006
c Carl James Schwarz 8
Multiple Choice Questions
Probability - Expected Value

1. Cans of soft drinks cost $0.30 in a certain vending machine. What is the
expected value and variance of daily revenue (Y) from the machine, if X,
the number of cans sold per day has E(X) = 125, and V ar(X) = 50 ?
(a) E(Y ) = 37.5 , V ar(Y ) = 50
(b) E(Y ) = 37.5 , V ar(Y ) = 4.5
(c) E(Y ) = 37.5 , V ar(Y ) = 15
(d) E(Y ) = 37.5 , V ar(Y ) = 15
(e) E(Y ) = 125 , V ar(Y ) = 4.5
Solution: b - remember variance = (std dev)2

2. A crop insurance company establishes the following loss table based upon
previous claims

percent loss | 0 25 50 100


probability | .90 .05 .02 ????

If they write policy that pays a maximum of $150/hectare, their expected


loss in $/hectare is approximately:
(a) 5.2
(b) 7.9
(c) 4.5
(d) 37.5
(e) 25.0
Solution: b
Past performance 1990 Oct - 57%
Past performance 1992 Oct - 92%
Past performance 2006 Nov - 68%

1
3. A rock concert producer has scheduled an outdoor concert. If it is warm
that day, she expects to make a $20,000 profit. If it is cool that day, she
expects to make a $5,000 profit. If it is very cold that day, she expects to
suffer a $12,000 loss. Based upon historical records, the weather office has
estimated the chances of a warm day to be .60; the chances of a cool day
to be .25. What is the producer’s expected profit?
(a) $5,000
(b) $13,000
(c) $15,050
(d) $13,250
(e) $11,450
Solution: e
Past performance 1989 Apr - 92%
Past performance 1997 Aug - 93%

4. A restaurant manager is considering a new location for her restaurant.


The projected annual cash flow for the new location is:

Annual
Cash Flow $10,000 $30,000 $70,000 $90,000 $100,000
Probability 0.10 0.15 0.50 0.15 ?

The expected cash flow for the new location is:


(a) $12,800
(b) $64,000
(c) $70,000
(d) $60,000
(e) $50,000
Solution: b
Past performance 1997 Jul - 99%

5. An insurance company has estimated the following cost probabilities for


the next year on a particular model of car:

cost | $0 $500 $1000 $2000


prob | .60 .05 .13 ????

The expected cost to the insurance company is (approximately):


(a) $155

2006
c Carl James Schwarz 2
(b) $595
(c) $875
(d) $645
(e) $495

Solution: b
Past performance 1989 Oct - 91%
Past performance 1991 Oct - 90%
Past performance 1993 Feb - 96%
Past performance 1996 Dec - 96%

6. Before planting a crop for the next year, a producer does a risk assess-
ment. According to her assessment, she concludes that there are three
possible net outcomes: a $7,000 gain, a $4,000 gain, or a $10,000 loss with
probabilities 0.55, 0.20 and 0.25 respectively. The expected profit is:
(a) $3,850
(b) $0
(c) $2,150
(d) $2,500
(e) $800
Solution: c
Past performance 1992 Dec - 97%

7. A business evaluates a proposed venture as follows. It stands to make a


profit of $10,000 with probability 3/20, to make a profit of $5,000 with
probability 9/20, to break even with probability 1/4 and to lose $5,000
with probability 3/20. The expected profit in dollars is:
(a) 1,500
(b) 0
(c) 3,000
(d) 3,250
(e) - 1,500
Solution: c
Past performance 1989 Dec - 96%

8. The average length of stay in a hospital is useful for planning purposes.


Suppose that the following is the distribution of the length of stay in a
hospital after a minor operation:

2006
c Carl James Schwarz 3
Days 2 3 4 5 6
Prob .05 .20 .40 .20 ?

The average length of stay is:


(a) .15
(b) .17
(c) 3.3
(d) 4.0
(e) 4.2

Solution: e
Past performance 1993 Apr - 74% (a-13%)
Past performance 1996 Dec - 92%
Past performance 1998 Dec - 95%

9. An insurance company issues a policy on a small boat under the following


conditions: The replacement cost ($5000) will be paid for a total loss. If
it is not a total loss, but the damage is more than $2000, then $1500 will
be paid. Nothing will be paid for damage costing $2000 or less and of
course nothing is paid out if there is no damage. The company estimates
the probability of the first three events as .02, .10, and .30 respectively.
The amount the company should charge if it wishes to make a profit of
$50 above the expected amount paid out in a year is:
(a) $250
(b) $201
(c) $300
(d) $1200
(e) $165
Solution: c
Past performance 1998 Nov - 77%

2006
c Carl James Schwarz 4
Multiple Choice Questions
Probability - General

1. The probability that the Red River will flood in any given year has been
estimated from 200 years of historical data to be one in four. This means:
(a) The Red River will flood every four year.
(b) In the next 100 years, the Red River will flood exactly 25 times.
(c) In the last 100 years, the Red River flooded exactly 25 times.
(d) In the next 100 years, the Red River will flood about 25 times.
(e) In the next 100 years, it is very likely that the Red River will flood
exactly 25 times.

2. The chances that you will ticketed for illegal parking on campus are about
1/3. During the last nine days, you have illegally parked every day and
have NOT been ticketed (you lucky person)! Today, on the 10th day, you
again decide to park illegally. The chances that you will be caught are:
(a) greater than 1/3 because you were not caught in the last nine days.
(b) less than 1/3 because you were not caught in the last nine days.
(c) still equal to 1/3 because the last nine days do not affect the proba-
bility.
(d) equal to 1/10 because you were not caught in the last nine days.
(e) equal to 9/10 because you were not caught in the last nine days.

3. The chance that a person will contract AIDS after a sexual contact with
an infected partner has been estimated to be 1/4. This means:
(a) A person will be infected after exactly 4 sexual contacts with infected
partners.
(b) Of 1000 people having sexual contacts with infected partners, exactly
250 will become infected.
(c) Of 200 people having sexual contacts with infected partners, about
50 will become infected.

1
(d) In exactly 25% of all sexual contacts with infected partners, the in-
fection will spread.
(e) Of 20 people having sexual contacts with infected partners, it is very
likely that exactly 5 people will become infected.
4. A random variable Y has the following distribution:

Y | -1 0 1 2
P(Y)| 3C 2C 0.4 0.1

The value of the constant C is:


(a) 0.10
(b) 0.15
(c) 0.20
(d) 0.25
(e) 0.75
5. A random variable X has a probability distribution as follows:

r | 0 1 2 3
P(R=r) | 2k 3k 13k 2k

Then the probability that P r(X < 2.0) is equal to


(a) .90
(b) .25
(c) .65
(d) .15
(e) 1.00
6. Suppose that the allele for tallness (T) is dominant over shortness (t); that
for Yellow (Y) is dominant over green (y); and that for roundness (W) is
dominant over wrinkled(w). Suppose we cross two plants with genotypes
TTYyWw and TtYyWw. The probability of a Tall, Yellow, Round plant
is:
(a) 9/16
(b) 3/32
(c) 1/16
(d) 9/32
(e) 3/16

2006
c Carl James Schwarz 2
7. It has been estimated that about 20% of people between the ages of 18
and 25 have used marijuana in the last year. Which of the following is
CORRECT about this statement?
(a) Five people of this age group were randomly selected. This means
that exactly one of them must have used marijuana in the last year.
(b) Twenty people were randomly selected from this age group. Eighteen
of them use marijuana in the last year. The next person selected at
random will have a lower probability of using marijuana.
(c) Ten people were randomly selected from this age group. None of
them have used marijuana in the last year. The next person selected
must have a higher probability of using marijuana in the last year.
(d) A thousand people from this age group were randomly selected. It is
not unusual to find that 217 of them have used marijuana in the last
year.
(e) A million people from this age group were randomly selected. There
must be exactly 200,000 of them that have used marijuana in the last
year.

The following two questions refer to the following situation.


All human blood can be “ABO” typed as belonging to one of A, B, O, or
AB types. The actual distribution varies slightly among different groups
of people, but for a randomly chosen person from North America, the
following are the approximate probabilities:
Blood type O A B AB
Probability .45 .40 .11 .04
8. Consider an accident victim with type B blood. She can only receive a
transfusion from a person with type B or type O blood. What is the
probability that a randomly chosen person will be suitable donor?
(a) about .11
(b) about .04
(c) about .15
(d) about .45
(e) about .56
9. What is the probability that both people in a couple will have the SAME
blood type if matings are random with respect to blood type, i.e. one
partner’s blood type does not influence the blood type of the other partner.

2006
c Carl James Schwarz 3
(a) about .21
(b) about .16
(c) about .002
(d) about .01
(e) about .38

2006
c Carl James Schwarz 4
Multiple Choice Questions
Probability - General

1. The probability that the Red River will flood in any given year has been
estimated from 200 years of historical data to be one in four. This means:
(a) The Red River will flood every four year.
(b) In the next 100 years, the Red River will flood exactly 25 times.
(c) In the last 100 years, the Red River flooded exactly 25 times.
(d) In the next 100 years, the Red River will flood about 25 times.
(e) In the next 100 years, it is very likely that the Red River will flood
exactly 25 times.
Solution: d
Past performance 1989 Oct - 90%
Past performance 1990 Dec - 99%

2. The chances that you will ticketed for illegal parking on campus are about
1/3. During the last nine days, you have illegally parked every day and
have NOT been ticketed (you lucky person)! Today, on the 10th day, you
again decide to park illegally. The chances that you will be caught are:
(a) greater than 1/3 because you were not caught in the last nine days.
(b) less than 1/3 because you were not caught in the last nine days.
(c) still equal to 1/3 because the last nine days do not affect the proba-
bility.
(d) equal to 1/10 because you were not caught in the last nine days.
(e) equal to 9/10 because you were not caught in the last nine days.
Solution: c
Past performance 1989 Oct - 96%

3. The chance that a person will contract AIDS after a sexual contact with
an infected partner has been estimated to be 1/4. This means:

1
(a) A person will be infected after exactly 4 sexual contacts with infected
partners.
(b) Of 1000 people having sexual contacts with infected partners, exactly
250 will become infected.
(c) Of 200 people having sexual contacts with infected partners, about
50 will become infected.
(d) In exactly 25% of all sexual contacts with infected partners, the in-
fection will spread.
(e) Of 20 people having sexual contacts with infected partners, it is very
likely that exactly 5 people will become infected.

Solution: c
Past performance 1989 Dec - 88%
Past performance 1990 Oct - 94%
Past performance 1991 Oct - 95%

4. A random variable Y has the following distribution:

Y | -1 0 1 2
P(Y)| 3C 2C 0.4 0.1

The value of the constant C is:

(a) 0.10
(b) 0.15
(c) 0.20
(d) 0.25
(e) 0.75
Solution: a

5. A random variable X has a probability distribution as follows:

r | 0 1 2 3
P(R=r) | 2k 3k 13k 2k

Then the probability that P r(X < 2.0) is equal to


(a) .90
(b) .25
(c) .65
(d) .15

2006
c Carl James Schwarz 2
(e) 1.00
Solution: b

6. Suppose that the allele for tallness (T) is dominant over shortness (t); that
for Yellow (Y) is dominant over green (y); and that for roundness (W) is
dominant over wrinkled(w). Suppose we cross two plants with genotypes
TTYyWw and TtYyWw. The probability of a Tall, Yellow, Round plant
is:
(a) 9/16
(b) 3/32
(c) 1/16
(d) 9/32
(e) 3/16
Solution: a
Past performance 1992 Oct 78%

7. It has been estimated that about 20% of people between the ages of 18
and 25 have used marijuana in the last year. Which of the following is
CORRECT about this statement?

(a) Five people of this age group were randomly selected. This means
that exactly one of them must have used marijuana in the last year.
(b) Twenty people were randomly selected from this age group. Eighteen
of them use marijuana in the last year. The next person selected at
random will have a lower probability of using marijuana.
(c) Ten people were randomly selected from this age group. None of
them have used marijuana in the last year. The next person selected
must have a higher probability of using marijuana in the last year.
(d) A thousand people from this age group were randomly selected. It is
not unusual to find that 217 of them have used marijuana in the last
year.
(e) A million people from this age group were randomly selected. There
must be exactly 200,000 of them that have used marijuana in the last
year.
Solution: d
Past performance 2006 Nov - 91%

2006
c Carl James Schwarz 3
The following two questions refer to the following situation.
All human blood can be “ABO” typed as belonging to one of A, B, O, or
AB types. The actual distribution varies slightly among different groups
of people, but for a randomly chosen person from North America, the
following are the approximate probabilities:
Blood type O A B AB
Probability .45 .40 .11 .04
8. Consider an accident victim with type B blood. She can only receive a
transfusion from a person with type B or type O blood. What is the
probability that a randomly chosen person will be suitable donor?

(a) about .11


(b) about .04
(c) about .15
(d) about .45
(e) about .56
Solution: e
Past performance 2006 Nov - 96%

9. What is the probability that both people in a couple will have the SAME
blood type if matings are random with respect to blood type, i.e. one
partner’s blood type does not influence the blood type of the other partner.
(a) about .21
(b) about .16
(c) about .002
(d) about .01
(e) about .38
Solution: e
Past performance 2006 Nov - 73%
Past performance 2006 Dec - 85%

2006
c Carl James Schwarz 4
Multiple Choice Questions
Normal approximations to discrete distributions

1. The National Broomball League claims to have a balanced league; that is,
for any given game each team has an equal chance of winning or losing with
no ties. Assuming the claim is true, what is the approximate probability
that a given team will lose more than 61 games out of the 100 played?
(a) 0.0500
(b) 0.4918
(c) 0.0107
(d) 0.0082
(e) 0.0164
Solution: c

2. The probability of getting a parking ticket when not paying for a 2-hour
period is 0.3. What is the probability of getting at least 60 tickets if you
park on 250 occasions for a 2-hour period and don’t pay?
(a) 0.016
(b) 0.019
(c) 0.98
(d) 0.93
(e) 0.072
Solution: c

3. A professional basketball player sinks 80% of his foul shots, in the long
run. If he gets 100 tries during a season, then the probability that he sinks
between 75 and 90 shots (inclusive) is approximately equal to:
(a) P r(−1.25 ≤ Z ≤ 2.5)
(b) P r(−1.125 ≤ Z ≤ 2.625)

1
(c) P r(−1.125 ≤ Z ≤ 2.375)
(d) P r(−1.375 ≤ Z ≤ 2.375)
(e) P r(−1.375 ≤ Z ≤ 2.625)
Solution: e

4. Suppose in the University of Manitoba, 30% of the students live in apart-


ments. If 200 students are randomly selected, then the probability that the
number of them living in apartments will be between 50 and 75 inclusive,
is:

(a) .9167
(b) .9298
(c) .9390
(d) .9268
(e) .9208

Solution: c

5. If X has a binomial distribution with n = 400 and p = .4, the approximate


probability of the event {155 < X < 175} is:

(a) 0.6552
(b) 0.6429
(c) 0.6078
(d) 0.6201
(e) 0.6320
Solution: c

6. If in the previous question we change the interval to {155 ≤ X ≤ 175},


the approximate probability is;

(a) 0.4
(b) larger than that in the previous question
(c) smaller than that in the previous question
(d) equal to that in the previous question
(e) may be smaller or larger than that in the previous question
Solution: b

2008
c Carl James Schwarz 2
7. Companies are interested in the demographics of those who listen to the
radio programs they sponsor. A radio station has determined that only
20% of listeners phoning in to a morning talk program are male. During
a particular week, 200 calls are received by this program. What is the
approximate probability that at least 50 of the callers are male?
(a) .0466
(b) .0212
(c) .1168
(d) .1402
(e) Not within ś .01 of any of the above.
Solution: a

8. The unemployment rate in a certain city is 8.5%. A random sample of 100


people from the labour force is drawn. Find the approximate probability
that the sample contains at least ten unemployed people.
(a) .3879
(b) .3245
(c) .3419
(d) .2946
(e) .3594
Solution: e

9. A politician has targeted 100 homes to visit during a week. From past
experience, 50 percent of the households answer the bell and invite him
in. Of this, 80 percent will agree with his policies. The approximate
probability that the politician will get support from at least 45 households
during a week is:
(a) 0.1991
(b) 0.3212
(c) 0.8643
(d) 0.1376
(e) 0.1788
Solution: d

10. People who have been in contact with a carrier of a disease, have a 40%
chance of contracting the disease. Suppose that the carrier of the dis-
eases may have infected a school with 500 people. Find the approximate
probability that at least 215 people will contract the disease.

2008
c Carl James Schwarz 3
(a) .09
(b) .91
(c) between .05 and .34
(d) 1.37
(e) between 2.5% and 17%
Solution: a
Past performance 1993 Apr - 40% (b-22%, c-22%)

2008
c Carl James Schwarz 4
Multiple Choice Questions
Probability - Normal distribution

1. One of the side effects of flooding a lake in northern boreal forest areas
(e.g. for a hydro-electric project) is that mercury is leached from the soil,
enters the food chain, and eventually contaminates the fish. The concen-
tration in fish will vary among individual fish because of differences in
eating patterns, movements around the lake, etc. Suppose that the con-
centrations of mercury in individual fish follows an approximate normal
distribution with a mean of 0.25 ppm and a standard deviation of 0.08
ppm. Fish are safe to eat if the mercury level is below 0.30 ppm. What
proportion of fish are safe to eat?
(a) 63%
(b) 23%
(c) 73%
(d) 27%
(e) 37%
Solution: c
Past performance 1992 Dec - 45% (16%a, 22%b, 15%d)
Past performance 1993 Apr - 57% (a-17%; d-17%)
Past performance 1996 Nov - 93%
Past performance 1997 Aug - 84%
Past performance 2006 Dec - 91%

2. Refer to the previous question. The Department of Fisheries and Oceans


wishes to know the mercury level of the top 20% of the fish. The appro-
priate percentile and mercury level for this lake is:
(a) 20th percentile has a value of −0.84 ppm
(b) 20th percentile has a value of 0.18 ppm
(c) 80th percentile has a value of 0.32 ppm
(d) 80th percentile has a value of 0.84 ppm

1
(e) 20th percentile has a value of 0.07 ppm
Solution: c
Past performance 1992 Dec - 46% (28%-b, 15%-d)
Past performance 1997 Aug - 77% (13%-d)
Past performance 2006 Dec - 84% (11%-c)

3. The following graph is a normal probability plot for the amount of rainfall
in acre-feet obtained from 26 randomly selected clouds that were seeded
with silver oxide:

(a) The data appear to show exponential growth; that is, the amount
of rainfall increases exponentially as the amount of silver oxide in-
creases.
(b) The pattern suggests that the measurement is not normally dis-
tributed.
(c) A least squares regression line should be fitted to the rainfall variable.
(d) It can be expected that the histogram of rainfall amount will look
like the normal curve.
(e) The shape of the curve suggests that rainfall is caused by seeding the
clouds with silver oxide.

Solution: not available

4. Marks on a Chemistry test follow a normal distribution with a mean of


65 and a standard deviation of 12. Approximately what percentage of the
students have scores below 50?
(a) 11%
(b) 89%
(c) 15%

2006
c Carl James Schwarz 2
(d) 18%
(e) 39%
Solution: a

5. Refer to the preceding question. What is the approximate 90th percentile


of the mark distribution?
(a) 80
(b) 90
(c) 85
(d) 75
(e) 95
Solution: a

6. The marks on a statistics test are normally distributed with a mean of 62


and a variance of 225. If the instructor wishes to assign B’s or higher to
the top 30% of the students in the class, what mark is required to get a
B or higher?
(a) 68.7
(b) 71.5
(c) 73.2
(d) 74.6
(e) 69.9
Solution: e
Past performance 1989 Dec - 50% (25% -d, 10% -b,c)
Past performance 1991 Oct - 67% (10% c, 14% d)

7. The grade point averages of students at the University of Manitoba are


approximately normally distributed with mean equal to 2.4 and standard
deviation equal to 0.8. What fraction of the students will possess a grade
point average in excess of 3.0 ?
(a) .7500
(b) .6000
(c) .2734
(d) .2500
(e) .2266

2006
c Carl James Schwarz 3
Solution: e
Past performance 1989 Dec - 52% (18% c,d)
Past performance 1989 Apr - 50% (C-23%, D-18%)
Past performance 1991 Dec - 80% (c-13%)

8. In some courses (but certainly not in an intro stats course!), students are
graded on a “normal curve”. For example, students within ś 0.5 stan-
dard deviations of the mean receive a C; between 0.5 and 1.0 standard
deviations above the mean receive a C+; between 1.0 and 1.5 standard
deviations above the mean receive a B; between 1.5 and 2.0 standard de-
viations above the mean receive a B+, etc. The class average in an exam
was 60 with a standard deviation of 10. The bounds for a B grade and the
percentage of students who will receive a B grade if the marks are actually
normal distributed are:
(a) (65, 75), 24.17%
(b) (70, 75), 18.38%
(c) (70, 75), 9.19%
(d) (65, 75), 12.08%
(e) (70, 75), 6.68%
Solution: c
Past performance 1997 Jul - 85%

Refer to the previous question. Another Instructor decides that the lower
B cutoff should be the 70th percentile. The lower-cutoff for a B grade is:
(a) 70
(b) 65
(c) 60
(d) 75
(e) 80
Solution: b
Past performance 1997 Jul - 71% (14%-a)

9. The diameters of steel disks produced in a plant are normally distributed


with a mean of 2.5 cm and standard deviation of .02 cm. The probability
that a disk picked at random has a diameter greater than 2.54 cm is about:
(a) .5080
(b) .2000
(c) .1587

2006
c Carl James Schwarz 4
(d) .0228
(e) .4920
Solution: d

10. Suppose the test scores of 600 students are normally distributed with a
mean of 76 and standard deviation of 8. The number of students scoring
between 70 and 82 is:
(a) 272
(b) 164
(c) 260
(d) 136
(e) 328
Solution: e

11. Bolts that are used in the construction of an electric transformer are sup-
posed to be 0.060 inches in diameter, and any bolt with diameter less than
0.058 inches or greater than 0.062 inches must be scrapped. The machine
that makes these bolts is set to produce bolts of 0.060 inches in diameter,
but it actually produces bolts with diameters following a normal distribu-
tion with µ = 0.060 inches and σ = 0.001 inches. The proportion of bolts
that must be scrapped is equal to:
(a) 0.0456
(b) 0.0228
(c) 0.9772
(d) 0.3333
(e) 0.1667
Solution: a

12. The cost of treatment per patient for a certain medical problem was mod-
eled by one insurance company as a normal random variable with mean
$775 and standard deviation $150. What is the probability that the treat-
ment cost of a patient is less than $1,000, based on this model?

(a) .5000
(b) .6826
(c) .8531
(d) .9332

2006
c Carl James Schwarz 5
(e) Cannot be computed without knowledge of additional parameters
Solution: d

13. The time that a skier takes on a downhill course has a normal distribution
with a mean of 12.3 minutes and standard deviation of 0.4 minutes. The
probability that on a random run the skier takes between 12.1 and 12.5
minutes is:
(a) 0.1915
(b) 0.3830
(c) 0.3085
(d) 0.6170
(e) 0.6826
Solution: b

14. It is known that the resistance of carbon resistors is normally distributed


with µ = 1200 ohms and σ = 120 ohms. What proportion of the resistors
have resistances that differ from the mean resistance by more than 120
ohms?
(a) 0.9544
(b) 0.3413
(c) 0.1587
(d) 0.6826
(e) 0.3174
Solution: e

15. The time required to assemble an electronic component is normally dis-


tributed with a mean of 12 minutes and a standard deviation of 1.5 min.
Find the probability that a particular assembly takes more than 14.25
minutes.
(a) .9332
(b) .0668
(c) .3413
(d) .4332
(e) .1587
Solution: b

2006
c Carl James Schwarz 6
16. Heights of males are approximately normally distributed with a mean of
170 cm and a standard deviation of 8 cm. What fraction of males are
taller than 176 cm?
(a) .7500
(b) .6000
(c) .2734
(d) .2500
(e) .2266
Solution: e
Past performance 1990 Oct - 68%
Past performance 1993 Feb - 87%
Past performance 1998 Dec - 92%

17. The height of an adult male is known to be normally distributed with


mean of 175 cm and standard deviation 6 cm. The 20th percentile of the
distribution of heights is:
(a) 175
(b) 179
(c) 170
(d) 172
(e) 174
Solution: c

18. The heights of students at a college are normally distributed with a mean
of 175 cm and a standard deviation of 6 cm. One might expect in a sample
of 1000 students that the number with heights less than 163 cm is:
(a) 997
(b) 23
(c) 477
(d) 228
(e) 456
Solution: b
Past performance 1991 Oct - 62% (12% c, 20% d)
Past performance 1996 Dec - 83% (11% d)
Past performance 2006 Nov - 84%

2006
c Carl James Schwarz 7
19. The height of an adult male is known to be normally distributed with a
mean of 69 inches and a standard deviation of 2.5 inches. The height of
the doorway such that 96 percent of the adult males can pass through it
without having to bend is:
(a) 1.8
(b) about 65
(c) about 74
(d) about 80
(e) about 58
Solution: c
Past performance 2006 Nov - 96%

20. The distribution of weights in a large group is approximately normally


distributed. The mean is 80 kg. and approximately 68% of the weights
are between 70 and 90 kg. The standard deviation of the distribution of
weights is equal to:
(a) 20
(b) 5
(c) 40
(d) 50
(e) 10
Solution: e

21. The distribution of weights of a large group of high school students is


normally distributed with µ = 55 kg and σ = 5 kg. Which of the following
is true?
(a) About 16 percent of the students will be over 60 kg.
(b) About 2.5 percent will be below 45 kg.
(c) Half of them can be expected to weigh less than 55 kg.
(d) About 5 percent will weigh more than 63 kg.
(e) All the above are true.
Solution: e

22. The daily milk production of Guernsey cows is approximately normally


distributed with a mean of 35 kg/day and a standard deviation of 6 kg/day.
The probability that a days production for a single animal will be less than
28 kg. is approximately:

2006
c Carl James Schwarz 8
(a) .41
(b) .09
(c) .38
(d) .12
(e) .62
Solution: d
Past performance 1990 Dec - 66%

23. Refer to the previous question. The producer is concerned when the milk
production of a cow falls below the 5th percentile because the animal
may be ill. The 5th percentile (in kg) of the daily milk production is
approximately:
(a) 1.645
(b) -1.645
(c) 33.36
(d) 25.13
(e) 44.87
Solution: d
Past performance 1990 Dec - 64%

24. Which of the following is NOT CORRECT about a standard normal dis-
tribution?
(a) P (0 ≤ Z ≤ 1.50) = .4332
(b) P (Z ≤ −1.0) = .1587
(c) P (Z ≥ 2.0) = .0228
(d) P (Z ≤ 1.5) = .9332
(e) P (Z ≥ −2.5) = .4938
Solution: e
Past performance 1989 Dec - 78%
Past performance 1990 Oct - 76%

25. The measurement of the width of the index finger of a human right hand
is a normally distributed variable with a mean of 6 cm. and a standard
deviation of 0.5 cm. What is the probability that the finger width of a
randomly selected person will be between 5 cm. and 7.5 cm.?
(a) .9759

2006
c Carl James Schwarz 9
(b) .0241
(c) .9500
(d) 1.000
(e) not within ś 0.001 of these

Solution: a

26. Lice are a pesky problem for school aged children and is unrelated to
cleanliness. The lifetimes of lice that have fallen off the scalp onto bed-
ding is approximately normally distributed with a mean of 2.2 days and a
standard deviation of 0.4 days. We would expect that approximately 90%
of the lice would die within:
(a) about 2.6 days
(b) about 3.9 days
(c) about 2.5 days
(d) about 2.7 days
(e) about 3.0 days
Solution: d
Past performance 1998 Nov - 67% (23% e)

2006
c Carl James Schwarz 10
Multiple Choice Questions
Probability - Poisson

1 Probability - Poisson distribution


1. It is sometimes possible to obtain approximate probabilities associated
with values of a random variable by using the probability distribution of a
different random variable. For example, binomial probabilities using the
Poisson probability function, binomial probabilities using the normal etc.
In order for the Poisson to give “good” approximate values for binomial
probabilities we must have the condition(s) that:
(a) the population size is large relative to the sample size.
(b) the sample size is large
(c) the probability, p, is small and the sample size is large
(d) the probability, p, is close to .5 and the sample size is large
(e) the probability, p, is close to .5 and the population size is large
Solution: c

2. Suppose flaws (cracks, chips, specks, etc.) occur on the surface of glass
with density of 3 per square metre. What is the probability of there being
exactly 4 flaws on a sheet of glass of area 0.5 square metre?
(a) 0.047
(b) 0.168
(c) 0.981
(d) 0.815
(e) 0.647
Solution: a

1
1 PROBABILITY - POISSON DISTRIBUTION

3. The rate at which a particular defect occurs in lengths of plastic film being
produced by a stable manufacturing process is 4.2 defects per 75 metre
length. A random sample of the film is selected and it was found that the
length of the film in the sample was 25 metres. What is the probability
that there will be at most 2 defects found in the sample?
(a) .2102
(b) .2417
(c) .8335
(d) .1323
(e) .1665
Solution: c
Past performance 1997 Jul - 86%

Refer to the previous question. The manufacturer decides to examine a


larger amount of film. She selects 1000 m of film. If there were no change
in the defect rate from the old process, what would be the number of
defects seen in approximately 95% of such examinations?
(a) (49 to 63)
(b) (34 to 78)
(c) (62 to 98)
(d) (41 to 71)
(e) (71 to 89)
Solution: d
Past performance 1997 Jul - 67% (21% - a)

4. The number of traffic accidents per week in a small city has a Poisson
distribution with mean equal to 1.3. What is the probability of at least
two accidents in 2 weeks?
(a) 0.2510
(b) 0.3732
(c) 0.5184
(d) 0.7326
(e) 0.4816
Solution: d

5. The number of traffic accidents per week in a small city has Poisson dis-
tribution with mean equal to 3. What is the probability of at least one
accident in 2 weeks?

2006
c Carl James Schwarz 2
1 PROBABILITY - POISSON DISTRIBUTION

(a) 0.0174
(b) 0.9502
(c) 0.9975
(d) 0.1991
(e) 0.0025
Solution: c

6. Significant birth defects occur at a rate of about 4 per 1000 births in human
populations. After a nuclear accident, there were 10 defects observed in
the next 1500 births. Find the probability of observing at least 10 defects
in this sample if the rate had not changed after the accident.
(a) .008
(b) .003
(c) .041
(d) .084
(e) .042
Solution: d
Past performance 1990 Oct - 58%
Past performance 1991 Dec - 66% (c-17%)
Past performance 1996 Nov - 79% (c-12%)

7. Refer to the previous question. An approximate 95% interval for the


number of defects that would occur in 1500 births (assuming that the rate
has not changed) is:

(a) (4, 8)
(b) (2, 10)
(c) (2, 6)
(d) (0, 8)
(e) (0, 12)
Solution: b
Past performance 1990 Oct - 78%
Past performance 1996 Dec - 77% (10%-a)

8. In a certain communications system, there is an average of 1 transmission


error per 10 seconds. Let the distribution of transmission errors be Pois-
son. What is the probability of more than 1 error in a communication
one-half minute in duration?

2006
c Carl James Schwarz 3
1 PROBABILITY - POISSON DISTRIBUTION

(a) 0.950
(b) 0.262
(c) 0.738
(d) 0.199
(e) 0.801
Solution: e

9. Bacteria in hamburger are distributed through out the meat. Suppose


that a large batch of hamburger has an average contamination of 0.3 bac-
teria/gram. Then the probability that a 10 gram sample will contain one
or fewer bacteria is:

(a) .2222
(b) .7408
(c) .9603
(d) .1494
(e) .1992

Solution: e
Past performance 1989 Oct - 89%
Past performance 1991 Oct - 84%
Past performance 1997 Aug - 92%

10. Refer to the previous question. A 95% range for the likely number of
bacteria present in a 100 g sample is:
(a) 30ś30.0
(b) 30ś5.5
(c) 30ś11.0
(d) 30ś16.4
(e) 30ś2.8
Solution: c
Past performance 1989 Oct - 77%
Past performance 1991 Oct - 71% (19% b)
Past performance 1997 Aug - 85%

11. The number of bacteria in a drop of water from a lake has a Poisson
distribution with an average of 0.5 bacteria/drop. A small dish containing
four drops of water from the lake is placed under a microscope. The
probability of observing at most one bacteria in the sample is

2006
c Carl James Schwarz 4
1 PROBABILITY - POISSON DISTRIBUTION

(a) 0.910
(b) 0.406
(c) 0.271
(d) 0.135
(e) 0.303
Solution: b
Past performance 1989 Dec - 75%
Past performance 1992 Oct - 82%
Past performance 2006 Dec - 74% (11%-a;)

12. Refer to the previous question. An approximate 95% range for the number
of bacteria present in 400 drops of water is:
(a) (171,229)
(b) (361,439)
(c) (185,215)
(d) (157,243)
(e) (0,400)
Solution: a
Past performance 1989 Dec - 70%
Past performance 1992 Oct - 87%
Past performance 2006 Dec - 75% (16%-c)

13. Which of the following is NOT applicable to a Poisson Distribution?


(a) It is used to compute the probability of rare events.
(b) Every event is independent of every other event.
(c) It is parameterized by the sample size and the probability that an
event will occur.
(d) The theoretical range for the number of events that could occur is
0,1,2,3, ...
(e) In order to compute the parameter value, we need to know the stan-
dardized rate and the sample size.
Solution: c
Past performance 1996 Nov - 56% (25%-d; 14%-e)

14. In a biological cell the average member of genes that will change into
mutant genes, when treated radioactively, is 2.4. Assuming Poisson prob-
ability distribution find the probability that there are at most 3 mutant
genes in a biological cell after the radioactive treatment.

2006
c Carl James Schwarz 5
1 PROBABILITY - POISSON DISTRIBUTION

(a) .2090
(b) .7576
(c) .5697
(d) .7787
(e) 1.000
Solution: d

15. The number of telephone calls that pass through a switchboard has a
Poisson distribution with mean equal to 2 per minute. The probability
that no telephone calls pass through the switch board in two consecutive
minutes is:
(a) 0.2707
(b) 0.0517
(c) 0.0183
(d) 0.0366
(e) 0.1353
Solution: c

16. The distribution of phone calls arriving in one minute periods at a switch-
board is assumed to be Poisson with the parameter λ. During 100 periods,
the following distribution was obtained:

# (calls) 0 1 2 3 4 or more
Frequency 30 43 21 6 0

An estimate for λ based on this data set is:


(a) 1.00
(b) 1.03
(c) 1.04
(d) 1.33
(e) 1.37
Solution: b

17. A can company reports that the number of breakdowns per 8-hour shift
on its machine-operated assembly line follows a Poisson distribution with
a mean of 1.5. Assuming that the machine operates independently across
shifts, what is the probability of no breakdowns during three consecutive
8-hour shifts?

2006
c Carl James Schwarz 6
1 PROBABILITY - POISSON DISTRIBUTION

(a) .0744
(b) .0498
(c) .6065
(d) .2231
(e) .0111
Solution: e

18. A fisherman arrives at his favorite fishing spot. From past experience
he knows that the number of fish he catches per hour follows a Poisson
distribution at 0.5 fish/hour. The probability that he catches at least 3
fish in four hours is:
(a) .0126
(b) .0144
(c) .1804
(d) .3233
(e) .8571
Solution: d

19. The number of arrivals per hour at an automatic teller machine is Poisson
distributed with a mean of 3.5 arrivals/hour. What is the probability that
more than three arrivals occur in an hour?
(a) .3209
(b) .4633
(c) .5367
(d) .6791
(e) .7246
Solution: b

20. The marketing manager of a company has noted that she usually receives
10 complaint calls during a week (consisting of five working days), and
that the calls occur at random. Let us suppose that the number of calls
during a week follows the Poisson distribution. The probability that she
gets five such calls in one day is:
(a) .0361
(b) .0378

2006
c Carl James Schwarz 7
1 PROBABILITY - POISSON DISTRIBUTION

(c) .9834
(d) .2000
(e) .5
Solution: a

21. Cataracts are a very rare birth defect. In Canada, they occur at a rate
of approximately 3 babies in every 100,000 births. In 1989, there were
approximately 57,000 births in BC. The probability that more than 5
babies will be born with cataracts is approximately:
(a) about .1080
(b) about .0295
(c) about .0216
(d) about .0080
(e) about .0839
Solution: d
Past performance 1998 Nov - 78% (13% a)
Past performance 2006 Nov - 82% (10% b)

22. The number of deaths due to stroke in the Vancouver region each year
varies randomly with a mean of about 555 deaths per year. Assuming
that the number of deaths has an approximate Poisson distribution, then
the probability that there will be at least 600 deaths due to stroke in any
one year is:
(a) about 1%
(b) about 32%
(c) about 16%
(d) about 5%
(e) about 2.5%
Solution: e
Past performance 1998 Nov - 41% (10% a; 14% b; 20% c; 15% d)
Past performance 2006 Nov - 84%

23. The number of babies born with a particular severe eye defect each year
varies randomly, but at a rate of about 30/10,000 live births. Last year
there were about 15,000 live births. The approximate probability that
there will be more than 58 babies born with this eye defect is:
(a) about 16%

2006
c Carl James Schwarz 8
1 PROBABILITY - POISSON DISTRIBUTION

(b) about 5%
(c) about 1%
(d) about 0.5%
(e) about 2.5%

Solution: e
Past performance 1998 Dec - 65% (12% d)

2006
c Carl James Schwarz 9
Multiple Choice Questions
Correlation

1. A research study has reported that there is a correlation of r = −0.59


between the eye color (brown, green, blue) of an experimental animal and
the amount of nicotine that is fatal to the animal when consumed. This
indicates:
(a) nicotine is less harmful to one eye color than the others.
(b) the lethal dose of nicotine goes down as the eye color of the animal
changes.
(c) one must always consider the eye color of animals in making state-
ments about the effect of nicotine consumption.
(d) the researchers need to do further study to explain the causes of this
negative correlation.
(e) the researchers need to take a course in statistics because correlation
is not an appropriate measure of association in this situation.
Solution: e - correlation cannot be computed with nominal variables
Past performance 1997 Jun - 98%

2. If the correlation between body weight and annual income were high and
positive, we could conclude that:

(a) high incomes cause people to eat more food.


(b) low incomes cause people to eat less food.
(c) high income people tend to spend a greater proportion of their income
on food than low income people, on average.
(d) high income people tend to be heavier than low income people, on
average.
(e) high incomes cause people to gain weight.

Solution: d
Past performance 1991 Dec - 70% (c-25%)
Past performance 1993 Apr - 75% (c-25%)

1
3. A study found a correlation of r = −0.61 between the sex of a worker and
his or her income. You conclude that:

(a) women earn more than men on average.


(b) women earn less than men on average.
(c) an arithmetic mistake was made; this is not a possible value of r.
(d) this is nonsense because r makes no sense here.
(e) the correlation of −0.61 is not meaningful here because the relation-
ship between sex and income is likely nonlinear.

Solution: d
Past performance 1993 Feb - 60% (e-33%)

4. A study examined the relationship between the sepal length and sepal
width for two varieties of an exotic tropical plant. Varieties A and B are
represented by x’s and o’s, respectively, in the following plot:

Which of the following statements is FALSE?

(a) Considering variety A alone, there is a negative correlation between


sepal length and sepal width.
(b) Considering variety B alone, the least squares regression line for pre-
dicting sepal length from sepal width has a negative slope.
(c) Considering both varieties together, there is a positive correlation
between sepal length and sepal width.
(d) Considering each variety separately, there is a positive correlation
between sepal length and sepal width.
(e) Considering both varieties together, the least squares regression line
for predicting sepal length from sepal width has a positive slope.

Solution: d

2006
c Carl James Schwarz 2
5. From tax records, it is relative easy to determine the amount of liquor
consumed per capita and the number of cigarettes consumed per capita
for each of the 10 provinces of Canada. These are plotted on a scatter
plot and a high positive correlation is found. Which of the following is
correct?

(a) This implies that heavy smoking causes people to drink more.
(b) This implies that heavy drinking causes people to smoke more.
(c) We cannot conclude cause and effect, but this also implies that there
is a high positive correlation between cigarette smoking and alcohol
consumption for individuals.
(d) This could be an example of a correlation caused by a common cause
because both activities are highly correlated with average family in-
come and average income varies widely among the provinces.
(e) We cannot conclude cause and effect, but this also implies that the
same individuals both smoke and consume liquor.

Solution: d
Past performance 1993 Feb - 44% (c-44%; e-10%)

6. The correlation coefficient provides:

(a) a measure of the extent to which changes in one variable cause


changes in another variable.
(b) a measure of the strength of the linear association between two cat-
egorical variables.
(c) a measure of the strength of the association (not necessarily linear)
between two categorical variables.
(d) a measure of the strength of the linear association between two quan-
titative variables.
(e) a measure of the strength of the linear association between a quanti-
tative variable and a categorical variable.

Solution: d

7. On May 11th, 50 randomly selected subjects had their systolic blood pres-
sure (SBP) recorded twice – the first time at about 9:00 a.m. and the
second time at about 2:00 p.m. If one were to examine the relationship
between the morning and afternoon readings, then one might expect:

(a) the correlation to be near zero, as the morning and afternoon readings
should be independent of one another.

2006
c Carl James Schwarz 3
(b) the correlation to be high and positive, as those with relatively high
readings in the morning will tend to have relatively high readings in
the afternoon.
(c) the correlation to be high and negative, as those with relatively high
readings in the morning will tend to have relatively low readings in
the afternoon.
(d) the correlation to be near zero, as correlation measures the strength
of the linear association.
(e) the correlation to be near zero, as blood pressure readings should
follow approximately a normal distribution.

Solution: b
Past performance 1996 Dec - 62% (23%-d)
Past performance 1998 Oct - 68%

8. Men tend to marry women who are slightly younger than themselves.
Suppose that every man married a woman who was exactly .5 of a year
younger than themselves. Which of the following is CORRECT?
(a) The correlation is −.5.
(b) The correlation is .5.
(c) The correlation is 1.
(d) The correlation is −1.
(e) The correlation is 0
Solution: c - Draw a scatterplot of various aged men and their wives
Past performance 2006 Oct - 75% (10%-e)

2006
c Carl James Schwarz 4
Multiple Choice Questions
Least squares

1. Given that we have collected pairs of observations on two variables X


and Y , we would consider fitting a straight line with X as an explanatory
variable if:

(a) the change in Y is an additive constant.


(b) the change in Y is a constant for each unit change in X
(c) the change in Y is a fixed percent of Y
(d) the change in Y is exponential
(e) none of the above

Solution: b

2. The least squares regression line is the line:


(a) which is determined by use of a function of the distance between the
observed Y ’s and the predicted Y ’s.
(b) which has the smallest sum of the squared residuals of any line
through the data values.
(c) for which the sum of the residuals about the line is zero.
(d) which has all of the above properties
(e) which has none of the above properties.
Solution: b

3. The following information was obtained from the manager of a city water
department for predicting the consumption of water (in gallons) from the
size of household:

1
Household Water
Size Used
(x) (y)
2 650
7 1200
9 1300
4 430
12 1400
6 900
9 1800
3 640
3 793
2 925

Here
P are the summary statistics:
P X = 57,
P Y 2= 10, 038,
P X2 = 433,
P Y = 11, 641, 474,
XY = 67, 669

The equation of the least squares regression for water consumption on


household size is given by:

(a) Yb = 97053.7 + 96.692X


(b) Yb = 999.220 + 0.803X
(c) Yb = −1.0028 + 0.0067X
(d) Yb = 452.66 + 96.692X
(e) Yb = 1003.8 − 96.692X

Solution: not available

4. For children between the ages of 18 months and 29 months, there is approx-
imately a linear relationship between “height” and “age”. The relationship
can be represented by: Yb = 64.93 + 0.63(x), where Y represents height
(in centimetres) and X represents age (in months). Joseph is 22.5 months
old and is 80 centimetres tall. What is Joseph’s residual?

(a) 79.1
(b) -0.9
(c) +0.9
(d) 56.6
(e) 64.93

2006
c Carl James Schwarz 2
Solution: c

5. For children, there is approximately a linear relationship between “height”


and “age”. One child was measured monthly. Her height was 75 cm at 3
years of age and 85 cm when she was measured 18 months later. A least-
squares line was fit to her data. The slope of this line is approximately:

(a) 0.55 cm/m


(b) 10 cm/m
(c) 25 cm/m
(d) 1.57 cm/m
(e) 2.1 cm/m

Solution: a
Past performance 1993 Feb - 72% (b-16%)
Past performance 1996 Oct - 96%

6. There is an approximate linear relationship between the height of females


and their age (from 5 to 18 years) described by:

height = 50.3 + 6.01(age)

where height is measured in cm and age in years. Which of the following


is not correct?

(a) The estimated slope is 6.01 which implies that children increase by
about 6 cm for each year they grow older.
(b) The estimated height of a child who is 10 years old is about 110 cm.
(c) The estimated intercept is 50.3 cm which implies that children reach
this height when they are 50.3/6.01=8.4 years old.
(d) The average height of children when they are 5 years old is about
50% of the average height when they are 18 years old.
(e) My niece is about 8 years old and is about 115 cm tall. She is taller
than average.

Solution: c
Past performance 1993 Apr - 83%
Past performance 1997 Jun - 96%

7. A study was conducted to examine the quality of fish after seven days in
ice storage. For this study:

2006
c Carl James Schwarz 3
Y = measurement of fish quality (on a 10 point scale with 10 = BEST.)
X = # of hours after being caught that the fish were packed in ice.

The sample linear regression line is: Yb = 8.5 − .5X. From this we can say
that:

(a) A one hour delay in packing the fish in ice decreases the estimated
quality by .5
(b) A one hour delay in packing the fish in ice increases the estimated
quality by .5
(c) If the estimated quality increases by 1 then the fish have been packed
in ice one hour sooner.
(d) If the estimated quality increases by 1 the fish have been packed in
ice two hours later.
(e) Can’t really say until we see a plot of the data.

Solution: a

8. The yield of a grain, Y (t/ha), appears to be linearly related to the amount


of fertilizer applied, X (kg/ha). An experiment was conducted by apply-
ing different amounts of fertilizer (0 to 10 kg/ha) to plots of land and
measuring the resulting yields. The following estimated regression line
was obtained:

yield
d = 4.85 + .05(f ertilizer)

Which of the following is not correct?

(a) If no fertilizer was used, the yield is estimated to be 4.85 t/ha.


(b) If fertilizer is applied at 10 kg/ha, the estimated yield is 5.35 t/ha.
(c) For every additional kg/ha of fertilizer applied, the yield is estimated
to increase 0.05 t/ha.
(d) To obtain an estimated yield of 5.2 t/ha., you need to apply 7.0 kg/ha
of fertilizer.
(e) If the current level of fertilizer is changed from 7.0 to 9.0 kg/ha, the
yield is estimated to increase by 0.20 t/ha.

Solution: e
Past performance 1991 Apr - 96%

The following three questions refer to the following situation:


Growth hormones are often used to increase the weight gain of chickens.
In an experiment using 15 chickens, five different doses of growth hormone
(0, .2, .4, .8, and 1.0 mg/kg) were injected into chickens (three for each

2006
c Carl James Schwarz 4
dose) and the subsequent weight gain was recorded. An experimenter
plots the data and finds that a linear relationship appears to hold. The
output from SAS follows:

SOURCE DF SUM OF SQUARES MEAN SQUARE F VALUE PR > F


MODEL 1 78.4083 78.4083 8.11 .0137
ERROR 13 125.7410 9.6723
CORRECTED TOTAL 14 204.1493

T FOR H0: PR > |T| STD ERROR OF


PARAMETER ESTIMATE PARAMETER=0 ESTIMATE
INTERCEPT 3.7816 3.23 0.0066 1.1705
DOSE 4.0416 2.85 0.0137 1.4195

9. The fitted regression line is:

(a) Yb = 4.04 + 3.78X


(b) Yb = 3.23 + 2.85X
(c) Yb = 2.85 + 3.23X
(d) Yb = 3.78 + 4.04X
(e) Yb = 1.17 + 1.42X

Solution: d
Past performance 1989 Apr - 83%
Past performance 1990 Dec - 97%
Past performance 1996 Dec - 84%

10. A 95% confidence interval for the slope is:

(a) 4.04 ± 1.96(1.42)


(b) 4.04 ± 1.77(1.42)
(c) 4.04 ± 2.16(1.42)
(d) 3.78 ± 1.77(1.17)
(e) 3.78 ± 2.16(1.17)

Solution: c
Past performance 1989 Apr - 50% (A-32%)
Past performance 1990 Dec - 90%
Past performance 1996 Dec - 86%

11. It is suspected that weight gain should increase with dose. An appropriate
null and alternate hypothesis to test the slope, the test statistic, and the
p-value are:

2006
c Carl James Schwarz 5
(a) H: β1 = 0 A:β1 6= 0; T ∗ = 2.85; p-value = .0069
(b) H: β0 = 0 A:β0 6= 0; T ∗ = 3.23; p-value = .0066
(c) H: β1 = 0 A:β1 > 0; T ∗ = 2.85; p-value = .0137
(d) H: β0 = 0 A:β0 > 0; T ∗ = 3.23; p-value = .0033
(e) H: β1 = 0 A:β1 > 0; T ∗ = 2.85; p-value = .0069

Solution: e
Past performance 1989 Apr - 49% (C-31%)
Past performance 1996 Dec - 82%

The following three questions refer to the following situation:


Growth hormones are often used to increase the weight gain of chickens.
In an experiment using 15 chickens, five different doses of growth hormone
(0, .2, .4, .8, and 1.0 mg/kg) were injected into chickens (three for each
dose) and the subsequent weight gain was recorded. An experimenter
plots the data and finds that a linear relationship appears to hold. The
output from JMP follows:

12. The fitted regression line is:


(a) Yb = 4.55 + .617X
(b) Yb = 4.83 + 4.55X
(c) Yb = 4.83 + 1.02X
(d) Yb = 4.55 + 4.75X
(e) Yb = 4.55 + 4.83X
Solution: e
Past performance 1996 Dec - 84%

13. An approximate 95% confidence interval for the slope is:


(a) 4.55 ± .617
(b) 4.83 ± 2.03
(c) 4.83 ± 1.02
(d) 4.55 ± 1.33
(e) 4.83 ± 4.75

2006
c Carl James Schwarz 6
Solution: b
Past performance 1996 Dec - 86%

14. It is suspected that the weight gain should increase with dose. An appro-
priate null and alternate hypothesis to test the slope, the test statistic,
and the p-value are:
(a) H: β1 = 0, A: β1 6= 0; T* = 7.37; p-value < .0001.
(b) H: β0 = 0, A: β0 6= 0; T* = 4.75; p-value = .0004.
(c) H: b1 = 0 A:b1 > 0 T* = 7.37; p-value = .0002.
(d) H: b0 = 0 A:b0 > 0 T* = 4.75; p-value = .0002.
(e) H: β1 = 0, A: β1 > 0; T* = 4.75; p-value = .0002.
Solution: e
Past performance 1996 Dec - 82%

15. A botanist investigates the relationship between Y , the heights of seedlings


(in inches), and X, the number of weeks P after planting.PThe summary
data
P are: n = 6, X = 4.67, Y = 9.467, X 2 = 154, Y 2 = 696.54,
XY = 325.9 The fitted regression line for seedling height on the number
of weeks after planting is:
(a) Yb = 2.8 + 2.62X
(b) Yb = −2.8 + 2.62X
(c) Yb = 2.62 + 2.8X
(d) Yb = 9.5 + 2.62X
(e) Yb = 2.62X
Solution: b

16. Refer to the previous question. If the number of weeks after planting
ranged from 2 to 8, what is the predicted height for a seedling after 12
weeks?

(a) Should not be determined because the relationship between Y and


X may not be linear beyond 8 weeks.
(b) 9.467
(c) 24.804
(d) 28.584
(e) 31.284

2006
c Carl James Schwarz 7
Solution: a

17. A research group was interested in predicting the number of bus riders per
capita in census districts. They felt that the rider-ship per capita, Y , could
be predicted using the average income, X, for the census district. A sample
of 29 census districts were taken and the observations on theP samples were
used to obtain nP= 29, Y = 62.1429, X = 3452.178; (X − X)(Y −
Y ) = 189, 312.0; (X − X)2 = 19, 910, 691.0; (Y − Y )2 = 13, 369.381;
P
M SE = 428.5 Based on this data, a 98% confidence interval for β1 is:

(a) .0095 ± (2.473)(20.7894)


(b) .0095 ± (2.33)(.0046)
(c) .0095 ± (2.33)(20.7894)
(d) .0095 ± (2.467)(.0046)
(e) .0095 ± (2.473)(.0046)

Solution: e

The following five questions refer to the following situation:


The effects of a toxic pollutant upon fish was examined by placing fish in
a two liter solution of water with various concentrations of the pollutant.
The time (in minutes) until the fish showed distress was recorded at which
time the fish were removed from the container. A total of 18 different
experiments were performed. Note that the pollutant is measured on a
logarithmic scale where a change of one unit represents an increase of 10
fold in the pollution concentration. A preliminary plot of the data showed
that the relationship of time vs. log(pollution) was approximately linear.
The output appears below:

SOURCE DF SUM OF SQUARES MEAN SQUARE F VALUE PR > F

MODEL 1 2.21459712 2.21459712 5.49 0.0324


ERROR 16 6.45556062 0.40347254
CORR. TOTAL 17 8.67015774

T FOR H0: PR > |T| STD ERROR OF


PARAMETER ESTIMATE PARAMETER=0 ESTIMATE
INTERCEPT 7.5641 3.82 0.0015 1.978
LOGPOLLUT -1.0269 -2.34 0.0324 0.438

18. The fitted regression line is:

(a) Yb = −1.03 + 7.56X


(b) Yb = 7.56 − 1.03X

2006
c Carl James Schwarz 8
(c) Yb = 3.28 − 2.34X
(d) Yb = 7.56 − 10.27X
(e) Yb = −1.03 + 75.64X

Solution: b
Past performance 1990 Apr - 89%
Past performance 1991 Dec - 93%

19. A 95% confidence interval for the slope is:

(a) 7.56 ± 1.96(1.978)


(b) −1.03 ± 1.96(0.438)
(c) 7.56 ± 2.1098(1.978)
(d) −1.03 ± 2.1098(.438)
(e) −1.03 ± 2.1199(.438)

Solution: e
Past performance 1990 Apr - 72%(D-14%)
Past performance 19 91 Dec - 88%

20. An appropriate null and alternate hypothesis to test the slope, the test
statistic, and the p-value are:

(a) H: β1 = 0 A:β1 6= 0; T ∗ = -2.34; p-value = .0324


(b) H: β0 = 0 A:β0 6= 0; T ∗ = 3.82; p-value = .0007
(c) H: β1 = 0 A:β1 < 0; T ∗ = -2.34; p-value = .0324
(d) H: β0 = 0 A:β0 6= 0; T ∗ = 3.82; p-value = .0015
(e) H: β1 = 0 A:β1 < 0; T ∗ = -2.34; p-value = .0162

Solution: e
Past performance 19 90 Apr - 48% (A-24%, C-18%)

21. Removed because badly worded.


22. A similar experiment was performed using a second pollutant. The esti-
mated regression line is found to be Yb = 27.63 − 2.03X. Which of the
following is NOT CORRECT?

(a) If the concentration of the pollutant is increased 100 times (repre-


sented by an increase of 2 on the logarithmic scale), the average time
to distress decreases by 4.06 minutes.

2006
c Carl James Schwarz 9
(b) In order to obtain an estimated time to distress of 25 minutes, the
log(concentration ) of the pollutant should be 1.30.
(c) A ten-fold increase in pollution (represented by an increase of one
unit on the log scale) decreases the time to distress by 20.3 minutes.
(d) It would be inadvisable to extrapolating the line outside of the ob-
served values of the pollutant concentration.
(e) The method of least squares is often used to obtain the estimates of
the slope and intercept.

Solution: c
Past performance 1990 Apr - 70% (A-11%, B-12%)
Past performance 1991 Dec - 56% (a-17%, b-17%)

23. A similar experiment was performed using a third pollutant. A scatter-


plot and the fitted regression line are shown below:

Which of the following is the best description of this plot?

(a) Yb = 20 - 2X; r = -0.6


(b) Yb = 20 - 4X; r = -0.6
(c) Yb = 20 - 2X; r = -0.9
(d) Yb = 20 - 4X; r= -0.9
(e) Y = 20 - 2X; r = -0.3
b

Solution: a
Past performance 1990 Apr - 32% (B-12%, C-28%, E-23%)
Past performance 1991 Dec - 38% (b-13%, c-31%, e-11%)

The next five questions refer to the following situation:


One concern about the depletion of the ozone layer is that the increase
in UV light will decrease crop yields. An experiment was conducted in a

2006
c Carl James Schwarz 10
green house where soybean plants were exposed to varying levels of UV
levels - measured in Dobson units. At the end of the experiment the
yield (kg) was measured. A regression analysis was performed with the
following results:
Here is some output:

24. The least squares regression line is the line:


(a) which minimizes the sum of the squared differences between the ac-
tual UV values and the predicted UV values.
(b) which minimizes the sum of the squared residuals between the actual
yield and the predicted yield.
(c) which minimizes the sum the squared differences between the actual
yield and the predicted UV.
(d) which minimizes the sum of the squared residuals between the actual
UV reading and the predicted UV reading.
(e) which minimizes the total variation in the data.
Solution: b
Past performance 1993 Apr - 36% (a-14%; c-25%; e-18%)
Past performance 1997 Aug - 60% (a-15%; d-15%)
Past performance 2006 Oct - 60% (c-15%; e-10%)

25. Which of the following is correct?


(a) If the UV reading is increased by 1 Dobson unit, the yield is expected
to increase by .0463 kg.
(b) If the yield increases by 1 kg, the UV reading is expected to decline
by .0463 Dobson units.
(c) The estimated yield is 3.98 kg when the UV reading is 0 Dobson
units.
(d) The predicted yield is 4.3 kg when the UV reading is 20 Dobson units.
(e) The t-ratios are used to test if the estimated slope are different from
zero.
Solution: c
Past performance 1993 Apr - xx% (b-42%; e-10%)
Past performance 1997 Aug - xx% (b-14%)
Past performance 2006 Oct - 86% (b-14%)

2006
c Carl James Schwarz 11
26. A 95% confidence interval for the slope will be centered on the estimated
slope and:

(a) ±0.011
(b) ±0.108
(c) ±0.054
(d) ±0.046
(e) ±0.021

Solution: e
Past performance 19 93 Apr - 37% (a-18%; c-18%; d-20%)
Past performance 19 97 Aug - 87%

27. The null and alternate hypothesis for a test of the slope, the test statistic,
and the p- value are:

(a) H:β1 = 0; A:β1 6= 0; T ∗ = -4.31; p-value = .0008.


(b) H:β0 = 0; A:β0 < 0; T ∗ = -74.01; p-value < .0001.
(c) H:β1 = 0; A:β1 < 0; T ∗ = -4.31; p-value = .0004.
(d) H:βb1 = 0; A:βb1 < 0; T ∗ = -4.31; p-value = .0004.
(e) H:βb1 = 0; A:βb1 6= 0; T ∗ = -4.31; p-value = .0008.

Solution: c
Past performance 1993 Apr - 72% (d-18%)
Past performance 1997 Aug - 74% (d-18%)

28. A 95% confidence interval for the mean yield when the UV reading is 20
Dobson units is:
(a) 3.3 ± 0.86
(b) 3.3 ± 2.12
(c) 3.3 ± 0.40
(d) 3.3 ± 0.98
(e) 3.3 ± 0.71
Solution: a
Past performance 1993 Apr - 23% (b-25%; c-22%; d-21%; e-10%)

2006
c Carl James Schwarz 12
29. Another experiment was computed where the plants were sprayed with a
chemical that acts like a sun-screen. The following plot was obtained:

The estimated slope and intercept are:


(a) 0.06 1.10
(b) 1.10 0.06
(c) 0.10 0.06
(d) 0.06 0.10
(e) 0.10 1.10
Solution: d - note that the intercept is the value of Y when X = 0, but
the vertical axis does not occur at X = 0 in the above graph.
Past performance 1993 Apr - 11% (a-65%; d-10%; e-10%)
Past performance 2006 Oct - 47% (a-44%)

30. Consider the following scatter plot:

2006
c Carl James Schwarz 13
Which of the following provides the most reasonable approximation to the
least squares regression line?
(a) Yb = 50 + 10X
(b) Yb = 50 + X
(c) Yb = 10 + 50X
(d) Yb = 1 + 50X
(e) Yb = 10 + X
Solution: a
Past performance 1990 Dec - 80%

31. In simple linear regression the model that is being assumed relates the
Dependent Variable, Y , to the Independent Variable, X, according to the
following relationship: Yi = β0 + β1 Xi + i , i = 1, 2, . . . . ,n. For setting
up confidence interval statements for the parameter β1 based on the least
squares estimates, it is necessary to make the following assumption(s)
about the i ’s:

(a) they have expectation 0


(b) they are normally distributed
(c) they have a common variance, σ 2
(d) all of the above.

2006
c Carl James Schwarz 14
(e) least squares is purely a mathematical technique so no assumptions
are required.

Solution: not available

32. A marine biologist wants to test the effect of water temperature on the
average dive duration for sea otters. Several otters are available for an
experiment. The biologist collects the following data:

Water. Dive
Temp (C) Duration (sec)
Otter X Y
J2 4 63
J1 8 75
B7 8 84
B9 12 91
M3 12 101
D4 16 110
B8 20 115

X 2 = 1088,
P P P
The
P 2summary statistics
P are: X = 80, Y = 639,
Y = 60457, XY = 7888
The least squares regression line is equal to:
(a) Yb = 3.4 + 52X
(b) Yb = 8.4 + 7.3X
(c) Yb = 4.7 + 21X
(d) Yb = 53 + 3.4X
(e) Yb = 50 − 3.3X
Solution: not available

33. An experiment was performed where students examined a set of circles.


For each circle they guessed the actual area, and then measured the actual
area. The scatter-plot had the guessed areas on the vertical axis and the
actual areas on the horizontal axes. A fitted line was fit to these data
points. One student’s fitted line was Guessed area = 5 + .65 Actual area.
Which of the following is not correct?
(a) The student guessed that a circle has an area of 125 mm2 . A better
guess would be 86 mm2 .
(b) The slope in the above equation indicates that, on average, a student
increases her guess by only .65 mm2 for every 1 mm2 increase in
actual area.

2006
c Carl James Schwarz 15
(c) “Calibration” refers to the process where the relationship between the
guessed and real areas is used to correct future guesses.
(d) If the fitted regression line tends to fall below the “45ř line”, then this
student tends to underestimate real areas.
(e) The fitted straight line was fit using “least squares”. This line mini-
mizes the sum of the square of the deviations between the actual and
predicted values.
Solution: a
Past performance 1997 Jun - 76%

34. A regression of the amount of calories in a serving of breakfast cereal


vs. the amount of fat gave the following results: Calories = 97.1053 +
9.6525F at Which of the following is FALSE:

(a) It is estimated that for every additional gram of fat in the cereal, the
number of calories increases by about 9.
(b) It is estimated that in cereals with no fat, the total amount of calories
is about 97.
(c) If a cereal has 2 g of fat, then it is estimated that the total number
of calories is about 115.
(d) If a cereal has about 145 calories, then this equation indicates that
it has about 5 grams of fat.
(e) One cereal has 140 calories and 5 g of fat. Its residual is about 5 cal.

Solution: e - Residuals are computed as Observed(140)−P redicted(145) =


−5
Past performance 1998 Oct - 55% (12% a; 13% b; 16% e)

35. A selection of cereals was sampled and the number of calories was plotted
against the number of grams of protein with the following results:

Which of the following is NOT CORRECT?


(a) The 95% confidence interval for the number of calories per gram of
protein indicates that the known true value of 4 cal/gram may be
consistent with the data.

2006
c Carl James Schwarz 16
(b) It is estimated that cereals with no protein would have just over 100
calories/serving.
(c) The observed regression line is Y = 106.0 + .339(protein)
(d) One plausible reason that the confidence interval for the slope is so
wide is that confounding variables may cloud the relationship be-
tween calories and grams of protein.
(e) The standard error for the slope indicates how much the calories may
vary among different cereals in the sample.
Solution: e
Past performance 1998 Nov - 53% (15% a)

The following three questions are based upon the following:


Fitness can be measured by the rate of oxygen consumption during exer-
cises with more fit people having higher rates. Unfortunately, this mea-
surement is quite costly to obtain, and so an experiment was done to see
if this measurement could be predicted from the time it takes (in minutes)
to run 1500 m. The following output from JMP was obtained - the M and
F refer to males and females respectively.

2006
c Carl James Schwarz 17
36. Which of the following is NOT CORRECT?
(a) We are about 95% confident that the slope for this data is between
-4.0 and -2.5.
(b) The fitted regression line is approximately Yb = 82.42−3.31(runtime)
(c) There is good evidence that there is a relationship between oxygen
consumption and the run time.
(d) A person who runs 1500 m in 10 minutes would have an estimated
oxygen consumption rate of about 50.
(e) The se of .36 measures how much the estimated slope would vary if
another sample of people were measured.
Solution: a
Past performance 1998 Dec - 39% (16% c; 39% e)

37. Which of the following is correct?


(a) The most relevant null hypothesis is that the estimated change in
oxygen consumption for people who take an additional minute to
run 1500 m is 0.
(b) The most relevant null hypothesis is: H: β1 = −3.31.
(c) The most relevant null hypothesis is that there is no relationship
between the oxygen consumption rate and the time to run 1500 m
among all people.
(d) The most relevant null hypothesis is that we are 95% confident that
the slope is between -4.04 and -2.57.
(e) The most relevant null hypothesis is that we haven’t a clue what this
question is about.
Solution: c
Past performance 1998 Dec - 68% (18% a; 10% b)

38. In the above graph, both males and females appear to have the same
relationship. However, this is, in general, not true. If the relationship
for each group was not the same, then which of the following is NOT
CORRECT?
(a) The slope for the combined data could be substantially different than
either group’s slope.
(b) The intercept for the combined data could be substantially different
than either group’s intercept.
(c) The sample correlation in the combined group could be substantially
different than either group’s correlation.

2006
c Carl James Schwarz 18
(d) The combined results may be influenced by a lurking variable, in this
case gender.
(e) The median oxygen consumption for the combined group will be the
average of the medians of each group.
Solution: e
Past performance 1998 Dec - 82%

2006
c Carl James Schwarz 19
Multiple Choice Questions
Regression, Correlation, Trends

1. The best way to recognize whether or not a variable is growing exponen-


tially over time is by:

(a) plotting the variable against time and looking for a straight-line pat-
tern.
(b) calculating the least squares regression line of the variable against
time and examining the residuals.
(c) plotting the logarithm of the variable aginst time and looking for a
straight line pattern.
(d) smoothing the time series by running medians of three or five.
(e) smothing the scatter plot by median trace

Solution: c

2. When looking at a sequence of monthly postal revenue data, we note


that the revenue is consistently highest in December. The high December
revenue is an illustration of:
(a) trend
(b) seasonal variation
(c) irregular fluctuations
(d) a cycle
(e) ??????
Solution: not availabe

3. The following data come from a time series of yearly sales of equipment
by a large manufacturer:

Year 1968 1969 1970 1971 1972 1973 1974


Units Sold 330 241 200 499 322 500 601

1
In order to smooth this series a running median of 3 is calculated. The
smoothed series for the years 1969 to 1973 respectively is:

(a) 200 200 200 322 322


(b) 330 499 499 500 601
(c) 241 241 322 499 500
(d) 257 313 340 440 474
(e) not enough information is given for us to determine the values.

Solution: not availabe

4. The following plot is the net sales (billions of dollars) for Eastman Kodak
Ltd. for the years 1970 through 1989 (1970 is coded as 0):

This plot is the graph of a(n) and it shows that there is


a(n) pattern in the data.

(a) experiment, exponential growth


(b) data set, stem and leaf
(c) linear model, correlation
(d) time series, trend
(e) regression model, multiple variable

Solution: not availabe

5. The potential growth of Gypsy moths, and the world-wide production of


oil in the last 100 years, can both be described as being:

(a) almost linear

2006
c Carl James Schwarz 2
(b) well represented by a straight line.
(c) approximately exponential growth.
(d) difficult to determine without detailed statistical analysis.
(e) regular with large residuals.

Solution: not availabe

2006
c Carl James Schwarz 3
Multiple Choice Questions
Sampling Distributions

1. The Gallup Poll has decided to increase the size of its random sample of
Canadian voters from about 1500 people to about 4000 people. The effect
of this increase is to:
(a) reduce the bias of the estimate.
(b) increase the standard error of the estimate.
(c) reduce the variability of the estimate.
(d) increase the confidence interval width for the parameter.
(e) have no effect because the population size is the same.
Solution: c
Past performance 1992 Dec - 65% (11%a, 16%e)
Past performance 1997 Jul - 92%

2. An airplane is only allowed a gross passenger weight of 8000 kg. If the


weights of passengers traveling by air between Toronto and Vancouver
have a mean of 78 kg and a standard deviation of 7 kg, the approximate
probability that the combined weight of 100 passengers will exceed 8,000
kg is:
(a) 0.4978
(b) 0.3987
(c) 0.1103
(d) 0.0044
(e) .0022
Solution: e
Past performance 1996 Nov - 84% (10%-b)
Past performance 1997 Aug - 73% (18%-b)
Past performance 1998 Nov - 85%
Past performance 1998 Dec - 88%

1
3. Government regulations indicate that the total weight of cargo in a certain
kind of airplane cannot exceed 330 kg. On a particular day a plane is
loaded with 100 boxes of goods. If the weight distribution for individual
boxes is normal with mean 3.2 kg and standard deviation 7 kg, what is
the probability that the regulations will NOT be met:
(a) 1.5%
(b) 92%
(c) 8%
(d) 15%
(e) 85%
Solution: c
Past performance 1997 Jul - 75%
Past performance 2006 Nov - 78%

4. The time required to assemble an electronic component is normally dis-


tributed with a mean of 12 minutes and a standard deviation of 1.5 min.
Find the probability that the time required to assemble all nine compo-
nents (i.e. the total assembly time) is greater than 117 minutes.

(a) 2514
(b) .2486
(c) .4772
(d) .0228
(e) .0013

Solution: d

5. A wholesale distributor has found that the amount of a customer’s order


is a normal random variable with a mean of $200 and a standard deviation
of $50. What is the probability that the total amount in a random sample
of 20 orders is greater than $4500?

(a) .1915
(b) .0125
(c) .3085
(d) .0228
(e) .4875

Solution: b

2006
c Carl James Schwarz 2
6. A random sample of 100 observations is to be drawn from a population
with a mean of 40 and a standard deviation of 25. The probability that
the mean of the sample will exceed 45 is:

(a) 0.4772
(b) 0.4207
(c) 0.0793
(d) 0.0228
(e) not possible to compute, based on the information provided.

Solution: d

7. Which of the following statements is INCORRECT about the sampling


distribution of the sample mean:

(a) The standard error of the sample mean will decrease as the sample
size increases.
(b) The standard error of the sample mean is a measure of the variability
of the sample mean among repeated samples.
(c) The sample mean is unbiased for the true (unknown) population
mean.
(d) The sampling distribution shows how the sample mean will vary
among repeated samples.
(e) The sampling distribution shows how the sample was distributed
around the sample mean.

Solution: e
Past performance 1990 Dec - 40% (c-18%, d-24%)
Past performance 1991 Dec - 41% (a-10%, c-27%, d-18%)

8. The sample mean is an unbiased estimator for the population mean. This
means:

(a) The sample mean always equals the population mean.


(b) The average sample mean, over all possible samples, equals the pop-
ulation mean.
(c) The sample mean is always very close to the population mean.
(d) The sample mean will only vary a little from the population mean.
(e) The sample mean has a normal distribution.

Solution: b
Past performance 1989 Dec - 77%

2006
c Carl James Schwarz 3
9. Which of the following statements is NOT CORRECT?

(a) In a proper random sampling, every element of the population has a


known (and often equal) chance of being selected.
(b) The precision of a sample mean or sample proportion depends only
upon the sample size (and not the population size) in a proper ran-
dom sample.
(c) Convenience sampling often leads to biases in estimates because the
sample is often not representative of the population.
(d) If a sample of 1,000,000 families is randomly selected from all of
Canada (with about 8,000,000 families) and the average family in-
come is computed, then the true value of the family income for all
families in Canada is known.
(e) The sampling distribution of the sample mean describes how the
sample mean will vary among repeated samples.

Solution: d
Past performance 1989 Dec - 92%
Past performance 1990 Dec - 90%

10. The sampling distribution of refers to:

(a) the distribution of the various sample sizes which might be used in a
given study
(b) the distribution of the different possible values of the sample mean
together with their respective probabilities of occurrence
(c) the distribution of the values of the items in the population
(d) the distribution of the values of the items actually selected in a given
sample
(e) none of the above

Solution: b

11. The average monthly mortgage payment for recent home buyers in Win-
nipeg is µ = $732, with standard deviation of σ = $421 A random sample
of 125 recent home buyers is selected. The approximate probability that
their average monthly mortgage payment will be more than $782 is:

(a) 0.9082
(b) 0.4522
(c) 0.4082
(d) 0.0478

2006
c Carl James Schwarz 4
(e) 0.0918

Solution: e

12. Can of salmon have a nominal net weight of 250 g. However, due to
variation in the canning process, the actual net weight has an approximate
normal distribution with a mean of 255 g and a standard deviation of 10
g. According to Consumer Affairs, a sample of 16 tins should have less
than a 5% chance that the mean weight is less than 250 g. What is the
actual probability that a sample of 16 tins will have a mean weight less
than 250 g?

(a) .1915
(b) .3085
(c) .0228
(d) .4772
(e) .0500

Solution: c
Past performance 1993 Apr - 58% (b-32%)
Past performance 1996 Nov - 77% (b-19%)

13. The Central Limit Theorem states that:

(a) if n is large then the distribution of the sample can be approximated


closely by a normal curve
(b) if n is large, and if the population is normal, then the variance of the
sample mean must be small.
(c) if n is large, then the sampling distribution of the sample mean can
be approximated closely by a normal curve
(d) if n is large, and if the population is normal, then the sampling
distribution of the sample mean can be approximated closely by a
normal curve
(e) if n is large, then the variance of the sample must be small.

Solution: c

14. A random sample of size n = 30 is taken from a population of size N = 300.


Which statement is generally correct?
(a) µ is an estimate of X; σ is an estimate of s.
(b) X is an estimate of µ; s is an estimate of σ.

2006
c Carl James Schwarz 5
(c) µ is an estimate of X; s is an estimate of the standard deviation of
the sample mean.
(d) X is an estimate of µ; s is an estimate of the standard deviation of
the sample mean.
(e) X is an estimate of µ; s is the standard error of the sample mean.

Solution: b

15. The central limit theorem tells us that the sampling distribution of is
approximately normal. Which of the following conditions are necessary
for the theorem to be valid:

(a) The sample size has to be large.


(b) We have to be sampling from a normal population.
(c) The population has to be symmetric.
(d) Population variance has to be small
(e) Both A and C.

Solution: a

16. The Central Limit Theorem is important in Statistics because it allows us


to use the normal distribution to make inferences concerning the popula-
tion mean:

(a) provided that the population is normally distributed and the sample
size is reasonably large.
(b) provided that the population is normally distributed (for any sample
size).
(c) provided that the sample size is reasonably large (for any population).
(d) provided that the population is normally distributed and the popu-
lation variance is known (for any sample size).
(e) provided that the population size is reasonably large (whether the
population distribution is known or not).

Solution: c

17. The Central Limit Theorem is important in Statistics because:

(a) it tells us that large samples do not need to be selected.


(b) it guarantees that , when it applies, the samples that are drawn are
always randomly selected.

2006
c Carl James Schwarz 6
(c) it enables reasonably accurate probabilities to be determined for
events involving the sample average when the sample size is large
regardless of the distribution of the variable
(d) it tells us that if several samples have produced sample averages
which seem to be different than expected, the next sample average
will likely be close to its expected value.
(e) it is the basis for much of the theory that has been developed in the
area of discrete random variables and their probability distributions.
Solution: c

18. One class decided to estimate the proportion of cars that are red in a
parking lot. They took a random sample of the cars in the closest parking
lot to the class. Which of the following is NOT correct?
(a) Even though the sample was random sample of cars in the parking
lot, the sample may not be representative of the population of cars
driven by SFU students because the decision to park in B-lot is a
self-selected sample.
(b) If another sample of cars was taken, it is likely that a different propor-
tion for Japanese made cars would be found. The set of all possible
values for the proportion is known as the sampling distribution.
(c) The confidence interval computed refers to the proportion of cars in
the sample that were red.
(d) The sample was a simple random sample from cars parked. This
means that every car in the lot had an equal chance of being selected.
(e) A convenience sample could be chosen by selecting the first 25 cars
in the parking lot that are closest to the Applied Science Building.
Solution: c
Past performance 1996 Nov - 82%

19. Recall in one assignment you surveyed cars in a parking lot to estimate
the proportion that were red or the proportion that were from a Japanese
manufacturer. Which of the following is NOT CORRECT?
(a) A convenience sample of the cars closest to the Applied Science build-
ing may give a biased estimate of the proportion of cars which are
from a Japanese manufacturer.
(b) Different students may get different answers for the proportion of
cars that are red.
(c) The sample proportion of cars that are red is an unbiased estimate of
the population proportion if the sampling is a simple random sample.

2006
c Carl James Schwarz 7
(d) A sample of 100 cars in a convenience sample is always better than
a sample of 20 cars from a proper random sample.
(e) A sample of 100 cars from a proper random sample will give more
precise estimates of the proportion of cars that are red than a sample
of 20 cars from a proper random sample.

Solution: d
Past performance 2006 Nov - 92%

20. Which statement is NOT CORRECT?


(a) The sample standard deviation measures variability of our sample
values.
(b) A larger sample will give answers that vary less from the true value
than smaller samples (assuming both are properly chosen).
(c) The sampling distribution describes how our estimate (answer) will
vary if a new sample is taken.
(d) The standard error measures how much our estimate (answer) may
vary if a new sample of the same size is chosen using the same sam-
pling method.
(e) A large sample size always gives unbiased estimators regardless of
how the sample is chosen.
Solution: e
Past performance 2006 Nov - 93%

2006
c Carl James Schwarz 8
Multiple Choice Questions
Hypothesis Testing - Introduction

1 Testing - Introduction
1. To determine the reliability of experts used in interpreting the results of
polygraph examinations in criminal investigations, 280 cases were studied.
The results were:

True Status
Innocent Guilty
Examiner’s Innocent 131 15
Decision Guilty 9 125

If the hypotheses were H: suspect is innocent vs A: suspect is guilty, then


we could estimate the probability of making a type II error as:

(a) 15/280
(b) 9/280
(c) 15/140
(d) 9/140
(e) 15/146

Solution: c
The second column percentage is the probability that the examiner con-
cludes a person is is not or guilty given the person is guilty. This is what is
required for a Type II error, i.e. conditional upon the person really being
guilty.
Past performance 1993 Feb - 13% (a-65%; e-13%)

2. In hypothesis testing, β is the probability of committing an error of Type


II. The power of the test, 1 − β is then:

(a) the probability of rejecting H0 when HA is true

1
1 TESTING - INTRODUCTION

(b) the probability of failing to reject H0 when HA is true


(c) the probability of failing to reject H0 when H0 is true
(d) the probability of rejecting H0 when H0 is true
(e) the probability of failing to reject H0 .

Solution: a

3. In a statistical test of hypothesis, what happens to the rejection region


when α, the level of significance, is reduced?

(a) The answer depends on the value of β.


(b) The rejection region is reduced in size.
(c) The rejection region is increased in size.
(d) The rejection region is unaltered.
(e) The answer depends on the form of the alternative hypothesis.

Solution: b

4. During the pre-flight check, Pilot Jones discovers a minor problem - a


warning light indicates that the fuel guage may be broken. If Jones decides
to check the fuel level by hand, it will delay the flight by 45 minutes. If
Jones decides to ignore the warning, the aircraft may run out of fuel before
it gets to Gimli. In this situation, what would be:

i) the appropriate null hypothesis? and;


ii) a type I error?

(a) Null Hypothesis: assume that the warning can be ignored.


Type I error: decide to check the fuel by hand when there is in fact
enough fuel.
(b) Null Hypothesis: assume that the warning can be ignored.
Type I error: decide to ignore the warning when there is in fact not
enough fuel.
(c) Null Hypothesis: assume that the fuel should be checked by hand.
Type I error: decide to ignore the warning when there is in fact not
enough fuel.
(d) Null Hypothesis: assume that the fuel should be checked by hand.
Type I error: decide to check the fueld by hand when there is in fact
enough fuel.
(e) Null Hypothesis: assume that the aircraft is already late.
Type I error: taking a commercial flight to Gimli in the first place.

2006
c Carl James Schwarz 2
1 TESTING - INTRODUCTION

Solution: a - treat the warning light as the “data”

5. Which of the following is not correct?

(a) The probability of a Type I error is controlled by the selection of the


α level.
(b) The probability of a Type II error is controlled by the sample size.
(c) The power of a test depends upon the sample size and the distance
between the null and alternate hypothesis.
(d) The p-value measures the probability that the null hypothesis is true.
(e) The rejection region is controlled by the α level and the alternate
hypothesis.

Solution: d
Past performance 1991 Apr - 55%

6. In testing statistical hypotheses, which of the following statements is false?

(a) The critical region is the values of the test statistic for which we
reject the null hypothesis.
(b) The level of significance is the probability of type I error.
(c) For testing H0 µ = µ0 , HA : µ > µ0 , we reject H0 for high values of
the sample mean X.
(d) In testing H0 : µ = µ0 , HA : µ 6= µ0 , the critical region is two sided.
(e) The p-value measures the probability that the null hypothesis is true.

Solution: e

7. Since α = probability of Type I error, then 1 − α


(a) Probability of rejecting H0 when H0 is true.
(b) Probability of not rejecting H0 when H0 is true.
(c) Probability of not rejecting H0 when HA is true.
(d) Probability of rejecting H0 when HA is true
(e) 1 − β.
Solution: b

8. Consider the following table in reference to the testing of a null hypothesis:

2006
c Carl James Schwarz 3
1 TESTING - INTRODUCTION

$H_0$ True $H_0$ false


Accept $H_0$ (1) (2)
Reject $H_0$ (3) (4)

Which of the following is incorrect?

(a) Entries (1) and (4) are correct decisions.


(b) The P(making entry (2)) is controlled by the sample size for a given
α level.
(c) A Type I error occurs if entry (3) occurs.
(d) Power refers to P(entry (4))
(e) A Type II error occurs when entry (1) is made.

Solution: e
Past performance 1991 Feb - 66% (a-12%, c-12%)

9. In a hypothesis testing problem:

(a) the null hypothesis will not be rejected unless the data are not un-
usual (given that the hypothesis is true).
(b) the null hypothesis will not be rejected unless the p-value indicates
the data are very unusual (given that the hypothesis is true).
(c) the null hypothesis will not be rejected only if the probability of
observing the data provide convincing evidence that it is true.
(d) the null hypothesis is also called the research hypothesis; the alter-
native hypothesis often represents the status quo.
(e) the null hypothesis is the hypothesis that we would like to prove; the
alternative hypothesis is also called the research hypothesis.

Solution: b
Past performance 1993 Apr - 59% (c-26%; e-10%)
Past performance 1997 Aug - 93%

10. A research biologist has carried out an experiment on a random sample


of 15 experimental plots in a field. Following the collection of data, a
test of significance was conducted under appropriate null and alternative
hypotheses and the P-value was determined to be approximately .03. This
indicates that:

(a) this result is statistically significant at the .01 level.


(b) the probability of being wrong in this situation is only .03.
(c) there is some reason to believe that the null hypothesis is incorrect.

2006
c Carl James Schwarz 4
1 TESTING - INTRODUCTION

(d) if this experiment were repeated 3 per cent of the time we would get
this same result.
(e) the sample is so small that little confidence can be placed on the
result.

Solution: c
Past performance 1996 Dec - 82%
Past performance 1998 Nov - 80%

11. In a statistical test for the equality of a mean, such as H0 : µ = 10, if


α = 0.05,
(a) 95% of the time we will make an incorrect inference
(b) 5% of the time we will say that there is a real difference when there
is no difference
(c) 5% of the time we will say that there is no real difference when there
is a difference
(d) 95% of the time the null hypothesis will be correct
(e) 5% of the time we will make a correct inference
Solution: b
Note that (b) is a Type I error; (c) is a Type II error.
The α level controls the Type I error rate.

12. Which of the following statements is correct?


(a) An extremely small p-value indicates that the actual data differs
markedly from that expected if the null hypothesis were true.
(b) The p-value measures the probability that the hypothesis is true.
(c) The p-value measures the probability of making a Type II error.
(d) The larger the p-value, the stronger the evidence against the null
hypothesis
(e) A large p-value indicates that the data is consistent with the alter-
native hypothesis.
Solution: a
Past performance 1998 Dec - 87%

2006
c Carl James Schwarz 5
Multiple Choice Questions
Hypothesis Testing - Multinomial proportions
from a single sample

The next 5 questions refer to the following situation:

There are extensive breeding programs for salmon on the West Coast of
Canada to enhance the salmon fishery. One question of interest is whether
inbreeding affects subsequent fitness of the fish. An experiment was conducted
where released salmon were classified as unrelated if the parents were unrelated,
half-sibs if the one of the parents was in common, and full sibs if both parents
were in common. In one release, 25% of the fish were half-sibs, 40% were
unrelated, and 35% were full-sibs. Of 237 returning adult salmon, 45% were
unrelated, 25% were full-sibs, and 30% were half- sibs.

Here is some output from JMP:

1. The null hypothesis is:

(a) The return rate is independent of the relatedness of the fish.


(b) The return rate is dependent upon the relatedness of the fish.
(c) The return rates are 45%, 25%, and 30% for unrelated, full-sibs, and
half-sibs respectively.

1
(d) The return rates are 40%, 35%, and 25% for unrelated, full-sibs, and
half-sibs respectively.
(e) The release percentages are different from the return percentages.

Solution: d
(d) is preferred over (a) because the hypothesis of independence
is only applicable when there are two classification variables. Here
there is only variable - the sibship. Also, the proportions that should
return when the H is true is known exactly. In the contingency table
analysis, you test if the proportions are the same for all the groups,
but the actual proportions are unknown.
Past performance 1993 Apr - 33% (a-54%)
Past performance 1997 Aug - 82% (a-11%)

2. The value of the test-statistic is:

(a) 13.1
(b) 4.5
(c) 5.4
(d) 10.8
(e) 6.0

Solution: d
Past performance 1993 Apr - 73% (b-10%; c-10%)

3. The p-value is:


(a) < .005
(b) between .005 and .01
(c) between .01 and .02
(d) between .02 and .05
(e) > .05
Solution: a
Past performance 1993 Apr - 62% (c-14%; e-13%)

4. The expected number of half-sibs is:


(a) .29958
(b) 71
(c) 10.81
(d) 25%

2006
c Carl James Schwarz 2
(e) 59.25
Solution: e
Past performance 1997 Aug - 84%

5. The p-value is:


(a) .0034
(b) .0068
(c) .0090
(d) .0045
(e) .0022
Solution: d
Past performance 1997 Aug - 89%

The next two questions refer to the following situation:


The paper “Linkage Studies of the Tomato” (Trans. Royal Canad. Inst.
(1931)) reported the accompanying data on phenotypes resulting from
crossing tall cut-leaf tomatoes with dwarf potato-leaf tomatoes. We wish
to investigate if the frequencies below are consistent with the Mendellian
laws which state the phenotypes should occur in the ratio 9:3:3:1.

Phenotype
Tall Tall Dwarf Dwarf
Cut Pot Cut Pot
leaf leaf leaf leaf
Frequency 926 288 293 104

6. The computed test statistic is:

(a) 7.81
(b) 5.99
(c) 1.18
(d) 1.47
(e) 964.01

Solution: d
Past performance 1991 Apr - 90%

7. The hypothesis would be rejected at α =0.05 if the test statistic exceeds::

(a) 7.81

2006
c Carl James Schwarz 3
(b) 5.99
(c) 3.84
(d) 9.49
(e) 11.07

Solution: a
Past performance 1991 Apr - 94%

8. A die was rolled 30 times with the results shown below.

Number of spots | 1 2 3 4 5 6
Frequency | 1 4 9 9 2 5

If a chi-square goodness of fit test is used to test the hypothesis that the
die is fair at a significance level of α = 0.05, then the value of the chi-square
statistic and the decision reached are:

(a) 11.6; reject hypothesis


(b) 11.6; accept hypothesis
(c) 22.1; reject hypothesis
(d) 22.1; accept hypothesis
(e) 42.0; reject hypothesis

Solution: a

9. On a particularly difficult multiple-choice question having five choices,


the instructor suspected that all 300 students who answered the question
simply picked an answer at random. The distribution of students’ answers
to the question is as follows:

answer Frequency
A 68
B 53
C 61
D 75
E 43

It is desired to conduct a test involving the hypotheses H0 : p1 = p2 =


p3 = p4 = p5 = .2 and H1 : not all pi = .2, where pi denotes the probability
of choosing answer i. The value of the test statistic is:

(a) 11.60
(b) 10.47

2006
c Carl James Schwarz 4
(c) 190.76
(d) 310.47
(e) 48

Solution: b
Past performance 1989 Apr - 87%

10. The following table gives the number of wins for each of the first four post
positions at Assiniboine Downs for 80 races during the 1978 horse-racing
season.

Post Position 1 2 3 4
Number of wins 24 17 19 20

For testing the hypothesis that the probability of winning is the same for
all four post positions, the calculated value of the test statistic is:

(a) 26.00
(b) 1.25
(c) 1.30
(d) 0.40
(e) 20.00

Solution: c

The next two questions refer to the following situation:


A recent estimate by a large distributor of gasoline claims that 60% of all
cars stopping at their service stations chose unleaded gas and that super
unleaded and regular were each selected 20% of the time. In order to check
the validity of these proportions, a study was conducted of cars stopping
at the distributor’s service stations in a large city. The results were as
follows:

Gasoline Selected
Regular Unleaded Super Unleaded
51 261 88

11. The expected cell counts assuming the distributor’s claim is correct are:

(a) 100, 200, 100


(b) 51, 261, 88
(c) 80, 240, 80
(d) 133, 133, 133

2006
c Carl James Schwarz 5
(e) 20%, 60%, 20%

Solution: c

12. If α=0.05, then the value of the appropriate test statistic and the critical
value respectively are:

(a) 21.75, 5.99


(b) 13.15, 5.99
(c) 21.75, 7.81
(d) 13.15, 7.81
(e) 13.15, 7.38

Solution: b
Past performance 1990 Apr - 82%

The next three questions refer to the following situation:


A recent estimate by a large distributor of gasoline claims that 60% of all
cars stopping at their service stations chose unleaded gas and that super
unleaded and regular were each selected 20% of the time. In order to check
the validity of these proportions, a study was conducted of cars stopping
at the distributor’s service stations in a large city. The results were as
follows:

Gasoline Selected
Regular Unleaded Super Unleaded
51 261 88

Here is some output from JMP

13. The null hypothesis is:


(a) pregular = .333; punleaded =.333; psuper = .333

2006
c Carl James Schwarz 6
(b) pregular = .200; punleaded =.600; psuper = .200
(c) pbregular = .200; pbunleaded =.600; pbsuper = .200
(d) gasoline selected is independent of the type of car
(e) the probability of each type of gasoline is equal

Solution: b
(d) is not valid because there is no classification by type of car in this
survey
Past performance 1996 Dec - 71% (12%-c)

14. The expected cell counts assuming the distributor’s claim is correct are:
(a) 100, 200, 100
(b) 51, 261, 88
(c) 80, 240, 80
(d) 133, 133, 133
(e) 20%, 60%, 20%
Solution: c
Past performance 1996 Dec - 93%

15. The value of the appropriate test statistic and approximate p-value , re-
spectively, are:

(a) 14.64, .0007


(b) 13.15, .0014
(c) 14.64 .00035
(d) 13.15, .0028
(e) 13.15, .0007

Solution: b
Past performance 1996 Dec - 73% (15%-d)

16. A company operates a production line producing a large number of manu-


factured parts in three shifts of 8 hours each. The following table provides
data obtained from a sample of 162 manufactured parts not conforming
to specifications:

Shift 1 Shift 2 Shift 3 Total


Non-conforming 50 44 68 162

2006
c Carl James Schwarz 7
A test of the hypothesis that the nonconforming parts are uniformly dis-
tributed among the three shifts can be based upon which of the following
values of the test statistic?
(a) 5.78 with 3 degrees of freedom.
(b) 5.78 with 2 degrees of freedom.
(c) 5.48 with 2 degrees of freedom.
(d) 5.48 with 3 degrees of freedom.
(e) 5.48 with 1 degree of freedom.
Solution: b

The following 2 questions refer to the following situation:


An experiment in chicken breeding results in offspring having either very
curly, slightly curly, or normal feathers. If this is the result of a single gene
system, then the proportions of offspring in the three phenotypes should
be 0.25, 0.50, and 0.25 respectively. In one such experiment, 93 chickens
were born. Here is some JMP output (with some values hidden):

17. The null hypothesis is:


(a) H: pn = ps = pv
(b) The phenotypes are independent of the type of feather.
(c) H: pn = 0.25, ps =0.50, pv = 0.25

2006
c Carl James Schwarz 8
(d) H: pn = 0.215, ps =0.538, pv = 0.247
(e) The observed proportions of the three feather types occur with prob-
abilities of 0.25, 0.50, and 0.25 respectively.
Solution: c
Past performance 1998 Dec - 85%

18. Which of the following is correct


(a) An approximate 95% confidence interval for the proportion of birds
with normal feathers is (17% → 26%).
(b) The test statistic is 0.72 and the p-value is .6975/2 or about .35.
(c) The p-value is not small. Consequently, we know that the null hy-
pothesis is true, i.e. it is a single gene system.
(d) Each of the individual confidence intervals includes the hypothesized
value. Hence there is no evidence against the single gene hypothesis.
(e) The se measures how much the population proportion could vary if
a new experiment was done.
Solution: d
(c) is not correct because you NEVER know the truth.
(e) is not correct, because the POPULATION proportion is fixed. The se
measures how much the SAMPLE proportion varies.
Past performance 1998 Dec - 57% (14% c; 15% b)

The next 3 questions refer to the following situation.


Are babies considerate of their mothers? A study of 700 births at a local
hospital classified births as falling on weekends or weekdays. Are babies
born equally on all days of the week? Here is some output (some parts
hidden):

2006
c Carl James Schwarz 9
19. What is the null hypothesis being tested?
(a) H : pweekend = .50; pweekday = .50
(b) H : µweekend = .22; µweekday = .78
(c) H : µweekend = 2/7; µweekday = 5/7
(d) H : pweekend = .22; pweekday = .78
(e) H : pweekend = 2/7; pweekday = 5/7
Solution: e.
Past performance 2006 Dec - 56% (26% c)

20. Estimate the expected number of births on weekends if the hypothesis


were true:
(a) 156
(b) 200
(c) 544
(d) 500
(e) 100

2006
c Carl James Schwarz 10
Solution: b
Past performance 2006 Dec - 82%

21. The test-statistic is 13.6 with a p-value that is very small. Which is COR-
RECT?
(a) There is strong evidence that the proportion of births on weekends
is different from 2/7.
(b) There is strong evidence that the mean number of births is the same
between weekends and weekdays.
(c) There is strong evidence that the mean number of births differs be-
tween weekends and weekdays.
(d) There is strong evidence that the proportion of births on weekends
is different from that on weekdays.
(e) There is strong evidence that the proportion of births on weekends
is the same as that on weekdays.
Solution: a
Past performance 2006 Dec - 45% ((19% c; 31% d)

2006
c Carl James Schwarz 11
Multiple Choice Questions
Hypothesis Testing - Population means from
paired experiments

1. A physician wants to compare the blood pressures of six patients before


and after treatment with a drug. The blood pressures are as follows:

Patient Before Drug After Drug


1 168 171
2 171 170
3 182 180
4 167 173
5 174 178
6 170 172

The physician wants to use a parametric procedure to test if there is a


significant change of the blood pressure before and after taking the drug
at 0.05 level of significance. The absolute value of the test statistic and
the absolute critical value of the test are, respectively:

(a) 1.6151 and 1.956


(b) 1.6151 and 2.571
(c) 0.7192 and 1.96
(d) 0.7192 and 1.812
(e) 0.7192 and 2.228

Solution: not available

2. The infamous researcher, Dr. Gnirips, claims to have found a drug that
causes people to grow taller. The coach of the Basketball team at Brandon
University has expressed interest but demands evidence. Ten people are
randomly selected from students at Brandon, their heights measured, the
drug administered, and 2 hours later their heights remeasured. The results
were as follows:

1
Pre-Drug 68 69 74 78 70 66 71 70 71 65
Post-Drug 70 69 75 78 73 69 72 73 72 66
Person 1 2 3 4 5 6 7 8 9 10

Using the proper test statistic, an appropriate decision rule for the hy-
potheses H:Drug has no effect versus A: Drug increases height at (αa =
.05) will be

(a) Reject H0 if the test statistic is > 1.96


(b) Reject H0 if the test statistic is > 1.645
(c) Reject H0 if the test statistic is > 1.83
(d) Reject H0 if the test statistic is > 1.73
(e) Reject H0 if the test statistic is > 2.10

Solution: not available

3. A group of 10 men were given a special diet for two weeks to test weight
loss in pounds. The observed data was:

Man Weight before diet Weight after diet


1 181 178
2 171 172
3 190 185
4 187 184
5 210 201
6 202 201
7 166 160
8 173 168
9 183 180
10 184 179

To determine if the data provide sufficient evidence to indicate the special


diet leads to a weight loss, the appropriate test procedure is either:

(a) two sample t-test or Wilcoxon Rank Sum test


(b) paired t-test or Wilcoxon Signed Rank test
(c) paired t-test or Wilcoxon Rank Sum test
(d) two sample t-test or Sign test
(e) two sample t-test or paired t-test

Solution: not available

2006
c Carl James Schwarz 2
4. A manufacturer wished to compare the wearing qualities of two different
types of automobile tires, A and B, and he had 5 cars available for use in
an experiment. To make the comparison, one tire of Type A and one of
Type B were mounted on the rear wheels of each of the five automobiles.
(For each car, a coin was flipped to decide which tire would be mounted on
the left side and which would be mounted on the right.). The automobiles
were then operated for a specified number of miles and the amount of wear
was recorded for each tire. These measurements appear below:

Automobile Tire A Tire B


1 10.6 10.2
2 9.8 9.4
3 12.3 11.8
4 9.7 9.1
5 8.8 8.3

An appropriate parametric procedure is to be used for testing the null


hypothesis that there is no difference in the average wear for the two
types of tires. The absolute value of the test statistic calculated from the
data is:

(a) 12.83
(b) 0.57
(c) 8.35
(d) 10.72
(e) 9.45

Solution: not available

5. A marine biologist wants to test the effect of water temperature on the


average dive duration for sea otters. Five otters are available for an ex-
periment and each otter is observed diving in both warm and cold water
(with the order being random). The biologist collects the following data:

Dive Duration (sec.)


Warm Cold
Otter Water Water
J2 97 92
B7 65 60
M3 75 77
D4 103 43
B8 90 81

Test for any difference in the length of dives using a non-parametric pro-
cedure:

2006
c Carl James Schwarz 3
(a) Rank-sum procedure, Wcold = 25;p−value > .111
(b) Rank-sum procedure, Wcold = 25;p−value > .222
(c) Signed-rank procedure, W − = 1;p−value = .062
(d) Signed-rank procedure, W − = 1;p−value = .124
(e) Sign-test, S = 4;p−value = .187

Solution: d
Past performance 1991 Apr - 38% (C-52%)

6. A paired difference experiment is conducted to compare the starting salaries


of male and female college graduates who find jobs. Pairs are formed by
choosing a male and a female with same major and similar grade-point
averages. Suppose a random sample of 5 pairs and the starting salaries
(in thousands) are as follows:

Pair 1 2 3 4 5
Male 25.9 20.0 28.7 13.5 18.8
Female 24.9 18.5 27.7 13.0 17.8

To test whether the mean starting salary for males is less than that of
females with α= 0.05, the absolute value of the test statistic is:

(a) 1
(b) 0.125
(c) 0.3535
(d) 5.658
(e) 6.3246

Solution: not available

The next two questions refer to the following situation:


The average height of children is believed to have increased in the last
50 years due to better nutrition and better health services. To examine
this hypothesis, measurement of the heights (in centimeters) of 10 pairs
of mothers and their eldest adult daughters yielded the following results:

Pair Mother Daughter Pair Mother Daughter


1 178.2 178.2 6 166.6 172.8
2 173.4 168.6 7 157.4 152.0
3 163.0 164.2 8 176.4 176.4
4 152.2 157.4 9 162.0 159.4
5 155.8 165.2 10 165.1 159.0

2006
c Carl James Schwarz 4
7. Consider the differences computed by taking the mother’s height - the
daughter’s height. The value of the Signed-Rank test statistic is:

(a) 36
(b) 19
(c) 16
(d) 6
(e) 20

Solution: c
Past performance 1990 Apr - 61%

8. No longer used
The next three questions refer to the following situation:
All of us non-smokers can rejoice - the mosaic tobacco virus that affects
and injures tobacco plants is spreading! Meanwhile, a tobacco company is
investigating if a new treatment is effective in reducing the damage caused
by the virus. Eleven plants were randomly chosen. On each plant, one
leaf was randomly selected, and one half of the leaf (randomly chosen)
was coated with the treatment - the other half was left untouched (con-
trol). After two weeks, the amount of damage to each half of the leaf was
assessed. The output from SAS follows:

VARIABLE N_USED MEAN MEDIAN SD MIN MAX


1ST:
CONTROL 11 15.7273 13 9.1224 5 36
2ND:
TRT 11 13.3636 12 10.0725 2 32
1ST-2ND:
DIFF 11 2.36364 3 3.32484 -6 6

NORMALITY | PAIRED T | SIGN:#+ #- #0 | OBS DELETED FOR |


OF DIFF | | 9 1 1 | MISSING VALUES |
| DF | 2-TAIL P(BINOMIAL) | |
W | A | 10 | .0215 | PREP1 0 |
W=.8525 | T | SIGN RNK:SUM R+ R- | PREP2 0 |
PR<W | PR>A | 2.358 | 45.5 9.5 | BOTH 0 |
.05<P<=.10 | PR>|T| | 2-TAIL P(TABLES) | OTHER 0 |
| .0401 | .05<P<=.10 | TOTAL 0 |

9. What is the best reason for performing a paired experiment rather than a
two- independent sample experiment?

2006
c Carl James Schwarz 5
(a) It is easier to do because we need fewer experimental units and each
unit receives more than one treatment.
(b) It allows us to remove variation in the results caused by other factors
because we can compare both treatments within the same experi-
mental unit.
(c) The computer program is more accurate because we work only with
the differences.
(d) It requires fewer assumptions because we are only interested in the
difference between treatments
(e) It allows us to do more experiments because we use each experimental
unit twice.

Solution: b
Past performance 1991 Feb - 98%
Past performance 1997 Aug - 95%

10. What is the rejection region (α=.05) and p-value for the paired t-test?
(a) Reject if T ∗ 1.812; p-value =.040
(b) Reject if T ∗ 1.812; p-value =.020
(c) Reject if T ∗ 2.358; p-value =.040
(d) Reject if T ∗ 2.358; p-value =.020
(e) Reject if T ∗ 1.645; p-value =.020
Solution: b
Past performance 1991 Feb - 56% (a-13%, e-20%)

11. Suppose that a supervisor, believing that the assumption of normality in


the differences is suspect, wishes to perform a non-parametric test. What
is the test- statistic and the exact p-value (using tables) for the signed-rank
test?

(a) R+ =45.5; p-value =.032


(b) R+ =45.5; p-value =.016
(c) R+ =45.5; p-value =.064
(d) R+ =45.5; p-value =.0215
(e) R+ =45.5; p-value =.0107

Solution: a

12. A group of 10 men were put on a weight reduction diet. The weights
before (b) and after (a) the diet were measured on each individual. The
differences di = ai -bi, were analyzed, yielding the following results.

2006
c Carl James Schwarz 6
- values are not given for some reason?

We wish to test if the diet has reduced the average weight. The test
statistic and critical value (α=.05) are:

(a) -1.04 1.812


(b) -1.04 2.228
(c) -.095 1.812
(d) -2.45 1.812
(e) -2.45 2.228

Solution: not available

13. A manufacturer wished to compare the wearing qualities of two different


types of automobile tires, A and B. To make the comparison, a tire of type
A and one of type B were randomly assigned and mounted on the rear
wheels of each of five automobiles. The automobiles were then operated
for a specified number of miles and the amount of wear was recorded for
each tire. These measurements appear below:

Automobile Tire A Tire B


1 10.6 10.2
2 9.8 9.4
3 12.3 11.8
4 9.7 9.1
5 8.8 8.3

The absolute value of the test statistic calculated from the data for testing
the null hypothesis that there is no difference in the average wear for the
two types of tires is:

(a) 12.83
(b) 5.7
(c) 8.35
(d) 10.72
(e) 9.45

Solution: b/option>

14. A statistics professor would like to determine whether students in his class
showed improved performance on the final examination as compared to the
mid-term examination. A random sample of 4 students selected from a
large class revealed the following mid-term and final scores:

2006
c Carl James Schwarz 7
Student #1 #2 #3 #4
Mid-term 70 62 57 68
Final 80 79 87 88

Making the appropriate assumptions, the value of the test statistic is:

(a) 19.25/8.30
(b) 19.25/(8.30/2)
p
(c) 19.25/ 28.295/4 + 28.295/4
p
(d) 19.25/ 34.92/4 + 21.67/4
(e) 19.25/(2/8.30)

Solution: b/option>
15. A sample of 8 patients had their lung capacity measured before and after
a certain treatment with the following results:

Patient Before After


1 750 850
2 860 880
3 950 930
4 830 860
5 750 800
6 680 740
7 720 760
8 810 800

The Sign Test is used to test the hypothesis that the treatment provides
no increase in lung capacity. The probability, under H0 , of obtaining the
observed result or a more extreme one (i.e. the p-value or observed level
of significance) is:

(a) .0352
(b) .1094
(c) .0498
(d) .1445
(e) .2980

Solution: d

16. Seven sets of identical twins are given psychological tests to determine
whether the firstborn of the twins tends to be more aggressive than the
second born. The results are shown in the following table, where the
higher score represents greater aggressiveness.

2006
c Carl James Schwarz 8
Set Firstborn Second born Difference
1 86 88 -2
2 77 65 12
3 91 90 1
4 70 65 5
5 75 80 -5
6 88 81 7
7 87 72 15

If we are willing to assume that the distribution of differences is symmet-


ric about the median but not necessarily normal, then the value of the
appropriate test statistic is:

(a) 22.5 and we would reject H0 at α = .05


(b) 40 and we would reject H0 at α = .05
(c) 1.71 and we would not reject H0 at α = .05
(d) 22.5 and we would not reject H0 at α = .05
(e) 1.71 and we would reject H0 at α = .05

Solution: d

17. The following data give uric acid levels (in milligrams per 100 milliliters)
for 5 subjects before and after a special diet.

Subject Before After


1 5.2 5.2
2 6.3 6.2
3 6.4 6.3
4 5.5 5.6
5 5.9 5.6

To test the hypothesis that the diet reduces the uric acid level, we might
use

(a) a two sample t-test because the uric acid levels before and after the
diet can be assumed independent.
(b) a sign test
(c) a paired t-test
(d) a and b
(e) b and c

Solution: e

The following 3 questions refer to the following situation.

2006
c Carl James Schwarz 9
An agricultural field station is investigating the differences between the
mean yields of two varieties of corn. Because of fertility differences, both
varieties were planted in each of seven farms across the province. At
harvest time, the plots were harvested and the yield recorded. The output
from SAS appears below.

VARIABLE N MEAN MED SD MIN MAX | PAIRED T | SIGN:#+ #- #0 |


1ST: | | 6 1 0 |
VARA 7 46.1 46.5 4.59 38.5 52.6 | DF | 2-TAIL P(BINOMIAL) |
2ND: | 6 | 0.125 |
VARB 7 43.6 41.7 3.53 40.1 49.8 | T | SIGN RNK:SUM R+ R- |
1ST-2ND: | 2.683 | 25 3 |
DIFF 7 2.5 2.8 2.43 -2.7 5.0 | PR>|T| | |
| .0364 | P(TABLES) < .10 |

18. The null and alternate hypotheses are:

(a) H: X d = 0 A: X d 6= 0
(b) H: µd = 0 A: µd 6= 0
(c) H: µd 6= 0 A: µd = 0
(d) H: µd = 0 A: µd < 0
(e) H: X d = 0 A: X d < 0

Solution: b
Past performance 1990 Feb - 97%

19. The test statistic, rejection region (α = .05), and p-value are:

(a) T* = 2.683; reject H if T ∗ > 1.94; p-value = .0364


(b) T* = 2.683; reject H if T ∗ > 2.45; p-value = .0364
(c) T* = 2.683; reject H if T ∗ > 1.94; p-value = .0182
(d) T* = 2.683; reject H if T ∗ > 2.45; p-value = .0182
(e) T* = 2.683; reject H if T ∗ > 1.89; p-value = .0182

Solution: b
Past performance 1990 Feb - 56% (A-22%,)

20. The conclusion is:

(a) There is evidence to believe that the two varieties have a different
mean yield.

2006
c Carl James Schwarz 10
(b) There is insufficient evidence to believe that the two varieties have a
different mean yield.
(c) There is evidence to believe that the two varieties have the same
mean yield.
(d) There is insufficient evidence to believe that the two varieties do not
have a difference in their mean yields.
(e) There is sufficient evidence to believe that the two varieties are paired
on each farm.

Solution: a
Past performance 1990 Feb - 83%

The following 3 questions refer to the following


An agricultural field station is investigating the differences between the
mean yields of two varieties of corn. They are particularly interested
in testing if the second variety gives a lower yield than the first variety.
Because of fertility differences, both varieties were planted in each of seven
farms across the province. At harvest time, the plots were harvested and
the yield recorded. The output from JMP appears below.

21. The null and alternate hypotheses are:


(a) H: X diff = 0 A: X diff 6= 0
(b) H: µdiff = 0 A: µdiff > 0
(c) H: µdiff 6= 0 A: µdiff = 0
(d) H: µdiff = 0 A: µdiff < 0
(e) H: X diff = 0 A: X diff < 0
Solution: b
Past performance 1996 Dec - 92%

22. The test statistic and p-value are:


(a) 2.333 .0584
(b) 1.204 .0292

2006
c Carl James Schwarz 11
(c) 2.810 .0584
(d) 1.204 .9708
(e) 2.333 .0292
Solution: e
Past performance 1996 Dec - 89%

23. Suppose that the p-value had been .0093. This would mean:

(a) There is strong evidence against the null hypothesis of equal mean
yields.
(b) There is no evidence to believe that the two varieties have a different
mean yield.
(c) There is strong evidence to believe that the two varieties have the
same mean yield.
(d) There is no evidence to believe that the two varieties do not have a
difference in their mean yields.
(e) There is sufficient evidence to believe that the two varieties are paired
on each farm

Solution: a
Past performance 1996 Dec - 87%

The following 3 questions refer to the following situation:


A physician wants to compare the blood pressures of six patients before
and after treatment with a drug that is designed to lower blood pressure
The blood pressure is measured before and after the drug, and the change
in blood pressure is measured. The summary information on the difference
(after-before) is:

Patient Before Drug After Drug


1 168 171
2 171 170
3 182 180
4 167 173
5 174 178
6 170 172

Here is some output from JMP.

2006
c Carl James Schwarz 12
24. The null and alternate hypotheses are:
(a) H: X diff = 0 A: X diff 6= 0
(b) H: µdiff = 0 A: µdiff > 0
(c) H: µdiff 6= 0 A: µdiff = 0
(d) H: µdiff = 0 A: µdiff < 0
(e) H: X diff = 0 A: X diff < 0
Solution: d - Notice that diff = before − after, so if the drug is effective
in reducing blood pressure, the average before should be greater than the
average after.
Past performance 1997 Aug - 73%
Past performance 2006 Dec - 73% (11% c; 12% e)

2006
c Carl James Schwarz 13
25. The estimated difference and the p-value are:
(a) 2.00; .1672
(b) 1.23; .0836
(c) 1.62; .0836
(d) 2.00; .9164
(e) 2.00; .0836
Solution: e
Past performance 1997 Aug - 87%
Past performance 2006 Dec - 79% (10% a)

26. Which of the following is NOT CORRECT?


(a) Pairing would be a good thing if the subject-to-subject variation was
small.
(b) This is a paired design because each subject is measured twice –
before and after.
(c) An unpaired experiment with the same number of data values would
require 12 subjects, half of which would be measured without taking
the drug, and half of which would be measured after taking the drug.
(d) Pairing is a form of stratification or blocking.
(e) The same conclusions would be obtained if the difference in blood
pressure was computed as before − after rather than after − before.
Solution: a
Past performance 2006 Dec - 58% (15% c; 16% e)

2006
c Carl James Schwarz 14
Multiple Choice Questions
Hypothesis Testing - Population mean from a
single sample

1. In a test of H0 : µ = 100 against HA : µ 6= 100, a sample of size 10


produces a sample mean of 103 and a p-value of 0.08. Thus, at the 0.05
level of significance:

(a) there is sufficient evidence to conclude that µ 6= 100.


(b) there is sufficient evidence to conclude that µ = 100.
(c) there is insufficient evidence to conclude that µ = 100.
(d) there is insufficient evidence to conclude that µ 6= 100.
(e) there is sufficient evidence to conclude that µ = 103.

Solution: d - you always try and collect evidence against the null

2. In a test of H0 : µ = 100 against HA : µ 6= 100, a sample of size 80


produces Z = 0.8 for the value of the test statistic. The p-value of the
test is thus equal to:

(a) 0.20
(b) 0.40
(c) 0.29
(d) 0.42
(e) 0.21

Solution: d
The one-sided p-value is P (Z > .8) = .21. Because the alternative hy-
pothesis is two-sided, the two-sided p-value is found as 2 × .21 = .42.
The next 2 questions refer to the following situation
A Canadian railway company claims that its trains block crossings no
more that 8 minutes per train on the average. The actual times (minutes)
that 10 randomly selected trains block crossings were recorded:

1
10.1 9.5 6.5 8.0 8.8 >12 7.2 10.5 8.2 9.3

3. The value of an appropriate test-statistics for testing the claim is:


(a) 37
(b) 33
(c) 44
(d) 29
(e) 36
Solution: a
Past performance 1993 Apr - 74% (e-10%)

4. The p-value is:

(a) .101
(b) .053
(c) .248
(d) .049
(e) .064

Solution: d
Past performance 1993 Apr - 72%

The next four questions refer to the following situation.


DDT is an insecticide that accumulates up the food chain. Predator birds
can be contaminated with quite high levels of the chemical by eating many
lightly contaminated prey. One effect of DDT upon birds is to inhibit
the production of the enzyme carbonic anhydrase which controls calcium
metabolism. It is believed that this causes egg shells to be thinner and
weaker than normal and makes the eggs more prone to breakage. (This is
one of reasons why the condor in California is near extinction.) An experi-
ment was conducted where 16 sparrow hawks were fed a mixture of 3 ppm
dieldrin and 15 ppm DDT (a combination often found in contaminated
prey). The first egg laid by each bird was measured and the mean shell
thickness was found to be 0.19 mm with a standard deviation of 0.01 mm.
A normal egg shell has a mean thickness of 0.2 mm.
5. The null and alternate hypotheses are:
(a) H: µ = 0.2 A: µ < 0.2
(b) H: µ < 0.2 A: µ = 0.2
(c) H: X = 0.2 A: X < 0.2

2006
c Carl James Schwarz 2
(d) H: X = 0.19 A: X = 0
(e) H: µ = 0.2 A: µ 6= 0.2
Solution: a
Past performance 1990 Apr - 98%
Past performance 1991 Dec - 84% (11%-e)
Past performance 1993 Feb - 99%

6. The value of the test statistic is:

(a) -1.00
(b) -4.00
(c) 0.01
(d) 1.96
(e) 1.75

Solution: b
Past performance 1990 Apr - 95%
Past performance 1993 Feb - 99%

7. The null hypothesis will be rejected (α=0.05) if the test statistic is less
than: (note that if the rejection region is two sided, only one side has been
shown)

(a) -2.1314
(b) -1.7530
(c) -1.9600
(d) -1.6450
(e) -1.7459

Solution: b
Past performance 1990 Apr - 74%
Past performance 1993 Feb - 92%

8. It is important to detect a decrease in the average thickness to .18 mm


because then the eggs are so fragile that few survive. What sample size
would be needed to be 80% sure of detecting this decrease at α=0.05?

(a) 8
(b) > 128
(c) 34

2006
c Carl James Schwarz 3
(d) 27
(e) > 101

Solution: d
Past performance 1993 Feb - 63%

The next two question refer to the following situation:


In some mining operations, a byproduct of the processing is mildly radioac-
tive. Of prime concern is the possibility that release of these byproducts
into the environment may contaminate the freshwater supply. There are
strict regulations for the maximum allowable radioactivity in supplies of
drinking water, namely an average of 5 picocuries per litre (pCi/L) or less.
However, it is well known that even safe water has occasional hot spots
that eventually get diluted, so samples of water are assumed safe unless
there is evidence to the contrary. A random sample of 25 specimens of
water from a city’s water supply gave a mean of 5.39 pCi/L and a standard
deviation of 0.87 pCi/L.
9. The appropriate null and alternative hypotheses are:

(a) H0 : µ = 5.39 vs HA : µ 6= 5.39


(b) H0 : µ = 5.39 vs HA : µ < 5.00
(c) H0 : µ = 5 vs HA : µ = 5.39
(d) H0 : µ = 5 vs HA : µ < 5
(e) H0 : µ = 5 vs HA : µ > 5

Solution: e
Past performance 1991 Feb - 98%

10. The value of the test statistic, the rejection region (α=0.05), and the
p-value (computed by a computer) are:

(a) Z ∗ = 2.24; reject if Z ∗ > 1.960; p-value = .0125


(b) Z ∗ = 2.24; reject if Z ∗ > 1.645; p-value = .0125
(c) T ∗ = 2.24 with 25 df ; reject if T ∗ > 1.708; p-value = .0171
(d) T ∗ = 2.24 with 24 df ; reject if T ∗ > 1.711; p-value = .0173
(e) T ∗ = 2.24 with 24 df ; reject if T ∗ > 2.064; p-value = .0173

Solution: d
Past performance 1991 Feb - 80%

2006
c Carl James Schwarz 4
11. The average time it takes for a person to experience pain relief from aspirin
is 25 minutes. A new ingredient is added to help speed up relief. Let µ
denote the average time to obtain pain relief with the new product. An
experiment is conducted to verify if the new product is better. What are
the null and alternative hypotheses?

(a) H0 : µ = 25 vs HA : µ 6= 25
(b) H0 : µ = 25 vs HA : µ < 25
(c) H0 : µ < 25 vs HA : µ = 25
(d) H0 : µ < 25 vs HA : µ > 25
(e) H0 : µ = 25 vs HA : µ > 25

Solution: b

12. We wish to test H0 that the average family income of Manitoba families
is at least $15,000 at level of significance α = .05. In order to test the null
hypothesis a sample of size 1000 is selected from the population, and the
p-value of the test is determined to be .02. We then:
(a) reject H0 because the data are sufficiently unusual if the null hypoth-
esis were false.
(b) reject H0 because the data are sufficiently unusual if the null hypoth-
esis were true .
(c) fail to reject H0 because the data are not sufficiently unusual if the
null hypothesis were true
(d) fail to reject H0 because the data are not sufficiently unusual if the
null hypothesis were false
(e) reject H0 because the data are sufficently unusual
Solution: b

13. The profit per new car sold by a Winnipeg automobile dealer varies from
car to car. The average profit per sale tabulated for the past 6 days was
$368 with a standard deviation of $190 To test if there is sufficient evidence
to indicate that average profit per sale is less than $480, the appropriate
null and alternative hypotheses for the test are:

(a) H: µ = $368 vs A: µ < $480


(b) H: µ = $480 vs A: µ > $480
(c) H: µ = $480 vs A: µ > $480
(d) H: µ = $480 vs A: µ 6= $480
(e) H: µ = $368 vs A: µ = $480

2006
c Carl James Schwarz 5
Solution: b

14. In order to study the amounts owed to the city, a city clerk takes a random
sample of 16 files from a cabinet containing a large number of delinquent
accounts and finds the average amount X owed to the city to be $230
with a sample standard deviation of $36. It has been claimed that the
true mean amount owed on accounts of this type is greater than $250. If
it is appropriate to assume that the amount owed is a normally distributed
random variable, the value of the test statistic appropriate for testing the
claim is:

(a) -3.33
(b) -1.96
(c) - 2.22
(d) -0.55
(e) - 2.1314

Solution: c = (230-250)/(36/sqrt(16)) = -2.22

15. A telephone company’s records indicate that private customers pay on


average $17.10 per month for long-distance telephone calls. A random
sample of 10 customers’ bills during a given month produced a sample
mean of $22.10 expended for long-distance calls and a sample variance of
45. A 5% significance test is to be performed to determine if the mean
level of billing for long distance calls per month is in excess of $17.10. The
calculated value of the test statistic and the critical value respectively are:

(a) (2.36, 1.8331)


(b) (1.17, 2.2622)
(c) (2.36, 2.2622)
(d) (1.17, 1.8331)
(e) (0.025, 1.8125)

Solution: a

The next two questions refer to the following situation


16. A group of nutritionists is hoping to prove that a new soya bean compound
has more protein per gram than roast beef, which has a mean protein
content of 20. A random sample of 5 batches of the soya compound have
been tested, with the following results:

protein content 15, 22, 17, 19, 23

2006
c Carl James Schwarz 6
What assumption(s) do we have to make in order to carry out a legitimate
statistical test of the nutritionists’ claim?

(a) The observations are from a normally distributed population.


(b) The mean protein content of the 5 batches follows a normal distribu-
tion.
(c) The variance of the population is known.
(d) Both (a) and (b) must be assumed.
(e) Both (a), (b), and (c) must be assumed.

Solution: a

17. Refer to the previous question. What are the appropriate statistical hy-
potheses and the observed value of the corresponding test statistic?

(a) H: µ = 20 vs. A: µ < 20 and T∗ = (19.2 - 20)/sqrt(11.2/5)


(b) H: µ = 20 vs. A: µ > 20 and T∗ = (19.2 - 20)/sqrt(11.2/5)
(c) H: µ = 20 vs. A: µ > 20 and Z∗ = (19.2 - 20)/sqrt(11.2/5)
(d) H: µ = 20 vs. A: µ < 20 and Z∗ = (19.2 - 20)/sqrt(11.2/5)
(e) None of these is correct.

Solution: b

18. An appropriate 95% confidence interval for µ has been calculated as ( -


0.73, 1.92 ) based on n = 15 observations from a population with a normal
N(µ , σ 2 ) distribution. The hypotheses of interest are H0 : µ = 0 versus
Ha : µ 6= 0. Based on this confidence interval,

(a) we should reject H0 at the α = 0.05 level of significance.


(b) we should not reject H0 at the α = 0.05 level of significance.
(c) we should reject H0 at the α = 0.10 level of significance.
(d) we should not reject H0 at the α = 0.10 level of significance.
(e) we cannot perform the required test because we do not know the
value of the test statistic

Solution: b

The next two questions refer to the following situation


19. Winnipeg Tribune claims that the time of travel from downtown to the
University via the Pembina bus has an average of µ = 27 minutes. A
student who normally takes this bus believes that µ is greater than 27
minutes. A sample of six ride-times taken to test the hypothesis of interest
gave X = 27.5 minutes and standard deviation s = 2.43 minutes. The value
of the test statistic for testing this hypothesis is:

2006
c Carl James Schwarz 7
(a) - 0.532
(b) 0.460
(c) 0.504
(d) - 0.504
(e) - 0.460

Solution: c

20. In the previous question, the appropriate critical region and conclusion
when testing at a = .05 are:

(a) T ∗ > 2.015; and we fail to reject H0 .


(b) T ∗ > 2.571; and we fail to reject H0 .
(c) T ∗ < 2.015; and we fail to reject H0 .
(d) T ∗ < 2.571; and we fail to reject H0 .
(e) T ∗ < 1.943; and we fail to reject H0 .

Solution: a

21. A Canadian railway company claims that its trains block crossings no
more that 5 minutes per train on the average. The actual times (minutes)
that 10 randomly selected trains block crossings were:

10.4 9.7 6.5 9.5 8.8 11.2 7.2 10.5 8.2 9.3

giving X = 9.130 and s2 = 2.209. In testing this claim, at the significance


level of 0.05 and assuming that the crossing times are normally distributed,
the value of the test statistic and the critical value are, respectively:

(a) 5.91 and 2.2622


(b) 8.79 and 1.8331
(c) 5.91 and 1.8331
(d) 8.79 and 2.2622
(e) 2.78 and 1.96

Solution: b

22. In testing H: µ = 100 against A: µ 6= 100 at the 10% level of significance,


H is rejected if:

(a) 100 is contained in the 90% confidence interval.

2006
c Carl James Schwarz 8
(b) The value of the test statistic is in the acceptance region.
(c) The p-value is less than 0.10.
(d) The p-value is greater than 0.10.
(e) If the sample mean is not equal to 100.
Solution: c

23. A 95% confidence interval for µ is calculated to be (1.7, 3.5). It is now


decided to test the hypothesis H0 : µ = 0 vs HA :µ 6= 0 at the α = 0.05
level, using the same data as was used to construct the c.i..
(a) We cannot test the hypothesis without the original data.
(b) We cannot test the hypothesis at the α= 0.05 level because the α=
0.05 test is connected to the 97.5% confidence interval.
(c) We can only make the connection between hypothesis tests and c.i.
if the sample sizes are large.
(d) We would reject H0 at level α= 0.05.
(e) We would accept H0 at level α= 0.05.
Solution: d

24. We want to test H0 : µ = 1.5 vs. H1 : µ 6= 1.5 at α= .05 . A 95%


confidence interval for µ calculated from a given random sample is (1.4,
3.6). Based on this finding we:
(a) Fail to reject H0 .
(b) Reject H0 .
(c) Cannot make any decision at all because the value of the test statistic
is not available.
(d) Cannot make any decision at all because the distribution of the pop-
ulation is unknown.
(e) Cannot make any decision at all because (1.4, 3.6) is only a 95%
confidence interval for µ .
Solution: a

25. The Federal government periodically tests packaged products to check


that the manufacturer is not short-weighting the product (i.e., underfill-
ing products). To allow for variation in the filling process, the Federal
government takes a sample of 16 bottles of beer with nominal capacity of
344 ml, and if the mean volume in the bottles is less than 340 ml, the
manufacturer is fined. Suppose an unscrupulous brewer sets the machine
to fill, on average, 342 ml. The machine has a standard deviation of 4 ml.
The probability that a Type II error will be made is:

2006
c Carl James Schwarz 9
(a) .4772
(b) .0228
(c) .9772
(d) .1915
(e) .3085

Solution: a

The next three questions refer to the following situation.


The average growth of a certain variety of pine tree is 10.1 inches in three
years. A biologist claims that a new variety will have a greater three-
year growth. A random sample of 25 of the new variety has an average
three-year growth of 10.8 inches and a standard deviation of 2.1 inches.
26. The appropriate null and alternate hypotheses to test the biologist’s claim
are:

(a) H: µ = 10.8 against A: µ > 10.8


(b) H: µ = 10.8 against A: µ 6= 10.8
(c) H: µ = 10.1 against A: µ > 10.1
(d) H: µ = 10.1 against A: µ < 10.1
(e) H: µ = 10.1 against A: µ 6= 10.1

Solution: c
Past performance 1991 Apr - 98%

27. At the 5% level of significance, the null hypothesis is:

(a) rejected because the calculated value of the test statistic is less than
the appropriate critical value 1.711.
(b) rejected because the calculated value of the test statistic is greater
than the appropriate critical value 1.645.
(c) accepted because the calculated value of the test statistic is less than
the appropriate critical value 1.711.
(d) accepted because the calculated value of the test statistic is less than
the appropriate critical value 1.708.
(e) accepted because the calculated value of the test statistic is less than
the appropriate critical value 2.064.

Solution: c
Past performance 1991 Apr - 77%

2006
c Carl James Schwarz 10
28. The p-value for the previous test is computed to be:

(a) between .005 and .010


(b) between .010 and .015
(c) between .015 and .025
(d) between .025 and .050
(e) between .050 and .100

Solution: e
Past performance 1991 Apr - 75% (D-12%)

The following 5 questions refer to the following situation.


Resting pulse rate is an important measure of the fitness of a person’s
cardiovascular system with a lower rate indicative of greater fitness. The
mean pulse rate for all adult males is approximately 72 beats per minute.
A random sample of 25 male students currently enrolled in the Faculty of
Agriculture and now taking 5.211 was selected and the mean pulse resting
pulse rate was found to be 80 beats per minute with a standard deviation
of 20 beats per minute. The experimenter wishes to test if the students
are less fit, on average, than the general population.
29. The null and alternate hypotheses are:

(a) H: µ = 72 A: µ < 72
(b) H: X = 72 A: X < 72
(c) H: µ = 80 A: µ = 72
(d) H: X = 80 A: X > 72
(e) H: µ = 72 A: µ > 72

Solution: e
Past performance 1990 Feb - 88%
Past performance 1993 Apr - 80% (a-17%)
Past performance 1996 Dec - 92%

30. The value of the test statistic is:

(a) .32
(b) 2.00
(c) Ð.32
(d) 1.64
(e) 2.88

2006
c Carl James Schwarz 11
Solution: b
Past performance 1990 Feb - 99%
Past performance 1993 Apr - 71% (d-10%)
Past performance 1996 Dec - 96%

31. The null hypothesis will be rejected at α= 0.05 if the test statistic exceeds:

(a) 1.9600
(b) 1.6450
(c) 1.7109
(d) 2.0639
(e) 1.7081

Solution: c
Past performance 1990 Feb - 62% (A-10%, B-18%)

32. The p-value is estimated to be:

(a) between .025 and .05


(b) between .020 and .025
(c) between .05 and .10
(d) 7.25
(e) between .005 and .0025

Solution: a
Past performance 1993 Apr - 74% (c-10%)
Past performance 1996 Dec - 92%

33. A possible Type II error would be to:

(a) Conclude that the students are less fit (on average) than the general
population when in fact they have equal fitness on average, .
(b) Conclude that the students have the same fitness (on average) as the
general population when in fact they are less fit on average.
(c) Conclude that the students have the same fitness (on average) as the
general population when in fact they are the same fitness level on
average.
(d) Conclude that the students are less fit (on average) than the general
population, when, in fact, they are less fit on average.
(e) Conclude that the students have the same fitness (on average) when
in fact they are more fit on average.

2006
c Carl James Schwarz 12
Solution: b
Past performance 1990 Feb - 79% (A-15%)
Past performance 1993 Apr - 80% (a-10%)

2006
c Carl James Schwarz 13
Multiple Choice Questions
Hypothesis Testing - Population proportion
from a single sample

1. In a test of H0 : p = 0.4 against Ha : p 6= 0.4, a sample of size 100 produces


Z=1.28 for the value of the test statistic. Thus the p-value (or observed
level of significance) of the test is approximately equal to:

(a) 0.90
(b) 0.40
(c) 0.05
(d) 0.20
(e) 0.10

Solution: d
The one-sided p-value is P (Z > 1.28) = .10. Because the alternative is a
two-sided alternative, the two-sided p-value is 2 × .1 = .2.
2. The power takeoff driveline on tractors used in agriculture is a potentially
serious hazard to operators of farm equipment. The driveline is covered
by a shield in new tractors, but for a variety of reasons, the shield is often
missing on older tractors. Two type of shields are the bolt-on and the flip-
up. It was believed that the bolt-on shield was perceived as a nuisance
by the operators and deliberately removed, but the flip-up shield is easily
lifted for inspection and maintenance and may be left in place. In a study
initiated by the National Safety Council of the U.S., a sample of older
tractors with both types of shields was taken to see what proportion were
removed. Of 183 tractors designed to have bolt-on shields, 35 had been
removed. Of the 136 tractors with flip-up shields, 15 were removed. We
wish to test the hypothesis H: pb = pf vs A: pb 6= pf where pb and pf are
the proportion of tractors with the bolt-on and flip-up shields removed,
respectively. The test-statistic is computed to be 1.97. The p-value is:

(a) .025
(b) .049

1
(c) .012
(d) .975
(e) .475

Solution: b
Past performance 1991 Feb - 65% (a-27%)

3. Let p represent the proportion of defectives in a manufacturing process.


To test H : p = .25 vs A: p > .25, a random sample of size 5 is taken from
the process. If the number of defectives is 4 or more, the null hypothesis
is rejected. What is the probability of rejecting H if p = .20 ?

(a) .00192
(b) .9933
(c) .0096
(d) .0067
(e) .9936

Solution: d

4. A random sample of 100 voters in a community produced 59 voters in


favour of candidate A. The observed value of the test statistic for testing
the null hypothesis H: p =.5 versus the alternative hypothesis A: p 6= .5
is:

(a) 1.80
(b) 1.90
(c) 1.83
(d) 1.28
(e) 1.75

Solution: a

5. . It is believed that at least 60% of voters from a certain region in Canada


favour the free trade agreement (FTA). A recent poll indicated that out
of 400 randomly selected individuals, 250 favoured the FTA. At the 5%
level of significance, we would:
(a) Fail to reject H0 because the calculated value of the test statistic is
1.033 which is less than 1.645.
(b) Fail to reject H0 because the calculated value of the test statistic is
1.033 which is less than 1.96.

2006
c Carl James Schwarz 2
(c) Fail to reject H0 because the calculated value of the test statistic is
1.0204 which is less than 1.96.
(d) Fail to reject H0 because the calculated value of the test statistic is
1.0204 which is less than 1.645.
(e) Not need to test because everyone knows that FTA is good.
Solution: d

6. Consider a binomial parameter p and the test of H0 : p = 0.7. If X


represents the number of successes in 15 trials and if the null hypothesis
is rejected if X ≥ 13 , what is the probability of type I error for this test
?

(a) 0.004
(b) 0.035
(c) 0.050
(d) 0.127
(e) 0.965

Solution: d

7. A seed company claims that 80% of the seeds of a certain variety of tomato
will germinate if sown under normal growing conditions. A government
inspector is interested in whether or not the proportion of seeds germi-
nating is living up to the company’s claim. He randomly selects a sample
of 200 seeds from a large shipment and tests the sample for percentage
germination. If 155 of the 200 seeds germinate, then the calculated value
of the test statistic used to test the hypothesis of interest is:

(a) −.847
(b) −.884
(c) −.897
(d) −.825
(e) −.858

Solution: b

8. A large supermarket chain will increase its stock of bakery products if


more than 20% of its customers are purchasers of bakery products. A
random sample of 100 customers found 28% purchased bakery items. A
5% significance test is conducted to determine if the chain should increase
its bakery stock. The p-value for this situation is:

2006
c Carl James Schwarz 3
(a) 0.0500
(b) .0750
(c) .0375
(d) .0448
(e) .0228

Solution: e

9. In a study of the inheritance pattern of gender, a random sample of 100


had 60 males and 40 females. We wish to test if the pattern favours males.
The p-value for this test is

(a) 0.4772
(b) 0.94772
(c) 0.0456
(d) 0.0114
(e) 0.0228

Solution: e

10. A local McDonald’s manager will return a shipment of hamburger buns


if more than 10% of the buns are crushed. A random sample of 81 buns
finds 13 crushed buns. A 5% significance test is conducted to determine
if the shipment should be accepted. The p value for this situation is:

(a) 0.0348
(b) 0.0500
(c) .0700
(d) 0.0436
(e) 0.0218

Solution: ***

The next two questions refer to the the following situation.:


The University of Manitoba research station wishes to investigate if a new
variety of wheat is more resistant to a disease than an old variety. It
is known that this disease strikes approximately 15% of all plants of the
old variety. A field experiment was conducted, and of 120 new plants, 12
became infected.
11. The null and alternative hypothesis are:

2006
c Carl James Schwarz 4
(a) H0 : p = 0.10 H1 : p > 0.15
(b) H0 : p = 0.10 H1 : p > 0.10
(c) H0 : p = 0.15 H1 : p 6= 0.15
(d) H0 : p = 0.15 H1 : p < 0.15
(e) H0 : p = 0.15 H1 : p > 0.15

Solution: d
Past performance 1991 Feb - 90%

12. The calculated value of the test statistic is:

(a) 1.83
(b) −1.10
(c) 1.53
(d) −1.83
(e) −1.53

Solution: e
Past performance 1991 Feb - 55% (a-13%, d-18%)

13. A method currently used by doctors to screen women for possible breast
cancer fails to detect cancer in 15% of the women who actually have the
disease. A new method has been developed that researchers hope will be
able to detect cancer more accurately. A random sample of 80 women
known to have breast cancer are to be screened using the new method. At
the 0.05 level of significance, the researchers will be able to conclude that
the new method is better than the one currently in use if the appropriate
test statistic has a value:

(a) greater than 1.96


(b) less than 1.645
(c) less than −1.645
(d) greater than −1.96
(e) greater than 1.96 in absolute value

Solution: ***

14. Refer to the previous question. After the experiment was performed it
was discovered that the new method failed to detect the breast cancer in
8 of the 80 randomly selected women. The value of the test statistic is
equal to:

2006
c Carl James Schwarz 5
(a) 0.10
(b) −1.25
(c) 1.50
(d) 0.15
(e) −0.14

Solution: ***

2006
c Carl James Schwarz 6
Multiple Choice Questions
Hypothesis Testing - Populations Means from
two independent samples

1. A study was carried out to investigate the effectiveness of a treatment.


1000 subjects participated in the study, with 500 being randomly assigned
to the “treatment group” and the other 500 to the “control (or placebo)
group”. A statistically significant difference was reported between the
responses of the two groups (P < .005). Thus,

(a) there is a large difference between the effects of the treatment and
the placebo.
(b) there is strong evidence that the treatment is very effective.
(c) there is strong evidence that there is some difference in effect between
the treatment and the placebo.
(d) there is little evidence that the treatment has any effect.
(e) there is evidence of a strong treatment effect.

Solution: c
Not (a), (b), or (e) because there is nothing the question
about the size of the effect - it may statistically significant, but.
of no practical importance - refer to notes

2. Herbicide A has been used for years in order to kill a particular type of
weed, but an experiment is to be conducted in order to see whether a new
herbicide, Herbicide B, is more effective than Herbicide A. Herbicide A
will continue to be used unless there is sufficient evidence that Herbicide
B is more effective. The alternative hypothesis in this problem is that

(a) Herbicide A is more effective than Herbicide B.


(b) Herbicide B is more effective than Herbicide A.
(c) Herbicide A is not more effective than Herbicide B.
(d) Herbicide B is not more effective than Herbicide A.

1
(e) Herbicides A and B differ in effectiveness.

Solution: b

The next three questions refer to the following situation


The Excellent Drug Company claims its aspirin tablets will relieve headaches
faster than any other aspirin on the market. To determine whether Excel-
lent’s claim is valid, random samples of size 15 are chosen from aspirins
made by Excellent and the Simple Drug Company. An aspirin is given
to each of the 30 randomly selected persons suffering from headaches and
the number of minutes required for each to recover from the headache is
recorded. The sample results are:

$\overline{X}$ $s^2$
Excellent (E) 8.4 4.2
Simple (S) 8.9 4.6

A 5% significance level test is performed to determine whether Excellent’s


aspirin cures headaches significantly faster than Simple’s aspirin.
3. The appropriate hypothesis to be tested is:

(a) H: µE − µS = 0 A: µE − µS > 0
(b) H: µE − µS = 0 A: µE − µS 6= 0
(c) H: µE − µS = 0 A: µE − µS < 0
(d) H: µE − µS < 0 A: µE − µS = 0
(e) H: µE − µS > 0 A: µE − µS = 0

Solution: c

4. Absolute value of the calculated value of the appropriate test statistic is:

(a) 1.61
(b) 2.33
(c) 0.65
(d) 1.24
(e) 0.85

Solution: not available

5. Absolute value of the critical value for this test is:

(a) 1.960

2006
c Carl James Schwarz 2
(b) 1.701
(c) 2.048
(d) 2.145
(e) 1.645

Solution: not available

The next three questions refer to the following situation:


A new drug has been developed for treating stage four (near terminal)
AIDS patients. Patients were randomized to the old and new drug and
the time to death (months) was recorded:

OLD 32 <25 40 31 35 29
NEW 45 32 >48 34 37 27 35 >48

One patient died before twenty five months, but it was not known when.
Two patients were still alive after four years when the study was termi-
nated.
6. The value of the test statistic (computed on the OLD drug) for testing if
the new drug gave an increased life span is:

(a) 75
(b) 71
(c) 32
(d) 34
(e) 33

Solution: e
Past performance 1990 Apr - 84%

7. Question removed because not longer needed.


8. Which of the following is NOT CORRECT?
(a) Nonparametric procedures require fewer assumptions than paramet-
ric procedures.
(b) The SIGNED-RANK test should be used for paired data.
(c) Nonparametric procedures can be used with ordinal data because all
that is needed are the relative sizes of the values.
(d) Tied values are assigned a rank equal to average of the ranks associ-
ated with the tied values.

2006
c Carl James Schwarz 3
(e) The assumption of independence is not important for non-parametric
procedures.
Solution: e
Past performance 1990 Apr - 78% (C-11%)

9. A researcher wishes to test a particular hypothesis about a new technique


that has been developed in the laboratory. Experience shows that the
variable being measured can reasonably be considered to be normally dis-
tributed. In order to test to determine if the new technique is more precise
than the old standard technique the researcher uses the Wilcoxon Rank
Sum Test. The researcher has used a procedure which

(a) is easier to use and is more informative than a t-test.


(b) has greater power to detect small differences than the t test in this
case.
(c) may be easier to use but is less powerful than the t test in this
circumstance.
(d) is both inappropriate and invalid.
(e) will likely lead to a wrong conclusion here.

Solution: not available

10. We wish to test if a new feed increases the mean weight gain compared
to an old feed. At the conclusion of the experiment it was found that the
new feed gave a 10 kg bigger gain than the old feed. A two-sample t-test
with the proper one-sided alternative was done and the resulting p-value
was .082. This means:

(a) there is an 8.2% chance the null hypothesis is true.


(b) There was only a 8.2% chance of observing an increase greater than
10 kg (assuming the null hypothesis was true).
(c) There was only an 8.2% chance of observing an increase greater than
10 kg (assuming the null hypothesis was false).
(d) There is an 8.2% chance the alternate hypothesis is true.
(e) There is only an 8.2% chance of getting a 10 kg increase.

Solution: b
Past performance 1991 Feb - 50% (20%-a; 12%-d; 11%-e)
Past performance 1993 Feb - 86%
Past performance 1993 Apr - 81%
Past performance 1997 Aug - 74% (14%-d)
Past performance 2006 Dec - 77% (11%-a)

2006
c Carl James Schwarz 4
11. Following the analysis of some data on two samples drawn from popula-
tions in which the variable of interest is normally distributed, the p-value
for the comparison of the two sample means under the null hypothesis that
the two population means are equal (H0 µ1 = µ2 ) against HA : µ1 6= µ2
was found to be .0063. This p-value indicates that:

(a) there is very little evidence in the data for a conclusion to be reached.
(b) there is rather strong evidence against the null hypothesis.
(c) the evidence against the null hypothesis is not strong.
(d) the null hypothesis should be accepted.
(e) there is rather strong evidence against the alternative hypothesis.

Solution: b

The next four questions refer to the following situation:


Different varieties of fruits and vegetables have different amount of nu-
trients. These differences are important when these products are used to
make baby food. We wish to compare the carbohydrate content of two
varieties of peaches. The data was analyzed with SAS and the following
output was obtained:

VARIETY N MEAN STD DEV STD ERROR MIN MAX


A 5 33.6 3.781 1.691 29.000 38.000
B 7 25.0 10.392 3.927 2.000 33.000

VARIANCES T DF PROB > |T|


UNEQUAL 2.0110 8.0 0.0791
EQUAL 1.7490 10.0 0.1109

FOR $H_0: \textit{VAR~ARE~EQUAL}$, F’= 7.55 WITH 6 AND 4 DF PROB > F’= 0.0707

12. We wish to test if the two varieties are significantly different in their mean
carbohydrate content . The null and alternative hypotheses are:

(a) H: µ1 = µ2 A: µ1 < µ2
(b) H: µ1 = µ2 A: µ1 > µ2
(c) H: µ1 = µ2 A: µ1 6= µ2
(d) H: X 1 = X 2 A: X 1 < X 2
(e) H: X 1 = X 2 A: X 1 6= X 2

Solution: c
Past performance 1990 Apr - 97%
Past performance 1990 Dec - 86%

2006
c Carl James Schwarz 5
13. The test statistic, absolute critical value (at α=.05), and p-value are:

(a) 1.7490 2.2281 .1109


(b) 1.7490 1.8125 .0554
(c) 2.0110 2.3060 .0791
(d) 2.0110 1.8595 .0396
(e) 7.5500 6.1600 .0707

Solution: c
Past performance 1990 Apr - 44% ( a=41%, e=12%)

14. Which of the following is not correct?

(a) The equal variance test is used if F’ is about 5:1 or less.


(b) The unequal variance test is used if the ratio of the sample variances
is more than about 5:1
(c) If both sample sizes are large, the p-value for T ∗ can be approximated
using a normal distribution.
(d) If the df are fractional, we round down to the lower integer
(e) Outliers normally do not affect T ∗ very much in small samples.

Solution: e
Past performance 1990 Apr - 91%

15. These findings were submitted to a journal, and one reviewer questioned
the results because she believed that the data within each group were
not normally distributed. Consequently, a non-parametric procedure was
used, and the output follows:

WILCOXON SCORES (RANK SUMS)


SUM OF EXPECTED STD DEV MEAN
LEVEL N SCORES UNDER $H_0$ UNDER $H_0$ SCORE
A 5 45.50 32.50 6.14 9.10
B 7 32.50 45.50 6.14 4.64

WILCOXON 2-SAMPLE TEST (NORMAL APPROXIMATION)


S= 45.50 Z= 2.0371 PROB >|Z|=0.0416

T-TEST APPROX. SIGNIFICANCE=0.0664

An appropriate test statistic and p-value are:

(a) S=45.5 p-value=.0416

2006
c Carl James Schwarz 6
(b) S=45.5 p-value=.0208
(c) Z=2.0371 p-value=.0208
(d) Z=2.0371 p-value=.0664
(e) S=45.5 p-value=.0664

Solution: a
Past performance 1990 Apr - 64% (b=23%)

The next two questions refer to the following situation:


Different varieties of fruits and vegetables have different amount of nu-
trients. These differences are important when these products are used to
make baby food. We wish to compare the carbohydrate content of two
varieties of peaches. The data was analyzed with JMP and the following
output was obtained:

16. We wish to test if the two varieties are significantly different in their mean
carbohydrate content . The null and alternative hypotheses are:

(a) H: µ1 = µ2 A: µ1 < µ2
(b) H: µ1 = µ2 A: µ1 > µ2
(c) H: µ1 = µ2 A: µ1 6= µ2
(d) H: X 1 = X 2 A: X 1 < X 2
(e) H: X 1 = X 2 A: X 1 6= X 2

Solution: c
Past performance 1996 Dec - 96%

17. The test statistic, and p-value are:

(a) 1.359 .2039


(b) 4.264 .1020
(c) 3.137 .2039
(d) 10 .2039

2006
c Carl James Schwarz 7
(e) -2.725 .1020

Solution: a
Past performance 1996 Dec - 95%

18. The following are percentages of fat found in 5 samples of each of two
brands of ice cream:

A 5.7 4.5 6.2 6.3 7.3


B 6.3 5.7 5.9 6.4 5.1

Which of the following procedures is appropriate to test the hypothesis of


equal average fat content in the two types of ice cream?

(a) Paired t-test with 5 d.f.


(b) Two sample t-test with 8 d.f.
(c) Paired t-test with 4 d.f.
(d) Two sample t-test with 9 d.f.
(e) Sign test

Solution: b

19. The life, in months of service, before a failure of the color television picture
tube in a random sample of 6 television sets manufactured by Company
A and 8 television sets manufactured by Company B are as follows:

Company Life of picture tube (months)


A 32 25 40 31 35 29
B 45 32 47 34 37 27 35 44

The calculated value of the Rank-Sum test statistic for testing the null
hypothesis that the life, in months of service, before failure of picture
tube is the same both companies is:

(a) 75
(b) 71
(c) 32
(d) 34
(e) 33

Solution: e

The next three questions refer to the following situation.


In order to compare two kinds of feed, thirteen pigs are split into two
groups, and each group received one feed. The following are the gains in
weight (kilograms) after a fixed period of time:

2006
c Carl James Schwarz 8
Feed A: 8.0 7.4 5.8 6.2 8.8 9.5
Feed B: 12.0 18.2 8.0 9.6 8.2 9.9 10.3

We wish to test the hypothesis that Feed B gives rise to larger weight
gains. The output from SAS is as follows:

Variable: GAIN Weight gain (kg)

FEED N Mean Std Dev Std Error


----------------------------------------------------
a 6 7.45000000 1.33529023 0.54512995
b 7 10.88571429 3.49400848 1.32061107

Variances T DF Prob>|T|
---------------------------------------
Unequal -2.4048 7.9 0.0431
Equal -2.2596 11.0 0.0451

For H0: Variances are equal, F’ = 6.85 DF = (6,5) Prob>F’ = 0.0520

20. The appropriate test statistic and p-value are:

(a) T ∗ = -2.4048; p-value = .0431


(b) T ∗ = -2.4048; p-value = .0216
(c) T ∗ = -2.2596; p-value = .0451
(d) T ∗ = -2.2596; p-value = .0256
(e) F’ = 6.85; p-value = .0520

Solution: b
Past performance 1991 Apr - 56% (A-20%)

21. The results were written up in a report, but a reviewer of the report
thought that some of the assumptions necessary for a two-sample t-test
might be violated. Consequently, a non-parametric procedure was also
done. The rank-sum test statistic computed for Feed A and the corre-
sponding p-value are:

(a) W = 25.5 p-value = .009


(b) W = 25.5 p-value = .018
(c) W = 23.5 p-value = .003
(d) W = 23.5 p-value = .006
(e) W = 7.45 p-value = .043

2006
c Carl James Schwarz 9
Solution: a
Past performance 1991 Apr - 82%

22. The rejection region in terms of WA at α=0.05 is:

(a) Reject H if WA 2 29
(b) Reject H if WA 3 55
(c) Reject H if WA 2 36
(d) Reject H if WA 2 27
(e) Reject H if WA 2 34

Solution: a
Past performance 1991 Apr - 77%

The next 3 questions refer to the following situation.


In order to compare two kinds of feed, thirteen pigs are split into two
groups, and each group received one feed. The following are the gains in
weight (kilograms) after a fixed period of time:

Feed A: 8.0 7.4 5.8 6.2 8.8 9.5


Feed B: 12.0 18.2 8.0 9.6 8.2 9.9 10.3

We wish to test the hypothesis that Feed B gives rise to larger weight
gains. The output from JMP is as follows:

23. The appropriate null and alternate hypotheses are:


(a) H: X A = X B A: X A 6= X B
(b) H: µA = µB ; A: µA 6= µB
(c) H: X A = X B A: X A < X B
(d) H: µA = µB ; A: µA < µB
(e) H: X A = X B A: X A > X B

2006
c Carl James Schwarz 10
Solution: d
Past performance 1997 Aug - 90%

24. The appropriate test statistic is:


(a) -3.269
(b) 1.535
(c) .0566
(d) -2.130
(e) -6.647
Solution: d
Past performance 1997 Aug - 95%

25. The p-value for the test is:


(a) .0566
(b) .0283
(c) .1132
(d) .1087
(e) 2.130
Solution: b
Past performance 1997 Aug - 88%

The following four questions refer to the following situation:


Nitric oxide is one component of the pollution emitted by automobiles.
Two different control devices are to be compared by equipping 10 cars
with device I and 7 cars with device II. The data was analyzed with SAS
and the output follows:

TYPE N MEAN STD DEV STD ERROR MIN MAX


I 10 1.0160 0.0377 0.0119 0.9600 1.0800
II 7 0.9942 0.0350 0.0132 0.9500 1.0500

VARIANCES T DF PROB > |T|


UNEQUAL 1.2173 13.7 0.2441
EQUAL 1.2004 15.0 0.2486

FOR H0: VAR ARE EQUAL, F’= 1.16 WITH 9 AND 6 DF PROB > F’= 0.8868

26. We wish to test if the mean level of nitric oxide from device I is greater
than that of device II. The null and alternate hypotheses are:

2006
c Carl James Schwarz 11
(a) H: µ1 − µ2 =0 A: µ1 − µ2 6= 0
(b) H: X 1 − X 2 =0 A: X 1 − X 2 < 0
(c) H: µ1 − µ2 =0 A: µ1 − µ2 < 0
(d) H: X 1 − X 2 =0 A: X 1 − X 2 > 0
(e) H: µ1 − µ2 =0 A: µ1 − µ2 < 0.

Solution: c

27. The test statistic, rejection region (α=.05), and the p-value are:

(a) T ∗ =1.2173; reject if T ∗ 1.7709; p-value=.2441


(b) T ∗ =1.2004; reject if T ∗ 1.7530; p-value=.2486
(c) T ∗ =1.2004; reject if T ∗ 1.7530; p-value=.1243
(d) T ∗ =1.2173; reject if T ∗ 1.7709; p-value=.1220
(e) T ∗ =1.2004; reject if T ∗ 2.1314; p-value=.1243

Solution: c

28. Which of the following is not correct?

(a) The equal variance test is used if F’ is about 5:1 or less.


(b) The unequal variance test is used if the ratio of the sample variances
is more than about 5:1
(c) If both sample sizes are large, the p-value for T ∗ can be approximated
using a normal distribution.
(d) If the df are fractional, we round down to the lower integer
(e) Outliers normally do not affect T ∗ very much in small samples.

Solution: e

29. These findings were submitted to a journal, and one reviewer questioned
the results because she believed that the data within each group were
not normally distributed. Consequently, a non-parametric procedure was
used, and the output follows:

WILCOXON SCORES (RANK SUMS)

SUM OF EXPECTED STD DEV MEAN


LEVEL N SCORES UNDER H0 UNDER H0 SCORE
I 10 102.00 90.00 10.20 10.20
II 7 51.00 63.00 10.20 7.29

2006
c Carl James Schwarz 12
WILCOXON 2-SAMPLE TEST (NORMAL APPROXIMATION)
(WITH CONTINUITY CORRECTION OF .5)
S= 51.00 Z=-1.1278 PROB >|Z|=0.2594

T-TEST APPROX. SIGNIFICANCE=0.2760

An appropriate test statistic and p-value are:

(a) S=51.0 p-value = .2594


(b) S=51.0 p-value = .1297
(c) Z=-1.1278 p-value = .2594
(d) Z=-1.1278 p-value = .2760
(e) S=90.0 p-value = .1297

Solution: b

The next four questions refer to the following situation.


Two different emission control devices for automobiles were being tested
to determine if Device I gives greater emissions, on average, than Device
II. Twenty cars of the same model and year are equipped with the devices;
ten were equipped with Device I and ten were equipped with Device II.
Unfortunately, three cars were involved in accidents and had to be removed
from the study. The following output was obtained from SAS.

VARIABLE: NOX LEVEL OF NITRIC OXIDE

TYPE N MEAN STD DEV STD ERROR MIN MAX


I 10 1.032 0.0522 0.0165 0.9600 1.1500
II 7 1.004 0.0299 0.0113 0.9600 1.0500

FOR H0: VARIANCES EQUAL, F’=3.05 WITH 9 AND 6 DF PROB > F’= 0.1882

VARIANCES T DF PROB > |T|


UNEQUAL 1.3844 14.6 0.1871
EQUAL 1.2590 15.0 0.2273

30. The null and alternate hypotheses are:

(a) H: µ1 − µ2 > 0 A: µ1 − µ2 = 0
(b) H: X 1 − X 2 > 0 A: X 1 − X 2 = 0
(c) H: X 1 − X 2 = 0 A: X 1 − X 2 > 0
(d) H: µ1 − µ2 = 0 A: µ1 − µ2 < 0

2006
c Carl James Schwarz 13
(e) H: µ1 − µ2 = 0 A: µ1 − µ2 > 0

Solution: e
Past performance 1990 Feb - 97%

31. The value of the proper test statistic and rejection region (α= 0.05) are:

(a) T ∗ = 1.38; reject H if T ∗ > 1.71


(b) T ∗ = 1.26; reject H if T ∗ > 1.75
(c) T ∗ = 1.38; reject H if T ∗ > 2.14
(d) T ∗ = 1.26; reject H if T ∗ > 2.19
(e) T ∗ = 1.26; reject H if T ∗ < −2.14 or T ∗ > 2.14

Solution: b
Past performance 1990 Feb - 92%

32. The p-value for this test is:

(a) .1882
(b) .1871
(c) .2273
(d) .0936
(e) .1136

Solution: e
Past performance 1990 Feb - 65% (C-26%)

33. Suppose we wish to be 80% sure of detecting a difference of 5 ppm assum-


ing that the the true variance for each type is 4 (ppm 2 ) when testing at
α=.05. The required sample size is estimated to be:

(a) 12 cars for each device for a total of 24 cars.


(b) 4 cars for each device for a total of 8 cars
(c) 12 cars in total; 6 cars for each device.
(d) 4 cars in total; 2 cars for each device
(e) 24 cars in total; 12 cars for each device.

Solution: b
Past performance 1990 Feb - 56% (A-27%)

The next three questions REFER TO THE FOLLOWING SIT-


UATION:

2006
c Carl James Schwarz 14
A sheep producer wishes to investigate if the mean number of tapeworms
in the stomachs of Suffolk sheep is less if they have been treated with a
drug compared to sheep not treated. He obtains the following sample data
to conduct a 5% significance test:

Sheep Number Mean Standard Range


Group Deviation
1 -No Drug 7 43.2 17.0 42
2 - Drug 7 28.6 14.1 37

34. The null and alternate hypotheses are:

(a) H0 : µ1 − µ2 = 0; H1 : µ1 − µ2 < 0
(b) H0 : µ1 − µ2 = 0; H1 : µ1 − µ2 > 0
(c) H0 : X 1 − X 2 = 0; H1 : X 1 − X 2 < 0
(d) H0 : X 1 − X 2 = 0; H1 : X 1 − X 2 > 0
(e) H0 : µ1 − µ2 = 0; H1 : µ1 − µ2 6= 0

Solution: b

35. The calculated value of the test statistic is:

(a) 1.54
(b) 1.28
(c) 1.75
(d) 2.1
(e) 4.41

Solution: not available

36. The critical value for this test situation is:

(a) 1.8946
(b) 1.7709
(c) 1.9432
(d) 1.7823
(e) 2.1788

Solution: not available

2006
c Carl James Schwarz 15
37. Calculate the observed value of the test statistic for the test of H0 : µ1 −
µ2 = 0 versus Ha : µ1 − µ2 < 0 on the basis of the following information.
Test the hypotheses at the 5% level of significance.

Sample statistics for group 1:


sample size 50
sample variance 100
sample mean 403
Sample statistics for group 2:
sample size 60
sample variance 150
sample mean 409

(a) zobs = -2.83 so we conclude that µ1 is less than µ2 .


(b) zobs = +2.83 so we conclude that µ1 is greater than µ2 .
(c) zobs = +2.78 so we conclude that µ1 is greater than µ2 .
(d) zobs = -2.78 so we conclude that µ1 is less than µ2 .
(e) zobs = -2.78 so we conclude that µ1 is greater than µ2 .

Solution: not available

38. Which of the following assumptions were necessary to allow us to conduct


the test of hypotheses in the previous question?.

(a) The means of the two populations are equal, i.e. µ1 = µ2 .


(b) The two population variances are equal, i.e. s21 = s22 .
(c) Each population follows a normal distribution.
(d) Both (b) and (c) are necessary assumptions.
(e) None of the above assumptions are necessary.

Solution: not available

39. A researcher is going to conduct an experiment in order to compare two


drugs – a new drug and an old drug. The researcher would like to see
whether there is sufficient evidence to say that the new drug is better
than the old drug. In this problem, the researcher will commit a type I
error if:

(a) she concludes that the drugs are equal in effectiveness when in fact
the new drug is better.
(b) she concludes that the drugs are equal in effectiveness when in fact
the old drug is better.

2006
c Carl James Schwarz 16
(c) she concludes that the old drug is better when in fact the new drug
is better.
(d) she concludes that the new drug is better when in fact the drugs are
equal in effectiveness.
(e) she concludes that the old drug is better when in fact the drugs are
equal in effectiveness.

Solution: d
Past performance 1990 Dec - 83%
Past performance 1991 Feb - 83% (a-10%)

The next four questions refer to the following situation.


An experiment was conducted to assess the efficacy of spraying oats with
malathion (at .25 lbs/acre) to control the cereal leaf beetle. A sample of 10
farms were selected at random from southwest Manitoba. Each farm was
assigned at random to either the control group (no spray) or the treatment
group (spray). At the conclusion of the experiment, a plot on each farm
was selected and the number of larvae per stem was measured. Here are
two possible outputs from DataDesk (only one of which is correct; some
output hidden)

t-Tests
separate estimates of sigma_1, sigma_2

Test $H_0:\mu_{not spray}- \mu_{spray} = 0$


vs $H_A:\mu_{not spray}- \mu_{spray} > 0$

Sample mean(not spray) =4.0947


Sample mean(spray) =3.0508

t-statistic=1.896 with * d.f.

------------------------------------
t-Test, paired samples
not spray - spray:

Test $H_0: \mu =0$ vs $H_A$:$\mu > 0$

Sample mean = 1.0440

t-statistic=1.887 with * d.f.

40. The appropriate test statistic and p-value are:

2006
c Carl James Schwarz 17
(a) 1.896, 0.033
(b) 1.896, 0.131
(c) 1.896, 0.065
(d) 1.887, 0.059
(e) 1.887, 0.118

Solution: c
Past performance 1993 Feb - 38% (a-53%)

41. A Type II error would occur if:

(a) We conclude malathion is ineffective when in fact it was effective.


(b) We conclude malathion is effective when in fact it is ineffective.
(c) We conclude malathion is effective when in fact it is effective.
(d) We conclude malathion is ineffective when in fact it is ineffective.
(e) We conclude malathion is neither ineffective or effective.

Solution: a
Past performance 1993 Feb - 83% (b-17%)

42. Power refers to:

(a) the ability to detect an effect of malathion when in fact there is no


effect.
(b) the ability to not detect an effect of malathion when in fact there is
no effect.
(c) the ability to detect an effect of malathion when in fact there is an
effect.
(d) the ability to not detect an effect of malathion when in fact there is
an effect.
(e) the ability to make a correct decision regardless if malathion has an
effect or not.

Solution: c
Past performance 1993 Feb - 66% (a-10%; e-15%)

43. Consider an experiment to investigate the efficacy of different insecticides


in controlling pests and their effects on subsequent yield. What is the best
reason for randomly assigning treatment levels (spraying or not spraying)
to the experimental units (farms)?

2006
c Carl James Schwarz 18
(a) Randomization make the experiment easier to conduct because we
can apply the insecticide in any pattern rather than in a systematic
fashion.
(b) Randomization will tend to average out all other uncontrolled fac-
tors such as soil fertility so that they are not confounded with the
treatment effects.
(c) Randomization makes the analysis easier because the data can be
collected and entered into the computer in any order.
(d) Randomization is required by statistical consultants before they will
help you analyze the experiment.
(e) Randomization implies that it is not necessary to be careful during
the experiment, during data collection, and during data analysis.

Solution: b
Past performance 1990 Feb - 97%
Past performance 1993 Feb - 98%
Past performance 1996 Dec - 100%
Past performance 2006 Dec - 99%

The following 3 questions refer to the following situation:


In order to study the harmful effects of DDT poisoning, the pesticide was
fed to 6 randomly chosen rats out of a group of 12 rats. The other 6 rats
were used as the control group. The following data gives the measure-
ments of the amount of tremor detected in the bodies of each rat after the
experiment: The more tremor, the more harmful.

Poisoned group: 12.2 16.9 25.0 22.4 8.5 20.6


Control group : 11.1 12.1 9.3 6.6 9.6 8.2

Here is some output from JMP: (the differences are computed as control-
poisoned)

44. The null and alternate hypotheses are:


(a) H: µc = µp A: µc < µp

2006
c Carl James Schwarz 19
(b) H: X c = X p A: X c < X p
(c) H: pc = pp A: pc < pp A: βc < βp
(d) H: X c = X p A: X c 6= X p
Solution: a
Past performance 1998 Dec - 95%

45. Refer to the JMP output above. Which is correct?


(a) We are about 95% confident that the rats in the poisoned group have
all between 14 and 2 more tremors than the control group.
(b) The std error measures how much the estimated difference could vary
if a new experiment was done.
(c) We are about 95% confident that the sample mean number of tremors
for the control group is between 2 and 14 more than the sample mean
number of tremors in the poisoned group.
(d) The test-statistic is a measure of how far the data is from that ex-
pected under the alternate hypothesis.
(e) The p-value measures the probability that there is no difference in
the mean number of tremors between the two groups.
Solution: b
Past performance 1998 Dec - 48% (20% a; 23% e))
Note that (a) refers to individual rats, not to the mean over all the rats
Note that (e) incorrect states that p-values measure the probability of an
hypothesis

46. Which of the following is correct?


(a) The p-value is small. There is good evidence that the two means are
equal.
(b) The p-value is large. There is good evidence that the two means are
different.
(c) The p-value is small. There is good evidence that the two sample
means differ, in fact, the control group appears to have fewer tremors,
on average.
(d) The confidence interval does not include 0. Hence, there is evidence
that the mean number of tremors for all potential rats in the poisoned
group is larger than that in the control group.
(e) The confidence interval does not include 0. Hence there is no evidence
that the means are the same for both groups.

2006
c Carl James Schwarz 20
Solution: d
Past performance 1998 Dec - 23% (20% e; 53% c)
Note: (c) refers to SAMPLE means not population means.

The following 2 questions refer to the following situation


A researcher wants to see if birds that build larger nests lay larger eggs.
She selects two random samples of nests: one of small nests and the other
of large nests. She measures one egg from each nest. The data are sum-
marized below.

47. The null and alternate hypothesis of interest is:


(a) H : µL = µS ; A : µL > µS
(b) H : Y L = Y S ; A:YL >YS
(c) H : µL = µS ; A : µL 6= µS
(d) H : Y L = Y S ; A : Y L 6= Y S
(e) H : µL = µS ; A : µL < µS

2006
c Carl James Schwarz 21
Solution: a
Past performance 2006 Dec - 87%

48. A Type I (false positive) error would occur if:


(a) We conclude that larger nests have the same size eggs (on average)
when in fact they are larger.
(b) We conclude that larger nests have larger eggs (on average) when in
fact they are larger.
(c) We conclude that larger nests have the same size eggs (on average)
when in fact there is no difference in the mean.
(d) We conclude that larger nests had larger eggs (on average) when in
fact there is no difference in the mean.
(e) I ever take a statistics course again in my life! (just kidding).
Solution: d
Past performance 2006 Dec - 77% (20%-a)

2006
c Carl James Schwarz 22
Multiple Choice Questions
Testing - Two independent samples on
proportions

1. The power takeoff driveline on tractors used in agriculture is a potentially


serious hazard to operators of farm equipment. The driveline is covered
by a shield in new tractors, but for a variety of reasons, the shield is often
missing on older tractors. Two type of shields are the bolt-on and the flip-
up. It was believed that the bolt-on shield was perceived as a nuisance
by the operators and deliberately removed, but the flip-up shield is easily
lifted for inspection and maintenance and may be left in place. In a study
initiated by the National Safety Council of the U.S., a sample of older
tractors with both types of shields was taken to see what proportion were
removed. Of 183 tractors designed to have bolt-on shields, 35 had been
removed. Of the 136 tractors with flip-up shields, 15 were removed. We
wish to test the hypothesis H: pb = pf vs A: pb 6= pf where pb and pf are
the proportion of tractors with the bolt-on and flip-up shields removed,
respectively. The test-statistic is computed to be 1.97. The p-value is:

(a) .025
(b) .049
(c) .012
(d) .975
(e) .475

Solution: b
Past performance 1991 Feb - 65% (a-27%)

2. Random samples of 1000 bolts manufactured by machine A and 1000 bolts


manufactured by machine B showed 52 and 23 defective bolts respectively.
The observed value of the test statistic for testing the null hypothesis that
there is no difference in the performance of the machines is:

(a) 3.29

1
(b) 2.47
(c) 8.56
(d) 12.32
(e) 3.41

Solution: e

3. Two different medical procedures are widely used to treat a disease. One
hundred patients were randomly selected for each procedure in a recent
clinical trial, with the following results:

n number of successes (no recurrence of disease).


procedure 1 100 78
procedure 2 100 87

What is the absolute value of the test statistic calculated from the data for
testing the null hypothesis that there is no difference between the success
rates between procedure 1 and procedure 2?

(a) +0.658
(b) +1.675
(c) +2.385
(d) +2.575
(e) +31.610

Solution: b

4. A proponent of innovative teaching methods wishes to compare the effec-


tiveness of teaching English by the traditional classroom lecture system
(T) and by the extensive use of audio visual aids To do so a class of 250
is randomly divided into two groups

100 are taught by method T; of these 63 pass a test


150 are taught by method A; of these 105 pass a test.

The appropriate test statistic for testing whether the traditional method
has a lower passing rate than the audio visual methods:
.63−.70
(a) √ .672×.328 .672×.328
100 + 150
.63−.70
√ .630×.370
(b)
100 + .700×.300
150
.63−.70
(c) √ .667×.333 .667×.333
100 + 150

2006
c Carl James Schwarz 2
(63−67.2)2 (37−32.8)2 (105−100.8)2 (45−49.2)2
(d) 67.2 + 32.8 + 100.8 + 49.2
(e) none of the above

Solution: a

5. Random samples of 50 women and 50 men are taken at the University of


Manitoba. They are asked their reaction to increased tuition fees. The
results are as follows:

Men in favour 18 Women in favour 22

It is suspected that a larger proportion of women favour such increases.


Based on the data (with α = .05):

(a) Our suspicions are confirmed as the p-value is .2061.


(b) Our suspicions are confirmed as the p-value is .2939.
(c) Our suspicions are confirmed as the p-value is 0.82.
(d) We cannot conclude that a larger proportion of women are in support
of the increase as the p-value is .2061.
(e) We cannot conclude that a larger proportion of women are in support
of the increase as the p-value is .2939.

Solution: not available

The next three questions refer to the following situation:


In the past decade there have been extensive antismoking campaigns to
try and reduce the proportion of smokers in the population. In 1982, a
survey of 350 adult females revealed that 148 smoked. In 1989, 488 adult
females were surveyed and 163 smoked. Let p represent the proportion of
adult female smokers.
6. The null and alternate hypotheses are:

(a) H: p1982 = p1989 A: p1982 > p1989


(b) H: p1982 6= p1989 A: p1982 = p1989
(c) H: p1989 = .423 A: p1989 < .423
(d) H: p1982 = .334 A: p1982 > .334
(e) H: p1982 = p1989 A: p1982 6= p1989

Solution: a
Past performance 1990 Feb - 83%
Past performance 1990 Apr - 92%
Past performance 1990 Dec - 68% (22% - e)

2006
c Carl James Schwarz 3
7. The test statistic would be computed as:
q
(a) .09/ .423(1−.423)
350 + .334(1−.334)
488
q
.423(1−.423) .334(1−.334)
(b) .09/ 838 + 838
q
(c) .09/ .371(1−.371)
838
q
(d) .09/ .371(1−.371)
350 + .371(1−.371)
488
q
(e) .09/ .423(1−.423)
350 + .370(1−.370)
488

Solution: d
Past performance 1990 Feb - 65% (A-32%)
Past performance 1990 Apr - 83%

8. The p-value is found to be:

(a) 2.63
(b) .004
(c) .009
(d) .496
(e) .089

Solution: b
Past performance 1990 Feb - 80%
Past performance 1990 Apr - 83%

9. Suppose the p-value was found to be .053. This means:

(a) The probability that the proportion of smokers has not changed is
.053.
(b) The proportion of smokers has definitely decreased.
(c) There is some, but not overwhelming evidence, that the proportion
of smokers has decreased.
(d) There is no evidence that the proportion of smokers is the same in
both years.
(e) There is overwhelming evidence that the proportion of smokers has
stayed the same.

Solution: c
Past performance 1990 Dec 61%

2006
c Carl James Schwarz 4
10. In a similar study of adult males, the p-value was found to be .053. This
means:

(a) The probability that the proportion of male smokes has not changed
is .053.
(b) The proportion of male smokers has definitely decreased.
(c) If the proportion of male smokers has not changed, then there is only
a .053 chance of seeing the observed drop in the smoking rate in the
survey.
(d) If the proportion of male smokers has changed, then there is only a
.053 chance of detecting a difference.
(e) If the proportion of smokers has changed, then there is only a .053
chance of seeing the observed drop in the smoking rate in the survey.

Solution: c
Past performance 1990 Feb - 38% (A-14%, B-38%, C-38%, D-29%, E-17%)
Past performance 1990 Apr - 64%(C-64%, D-11%, E-21%)

11. In a random sample of 200 University of Manitoba graduate students, it


was found that 66% of them had previously attended some other college
or university. In a random sample of 100 University of Waterloo graduate
students, it was found that 35% of them had previously attended some
other college or university. A 95% confidence interval for estimating the
difference in proportions of graduate students who had previously attended
some other college or university between the University of Manitoba and
the University of Waterloo is:
q
1 1
(a) (0.66 − 0.35) ± 1.96 (0.3366)(0.6633)(( 200 + 100 )
q
(b) (0.66 − 0.35) ± 1.96 (0.66)(0.34)
200 + (0.35)(0.65)
100
q
1 1
(c) (0.66 − 0.35) ± 1.96 ((0.5566)(0.4433)(( 100 + 200 )
q
1 1
(d) (0.33 − 0.35) ± 1.96 (0.5566)(0.4433)(( 100 + 200 )
q
1 1
(e) (0.33 − 0.35) ± 1.645 (0.5566)(0.4433)(( 100 + 200 )

Solution: not available

The next two questions refer to the following situation:


One criticism of reforestation efforts after timber harvesting is that too
few of the seedling survive. An experiment was conducted to assess if
mulching the slash (limbs, roots, small branches, etc.) and leaving the
mulch on the ground improves the survival rate compared to just leaving

2006
c Carl James Schwarz 5
the slash on the ground. It is believed that mulching will cause the ma-
terial to break down sooner and release the nutrients to the seedlings. A
total of 500 seedlings were randomly assigned to the two treatments and
the two year survival rate was measured. Of the 250 seedling receiving
the “mulching” treatment, 75 survived; of the 250 seedlings receiving the
“control” treatment, 55 survived.
12. The null and alternate hypotheses are: (m=mulch, c=control)

(a) H: pm =.22 A: pm > .22


(b) H: µm =.22 A: µm > .22
(c) H: pm -pc =0 A: pm − pc > 0
(d) H: µm -µc =0 A: µm − µc > 0
(e) H: pm -pc =0 A: pm − pc 6= 0

Solution: c
Past performance 1993 Feb - 82% (d=19%)

13. The value of the test statistic and the p-value are:
(a) 2.76, .003
(b) 2.05, .042
(c) 2.76, .006
(d) 2.05, .021
(e) 2.05, .011
Solution: d
Past performance 1993 Feb - 84%

2006
c Carl James Schwarz 6