Академический Документы
Профессиональный Документы
Культура Документы
1. The following results were obtained from medical records about age (x) and systolic blood
pressure (y) of a group of 10 women:
The mean and variance of x are 53 and 121 respectively and mean and variance of y are 142 and
169 respectively. Σ(x-X) (y-Y) = 1210. (X is the mean of x and Y is mean of y).
Find the regression equation of systolic blood pressure on age of a women. Also estimate systolic
blood pressure of a women whose age is 55.While solving this example, explain the relevant
terms used in details.
Solution:
2
x
Let X̄ and σ be the mean and variance of age (x) respectively.
2
∴ X̄ =53 and σ x=121 (Given)
2
y
Let Ȳ and σ be the mean and variance of systolic blood pressure (y) respectively.
2
∴ Ȳ =142 and σ y=169 (Given)
σy
(Y −Ȳ )= ρ ( X− X̄ ),
σx
n
1
∑ ( X− X̄ )(Y −Ȳ )
Cov (x , y ) n i=1
ρ= =
√ σ 2 2
σ σxσ y
where x y is the correlation coefficient between X and Y.
1
(1210 )
10 11
ρ= =
11 x 13 13
σy
(Y −Ȳ )= ρ ( X− X̄ ),
σx
11 13
∴ ( Y −142 )= x ( X −53 ) ,
13 11
⇒ ( Y −142 )=( X −53)
⇒ Y =142−53+ X ,
⇒ Y =89+ X ,
( 1)
Equation (1) is the required regression equation of systolic blood pressure (y) on age of a women
(x).
Y =89+ X
Y =89+(55)
Y = 144
2. A sarcastic graduate student from Waterloo was testing the American media at the Salt Lake
City Olympics about Canadian history and geography. Reporters were given a 10-item multiple-
choice questionnaire to complete. Each question had four possible answers. Unfortunately, one
reporter wanted to return to the bar for the rest of the morning, so he simply guessed on all of the
questions and then handed in the questionnaire. What is the probability that he will score at least
6 out of 10 in this questionnaire? State clearly the assumptions you need to make to solve this
question. Also, write a note on the relevant probability distribution you used to solve this
example.
Solution:
Above problem can be handled using binomial probability distribution. The basic assumptions
for binomial distribution are given by
1: Each trial results in one of two possible, mutually exclusive, outcomes. One of the possible
outcomes is denoted (arbitrarily) as a success and the other is denoted a failure
3: The trials are independent; that is the outcome of any particular trial is not affected by the
outcome of any other trial.
P( X= x )=b (x;n,p ) = ¿ ¿¿ ¿
( n ¿) ¿¿ ¿
¿ ¿
¿
where n ranges over the set of positive integers and probability parameter p satisfies 0 p
1, q = 1 p.
In the above question, we have number of trials n= 10 (Reporters were given a 10-item multiple-
choice questionnaire to complete)
The probability of success=1/4=0.25, (Because each question has four possible answers)
Using binomial distribution, probability that he will score at least 6 out of 10 in this
questionnaire is given by
P(X≥6)=P(X=6)+P(X=7)+ P(X=8)++P(X=9)+P(X=10)
( 1
=¿ ¿¿0¿ ) ¿ ¿
¿
¿ = 0.016222000+0.003089905+0.000386238+0.000028610+0.00000095
P( X≥6)= 0 .019727707
3. In order to compare the batting performances in Test Cricket of Indian legends, in a meeting
of administrators it was proposed that instead of using arithmetic mean of runs scored in all the
innings played, Median score be used as an average to represent the performance. But one
member, who happened to be a statistician, objected to the proposal saying that by doing so we
are favouring the players like Sachin Tendulkar.
Solution:
The arithmetic mean or average of n observations x̄ (pronounced x bar) is simply the sum of
the observations divided by their number.
The major advantage of the mean is that it uses all the data values, and is, in a statistical sense,
efficient. The main disadvantage of the mean is that it is vulnerable to what are known as
outliers. Outliers are single observations which, if excluded from the calculations, have
noticeable influence on the results. For example if we had entered ‘21’ instead of ‘2.1’ in the
calculation of the mean, we would find the mean changed from 1.50 kg to 7.98 kg. It does not
necessarily follow, however, that outliers should be excluded from the final data summary, or
that they result from an erroneous measurement. Thus we can say that in some practical
situations mean can mislead in decision making. In a situation like cricket, if a player hits a
double century for two matches and after that shows a very bad performance in the following
series of matches, mean will definitely defend his/her performance because of two outlier
performances.
The median is estimated by first ordering the data from smallest to largest, and then counting
upwards for half the observations. The estimate of the median is either the observation at the
centre of the ordering in the case of an odd number of observations, or the simple average of the
middle two observations if the total number of observations is even. Median is not affected by
outliers at all and is measure of position rather than magnitude. So definitely, a cricket player
who hits outlier double centuries would not get favor from median. Most important relationship
between arithmetic mean and median is that in case of symmetrical distribution, they both are
same. In case of right skewed distribution, median<mean and in case of negatively skewed
distribution median>mean. Of course a player like Sachin Tendulkar will get favor from median
because of negatively skewed distribution of runs scored (369 Test matches, 100-6, 50-39, duck-
30).
b) If it is further decided to compare the performance based on consistency in performance,
which statistical measures would you recommend? Substantiate your answer with appropriate
justification.
Solution:
In order to compare the performance based on consistency, it is better to take into consideration
both deviation in runs scored and individual mean scores, which can be done using statistical
measure called coefficient of variation (C.V).
X̄
x 100
C.V=
σ x