Вы находитесь на странице: 1из 7

Business Statistics

1. The following results were obtained from medical records about age (x) and systolic blood
pressure (y) of a group of 10 women:

The mean and variance of x are 53 and 121 respectively and mean and variance of y are 142 and
169 respectively. Σ(x-X) (y-Y) = 1210. (X is the mean of x and Y is mean of y).

Find the regression equation of systolic blood pressure on age of a women. Also estimate systolic
blood pressure of a women whose age is 55.While solving this example, explain the relevant
terms used in details.

Solution:
2
x
Let X̄ and σ be the mean and variance of age (x) respectively.
2
∴ X̄ =53 and σ x=121 (Given)
2
y
Let Ȳ and σ be the mean and variance of systolic blood pressure (y) respectively.
2
∴ Ȳ =142 and σ y=169 (Given)

Regression equation of Y on X is given by

σy
(Y −Ȳ )= ρ ( X− X̄ ),
σx

n
1
∑ ( X− X̄ )(Y −Ȳ )
Cov (x , y ) n i=1
ρ= =
√ σ 2 2
σ σxσ y
where x y is the correlation coefficient between X and Y.

1
(1210 )
10 11
ρ= =
11 x 13 13

∴ regression equation of systolic blood pressure on age of a women is given by

σy
(Y −Ȳ )= ρ ( X− X̄ ),
σx
11 13
∴ ( Y −142 )= x ( X −53 ) ,
13 11
⇒ ( Y −142 )=( X −53)
⇒ Y =142−53+ X ,
⇒ Y =89+ X ,
( 1)

Equation (1) is the required regression equation of systolic blood pressure (y) on age of a women
(x).

Estimate of systolic blood pressure of a women whose age is 55 is given by

Y =89+ X
Y =89+(55)
Y = 144
2. A sarcastic graduate student from Waterloo was testing the American media at the Salt Lake
City Olympics about Canadian history and geography. Reporters were given a 10-item multiple-
choice questionnaire to complete. Each question had four possible answers. Unfortunately, one
reporter wanted to return to the bar for the rest of the morning, so he simply guessed on all of the
questions and then handed in the questionnaire. What is the probability that he will score at least
6 out of 10 in this questionnaire? State clearly the assumptions you need to make to solve this
question. Also, write a note on the relevant probability distribution you used to solve this
example.

Solution:

Above problem can be handled using binomial probability distribution. The basic assumptions
for binomial distribution are given by

1: Each trial results in one of two possible, mutually exclusive, outcomes. One of the possible
outcomes is denoted (arbitrarily) as a success and the other is denoted a failure

2: The probability of a success denoted by p ,remains constant from trial to trial.

3: The trials are independent; that is the outcome of any particular trial is not affected by the
outcome of any other trial.

Mathematically, a random variable X is said to have Binomial Distribution with


parameters n and p if its p.m.f is given by

P( X= x )=b (x;n,p ) = ¿ ¿¿ ¿
( n ¿) ¿¿ ¿
¿ ¿
¿
where n ranges over the set of positive integers and probability parameter p satisfies 0  p
 1, q = 1  p.
In the above question, we have number of trials n= 10 (Reporters were given a 10-item multiple-
choice questionnaire to complete)

The probability of success=1/4=0.25, (Because each question has four possible answers)

Using binomial distribution, probability that he will score at least 6 out of 10 in this
questionnaire is given by

P(X≥6)=P(X=6)+P(X=7)+ P(X=8)++P(X=9)+P(X=10)
( 1
=¿ ¿¿0¿ ) ¿ ¿
¿
¿ = 0.016222000+0.003089905+0.000386238+0.000028610+0.00000095

P( X≥6)= 0 .019727707
3. In order to compare the batting performances in Test Cricket of Indian legends, in a meeting
of administrators it was proposed that instead of using arithmetic mean of runs scored in all the
innings played, Median score be used as an average to represent the performance. But one
member, who happened to be a statistician, objected to the proposal saying that by doing so we
are favouring the players like Sachin Tendulkar.

a) Give your opinion with appropriate justification.

Solution:

The arithmetic mean or average of n observations x̄ (pronounced x bar) is simply the sum of
the observations divided by their number.

Sum of all observations


x̄=
Total number of observations

The major advantage of the mean is that it uses all the data values, and is, in a statistical sense,
efficient. The main disadvantage of the mean is that it is vulnerable to what are known as
outliers. Outliers are single observations which, if excluded from the calculations, have
noticeable influence on the results. For example if we had entered ‘21’ instead of ‘2.1’ in the
calculation of the mean, we would find the mean changed from 1.50 kg to 7.98 kg. It does not
necessarily follow, however, that outliers should be excluded from the final data summary, or
that they result from an erroneous measurement. Thus we can say that in some practical
situations mean can mislead in decision making. In a situation like cricket, if a player hits a
double century for two matches and after that shows a very bad performance in the following
series of matches, mean will definitely defend his/her performance because of two outlier
performances.
The median is estimated by first ordering the data from smallest to largest, and then counting
upwards for half the observations. The estimate of the median is either the observation at the
centre of the ordering in the case of an odd number of observations, or the simple average of the
middle two observations if the total number of observations is even. Median is not affected by
outliers at all and is measure of position rather than magnitude. So definitely, a cricket player
who hits outlier double centuries would not get favor from median. Most important relationship
between arithmetic mean and median is that in case of symmetrical distribution, they both are
same. In case of right skewed distribution, median<mean and in case of negatively skewed
distribution median>mean. Of course a player like Sachin Tendulkar will get favor from median
because of negatively skewed distribution of runs scored (369 Test matches, 100-6, 50-39, duck-
30).
b) If it is further decided to compare the performance based on consistency in performance,
which statistical measures would you recommend? Substantiate your answer with appropriate
justification.

Solution:

A quantitative measurement contains more information than a categorical one, and so


summarizing these data is more complex. One chooses summary statistics to condense a large
amount of information into a few intelligible numbers, the sort that could be communicated
verbally. The two most important pieces of information about a quantitative measurement are
‘where is it?’ and ‘how variable is it?’ These are categorised as measures of location (or
sometimes ‘central tendency’) and measures of spread or variability.

In order to compare the performance based on consistency, it is better to take into consideration
both deviation in runs scored and individual mean scores, which can be done using statistical
measure called coefficient of variation (C.V).


x 100
C.V=
σ x

X̄ is the arithmetic mean of the data points and


σx is the standard deviation of the data
points under consideration.

Most consistent player will be the one whose C.V is least.

Вам также может понравиться