Вы находитесь на странице: 1из 16

# 1

(Final)

Chapter Name: Correlation
1) Define Correlation:
The statistical tool with the help of which relation between two or more than two variables are studied is
called Correlation or relation between two variables are called correlation.
For example, there exists some relationship between family income and expenditure on luxury
items, increase in rainfall up to appoint and production of rice.
2) Types of Correlation:
Generally, Correlation can be of four types which are as follows:
i) Positive, negative and zero correlation
ii) Simple, partial and multiple correlation
iii) Linear and non linear.
3) Positive, negative and zero correlation:
It both the variables are varying in tin- .same direction, i.e, if one variable is increase (decrease)-the
other on an average is also increase (decrease) correlation is said to be positive.
If the variables are varying in opposite direction, i.e, if one variable is increase (decrease)- the other on
an average is decrease ( increase) correlation is said to be negative.
If the change of one variable does not affect the other one, i.e, if one variable is increase or decrease the
other one remain unchanged then it is called zero Correlation.
4) Importance or use of correlation
The major importance of correlation are given bellow
i) It provides relationship between two sets of data.
ii) The theory of correlation is very important for business and economic point of views.
iii) Its study is helpful for the purpose of prediction and forecasting.
iv) We can measure social instability and investigate it's causes by calculating the correlation between
the ages of wife-husband, unemployment and crime tendency, unemployment and drug addiction etc.
v) We can comment on economic condition of a country by calculating correlation between demand and
price, production and price etc.
5) Methods of studying correlation
There are several methods of assessing where two variables are correlated or not. Some important methods are as
follows:
i) Scatter diagram.
ii) Karl person's co-efficient of correlation.
iii) Spearman's Rank correlation. And
iv) Methods of least square.
6) Karl Pearson's co-efficient of correlation:
As a measure of intensity of degree of linear relationship between two variables, Karl Pearson developed a
formula called co-efficient of correlation and is denoted
r
xy
=
( )
( )

7) Properties of co-efficient of correlation:
The following are the important properties of co-efficient of correlation, r :
i) I t lies between -1 to +1 symbolically
ii) It is independent of change of origin and scale.
iii) It is the geometric mean of two regression co-efficient, symbolically r =

2
iv) If x and y are independent variables, then co-efficient of correlation is zero.
Comment on the followings:
1) r = +1, perfect positive correlation
2) r = 1, perfect negative correlation
3) r = 0.8, higher degree of positive correlation
4) r = 0.8, higher degree of negative correlation
5) 0.2 = r 0.7 moderate degree of positive correlation
6) 0.7 = r 0.3 moderate degree of negative correlation
7) 0 r 0.3 lower degree of positive correlation
8) 0.3 = r 0 lower degree of negative correlation
9) r = 0 zero correlation

Question No: 1
The following table provides the indices of cosmetic productions of several companies where there are
unemployed labors (in hundreds).
Year 1990 1991 1992 1993 1994
Index production 100 102 104 108 112
No. of unemployed labors 15 12 11 08 14

Requirements:
I) Calculate the correlation co-efficient
II) Find the standard error of r
a) Find the probable error of r
Solution:
I) Let,
x = Index Production
y = No of unemployed labors
We know that,
r =

} {

}

Table for calculation
x y xy x
2
y
2

100 15 1500 10000 225
102 12 1224 10404 144
104 11 1144 10816 121
108 08 864 11664 64
112 14 1568 12544 196
x = 526 y = 60 xy = 6300 x
2
= 55428 y
2
= 750

3
Correlation co-efficient
r =

} {

}

=

} {

}

=

} {

}

=

{ } { }

=

= 0.2274
This is negative and lower. Therefore, if index production increases then number of
unemployed labors decreases or vice-versa.
(II) Standard error of r
S E (r) =

= 0.424 (Ans)
(III) Probable error of r
P E (r) = 0.6745 S E (r)
= 0.6745 0.424
= 0.285988
= 0.285 (Ans)

4

Question No: 2
Calculate the correlation coefficient between age & playing habits of the following students.
Age 16 17 18 19 20 21
No of Students 250 200 120 150 80 100
Regular player 200 150 48 90 32 50

Requirements:
I) Calculate the correlation co-efficient r
II) Find the standard error of r
a) Find the probable error of r
Solution:
Let,
x = Age
y = Percentage of regular player (Playing habits)
We know that,
r =

} {

}

Table for calculation
x No of
Student
Regular player Playing
habits (y)
xy x
2
y
2

16 250 200 80 1280 256 6400
17 200 150 75 1275 289 5625
18 120 48 40 864 324 1600
19 150 90 60 1140 361 3600
20 80 32 40 800 400 1600
21 100 50 50 1050 441 2500
x = 111

y = 345 xy = 6265 x
2
= 2071 y
2
= 21325

Correlation co-efficient
r =

} {

}

=

} {

}

=

} {

}

=

{ } { }

5
=

= 0.728
This is negative and strong relation between the variables. Therefore, if age is increase then
playing habits decrease or vice versa.
(II) Standard error of r
S E (r) =

= 0.191 (Ans)
(III) Probable error of r
P E (r) = 0.6745 S E (r)
= 0.6745 0.191
= 0.1288295
= 0.128 (Ans)

Chapter Name: Test of Hypothesis
1) Hypothesis:
A hypothesis is an assumption to be tested.
Very often in practice we are called upon to make decisions about population on the basis of sample
information. Such decisions are called statistical hypothesis. For example, we may wish to decide on the
basis of sample data whether a new medicine is really effective in curing a disease, whether one training
procedure is better than another.
2) Importance of hypothesis:
Hypothesis tests are widely used in business and industries for making decisions. For example, in order
to increase consumer awareness of a product or services it might be necessary to compare the
effectiveness of different types of advertising campaigns, or in order to offer more profitable investments
to its customers, an investment firm might wish to compare the profitability of different types of
investment.

6
3) Statistical hypothesis:
In attempting to reach decisions, it is useful to make assumptions or guesses about the population
involved. Such assumptions, which may or may not be true, are called statistical hypothesis.
4) Null hypothesis:
The statistical hypothesis which is picked up for testing is known as null hypothesis. A null hypothesis
states that there is no difference between a sample estimate and the true population value. A null
generally denoted by H
0

5) Alternative hypothesis:
A statistical hypothesis that disagree with the null hypothesis or that is simply opposite of null hypothesis
is said to be alternative hypothesis. It is denoted by H
1

6) Type-i and type-ii error:
We may reject null hypothesis H
0
, when it is true. This type of error is often called type-i error.
We may accept the null hypothesis H
0
when an alternative hypothesis H
1
is true. This type of error is called
type- ii error.
7) Size of type-i (level of significance) and type ii error:
The probability of committing a type-i error is called size of type-i error or level of significance and it is
denoted by . Therefore
= P (type-i error) = P (rejecting H
0
/H
0
is true)
The probability of committing type-ii error is called size of type-ii error and it is denoted by or (I - ).
Therefore = P (type-ii error) = P (accepting H
0
/H
0
is false).
8) Acceptance and rejection (or critical) region:
An acceptance region is a set of values of the test statistic that leads the null hypothesis to be -,T7tE;..
A rejection (critical) region is a set of possible values of the test statistic that leads the null hypothesis to
be rejected.
Survey of important Tests of significance:
The important tests of significance in statistics can be classified broadly as follows:
1) Normal test (z-test)
2) t- test
3) 2 -(the chi square) test
4) F-test
Normal test (Z- test):
Normal test is often regarded as a large sample test. It is widely used in testing hypothesis regarding means,
proportions and correlation coefficients. Normal tests are two tailed, although on tailed test may be appropriate in
some situation.
Assumption (condition) for normal test:
1. The random sampling distribution of a .statistic is approximately normal.
2. The values provided by the sample data are sufficient close to the population value.
3. Sample size must be large (n > 30).
Test statistic for z- test:

1. z = [ for single mean test ]
2. z = [compare two proportion ]
variance =
x = sample mean
= population mean
= standard deviation
2
= variance

7
3. z = [compare two proportion test]
where, = ; =
and =
Problem associated with normal (z- test)
Problem 1: The mean lifetime of a sample of 100 light tubes produced by a company is found to be 1,580 hours with
standard deviation of 90 hours. Test the hypothesis that the mean life time of the tubes produced by the company is
1,600 hours.
Solution: the null hypothesis is that there is o significant difference between the sample mean and the
hypothetical population mean i.e. Ho:
We know, for single mean test, z =
= ; Here sample mean, x = 1580
= 2.22
The critical value of z = 1.96 for a two- tailed test at 5% level of significance, since the computed value of
z = - 2.22 falls in the rejection region, we reject the null hypothesis. Hence the mean lifetime of the tubes produced
by the company may not be 1.600 hours.
Problem 3: In a random sample of 1 00 people taken from village A, 60 are found to be consuming tea, In another
sample of 200 persons taken from village B, 100 persons are found to be consuming tea. Do the data reveal
significant difference between the two villages so far as the habit of taking tea is concerned?
Solution: Let us take the hypothesis that there is no significant different between the two villages so far as
the habit of taking tea is concerned i.e.
The appropriate statistics to be used here is given by
z =

Here, = = = 0.6 ; n
1
= 100
= = = 0.5; n
2
= 200
and = = = = 0.53
so, z =
=
=
= 1.64

8
Since the computed value of z is less than the critical value of z = 1.96 at 5% level of significance, therefore, we
can accept the hypothesis. Hence we conclude that there is no significant difference in the habit of taking tea in the
two villages A and B
T- test: in case of small sample and '6' is unknown, t- test is applied instead of/.- test. It was designed by W.S
Gossett whose pen name was student. Hence this statistic is generally known as student's t. it is also called
sample test.
The t- test is used for testing hypothesis, about the population mean, difference between two mean and correlation
coefficient. Like the normal tests, the t- tests are two tailed tests in most applications.
Test statistic for T- test:
1. t = ; for testing hypothesis about the difference between two means with dependent samples
Here, S = =
2. t = ; test of hypothesis about the population mean.
3. t = ; between two mean with equal variance.
Here, S = =

S
1
2
= ; S
2
2
=
Probl em associ ated wi t h t - test:
Problem 1: Ten persons were appointed in an office cadre in an office. There performance was noted by giving a test and the
marks were recorded out of 100. They were given 4 months training and a test was held and marks were recorded out of 100.

Employee A B C D E F G H I J
Before training 80 76 92 60 70 56 74 56 70 56
After training 84 70 96 80 70 52 84 72 72 50

By applying the t- test, can it be concluded that the employees have benefited by the training?

Solution: Let us take the null hypothesis that the employees have not benefited by the training.

employee Before (d1) After (d2) d = d1 d2 d
2

A 80 84 -4 16
B 76 70 6 36
C 92 96 -4 16
D 60 80 -20 400
E 70 70 0 0
F 56 52 4 16
G 74 84 -10 100
H 56 72 -16 256
I 70 72 -2 4
J 56 50 6 36
N = 10 d = - 40 d
2
= 880

9

We know, t =
Here, d = = = 4
S =
=
=
=
= 10.75

So, t = = -1.41
V = 10 1 = 9, for, V = 9, t
05
= 2.62
The calculated value is less than the tabulated value. The null hypothesis holds true. Hence it can be concluded that
the employees have not benefited by the training.
Problem for practice
34. In a random sample of 1000 persons from Dhaka city, 400 are found to be consumers of wheat. In a
sample of 800 from Khulna city, 400 are found to be consumers of wheat. Do the data reveal a
significant difference between Dhaka city and Khulna City, so far as the proportion for wheat consumers
concerned ? Also develop a 95% and 99% confidence interval.
Solution:
Let us take the null hypothesis that there is no significant difference between two city
We know
z =
=
=
= 4.2376

Here,
n1 = 1000
x1 = 400
n2 = 800
x2 = 400
= = = 0.4
= = = 0.5
=
= = = 0.45
At 5% level of significance table value z = 1.96 And calculated value 4.2376. So we reject the null
hypothesis. Hence there is significant difference between Dhaka and Khulna city as for consuming wheat.

35. In a simple random sample of 600 men taken from Barisal city 400 are found to be smokers. In another
simple random sample of 900 men taken from Farid Pur city 450 are smokers. Do the data indicate that there
is a significant that there is a significant difference in the habit of smoking in the two cities ?

10
Solution:
Let us take the null hypothesis that there is no significant difference between two city
We know
z =
=
=
= = 6.5

Here,
n1 = 600
x1 = 400
n2 = 900
x2 = 450
= = = 0.67
= = = 0.5
=
= = = 0.57
At 5% level of significance table value z = 1.96 And calculated value 6.5. So we reject the null
hypothesis. Hence there is significant difference between two city so far as the habit of smoking in the two
cities.

37. A random sample of 400 housewives was selected to know their individual wives as to whether they
prefer brand A detergent or brand B. Brand A was favored by 180 housewives, while brand B was favored
by the rest. Do these data provide significant evidence to indicate a difference for the two brands detergents?

40. In a random sample of 880 males at old Dhaka, 440 were found to be smokers which in another random
sample of 1000 males at new Dhaka 480 were found to be smokers. Discuss the question whether the
data reveal a significance difference at old Dhaka and new Dhaka so far as the proportion of smokers is
concerned.
Solution:
Let us take the null hypothesis that there is no significant difference between two city
We know
z =
=
=
=
= 0.866

Here,
n1 = 880
x1 = 440
n2 = 1000
x2 = 480
= = = 0.5
= = = 0.48
=
= = = 0.49
At 5% level of significance table value z = 1.96 And calculated value 0.866 So we accept the null
hypothesis. There is no significant difference between old Dhaka and new Dhaka. So far as the proportion
of smokers is concerned.

11
42. In a sociological investigation two groups of children drawn at random from socio- economic classes
called A and B were classified as bright and non- bright. Out of 100 children in the first group, 30 are found to
be bright and out of 300 children in the second group 45 are bright. Do the children belonging to the two
classes differ significantly in respect of their intelligence?
Solution:
Let us take the null hypothesis that there is no significant difference between two groups of children A & B
We know
z =
=
=
=
= 3.3

Here,
n1 = 100
x1 = 30
n2 = 300
x2 = 45
= = = 0.3
= = = 0.15
=
= = = 0.19
At 5% level of significance table value z = 1.96 And calculated value 3.3 So we accept the null
hypothesis.

Chapter Name: Regression Analysis
Definition: (what do you know _vKj D`vniY mn m~Y)
The statistical tool with the help of which we are in a position to estimate the unknown values of one
variable from known values of another variable is called regression and such types of analysis is called
regression analysis.
For example, if we know that advertising and sales are correlated, we may find out expected amount
of sales for a given advertising expenditure of the required amount of expenditure for achieving a
fixed sales target.
(hw` Why we use regression analysis _vK Zvnj aygv wbPi Ask)
We use regression analysis to estimate the unknown values of one variable from known values of another
variable.
Regression coefficients: The quantity of b in the regression equation is called the regression coefficient
or shape of coefficient. Since there are two regression equations, therefore, there are two regression
coefficients.
Regression coefficient of y on x = byx
Regression coefficient of x on y = bxy

12
byx means the amount of change in y corresponding to a unit change in x.
byx =

[ where x is independent ]
we have, y y
y = a + bx --------- (i)

--------- (ii)
(i) (ii)
y = bx

Regression equation
1) Regression equation of x on y are
x = a + by
x

2) equation of y on x are y = a + byx
y = a + byx
y

#Difference between correlation analysis and regression analysis:
Correlation Regression
1) Correlation means the relationship between two or
more variables.
1) Regression is a mathematical measure expressing
the average relationship between the two variables.
2) Correlation need not imply cause and effect
relationship between the variables under study.
2) Regression analysis clearly indicates the causes and
effect relationship between the variables.
3) Correlation analysis is confined only to the study of
linear relationship between the variables and therefore
has limited applications.
3) It has much wider applications as it studies linear as
well as non linear relationship between the variables.
4) There may be non-sense of correlation between two
variables.
4) There is no such thing lice non-sense regression.
5) r
xy
= r
yx
5) b
xy
= b
yx

6) 1 r 1 6) bxy or, bx

13
In the following table, recorded data shows the test scores made by salesmen or an intelligence test and their
weekly sales:
Salesman 1 2 3 4 5 6 7 8 9 10
Test score 40 70 50 60 80 50 90 40 60 60
Sales
(.000 Tk)
2.5 6.0 4.0 5.0 4.0 2.5 5.5 3.0 4.5 3.0
Calculate the regression equation of sales on test scores and estimate the probable weekly sales volume if a
salesman makes a score of 100.
Solution: Let, sales be denoted by y and test scores by x. we have to fit a regression equation of y on x, i.e.

Salesman Test
core (x)
(x x) x
2
Sales
y
(y y (x x) (y y
1 40 -20 400 2.5 -1.5 30
2 70 10 100 6.0 2.0 20
3 50 -10 100 4.0 0 0
4 60 0 0 5.0 1.0 0
5 80 20 400 4.0 0 0
6 50 -10 100 2.5 -1.5 15
7 90 30 900 5.5 1.5 45
8 40 -20 400 3.0 -1.0 20
9 60 0 0 4.5 0.5 0
10 60 0 0 3.0 -1.0 0
N = 10 x = 600 x
2
= 2400 (x x) (y y 130
x =

= 60
y =

= 4
b =

= 0.054

y 4 = 0.054 (x 60)
y 4 = 0.054 x 3.24
y = 4 3.24 + 0.054 x
y = 0.76 + 0.054 x
where x = 100 then y would be
y = 0.76 + 0.054 100 = 6.16

14
Chapter Name: Sample Survey
Population: The total set of observation of a numerical characteristic under study is called population.
Example: Heights of BBA students of University of Development Alternative.
Q. What is Sample and Sampling?
Sample and Sampling: A part of small observation selection from the population is called sample and the
process of such selection is called sampling.
Census: The term used for complete enumeration of a population or groups at a point in time with respect to well
defined characteristics. .
Sample Survey: An investigation in which elaborate information is collected on a sample basis is known
as sample survey.
1. By sample survey we can obtain maximum accuracy or reliability with a fixed budget.
2. We can reject the units under investigation which show considerable variation for the characteristic under study.
3. When a total count of the population is not possible or is very costly or destructive, the survey
method is then appropriate.
4. The sample survey is preferable when the scope of the investigation is very wide and the population is not
completely known.
5. When time money and other resource are limited, the sample survey is appropriate.
1. In spite of the fact that a proper choice of design is employed a sample docs not fully cover the parent
population and consequently results are not exact.
2. Sampling theory and its applications in the field need the services trained & qualified personal
without whom results of sample survey arc not dependable.
3. The planning and execution of sample survey should be done very carefully or the data may provide
Methods of sampling: When a sample is required to be reflected from a population, it is necessary to decide
which method should be applied. The various sampling methods under two separate headings are given below.
A. Random sampling methods:
1) Simple Random Sampling (SRS)
2) Stratified Random Sampling
3) Systematic sampling
4) Cluster sampling
B. Non Random sampling methods:
1. Judgment sampling
2. Quota sampling
3. Convenience sampling.
Simple Random sampling: If a sample is drawn in such a technique that each and every unit of population
has an equal and independent chance of being included in the sample is called simple random sampling.
Selection of a simple Random Sample:
Random sample can be obtained by any of the following method
1) By lottery system
2) Mechanical Randomization" or "Random Number" method

Lottery system
The simplest method of selecting a random sample is the lottery system, which is illustrated below by means of an
example;
Suppose we want to select 'r' candidates out of n. We assign the number I t o n; o n e n u mb e r t o e a c h
c a n d i d a t e a n d wr i t e t h e s e n u mb e r s ( I t o n ) on n slips which arc made as homogenous as

15
possible in shape, size, and color etc. These slips arc then put in a bag and thoroughly shuffled and then 'r' slips
arc drawn one by one. The 'r' candidates corresponding to numbers on the slips drawn, will constitute a random
sample.
Mechanical Randomization or Random Number method:
The lottery method described above is quite time consuming and cumbersome to use if the population is
sufficiently large. The most practical and inexpensive method of selecting a random sample consists in the use
of' Random Number Tables', which have
been so constructed that each of the digits 0, 1, 2, ............. ,9 appear with approximately
the same frequently and independently each other.
Example with Random Number table:
07018 31172 12572 23968 55216 85366 50223 09300
94564
51669
66662
91676
07337
18172
47429
27036
75127
55901
52444 65625 97918 46794 62370 69344 20449 17596
72161 57299 8752! 44351 9v9Xi 65008 98371 60620
17918 75071 91057 46829 47992 26797 64423 42379
13623 76165 43195 50205 75736 77473 67268 31330

If we have to select a sample from a population on size N (< 99) then the numbers can be combined two be
two to give pairs error 00 to 99. Similarly if N<999or N<9999 and so on, then combing the digits three (or
four by and so on), we get numbers from 000 to 999
or (0000 to 9999) and so on. Since each of the digit 0,1,2, .. 9 o curs with approximately
the same frequency and independently of such other, so does each of the pars 00 to 999 or q drupelets 0000
to 9999 and so on.
The method of drawing the random sample consists in the following steps:
1. Identify the N units in the population with numbers from 1 to N.
2. Select at random, any page of random numbers table and pick up the numbers in any row or column or
diagonal a random.
3. The population units corresponding to the numbers selected in step (2) constitute the random
sample.
2. Stratified Random Sampling. Stratified random sampling is one of the restricted random methods
which by using available information concerning the data attempts to design a more efficient sample than that
obtained by the simple random procedure. The process of stratification requires that the population may be
divided into homogeneous groups or classes called strata. Then a sample may be taken from each group by
simple random method, and the resulting sample is called a stratified sample.
1) More Representative
2) Greater Accuracy.
Difference between cluster and stratified Random Sampling:
Cluster sampling Stratified sampling
I .In cluster sampling elements within each cluster
arc heterogeneous and between clusters
homogenous
1. In stratified sampling elements within each
stratum arc homogenous and between strata are
heterogeneous.
2. Cluster is generally made up on the basis of
compute enumeration.
2. Stratum is generally made up on the basis of
characteristics of the population.
3. In cluster sampling the cluster are subjected to
compute enumeration.
3. In stratified random sampling the strata are
subjected to sampling.
4. It gives less efficient result than stratified
sampling.
4. It gives more efficient result than cluster sampling.
5. It generally costs less than stratified sampling 5. It generally costs more than in cluster
sampling.

16
Error: The term 'error

refers to the difference between the value of 'Statistic' and that of corresponding
'parameter'. Various forces combine to produce deviations of statistic from parameters, and errors, in accordance
with the different causes, are classified into sampling and non sampling error.
Sampling Error: The error which arises due to only a sample being used to estimate the population parameters is
termed as sampling error. It arises due to the following reason.
1) Faulty selection of the sample
2) Substitution/ fully work during the collection of information
3) Constant error due to improper choice of statistic for estimating population parameter/ Faulty method of analysis.
Non Sampling Error: In many situations the results obtained from a sample or a complete census suffers
from certain error not ascribable to sampling fluctuations. These errors are collectively known as non-sampling
error. It may arise at all stages of a survey planning in both complete enumeration survey and sample survey.

Abdus Sattar
M. Pharm, Batch 12, UODA