00 голосов за00 голосов против

38 просмотров16 стр.statistics

Sep 08, 2014

© © All Rights Reserved

DOCX, PDF, TXT или читайте онлайн в Scribd

statistics

© All Rights Reserved

38 просмотров

00 голосов за00 голосов против

statistics

© All Rights Reserved

Вы находитесь на странице: 1из 16

Business Statistics

(Final)

Chapter Name: Correlation

1) Define Correlation:

The statistical tool with the help of which relation between two or more than two variables are studied is

called Correlation or relation between two variables are called correlation.

For example, there exists some relationship between family income and expenditure on luxury

items, increase in rainfall up to appoint and production of rice.

2) Types of Correlation:

Generally, Correlation can be of four types which are as follows:

i) Positive, negative and zero correlation

ii) Simple, partial and multiple correlation

iii) Linear and non linear.

3) Positive, negative and zero correlation:

It both the variables are varying in tin- .same direction, i.e, if one variable is increase (decrease)-the

other on an average is also increase (decrease) correlation is said to be positive.

If the variables are varying in opposite direction, i.e, if one variable is increase (decrease)- the other on

an average is decrease ( increase) correlation is said to be negative.

If the change of one variable does not affect the other one, i.e, if one variable is increase or decrease the

other one remain unchanged then it is called zero Correlation.

4) Importance or use of correlation

The major importance of correlation are given bellow

i) It provides relationship between two sets of data.

ii) The theory of correlation is very important for business and economic point of views.

iii) Its study is helpful for the purpose of prediction and forecasting.

iv) We can measure social instability and investigate it's causes by calculating the correlation between

the ages of wife-husband, unemployment and crime tendency, unemployment and drug addiction etc.

v) We can comment on economic condition of a country by calculating correlation between demand and

price, production and price etc.

5) Methods of studying correlation

There are several methods of assessing where two variables are correlated or not. Some important methods are as

follows:

i) Scatter diagram.

ii) Karl person's co-efficient of correlation.

iii) Spearman's Rank correlation. And

iv) Methods of least square.

6) Karl Pearson's co-efficient of correlation:

As a measure of intensity of degree of linear relationship between two variables, Karl Pearson developed a

formula called co-efficient of correlation and is denoted

r

xy

=

( )

( )

7) Properties of co-efficient of correlation:

The following are the important properties of co-efficient of correlation, r :

i) I t lies between -1 to +1 symbolically

ii) It is independent of change of origin and scale.

iii) It is the geometric mean of two regression co-efficient, symbolically r =

2

iv) If x and y are independent variables, then co-efficient of correlation is zero.

Comment on the followings:

1) r = +1, perfect positive correlation

2) r = 1, perfect negative correlation

3) r = 0.8, higher degree of positive correlation

4) r = 0.8, higher degree of negative correlation

5) 0.2 = r 0.7 moderate degree of positive correlation

6) 0.7 = r 0.3 moderate degree of negative correlation

7) 0 r 0.3 lower degree of positive correlation

8) 0.3 = r 0 lower degree of negative correlation

9) r = 0 zero correlation

Question No: 1

The following table provides the indices of cosmetic productions of several companies where there are

unemployed labors (in hundreds).

Year 1990 1991 1992 1993 1994

Index production 100 102 104 108 112

No. of unemployed labors 15 12 11 08 14

Requirements:

I) Calculate the correlation co-efficient

II) Find the standard error of r

a) Find the probable error of r

Solution:

I) Let,

x = Index Production

y = No of unemployed labors

We know that,

r =

} {

}

Table for calculation

x y xy x

2

y

2

100 15 1500 10000 225

102 12 1224 10404 144

104 11 1144 10816 121

108 08 864 11664 64

112 14 1568 12544 196

x = 526 y = 60 xy = 6300 x

2

= 55428 y

2

= 750

3

Correlation co-efficient

r =

} {

}

=

} {

}

=

} {

}

=

{ } { }

=

= 0.2274

This is negative and lower. Therefore, if index production increases then number of

unemployed labors decreases or vice-versa.

(II) Standard error of r

S E (r) =

= 0.424 (Ans)

(III) Probable error of r

P E (r) = 0.6745 S E (r)

= 0.6745 0.424

= 0.285988

= 0.285 (Ans)

4

Question No: 2

Calculate the correlation coefficient between age & playing habits of the following students.

Age 16 17 18 19 20 21

No of Students 250 200 120 150 80 100

Regular player 200 150 48 90 32 50

Requirements:

I) Calculate the correlation co-efficient r

II) Find the standard error of r

a) Find the probable error of r

Solution:

Let,

x = Age

y = Percentage of regular player (Playing habits)

We know that,

r =

} {

}

Table for calculation

x No of

Student

Regular player Playing

habits (y)

xy x

2

y

2

16 250 200 80 1280 256 6400

17 200 150 75 1275 289 5625

18 120 48 40 864 324 1600

19 150 90 60 1140 361 3600

20 80 32 40 800 400 1600

21 100 50 50 1050 441 2500

x = 111

y = 345 xy = 6265 x

2

= 2071 y

2

= 21325

Correlation co-efficient

r =

} {

}

=

} {

}

=

} {

}

=

{ } { }

5

=

= 0.728

This is negative and strong relation between the variables. Therefore, if age is increase then

playing habits decrease or vice versa.

(II) Standard error of r

S E (r) =

= 0.191 (Ans)

(III) Probable error of r

P E (r) = 0.6745 S E (r)

= 0.6745 0.191

= 0.1288295

= 0.128 (Ans)

Chapter Name: Test of Hypothesis

1) Hypothesis:

A hypothesis is an assumption to be tested.

Very often in practice we are called upon to make decisions about population on the basis of sample

information. Such decisions are called statistical hypothesis. For example, we may wish to decide on the

basis of sample data whether a new medicine is really effective in curing a disease, whether one training

procedure is better than another.

2) Importance of hypothesis:

Hypothesis tests are widely used in business and industries for making decisions. For example, in order

to increase consumer awareness of a product or services it might be necessary to compare the

effectiveness of different types of advertising campaigns, or in order to offer more profitable investments

to its customers, an investment firm might wish to compare the profitability of different types of

investment.

6

3) Statistical hypothesis:

In attempting to reach decisions, it is useful to make assumptions or guesses about the population

involved. Such assumptions, which may or may not be true, are called statistical hypothesis.

4) Null hypothesis:

The statistical hypothesis which is picked up for testing is known as null hypothesis. A null hypothesis

states that there is no difference between a sample estimate and the true population value. A null

generally denoted by H

0

5) Alternative hypothesis:

A statistical hypothesis that disagree with the null hypothesis or that is simply opposite of null hypothesis

is said to be alternative hypothesis. It is denoted by H

1

6) Type-i and type-ii error:

We may reject null hypothesis H

0

, when it is true. This type of error is often called type-i error.

We may accept the null hypothesis H

0

when an alternative hypothesis H

1

is true. This type of error is called

type- ii error.

7) Size of type-i (level of significance) and type ii error:

The probability of committing a type-i error is called size of type-i error or level of significance and it is

denoted by . Therefore

= P (type-i error) = P (rejecting H

0

/H

0

is true)

The probability of committing type-ii error is called size of type-ii error and it is denoted by or (I - ).

Therefore = P (type-ii error) = P (accepting H

0

/H

0

is false).

8) Acceptance and rejection (or critical) region:

An acceptance region is a set of values of the test statistic that leads the null hypothesis to be -,T7tE;..

A rejection (critical) region is a set of possible values of the test statistic that leads the null hypothesis to

be rejected.

Survey of important Tests of significance:

The important tests of significance in statistics can be classified broadly as follows:

1) Normal test (z-test)

2) t- test

3) 2 -(the chi square) test

4) F-test

Normal test (Z- test):

Normal test is often regarded as a large sample test. It is widely used in testing hypothesis regarding means,

proportions and correlation coefficients. Normal tests are two tailed, although on tailed test may be appropriate in

some situation.

Assumption (condition) for normal test:

1. The random sampling distribution of a .statistic is approximately normal.

2. The values provided by the sample data are sufficient close to the population value.

3. Sample size must be large (n > 30).

Test statistic for z- test:

1. z = [ for single mean test ]

2. z = [compare two proportion ]

variance =

x = sample mean

= population mean

= standard deviation

2

= variance

7

3. z = [compare two proportion test]

where, = ; =

and =

Problem associated with normal (z- test)

Problem 1: The mean lifetime of a sample of 100 light tubes produced by a company is found to be 1,580 hours with

standard deviation of 90 hours. Test the hypothesis that the mean life time of the tubes produced by the company is

1,600 hours.

Solution: the null hypothesis is that there is o significant difference between the sample mean and the

hypothetical population mean i.e. Ho:

We know, for single mean test, z =

= ; Here sample mean, x = 1580

= 2.22

The critical value of z = 1.96 for a two- tailed test at 5% level of significance, since the computed value of

z = - 2.22 falls in the rejection region, we reject the null hypothesis. Hence the mean lifetime of the tubes produced

by the company may not be 1.600 hours.

Problem 3: In a random sample of 1 00 people taken from village A, 60 are found to be consuming tea, In another

sample of 200 persons taken from village B, 100 persons are found to be consuming tea. Do the data reveal

significant difference between the two villages so far as the habit of taking tea is concerned?

Solution: Let us take the hypothesis that there is no significant different between the two villages so far as

the habit of taking tea is concerned i.e.

The appropriate statistics to be used here is given by

z =

Here, = = = 0.6 ; n

1

= 100

= = = 0.5; n

2

= 200

and = = = = 0.53

so, z =

=

=

= 1.64

8

Since the computed value of z is less than the critical value of z = 1.96 at 5% level of significance, therefore, we

can accept the hypothesis. Hence we conclude that there is no significant difference in the habit of taking tea in the

two villages A and B

T- test: in case of small sample and '6' is unknown, t- test is applied instead of/.- test. It was designed by W.S

Gossett whose pen name was student. Hence this statistic is generally known as student's t. it is also called

sample test.

The t- test is used for testing hypothesis, about the population mean, difference between two mean and correlation

coefficient. Like the normal tests, the t- tests are two tailed tests in most applications.

Test statistic for T- test:

1. t = ; for testing hypothesis about the difference between two means with dependent samples

Here, S = =

2. t = ; test of hypothesis about the population mean.

3. t = ; between two mean with equal variance.

Here, S = =

S

1

2

= ; S

2

2

=

Probl em associ ated wi t h t - test:

Problem 1: Ten persons were appointed in an office cadre in an office. There performance was noted by giving a test and the

marks were recorded out of 100. They were given 4 months training and a test was held and marks were recorded out of 100.

Employee A B C D E F G H I J

Before training 80 76 92 60 70 56 74 56 70 56

After training 84 70 96 80 70 52 84 72 72 50

By applying the t- test, can it be concluded that the employees have benefited by the training?

Solution: Let us take the null hypothesis that the employees have not benefited by the training.

employee Before (d1) After (d2) d = d1 d2 d

2

A 80 84 -4 16

B 76 70 6 36

C 92 96 -4 16

D 60 80 -20 400

E 70 70 0 0

F 56 52 4 16

G 74 84 -10 100

H 56 72 -16 256

I 70 72 -2 4

J 56 50 6 36

N = 10 d = - 40 d

2

= 880

9

We know, t =

Here, d = = = 4

S =

=

=

=

= 10.75

So, t = = -1.41

V = 10 1 = 9, for, V = 9, t

05

= 2.62

The calculated value is less than the tabulated value. The null hypothesis holds true. Hence it can be concluded that

the employees have not benefited by the training.

Problem for practice

34. In a random sample of 1000 persons from Dhaka city, 400 are found to be consumers of wheat. In a

sample of 800 from Khulna city, 400 are found to be consumers of wheat. Do the data reveal a

significant difference between Dhaka city and Khulna City, so far as the proportion for wheat consumers

concerned ? Also develop a 95% and 99% confidence interval.

Solution:

Let us take the null hypothesis that there is no significant difference between two city

We know

z =

=

=

= 4.2376

Here,

n1 = 1000

x1 = 400

n2 = 800

x2 = 400

= = = 0.4

= = = 0.5

=

= = = 0.45

At 5% level of significance table value z = 1.96 And calculated value 4.2376. So we reject the null

hypothesis. Hence there is significant difference between Dhaka and Khulna city as for consuming wheat.

35. In a simple random sample of 600 men taken from Barisal city 400 are found to be smokers. In another

simple random sample of 900 men taken from Farid Pur city 450 are smokers. Do the data indicate that there

is a significant that there is a significant difference in the habit of smoking in the two cities ?

10

Solution:

Let us take the null hypothesis that there is no significant difference between two city

We know

z =

=

=

= = 6.5

Here,

n1 = 600

x1 = 400

n2 = 900

x2 = 450

= = = 0.67

= = = 0.5

=

= = = 0.57

At 5% level of significance table value z = 1.96 And calculated value 6.5. So we reject the null

hypothesis. Hence there is significant difference between two city so far as the habit of smoking in the two

cities.

37. A random sample of 400 housewives was selected to know their individual wives as to whether they

prefer brand A detergent or brand B. Brand A was favored by 180 housewives, while brand B was favored

by the rest. Do these data provide significant evidence to indicate a difference for the two brands detergents?

40. In a random sample of 880 males at old Dhaka, 440 were found to be smokers which in another random

sample of 1000 males at new Dhaka 480 were found to be smokers. Discuss the question whether the

data reveal a significance difference at old Dhaka and new Dhaka so far as the proportion of smokers is

concerned.

Solution:

Let us take the null hypothesis that there is no significant difference between two city

We know

z =

=

=

=

= 0.866

Here,

n1 = 880

x1 = 440

n2 = 1000

x2 = 480

= = = 0.5

= = = 0.48

=

= = = 0.49

At 5% level of significance table value z = 1.96 And calculated value 0.866 So we accept the null

hypothesis. There is no significant difference between old Dhaka and new Dhaka. So far as the proportion

of smokers is concerned.

11

42. In a sociological investigation two groups of children drawn at random from socio- economic classes

called A and B were classified as bright and non- bright. Out of 100 children in the first group, 30 are found to

be bright and out of 300 children in the second group 45 are bright. Do the children belonging to the two

classes differ significantly in respect of their intelligence?

Solution:

Let us take the null hypothesis that there is no significant difference between two groups of children A & B

We know

z =

=

=

=

= 3.3

Here,

n1 = 100

x1 = 30

n2 = 300

x2 = 45

= = = 0.3

= = = 0.15

=

= = = 0.19

At 5% level of significance table value z = 1.96 And calculated value 3.3 So we accept the null

hypothesis.

Chapter Name: Regression Analysis

Definition: (what do you know _vKj D`vniY mn m~Y)

The statistical tool with the help of which we are in a position to estimate the unknown values of one

variable from known values of another variable is called regression and such types of analysis is called

regression analysis.

For example, if we know that advertising and sales are correlated, we may find out expected amount

of sales for a given advertising expenditure of the required amount of expenditure for achieving a

fixed sales target.

(hw` Why we use regression analysis _vK Zvnj aygv wbPi Ask)

We use regression analysis to estimate the unknown values of one variable from known values of another

variable.

Regression coefficients: The quantity of b in the regression equation is called the regression coefficient

or shape of coefficient. Since there are two regression equations, therefore, there are two regression

coefficients.

Regression coefficient of y on x = byx

Regression coefficient of x on y = bxy

12

byx means the amount of change in y corresponding to a unit change in x.

byx =

[ where x is independent ]

we have, y y

y = a + bx --------- (i)

--------- (ii)

(i) (ii)

y = bx

Regression equation

1) Regression equation of x on y are

x = a + by

x

2) equation of y on x are y = a + byx

y = a + byx

y

#Difference between correlation analysis and regression analysis:

Correlation Regression

1) Correlation means the relationship between two or

more variables.

1) Regression is a mathematical measure expressing

the average relationship between the two variables.

2) Correlation need not imply cause and effect

relationship between the variables under study.

2) Regression analysis clearly indicates the causes and

effect relationship between the variables.

3) Correlation analysis is confined only to the study of

linear relationship between the variables and therefore

has limited applications.

3) It has much wider applications as it studies linear as

well as non linear relationship between the variables.

4) There may be non-sense of correlation between two

variables.

4) There is no such thing lice non-sense regression.

5) r

xy

= r

yx

5) b

xy

= b

yx

6) 1 r 1 6) bxy or, bx

13

In the following table, recorded data shows the test scores made by salesmen or an intelligence test and their

weekly sales:

Salesman 1 2 3 4 5 6 7 8 9 10

Test score 40 70 50 60 80 50 90 40 60 60

Sales

(.000 Tk)

2.5 6.0 4.0 5.0 4.0 2.5 5.5 3.0 4.5 3.0

Calculate the regression equation of sales on test scores and estimate the probable weekly sales volume if a

salesman makes a score of 100.

Solution: Let, sales be denoted by y and test scores by x. we have to fit a regression equation of y on x, i.e.

Salesman Test

core (x)

(x x) x

2

Sales

y

(y y (x x) (y y

1 40 -20 400 2.5 -1.5 30

2 70 10 100 6.0 2.0 20

3 50 -10 100 4.0 0 0

4 60 0 0 5.0 1.0 0

5 80 20 400 4.0 0 0

6 50 -10 100 2.5 -1.5 15

7 90 30 900 5.5 1.5 45

8 40 -20 400 3.0 -1.0 20

9 60 0 0 4.5 0.5 0

10 60 0 0 3.0 -1.0 0

N = 10 x = 600 x

2

= 2400 (x x) (y y 130

x =

= 60

y =

= 4

b =

= 0.054

y 4 = 0.054 (x 60)

y 4 = 0.054 x 3.24

y = 4 3.24 + 0.054 x

y = 0.76 + 0.054 x

where x = 100 then y would be

y = 0.76 + 0.054 100 = 6.16

14

Chapter Name: Sample Survey

Population: The total set of observation of a numerical characteristic under study is called population.

Example: Heights of BBA students of University of Development Alternative.

Q. What is Sample and Sampling?

Sample and Sampling: A part of small observation selection from the population is called sample and the

process of such selection is called sampling.

Census: The term used for complete enumeration of a population or groups at a point in time with respect to well

defined characteristics. .

Sample Survey: An investigation in which elaborate information is collected on a sample basis is known

as sample survey.

Advantage and disadvantage of sample survey

Advantage:

1. By sample survey we can obtain maximum accuracy or reliability with a fixed budget.

2. We can reject the units under investigation which show considerable variation for the characteristic under study.

3. When a total count of the population is not possible or is very costly or destructive, the survey

method is then appropriate.

4. The sample survey is preferable when the scope of the investigation is very wide and the population is not

completely known.

5. When time money and other resource are limited, the sample survey is appropriate.

Disadvantages of sample survey:

1. In spite of the fact that a proper choice of design is employed a sample docs not fully cover the parent

population and consequently results are not exact.

2. Sampling theory and its applications in the field need the services trained & qualified personal

without whom results of sample survey arc not dependable.

3. The planning and execution of sample survey should be done very carefully or the data may provide

misleading.

Methods of sampling: When a sample is required to be reflected from a population, it is necessary to decide

which method should be applied. The various sampling methods under two separate headings are given below.

A. Random sampling methods:

1) Simple Random Sampling (SRS)

2) Stratified Random Sampling

3) Systematic sampling

4) Cluster sampling

B. Non Random sampling methods:

1. Judgment sampling

2. Quota sampling

3. Convenience sampling.

Simple Random sampling: If a sample is drawn in such a technique that each and every unit of population

has an equal and independent chance of being included in the sample is called simple random sampling.

Selection of a simple Random Sample:

Random sample can be obtained by any of the following method

1) By lottery system

2) Mechanical Randomization" or "Random Number" method

Lottery system

The simplest method of selecting a random sample is the lottery system, which is illustrated below by means of an

example;

Suppose we want to select 'r' candidates out of n. We assign the number I t o n; o n e n u mb e r t o e a c h

c a n d i d a t e a n d wr i t e t h e s e n u mb e r s ( I t o n ) on n slips which arc made as homogenous as

15

possible in shape, size, and color etc. These slips arc then put in a bag and thoroughly shuffled and then 'r' slips

arc drawn one by one. The 'r' candidates corresponding to numbers on the slips drawn, will constitute a random

sample.

Mechanical Randomization or Random Number method:

The lottery method described above is quite time consuming and cumbersome to use if the population is

sufficiently large. The most practical and inexpensive method of selecting a random sample consists in the use

of' Random Number Tables', which have

been so constructed that each of the digits 0, 1, 2, ............. ,9 appear with approximately

the same frequently and independently each other.

Example with Random Number table:

07018 31172 12572 23968 55216 85366 50223 09300

94564

51669

66662

91676

07337

18172

47429

27036

75127

55901

52444 65625 97918 46794 62370 69344 20449 17596

72161 57299 8752! 44351 9v9Xi 65008 98371 60620

17918 75071 91057 46829 47992 26797 64423 42379

13623 76165 43195 50205 75736 77473 67268 31330

If we have to select a sample from a population on size N (< 99) then the numbers can be combined two be

two to give pairs error 00 to 99. Similarly if N<999or N<9999 and so on, then combing the digits three (or

four by and so on), we get numbers from 000 to 999

or (0000 to 9999) and so on. Since each of the digit 0,1,2, .. 9 o curs with approximately

the same frequency and independently of such other, so does each of the pars 00 to 999 or q drupelets 0000

to 9999 and so on.

The method of drawing the random sample consists in the following steps:

1. Identify the N units in the population with numbers from 1 to N.

2. Select at random, any page of random numbers table and pick up the numbers in any row or column or

diagonal a random.

3. The population units corresponding to the numbers selected in step (2) constitute the random

sample.

2. Stratified Random Sampling. Stratified random sampling is one of the restricted random methods

which by using available information concerning the data attempts to design a more efficient sample than that

obtained by the simple random procedure. The process of stratification requires that the population may be

divided into homogeneous groups or classes called strata. Then a sample may be taken from each group by

simple random method, and the resulting sample is called a stratified sample.

Advantages of Stratified Random sampling:

1) More Representative

2) Greater Accuracy.

Difference between cluster and stratified Random Sampling:

Cluster sampling Stratified sampling

I .In cluster sampling elements within each cluster

arc heterogeneous and between clusters

homogenous

1. In stratified sampling elements within each

stratum arc homogenous and between strata are

heterogeneous.

2. Cluster is generally made up on the basis of

compute enumeration.

2. Stratum is generally made up on the basis of

characteristics of the population.

3. In cluster sampling the cluster are subjected to

compute enumeration.

3. In stratified random sampling the strata are

subjected to sampling.

4. It gives less efficient result than stratified

sampling.

4. It gives more efficient result than cluster sampling.

5. It generally costs less than stratified sampling 5. It generally costs more than in cluster

sampling.

16

Error: The term 'error

refers to the difference between the value of 'Statistic' and that of corresponding

'parameter'. Various forces combine to produce deviations of statistic from parameters, and errors, in accordance

with the different causes, are classified into sampling and non sampling error.

Sampling Error: The error which arises due to only a sample being used to estimate the population parameters is

termed as sampling error. It arises due to the following reason.

1) Faulty selection of the sample

2) Substitution/ fully work during the collection of information

3) Constant error due to improper choice of statistic for estimating population parameter/ Faulty method of analysis.

Non Sampling Error: In many situations the results obtained from a sample or a complete census suffers

from certain error not ascribable to sampling fluctuations. These errors are collectively known as non-sampling

error. It may arise at all stages of a survey planning in both complete enumeration survey and sample survey.

Abdus Sattar

M. Pharm, Batch 12, UODA

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.