Business Statistics

Internal Assignment Applicable for June 2016 Examination

Assignment Marks: 30

---------------------------------------------------------------------------------------------------1. Differentiate between correlation and regression. Explain with suitable example

using data. (10 Marks)

Ans:

Correlation and Regression

Correlation is a specialized procedures from a larger family of regression

procedures. Regression procedures examine the relationship between two or more

sets of paired variables. The basic linear regression formula

y=a+x

Where:

y is the set of dependent variable values

a is the intercept, the value of y when x is 0

is the change in y for each unit of x.

In words this formula says that the predicted value of the dependent variable y is

the intercept value a plus the coefficient times the value of x. It is possible to add

more s for additional independent variables.

Lets first compare the specifics of correlation with the most general linear

regression.

Correlation

Regression

between two variables using

between one dependent variables and

applications use raw units as an input.

Calculations may us either raw unit

values, or standardized units as input.

meaning that the order of comparison

does NOT change the result.

So one variable is assigned the

dependent role (the values being

predicted) and one or more the

independent role (the values

hypothesize to impact the dependent

variable).

strength of a relationship.

change in an independent variable on

the dependent variable.

different measurement scales.

Therefore, comparison between

different models is possible since the

rho coefficient is in standardized units.

unit measurement scales can be used to

predict outcomes. For example, if a

model shows that spending more

money on advertising will increases

sales, then one can say that for every

added $ in advertising our sales will

increase by .

The following example illustrates some of the uses of correlation and regression as

well as the importance of control for exogenous variables. The example uses real

data, but it is intended as an example, not a theoretically based explanation. So

let's start with a reasonable hypothesis.

State a research hypothesis: As unemployment rates rise the rate of crime will

increase.

Now the researcher collects data. In this case we can use data collected by

governmental agencies.

Illinois rates of crime by year

Year

Violen

t

Propert

Crime y crime Unemployme

rate rate

nt rate

Year

Violen

t

Propert

Crime y crime Unemployme

rate rate

nt rate

1975

670

5,033

8.5

1991

1039 5,093

6.8

1976

626

4,830

7.7

1992

977

4,788

7.5

1977

631

4,697

7.1

1993

960

4,658

6.9

1978

677

4,943

6.1

1994

961

4,665

6.1

1979

744

5,287

5.8

1995

996

4,460

5.6

1980

808

5,461

7.1

1996

890

4,430

5.4

1981

793

5,323

7.6

1997

861

4,280

4.9

1982

774

5,066

9.7

1998

808

4,051

4.5

1983

728

4,813

9.6

1999

690

3,825

4.2

1984

725

4,579

7.5

2000

654

3,585

4.0

1985

715

4,597

7.2

2001

637

3,461

4.7

1986

809

4,746

7.0

2002

602

3,420

5.8

1987

796

4,620

6.2

2003

556

3,288

6.0

1988

810

4,810

5.5

2004

546

3,174

5.5

1989

846

4,793

5.3

2005

552

3,080

5.1

1990

967

4,968

5.6

Crime rates are from the FBI, Uniform Crime Reports as prepared by the National

Archive of Criminal Justice Data

Statistics,

Next the researcher would test the hypothesis. It appears that a simple

correlation between crime rates and unemployment is appropriate.

Using the Excel function correl() the correlation between Violent Crimes 100,000

and Unemployment is .058. This appears to be a very weak, possibly a nonsignificant relationship. Consulting a table of Critical Values for rho (the name

given to this statistic), the researcher finds that this correlation is NOT

STATISTICALLY SIGNIFICANT. (It is not our intention to provide full

instructions on calculating this statistic, nor on how to test for significance. The

reader is directed to any general statistical textbook.)

Test an alternative hypothesis. In this case let's look at Property Crimes per

100,000 (PC) and unemployment (U). Once again a correlation is calculated.

In this case the correlation between PC and U is .577. This is much stronger.

Consulting that critical value table for rho the research find that the correlation is

significant.Next let's look at these relationships. In the following graph that scatter

plot shows each of the 31 sets of data.

An Excel procedure has calculated and drawn a linear regression line and given us

the equation in the upper right corner of the graph. From it we can see that in

general as unemployment increases so does the rate of property crime. The

regression formula in the top right corner of the graph indicates that the best

estimate of property crime is 270.08 times the unemployment rate. In other words,

for every 1% increase in unemployment, property crime appears in increase by 270

crimes per 100,000 people in the population.

When the correlation of .577 is squared the researcher finds that about 33% of the

variance is shared between the two variables. Sometimes this shared variance is

called "explained variance." Thus, the research claims that 33% of the changes in

property crime rates can be attributed to changes in the unemployment rate. (A

part of this interpretation rests with some logical assumptions, not statistical

rational.)

Our researcher (R1) publishes, and waits. One knows that once put out in public

some other researcher will come along to test ones findings.

And so it comes, a second researcher (R2) says "Yes, but...."

R2 presents a different graph. In this graph the Unemployment and Property

Crime data are graphed as they chronologically occurred.

It is very clear that there has been a major trend down in property crime and

unemployment from 1975 to 2005. The three upward peaks in the unemployment

rate (pink line) are not reflected in the property crime rates. Both rates have

declined more or less together across the years. The correlation between the rates

is the result of this shared patter over time.So here then is an alternative

explanation. The original hypothesis must now be rejected. However, note that

the correlations are still very true and accurate. But, the issue is that

other exogenous variables were not included.

R2 published the following summary from a SPSS regression procedure.

Unstandardized

Coefficients

Regression

(Constant)

Std.

Error

Standardiz

ed

Coefficien

ts

Beta

95% Confidence

Interval for B

Sig Lower

.

Bound

Upper

Bound

129615.5 19891.2

30

53

.

6.51

00

6

0

88870.1 170360.9

45

16

-62.869

-.860

.

6.37 00

8

0

-83.062

-.009

.

-.06

94

8

7

-133.424 124.893

Year

Unemploym

ent rate

-4.265

9.857

63.053

-42.677

The following summarize how to interpret these findings.

The constant (intercept, a) is a very large 129,615.5. But remember that the year

Unstandardized for the year indicates a decline of 62.869 crimes. While this

does not sound a big change in raw units, move to the right and notice that the

standardized coefficient is -.86. This is a very strong decline. While we have

not talked about how to test significant, the next two columns "t" and "Sig."

indicate that this variable is statistically significant.

The unstandardized for the unemployment rate is also very small. As is the

standardized and the test for significance indicates that unemployment IS NOT

STATISTICALLY significant.

Bottom line - statistical tests can only find "truth" within the specifications of the

model provided. In this example, the second graph showed a truth that the first

graph did not indicate. Statistics are strong tools, but they are not omnipotent

(with my apologies to philosophers who will point out the logical fallacy of this

statement). Taking a moment to step back and look at patterns, and alternatives is

an important part of research and model building.

2. You want to find a measure of central tendency for income of persons who have

boarded a particular train on a particular date at the originating station. What

measure will you use and why? What measure will you choose for dispersion and

why? (10 marks)

with the help of example. (5 marks)

Ans: If two events A and B are independent, then

Pr [A and B] = Pr [A] Pr[B];

That is, the probability that both A and B occur is equal to the probability that A

occurs times the probability that B occurs.

If A and B are mutually exclusive, then

Pr[A and B] = 0;

That is, the probability that both A and B occur is zero. Clearly, if A and B are

nontrivial events (Pr[A] and Pr[B] are nonzero), then they cannot be both

independent and mutually exclusive.

Lets say there are two events A and B

Mutually Exclusive implies -- If event A occurs, event B can't occur and vice versa.

Independent Event implies - Event A and B don't influence each other, in other

words event A occurring gives us no extra information of event B occurring.

Examples

In a coin tosses you can only have heads and tails; If you get a head you will not

get a tail. So for a single coin toss Occurrence of head and tail is mutually

exclusive.

Whereas if you have two coins, you toss them together, getting a head in 1st coin

has no influence on 2nd coin, so occurrence of head in two different coins are

independent events.

3 B) In how many ways can the letters of the word MADAM be rearranged? (5

marks)

Ans: The question is that of Permutation and contains repeating Letters in the

Word MADAM.

The trick is:

Step 1: Count the number of Letters you have (Ans : 5)

Step 3: Get the Factorial of the repeating letters in the denominator

Ans :

5! / ( 2! * 2!)

= (5*4*3*2*1)/{(2*1)*(2*1)}

= 30

MADAM- It is a 5 letter word.

So in 5 places we can use the 5 letters in 5! Ways

How? 5?

_ _ _ _ _ ->Consider these 5 underscores represent 5 places where

letters need to be filled.

1st place-5 letters can be filled

2nd place-4 letters can be placed (because already 1 letter is occupied by

1st place)

3rd place-3 letters

4th place-2 letters

5th place-1 letter

Totally-5*4*3*2*1=5!

But in the word MADAM, M and A are repeating.

So it need to be neglected by dividing 2! for M 2! for A

Thus 5!/(2! * 2!)=(5*4*3*2*1)/4

=30

In 30 number of ways this word can be rearranged.

