You are on page 1of 10

NMIMS Global Access

School for Continuing Education (NGASCE)


Business Statistics
Internal Assignment Applicable for June 2016 Examination
Assignment Marks: 30
---------------------------------------------------------------------------------------------------1. Differentiate between correlation and regression. Explain with suitable example
using data. (10 Marks)
Ans:
Correlation and Regression
Correlation is a specialized procedures from a larger family of regression
procedures. Regression procedures examine the relationship between two or more
sets of paired variables. The basic linear regression formula
y=a+x
Where:
y is the set of dependent variable values
a is the intercept, the value of y when x is 0
is the change in y for each unit of x.
In words this formula says that the predicted value of the dependent variable y is
the intercept value a plus the coefficient times the value of x. It is possible to add
more s for additional independent variables.
Lets first compare the specifics of correlation with the most general linear
regression.
Correlation

Regression

Correlation examines the relationship


between two variables using

Regression examines the relationship


between one dependent variables and

a standardized unit. However, most


applications use raw units as an input.

one or more independent variables.


Calculations may us either raw unit
values, or standardized units as input.

The calculation is symmetrical,


meaning that the order of comparison
does NOT change the result.

The calculation is NOT symmetrical.


So one variable is assigned the
dependent role (the values being
predicted) and one or more the
independent role (the values
hypothesize to impact the dependent
variable).

Correlation coefficients indicate the


strength of a relationship.

Regression shows the effect of one unit


change in an independent variable on
the dependent variable.

Correlation removes the effect of


different measurement scales.
Therefore, comparison between
different models is possible since the
rho coefficient is in standardized units.

Linear regression using raw


unit measurement scales can be used to
predict outcomes. For example, if a
model shows that spending more
money on advertising will increases
sales, then one can say that for every
added $ in advertising our sales will
increase by .

Examining Unemployment and Crime


The following example illustrates some of the uses of correlation and regression as
well as the importance of control for exogenous variables. The example uses real
data, but it is intended as an example, not a theoretically based explanation. So
let's start with a reasonable hypothesis.
State a research hypothesis: As unemployment rates rise the rate of crime will

increase.
Now the researcher collects data. In this case we can use data collected by
governmental agencies.
Illinois rates of crime by year

Year

Violen
t
Propert
Crime y crime Unemployme
rate rate
nt rate
Year

Violen
t
Propert
Crime y crime Unemployme
rate rate
nt rate

1975

670

5,033

8.5

1991

1039 5,093

6.8

1976

626

4,830

7.7

1992

977

4,788

7.5

1977

631

4,697

7.1

1993

960

4,658

6.9

1978

677

4,943

6.1

1994

961

4,665

6.1

1979

744

5,287

5.8

1995

996

4,460

5.6

1980

808

5,461

7.1

1996

890

4,430

5.4

1981

793

5,323

7.6

1997

861

4,280

4.9

1982

774

5,066

9.7

1998

808

4,051

4.5

1983

728

4,813

9.6

1999

690

3,825

4.2

1984

725

4,579

7.5

2000

654

3,585

4.0

1985

715

4,597

7.2

2001

637

3,461

4.7

1986

809

4,746

7.0

2002

602

3,420

5.8

1987

796

4,620

6.2

2003

556

3,288

6.0

1988

810

4,810

5.5

2004

546

3,174

5.5

1989

846

4,793

5.3

2005

552

3,080

5.1

1990

967

4,968

5.6

Crime rates are from the FBI, Uniform Crime Reports as prepared by the National
Archive of Criminal Justice Data

Unemployment data from the U.S. Department of Labor, Bureau of Labor


Statistics,
Next the researcher would test the hypothesis. It appears that a simple
correlation between crime rates and unemployment is appropriate.
Using the Excel function correl() the correlation between Violent Crimes 100,000
and Unemployment is .058. This appears to be a very weak, possibly a nonsignificant relationship. Consulting a table of Critical Values for rho (the name
given to this statistic), the researcher finds that this correlation is NOT
STATISTICALLY SIGNIFICANT. (It is not our intention to provide full
instructions on calculating this statistic, nor on how to test for significance. The
reader is directed to any general statistical textbook.)
Test an alternative hypothesis. In this case let's look at Property Crimes per
100,000 (PC) and unemployment (U). Once again a correlation is calculated.
In this case the correlation between PC and U is .577. This is much stronger.
Consulting that critical value table for rho the research find that the correlation is
significant.Next let's look at these relationships. In the following graph that scatter
plot shows each of the 31 sets of data.

An Excel procedure has calculated and drawn a linear regression line and given us
the equation in the upper right corner of the graph. From it we can see that in
general as unemployment increases so does the rate of property crime. The

regression formula in the top right corner of the graph indicates that the best
estimate of property crime is 270.08 times the unemployment rate. In other words,
for every 1% increase in unemployment, property crime appears in increase by 270
crimes per 100,000 people in the population.
When the correlation of .577 is squared the researcher finds that about 33% of the
variance is shared between the two variables. Sometimes this shared variance is
called "explained variance." Thus, the research claims that 33% of the changes in
property crime rates can be attributed to changes in the unemployment rate. (A
part of this interpretation rests with some logical assumptions, not statistical
rational.)
Our researcher (R1) publishes, and waits. One knows that once put out in public
some other researcher will come along to test ones findings.
And so it comes, a second researcher (R2) says "Yes, but...."
R2 presents a different graph. In this graph the Unemployment and Property
Crime data are graphed as they chronologically occurred.

It is very clear that there has been a major trend down in property crime and
unemployment from 1975 to 2005. The three upward peaks in the unemployment
rate (pink line) are not reflected in the property crime rates. Both rates have
declined more or less together across the years. The correlation between the rates
is the result of this shared patter over time.So here then is an alternative
explanation. The original hypothesis must now be rejected. However, note that
the correlations are still very true and accurate. But, the issue is that
other exogenous variables were not included.
R2 published the following summary from a SPSS regression procedure.

Unstandardized
Coefficients

Regression
(Constant)

Std.
Error

Standardiz
ed
Coefficien
ts

Beta

95% Confidence
Interval for B

Sig Lower
.
Bound

Upper
Bound

129615.5 19891.2
30
53

.
6.51
00
6
0

88870.1 170360.9
45
16

-62.869

-.860

.
6.37 00
8
0

-83.062

-.009

.
-.06
94
8
7

-133.424 124.893

Year

Unemploym
ent rate
-4.265

9.857

63.053

-42.677

a Dependent Variable: Property crime rate


The following summarize how to interpret these findings.
The constant (intercept, a) is a very large 129,615.5. But remember that the year

variable ended at 2005, so the intercept is over 2000 years prior.


Unstandardized for the year indicates a decline of 62.869 crimes. While this
does not sound a big change in raw units, move to the right and notice that the
standardized coefficient is -.86. This is a very strong decline. While we have
not talked about how to test significant, the next two columns "t" and "Sig."
indicate that this variable is statistically significant.
The unstandardized for the unemployment rate is also very small. As is the
standardized and the test for significance indicates that unemployment IS NOT
STATISTICALLY significant.

Bottom line - statistical tests can only find "truth" within the specifications of the
model provided. In this example, the second graph showed a truth that the first
graph did not indicate. Statistics are strong tools, but they are not omnipotent
(with my apologies to philosophers who will point out the logical fallacy of this
statement). Taking a moment to step back and look at patterns, and alternatives is
an important part of research and model building.
2. You want to find a measure of central tendency for income of persons who have
boarded a particular train on a particular date at the originating station. What
measure will you use and why? What measure will you choose for dispersion and
why? (10 marks)

3 A) Distinguish between independence of events and mutual exclusivity of events


with the help of example. (5 marks)
Ans: If two events A and B are independent, then
Pr [A and B] = Pr [A] Pr[B];

That is, the probability that both A and B occur is equal to the probability that A
occurs times the probability that B occurs.
If A and B are mutually exclusive, then
Pr[A and B] = 0;
That is, the probability that both A and B occur is zero. Clearly, if A and B are
nontrivial events (Pr[A] and Pr[B] are nonzero), then they cannot be both
independent and mutually exclusive.
Lets say there are two events A and B
Mutually Exclusive implies -- If event A occurs, event B can't occur and vice versa.
Independent Event implies - Event A and B don't influence each other, in other
words event A occurring gives us no extra information of event B occurring.

Examples
In a coin tosses you can only have heads and tails; If you get a head you will not
get a tail. So for a single coin toss Occurrence of head and tail is mutually
exclusive.
Whereas if you have two coins, you toss them together, getting a head in 1st coin
has no influence on 2nd coin, so occurrence of head in two different coins are
independent events.

3 B) In how many ways can the letters of the word MADAM be rearranged? (5
marks)
Ans: The question is that of Permutation and contains repeating Letters in the
Word MADAM.
The trick is:
Step 1: Count the number of Letters you have (Ans : 5)

Step 2 : Get its factorial (denoted by "!") in the numerator.


Step 3: Get the Factorial of the repeating letters in the denominator
Ans :
5! / ( 2! * 2!)
= (5*4*3*2*1)/{(2*1)*(2*1)}
= 30
MADAM- It is a 5 letter word.
So in 5 places we can use the 5 letters in 5! Ways
How? 5?
_ _ _ _ _ ->Consider these 5 underscores represent 5 places where
letters need to be filled.
1st place-5 letters can be filled
2nd place-4 letters can be placed (because already 1 letter is occupied by
1st place)
3rd place-3 letters
4th place-2 letters
5th place-1 letter
Totally-5*4*3*2*1=5!
But in the word MADAM, M and A are repeating.
So it need to be neglected by dividing 2! for M 2! for A
Thus 5!/(2! * 2!)=(5*4*3*2*1)/4
=30
In 30 number of ways this word can be rearranged.