Вы находитесь на странице: 1из 35

# Thursday 30th April 2015

Course : S.I.

Group members
Sundus Karim
Iqra Gulzar
Komal
Marylyn
Fatima
Azeem
Huzaifa

Huzaifa

## Looking at the relationship between

two interval-ratio variables

## When we want to know how two variables

are related to one another the pattern of the
data points on the scatterplot can illustrate
various patterns and relationships, including:
data correlation
positive or direct relationships between
variables
negative or inverse relationships between
variables
non-linear patterns

Marylyn

What can we measure?:

## Gradient a measure of how the line slopes

Intercept where the line cuts the y axis
Correlation a measure of how well the line

## fits the data

Equation for a line:
y = a + bx
a is the point at which the
line crosses the y axis (when
x=0).
b is a measure of the slope
(the amount of change in y

y = 1.5 +
0.5x

4
3
2
1
00
5

Linear relationship
The technique of line-fitting, known as regression is used to
measure how well a line fits a scatter of plots.
When the data points form a straight line on the graph, the
linear relationship between the variables is stronger and the
correlation is higher.
The following scatterplot shows a strong linear relationship
between the two variables.
We say that these two variables are highly correlated.

Fatima

## Positive and negative relationships

Positive or direct relationships
If the points cluster around a line
that runs from the lower left to upper
right of the graph area, then the
relationship between the two
variables is positive or direct.
An increase in the value of x is more
likely to be associated with an
increase in the value of y.
The closer the points are to the line,
the stronger the relationship.
Negative or inverse
relationships
If the points tend to cluster around
a line that runs from the upper left
to lower right of the graph, then the
relationship between the two
variables is negative or inverse.
An increase in the value of x is
more likely to be associated with a

Komal

Iqra

## Working out the correlation

coefficient (Pearsons r)
Pearsons r tells us how much one variable changes as the values

## of another changes their covariation.

Variation is measured with the standard deviation. This measures
average variation of each variable from the mean for that variable.
Covariation is measured by calculating the amount by which each
value of X varies from the mean of X, and the amount by which
each value of Y varies from the mean of Y and multiplying the
differences together and finding the average (by dividing by n-1).

x X y Y
n 1

x X y Y

(n 1) sx s y

## Working out the correlation

coefficient (Pearsons r)
This can also be calculated as the average sum of the

z z

x y

(n 1)

## Because r is standardized it will always fall between

+1 and -1.
A correlation of either 1 or -1 means perfect
association between the two variables.
A correlation of 0 means that there is no association.
Note: correlation does not mean causation. We can
only investigate causation by reference to our theory.
However (thinking about it the other way round) there
is unlikely to be causation if there is not correlation.

Azeem

Worked Example:
x

13

x in
standardized
units

y in
standardized
units

Product

(x X )
sx

Average of x = 4, SD = 2
Average of y = 7, SD = 4

## Note: reminder of how to

standardize scores:

Worked Example:
x

x in
standardized
units

y in
standardized
units

-1.5

-0.5

-0.5

0.5

0.0

0.0

0.5

-1.5

13

1.5

1.5

Product

(x X )
sx

Average of x = 4, SD = 2
Average of y = 7, SD = 4

## Note: reminder of how to

standardize scores:

Worked Example:
x

x in
standardized
units

-1.5

-0.5

0.75

-0.5

0.5

-0.25

0.0

0.0

0.00

0.5

-1.5

-0.75

13

1.5

1.5

2.25

Average of x = 4, SD = 2
Average of y = 7, SD = 4
Note: reminder of how to
standardize scores:

(x X )
z
sx

y in
standardized
units

Product

## Average of the products:

= 0.75 + -0.25 + 0 + -0.75 + 2.25
= 2.00
Divide by n-1:
= 2.00/(5-1) = 2/4 = .5

Explained Variation
Pearsons r measures strength of association

## between two variables. It does not tell you

how much of variable y is explained by
variable x. To get this you need to calculate
r2. This is known as the coefficient of
determination.
In this example r2 = 0.5 x 0.5 = 0.25.
Therefore 25% of the variation in y is
explained by x.

## Going back to the line

The regression line for y on x estimates the

## average value for y corresponding to each

value of x
Associated with each increase of one SD of x
there is an increase of r SDs in y, on the
average.

Point of
averages

r x SDy
SDx

Sundus