Вы находитесь на странице: 1из 31

# Click to edit Master subtitle style

ASSOCIATION OF ATTRIBUTES
Chi-Square
ROLL 553-07-09
11

Data Types
Data
Quantitative Qualitative
Continuous Discrete
22

Quantitative Data

## Quantitative Data usually consist of

measurable characteristics called variables. For
example,

33

Qualitative Data

## Qualitative Data can not be measured accurately

but can be divided into classes and their number
in each class can be counted. It consist of non-
measurable characteristics. A characteristic
which can be measured numerically (but only its
presence or absence can be described) is called
an Attribute. Nominal or ordinal scale, For
example,

Widowed)

## Employment Status (Employed, Unemployed)

44

Measurement Scales
The four scales of measurement are

Nominal Scale

Interval Scale

Ratio Scale
55

Nominal Scale

## It is the classification of the observations into

mutually exclusive qualitative categories, For
example,

## Students are classified as MALE or

FEMALE, Number 1 and 2 may also be used
to identify these two categories.

## Rainfall may be classified as HEAVY,

MODERATE & LIGHT, Numbers 1, 2 and 3
might be used to denoted three classes.
66
NOTE: There is no particular order for grouping/
classifications here..

Ordinal or Ranking Scale

## It includes the characteristic of a nominal

scale and in addition has a property of
ordering or ranking, For example,

## The performance of Students is rated as

EXCELLENT, GOOD, FAIR or POOR, etc
here number 1, 2, 3 & 4 are used to indicate
ranks.
77
NOTE: The only relation that holds
between any pair of categories is that
of greater than (or more preferred)

Interval Scale

## A measurement scale possessing a constant

interval size (distance) but not a true zero
point, is called an interval scale. For
example,

## Temperature measured is an outstanding

example of interval scale because a same
difference exists between 20C and 40C as
between 5C and 25C. It can not be said that a
temp of 40c is twice as hot as a temperature of
20c.

## The ratio 40/20 has no meanings.

88
NOTE: The arithmetic operation
meaningful.

Ratio Scale

## It is a special kind of an interval scale where

the scale of measurement has a true zero
point as its origin.

## The ratio scale is used to measure weight,

volume, length, distance, money, etc in
which zero point is meaningful.
99
NOTE: The zero point is meaningful for
Ratio scale but not for Interval
scale..

Hypothesis Tests
Qualitative Data
Qualitativ
e
Dat
a
Z
Test
Z
Test

2

Tes
t
Proportio
n
Independenc
e
1
pop.

2

Tes
t
More
than
2 pop.
2
pop.
1010

ASSUMPTION

## Random sample selected from a binomial

population Normal approximation can be used if
H0: p <= p0 or p = p0 or p >= p0
H1: p > p0 or p p0 or p < p0
Z-test statistic
where
1111
0 0

15 and 15 np nq
0
0 0
p p
Z
p q
n

number of successes

sample size
x
p
n

Hypothesis for One Proportion

Ha
Hypothes
is
Research
Questions
No
Difference
Any
Difference
Pop 1

Pop
2
Pop 1 < Pop 2
Pop
1

Pop
2
Pop 1 > Pop
2
H0
1 2
0 p p
1 2
0 p p
1 2
0 p p
1 2
0 p p >
Z-Test Statistic for Two Proportions
( ) ( )
1 2 1 2
1 2
1 2
1 2

where
1 1

p p p p
X X
Z p
n n
pq
n n

+

+
_
+

,
Hypothesis for Two Proportions

Chi Square Test Basic Idea
1. Compares observed count to expected
count assuming null hypothesis is true
2. Closer observed count is to expected
count, the more likely the H0 is true

2. Test Statistic
( )
( )
2
2
all cells
i i
i
n E n
E n

1
]

## Observed (actual) count

Expected
count:
E(ni) = npi,0
3. Degrees of Freedom: k 1
Number of
outcomes
Hypothesized
probability
1. Hypotheses
H0: p1 = p1,0, p2 = p2,0, ..., pk = pk,0
Ha: At least one pi is different from above
Chi Square Test for k proportions

What is the critical 2 value if k = 3, and =.05?

2
0
Upper Tail
Area
D
F
.
99
5
.
9
5
.
0
5
1 .
.
.
0.0
04
3.8
41
2 0.0
10
0.1
03
5.9
91
2
()
If ni = E(ni), 2 0.
0
df = k - 1 = 2
5.9
91
Reject
H0
.05
Finding Critical Value

2 Test of Independence Example
As a realtor you want to determine if house style
and house location are related. At the .05 level of
significance, is there evidence of a relationship?
House Location
House Style Urban Rural Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160

Shows number of observations from 1 sample
jointly in 2 qualitative variables
House Location
House Style Urban Rural Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160

Levels of variable 2
Levels of variable 1
Chi Square Test of Independence
Contingencies Table

112
160
Marginal probability =
Expected Count Example
Location
Urban Rural
House Style Obs. Obs. Total
SplitLevel 63 49 112
Ranch 15 33 48
Total 78 82 160

78
160
Marginal probability =
Expected Count Example
112
160
Marginal probability =
Location
Urban Rural
House Style Obs. Obs. Total
SplitLevel 63 49 112
Ranch 15 33 48
Total 78 82 160

Expected Count Example
78
160
Marginal probability
=
112
160
Marginal probability =
Joint probability =
112
160
78
160
Location
Urban Rural
House Style Obs. Obs. Total
SplitLevel 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected count = 160
112
160
78
160
= 54.6

Expected Count Calculation
i j
R C
=
n
ij
E
House
Location

Urb
an
Ru
ral

House
Style
O
bs
.
Ex
p.
O
bs
.
Ex
p.
Tot
al

Sp
lit
- Le
vel
6
3

11278
160
54
.6
4
9

11282
160
57
.4
1
1
2

Ran
ch
1
5

4878
160
23
.4
3
3

4882
160
24
.6
4
8

Tot
al
7
8
7
8
8
2
8
2
1
6
0

Eij 5 in all cells

2 Test of Independence
Solution
House Location
Urban Rural

House Style Obs. Exp. Obs. Exp. Total
Split-Level 63 54.6 49 57.4 112
Ranch 15 23.4 33 24.6 48
Total 78 78 82 82 160

11282
160
4878
160
4882
160
11278
160

[ ] [ ] [ ]
[ ] [ ] [ ]
2
2
all cells
2 2 2
11 11 12 12 22 22
11 12 22
2 2 2
63 54.6 49 57.4 33 24.6
8.41
54.6 57.4 24.6
ij ij
ij
n E
E
n E n E n E
E E E

1
]

+ + +

+ + +

L
L
2 Test of Independence
Solution

2 Test of Independence Solution

H0:

Ha:

df =

Critical Value(s):
Test Statistic:
p-value = ?
Decision:
Conclusion:
2 = 8.41
Reject at = .05
There is evidence of
a relationship

2
0
Reject
H0
No Relationship
Relationship
.05
(2 - 1)(2 - 1) = 1
3.84
1
.05

Yates Correction for Continuity

## In applying Chi-square approximation, we are required to

combine the smaller expected counts (<5) with larger
ones.

## But in case of 2 classes only, we cannot pool the smaller

frequency into the larger one.

## Frank Yates in 1934 showed that the Chi Square

approximation is markedly improved if we use the
following formula

## It should only be used when d.f=1 and only one ei is

small.

Chi-Square
Table

Coefficient of Contingency

## Chi-Square statistic does not tell anything about the

strength of the association.

## For this purpose Karl Pearson (1857-1936) has

defined a coefficient C defined as pearson coefficient of
mean square contingency

## This coefficient measures the strength of the association

or dependence of two variables of classification of the
contingency table.

## C suffers from the disadvantage that it does not reach a

maximum of 1 or the minimum of -1

## It should, therefore, not be used to compare associations

among tables with different numbers of categories
Coefficient of Contingency

Phi-Coefficient

## Where chi-square is a pearsons Chi square statistic, and

N is a grand total of the observations.

## Phi varies from -1 to 1

0 indicates no association

## This coefficient can only be calculated for frequency

data represented in 2 x 2 tables

Cramers Co-efficient of contingency

Thanks