Вы находитесь на странице: 1из 96

INTRODUCTION

Data Preparation

Types of Variables Analysis

Univariate Analysis

Bivariate Analysis

Bivariate Analysis (cont)

Multivariate Analysis

Measures of Centrality and Dispersion

Types of Measures of Dispersion

Range

Mean Deviation

Examples

Variance

Standard Deviation

Example

Interpretation

Coefficient of Variation

Why Coefficient of Variation

Frequency Distribution
Distributions
Frequency Distributions A description of the
number of times the various attributes of a
variable are observed in a sample.
Frequency distribution is counts of the number of
response to a question or to the occurrence of a
phenomenon of interest.

Distribution (cont)

Distribution (cont)

Central Tendency
Central Tendency

Average An ambiguous term generally


suggesting typical or normal a central
tendency (examples: mean, median, mode).
1. Mean = Sum of values / total number of
cases
2. Mode = Most frequently occurring
attribute
3. Median = Middle attribute in the ranked
distribution of observed attribute

Central Tendency (Cont)


Practice: The following list represents the scores
on a mid-term exam.
100, 94, 88, 91, 75, 61, 93, 82, 70, 88, 71, 88

Determine the mean.


Determine the mode.
Determine the median.

Measurement of Dispersion

Range

Percentile range
Quartile deviation
Mean deviation
Variance and standard deviation

Relative measure of dispersion

Coefficient of variation
Coefficient of mean deviation
Coefficient of range
Coefficient of quartile deviation

Measures of Dispersion: The Range


Simplest measure of dispersion
Difference between the largest and the smallest values:

Range = Xlargest Xsmallest


Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 13 - 1 = 12

Interquartile Range
Measures the range of the middle 50% of the values only
Is defined as the difference between the upper and lower
quartiles

Interquartile range
quartile

= upper quartile - lower

= Q3 - Q1

The mean deviation


Measures the average distance of each observation away
from the mean of the data
Gives an equal weight to each observation
Generally more sensitive than the range or interquartile
range, since a change in any value will affect it

Actual and absolute deviations from


mean
A set of x values has a mean of

The residual of a particular x-value is:


Residual or deviation = x

The absolute deviation is:

x-x

Mean deviation
The mean of the absolute deviations

Mean deviation =

xx
n

To calculate mean deviation


1. Calculate mean of data

Find

2. Subtract mean from each For each x, find


observation
xx
Record the differences
3. Record absolute value of Find
each residual

xx
for each x

4. Calculate the mean of


the absolute values

Mean deviation =

xx
n

Add up absolute values


and divide by n

The standard deviation


Measures the variation of observations from the
mean
The most common measure of dispersion
Takes into account every observation
Measures the average deviation of observations
from mean
Works with squares of residuals not absolute
valueseasier to use in further calculations

Standard deviation of a population


Every observation in the population is used.
The square of the population standard
deviation is called the variance.
Variance = 2

(x x )

Standard deviation = =

Standard deviation of a sample


In practice, most populations are very large
and it is more common to calculate the
sample standard deviation.

(x x )

Sample standard deviation = s =

n 1

Where: (n-1) is the number of observations in the sample

To calculate standard deviation


1. Calculate the mean
2. Calculate the residual for each x
3. Square the residuals
4. Calculate the sum of the squares
5. Divide the sum in Step 4 by (n-1)

x
xx

( x x )2

xx

2
(
)
x

n 1
6. Take the square root of quantity
in Step 5

xx
n 1

)2

Standard deviations for frequency


distributions
If data is in a frequency distribution
No. Units
n
1

Frequency
f
85

192

123

Total

400

Calculate standard deviation using:


Total

s=

x x
1

Coefficient of variation
Is a measure of relative variability used to:
measure changes that have occurred in a
population over time
compare variability of two populations that are
expressed in different units of measurement
expressed as a percentage rather than in terms of
the units of the particular data

Formula for coefficient of variation


Denoted by V

s
V = 100 %
x
where

= the mean of the sample


s = the standard deviation of the sample

Summary
Measures of central tendency

no ideal measure of dispersion exists


standard deviation is the most important measure
of central tendency
it is the most frequently used
the value is affected by the value of every observation
in the data
extreme values in the population may distort the data

The range
Simply the difference between the largest and
smallest values in a set of data
Useful for: daily temperature fluctuations or share
price movement
Is considered primitive as it considers only the
extreme values which may not be useful indicators of
the bulk of the population.
The formula is:

Range = largest observation - smallest


observation

Interquartile range
Measures the range of the middle 50% of the
values only
Is defined as the difference between the upper
and lower quartiles

Interquartile range = upper quartile - lower


quartile

= Q3 - Q1

The mean deviation


Measures the average distance of each observation
away from the mean of the data
Gives an equal weight to each observation
Generally more sensitive than the range or
interquartile range, since a change in any value will
affect it

Actual and absolute deviations from


mean
A set of x values has a mean of x
The residual of a particular x-value is:
Residual or deviation = x

The absolute deviation is: x - x

Mean deviation
The mean of the absolute deviations

Mean deviation =

xx
n

To calculate mean deviation


1. Calculate mean of data

Find

2. Subtract mean from each For each x, find


observation
xx
Record the differences
3. Record absolute value of Find
each residual

xx
for each x

4. Calculate the mean of


the absolute values

Mean deviation =

xx
n

Add up absolute values


and divide by n

The standard deviation


Measures the variation of observations from the
mean
The most common measure of dispersion
Takes into account every observation
Measures the average deviation of observations
from mean
Works with squares of residuals not absolute
valueseasier to use in further calculations

Standard deviation of a population

Every observation in the population is used.

(
x x)
The squareStan
ofdard
thedeviation
population
standard

==
deviation is called the variance. n
2

Variance = 2
2002 McGraw-Hill Australia,
PPTs t/a Introductory
Mathematics & Statistics for
Business 4e by John S.
Croucher

47

Standard deviation of a sample


s
In practice, most populations are very large
and it is more common to calculate the
sample standard deviation.

(x x )

Sample standard deviation = s =

n 1
Where: (n-1) is the number of observations in the sample

2002 McGraw-Hill Australia,


PPTs t/a Introductory
Mathematics & Statistics for
Business 4e by John S.
Croucher

48

To calculate standard deviation


1. Calculate the mean

2. Calculate the residual for each x


3. Square the residuals

xx

( x x )2

4. Calculate the sum of the squares


5. Divide the sum in Step 4 by (n-1)

xx

2
(
)
x

n 1
6. Take the square root of quantity
in Step 5
2002 McGraw-Hill Australia,
PPTs t/a Introductory
Mathematics & Statistics for
Business 4e by John S.
Croucher

49

xx
n 1

)2

Standard deviations for frequency


distributions
If data is in a frequency distribution
No. Units
n
1

Frequency
f
85

192

123

Total

400

Calculate standard deviation using:


Total

s=
2002 McGraw-Hill Australia,
PPTs t/a Introductory
Mathematics & Statistics for
Business 4e by John S.
Croucher

x x
1
50

Coefficient of variation
Is a measure of relative variability used to:
measure changes that have occurred in a
population over time
compare variability of two populations that are
expressed in different units of measurement
expressed as a percentage rather than in terms of
the units of the particular data

2002 McGraw-Hill Australia,


PPTs t/a Introductory
Mathematics & Statistics for
Business 4e by John S.
Croucher

51

Formula for coefficient of variation


Denoted by V

s
V = 100 %
x
where

2002 McGraw-Hill Australia,


PPTs t/a Introductory
Mathematics & Statistics for
Business 4e by John S.
Croucher

= the mean of the sample


s = the standard deviation of the sample

52

Summary
Measures of central tendency

no ideal measure of dispersion exists


standard deviation is the most important measure
of central tendency
it is the most frequently used
the value is affected by the value of every observation
in the data
extreme values in the population may distort the data

2002 McGraw-Hill Australia,


PPTs t/a Introductory
Mathematics & Statistics for
Business 4e by John S.
Croucher

53

Measures of Dispersion:
Why The Range Can Be Misleading
Ignores the way in which data are distributed
7

10 11 12

Range = 12 - 7 = 5

9 10

11 12

Range = 12 - 7 = 5

Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 120 - 1 = 119

Quartile Measures
Quartiles split the ranked data into 4 segments with an
equal number of values per segment
25%

25%

Q1

25%

Q2

25%

Q3

The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% of the observations are
smaller and 50% are larger)
Only 25% of the observations are greater than the third
quartile

Measures of Dispersion:
The Variance
Average (approximately) of squared deviations
of values from the mean
n

Sample variance:

Where

S =
2

(X X)
i=1

X= arithmetic mean
n = sample size
Xi = ith value of the variable X

n -1

Measures of Dispersion:
The Standard Deviation s

Most commonly used measure of variation


Shows variation about the mean
Is the square root of the variance
Has the same units as the original data
n

Sample standard deviation:

S=

2
(X
X
)

i
i=1

n -1

Sample
Data (Xi) :

Measures of Dispersion:
Sample Standard Deviation:
Calculation Example
10

12

14

15 17 18 18 24

n=8

Mean = X = 16

S=

(10 X ) 2 + (12 X ) 2 + (14 X ) 2 + + (24 X ) 2


n 1

(10 16) 2 + (12 16) 2 + (14 16) 2 + + (24 16) 2


8 1

130
7

4.3095

A measure of the average scatter


around the mean

Summary of Measures
Range

X largest X smallest

Total Spread

Standard Deviation
(Sample)

(X i

Dispersion about
Sample Mean

Standard Deviation
(Population)

(X i

X )

Dispersion about
Population Mean

Variance
(Sample)

X)
n 1

N
(X i X

n1

Squared Dispersion
about Sample Mean

The Semi-Interquartile Range


The semi-interquartile range (or SIR) is
defined as the difference of the first and
third quartiles divided by two
The first quartile is the 25th percentile
The third quartile is the 75th percentile

SIR = (Q3 - Q1) / 2

60

The Semi-Interquartile Range


Example
What is the SIR for the
data to the right?
25 % of the scores are
below 5
5 is the first quartile

25 % of the scores are


above 25
25 is the third quartile

SIR = (Q3 - Q1) / 2 = (25 5) / 2 = 10

2
4
6
8
10
12
14
20
30
60

5 = 25th %tile

25 = 75th %tile

61

Variance
Variance is defined as the average of the
square deviations:

(
)
X

=
2

N
62

What Does the Variance Formula


Mean?
First, it says to subtract the mean from each of
the scores
This difference is called a deviate or a deviation
score
The deviate tells us how far a given score is from
the typical, or average, score
Thus, the deviate is a measure of dispersion for a
given score
63

Continuous & Discrete Variables

Subgroup Comparison

Collapsing Response Categories

Handling Dont Knows

Numerical Description in Qualitative


Research

Bivariate Analysis

Bivariate Analysis (cont)

Bivariate Analysis

Percentaging a Table

Percentaging a Table (cont)

Percentaging a Table (cont)

Percentaging a Table (cont)

Percentaging a Table (cont)

Constructing & Reading Bivariate


Tables

Multivariate Analysis

Conclusion

DESCRIPTIVE STATISTICS
Summarise and organise data.
Measures of central tendency
Mean average sum of scores/number of scores.
Mode most common value typical value.
Median middle value.

Findings can be presented in a number of ways:


Frequency tables
How often do
you go training
in a week?

17
2
3

(n = number of responses)
Always provide a table of results.

n
25.0
29
22

%
42.7
32.3

USING GRAPHS AND CHARTS


Only present a graph/chart if it illustrates something.
These describe data they do not explain anything.

INFERENTIAL STATISTICS
Allow you to make inferences from data.
Uses at least 2 variables.
What affect does the independent variable have on the
dependent variable? Causality is A caused by B?

TYPES OF TEST
1. Parametric tests. These tests use interval or ratio data (see
Chapter 6 for a reminder). Parametric tests assume that the
data is drawn from a normally distributed population (i.e. the
data is not skewed) and have the same variance (or spread) on
the variables being measured.
2. Non-parametric tests. These are used with ordinal or
nominal data, and do not make any assumptions about the
characteristics of the sample in terms of its distribution.

TESTS OF ASSOCIATION
CORRELATION
Correlations investigate the relationship between two variables
consisting of interval or ratio data.
A correlation can indicate:
Whether there is a relationship between the two variables.
The direction of the relationship, i.e. whether it is positive or
negative.
The strength, or magnitude of the relationship.

Correlation scores range from 1 to -1

R=1
strong
positive
correlation

R= -1
strong
negative
correlation

R=0
no
correlation

A strong correlation does not necessarily mean a relationship!


e.g. lectures attended positively correlates with final grade.
May be:
more lectures attended = more interest
more interest = higher grade
Spuriousness relationship.

TESTING DIFFERENCES
Tests of difference generally assess whether differences between
two samples are likely to have occurred by chance, or whether
they are the result of the effect of a particular variable.

THE INDEPENDENT SAMPLES T-TEST


This examines whether the mean scores of two different
groups can be considered as being significantly different.
It can be used when:
The data is interval or ratio in nature.
The groups are randomly assigned (hence, you should use an
ANOVA rather than a t-test to compare differences between
males and females, as gender is not randomly determined
when you come to assign your groups).
The two groups are independent of each other.
The variance, or spread, in the two groups is equal.

PAIRED SAMPLES T-TEST


The paired t-test measures whether the mean of a single group
is different when measured at different times.

ANALYSIS OF VARIANCE (ANOVA)


ANOVA is similar in nature to the independent t-test, however
it allows you to ascertain differences between more than two
groups.
If you are looking to explore gender differences, then this is a
more appropriate test to use than an independent t-test as it
does not assume that participants have been randomly
assigned to each group.

THE MANN-WHITNEY TEST


An alternative to the independent t-test.
Used when data is ordinal and non-parametric.
This test works on ranking the data rather than testing the
actual score, and scoring each rank (so the lowest score would
be ranked 1, the next lowest 2 and so on) ignoring the group
to which each participant belonged.
The principle of the test is that if the groups were equal, then
the sum of the ranks should also be the same.

THE WILCOXON SIGNED RANK TEST


Similar to the Mann-Whitney test, however it examines
differences where the two sets of scores are from the same
participants (effectively it is non-parametric alternative to a one
sample t-test).
THE KRUSKAL-WALLIS TEST
This is a non-parametric alternative to the ANOVA test, and can
be used to identify differences between three or more
independent groups.

WHICH TEST SHOULD I USE?


The type of data that you collect will be important in your final
choice of test:
Nominal
Consider a chi-squared test if you are interested in differences
in frequency counts using nominal data, for example comparing
whether month of birth affects the sport that someone
participates in.

WHICH TEST SHOULD I USE?


Ordinal
If you are interested in the relationship between groups, then
use Spearmans correlation.
If you are looking for differences between independent groups,
then a Mann-Whitney test may be appropriate.
If the groups are paired, however, then a Wilcoxon Signed rank
test is appropriate.
If there are three or more groups then consider a KruskalWallis test.

Interval or ratio
Are you looking to identify relationships between two variables?
If so, consider the use of a Pearsons correlation.
If there are three or more variables, then consider multiple
regression.
If you are concerned with differences between scores, then ttests or ANOVA may be appropriate.
If you want to identify differences within one group, then a
paired samples t-test should be used.
If you are comparing two randomly assigned groups, then use
an independent samples t-test.
If you are looking to compare two non-randomly assigned, or
three or more groups, then use ANOVA.

Вам также может понравиться