Академический Документы
Профессиональный Документы
Культура Документы
Social Sciences
or: why data analysis?...
required knowledge on the basic
statistics
what are the various tests?...
the assumptions behind each test
how to use the tests?...
an introduction to the fundamentals
of data analysis for MBA students
Partially based on Pandya K, et.al., SPSS in Simple Steps,
Dreamtech Press, ISBN-13: 978-93-5004-251-9.
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
1
Steps, Dreamtech Press.
Purpose of Statistics
1) To describe a phenomena,
2) To organize and summarize our result more
conveniently and meaningfully,
3) To make inference or make predictions,
4) To explain, and
5) To make a conclusion.
Types of Statistics
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in
Measurement scales
Nominal:
Numbers assigned to
runners
Finish
7
11
Ordinal:
Rank Order of
Winners
Finish
Third
Place
Interval:
Performance Rating
on a 0-to-10 scale
Ratio:
Time to finish in
seconds
Second
Place
First
Place
8.2
9.1
9.6
15.2
14.1
13.4
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
Nominal variables allow for only qualitative classification. That is, they
can be measured only in terms of whether the individual items belong to
some distinctively different categories, but we cannot quantify or even
rank order those categories. Typical examples of nominal variables are
gender, race, color, city, no. of players etc.
Permissible Statistics:
Descriptive percentages, mode
Inferential Chi-square, binomial test
Ordinal variables allow us to rank order the items we measure in terms
of which has less and which has more of the quality represented by the
variable, but still they do not allow us to say "how much more. A typical
example of an ordinal variable is the socioeconomic status of families.
Permissible Statistics:
Descriptive percentile, median
Inferential Rank-order correlation, Friedman ANOVA
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
Interval variables allow us not only to rank order the items that are
measured, but also to quantify and compare the sizes of differences
between them. For example, temperature, as measured in degrees
Fahrenheit or Celsius, constitutes an interval scale.
Permissible Statistics:
Descriptive range, mean, S.D
Inferential pearson correlation, t-tests, ANOVA, regression, factor analysis
Ratio variables are very similar to interval variables; in addition to all the
properties of interval variables, they feature an identifiable absolute zero
point, thus they allow for statements such as x is two times more than y.
Typical examples of ratio scales are measures of time or space.
Permissible Statistics:
Descriptive geometric mean, harmonic mean
Inferential coefficient of variation
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
- Mean
- Median
- Mode
Measures of Spread / Dispersion
- Range
- Variance
- S.D
- IQR
Measure of Asymmetry
Measure of Peakedness
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
Skewness
Indicates asymmetry in the distribution
Positive value right skewed
Long tail to right
Median < Mean
Negative value left skewed
Long tail to left
Median > Mean
For normally distributed data skewness
is zero
Kurtosis
Indicates peakedness of the distribution
Positive values indicate heavy tails
Negative value indicate light tails
Normal distribution has a kurtosis of zero
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple Steps,
9
Dreamtech Press.
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
10
Kurtosis
Leptokurtic
Normal
Platykurtic
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
11
Measures of Dispersion
The knowledge of measures of central tendency cannot give a complete
idea about the distribution.
Measures of dispersion give an idea about the scatteredness of the data
of the distribution.
Range, Variance, Standard Deviation and Quartile Deviation are the
measures of deviation.
Range is the difference between the greatest and the least of the
observations.
A better way to measure dispersion is to square the differences between
each data and the mean before averaging them. This is called Variance.
The positive square root of the Variance is called the Standard Deviation.
Co-efficient of variation helps us to compare the consistency of two or
more collections of data.
When coefficient of variation is more the data is less consistent and vice
versa.
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
12
Quartile Deviation
Median is a value below which there are 50% cases in
the distribution. Median divides the distribution into
two equal halves.
There are other points that divide the distribution into
various ratios.
The point below which there are 25% of the cases in
the distribution is known as the first quartile (Q1).
The point below which there are 75% of the cases is
known as the third quartile (Q3).
Quartile Deviation is half the difference between the
values of these quartiles.
25%
Min
Q1
25%
Mdn
25%
25%
Q3
Max
13
Box Plot
Outlier
Maximum
75th Percentile
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in
Simple Steps, Dreamtech Press.
14
1. Normal distribution
2. Right skewed dist
3. Left skewed dist
4. Light tailed dist
5. Heavy tailed dist
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in
Simple Steps, Dreamtech Press.
15
Mean is used
- when the distribution is not badly skewed.
- when the measure of the greatest stability is required.
- when other statistics (such as S.D, t) are to be computed.
Median is used
- when the mid points is required.
- when there are extreme cases.
- when the distribution is truncated.
Mode is used
- when we need a quick approximate value
- when we need the most typical value.
C.T. of a nominal (categorical) scale is given by its mode.
C.T. of a Ordinal scale is given by Median.
C.T. of Interval/Ratio scale is given by Mean (symmetrical data),
Median (skewed data).
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
16
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
17
NON-PARAMETRIC
Runs Test
Mann-Whitney U Test
Wilcoxon Signed Rank
Test
Kruskal-Wallis Test
Spearmann & Kendall
Tau coefficients
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
18
UNI-VARIATE
ANALYSIS
Cross Tab & Two Cat Var
Qualitative
Data
Yes
Comp with
Standard Value
Difference
No
Are data from
same
respondents?
Dependent
Yes
Relationship
Independen
t
No
Dependent
Variable ?
Yes
Yes
2 samples
Corr (r)
Regression
Analysis
One Samp t
Indpt t
No
3 Samples
2 samples
3 Samples
Paired t
Repeated Measurement Analysis
20
Tests of Normality
A common application for distribution fitting procedures is
when the assumption of normality has to be verified
before using some parametric test.
Kolmogorov-Smirnov test for normality (one sample / two
sample),
Shapiro-Wilks W test
Lilliefors test (used instead of KS when mean and SD are
not known).
Histogram with normal curve.
Skewness & Kurtosis
Significance Level
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
21
Steps, Dreamtech Press.
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
22
Inferential Statistics
Check for related samples or unrelated
samples
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
23
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
24
2012, Marie Anne Rosario, partially based on, and adapted from Pandya K, et.al., SPSS in Simple
Steps, Dreamtech Press.
25