Вы находитесь на странице: 1из 52

Statistics

An Introduction

1-1

Learning Objectives
1. Define Statistics
2. Describe the Uses of Statistics
3. Distinguish Descriptive & Inferential
Statistics
4. Define Population, Sample, Parameter,
& Statistic
5. Identify data types

1-2

What is Statistics?
The practice (science?) of data analysis
Summarizing data and drawing inferences
about the larger population from which
it was drawn

1-3

Statistical Methods
Statistical
Methods

Descriptive
Statistics

1-4

Inferential
Statistics

Descriptive
Statistics
1.

Involves

2.

Collecting Data
Presenting Data
Characterizing
Data

Purpose

Describe Data

1-5

50

25
0
Q1 Q2

Q3 Q4

X = 30.5 S2 = 113

Inferential Statistics
1.

Involves

2.

Estimation
Hypothesis
Testing

Purpose

Make Decisions About


Population Based on
Sample Characteristics

1-6

Population?

Key Terms
1. Population (Universe)

All Items of Interest

2. Sample

Portion of Population

P in Population
& Parameter
S in Sample
& Statistic

3. Parameter

Summary Measure about Population

4. Statistic

1-7

Summary Measure about Sample

Data Types
Quantitative

Discrete
Continuous

Qualitative

Nominal (categorical)
Ordinal (rank ordered categories)

1-8

Sampling
Representative sample

Same characteristics as the population

Random sample

Every subset of the population has an


equal chance of being selected

1-9

Review
Descriptive vs. Inferential Statistics
Vocabulary

Population
(Random, representative) sample
Parameter
Statistic

Data types
1 - 10

Methods for Describing Data

1 - 11

Learning Objectives
1.
2.
3.

Describe Qualitative Data Graphically


Describe Numerical Data Graphically
Create & Interpret Graphical Displays

4. Explain Numerical Data Properties


5. Describe Summary Measures
6. Analyze Numerical Data Using Summary
Measures
1 - 12

Data Presentation
Data
Presentation
Qualitative
Data

Numerical
Data

Summary
Table
Bar
Chart
1 - 13

Pie
Chart

Stem-&-Leaf
Display

Frequency
Distribution

Dot
Chart

Histogram

Presenting
Qualitative Data

1 - 14

Data Presentation
Data
Presentation
Qualitative
Data

Numerical
Data

Summary
Table
Bar
Chart
1 - 15

Pie
Chart

Stem-&-Leaf
Display

Frequency
Distribution

Dot
Chart

Histogram

Student
Specializations
Specialization |
Freq.
Percent
Cum.
---------------+---------------------------------HCI
|
9
39.13
39.13
IEMP
|
9
39.13
78.26
LIS
|
3
13.04
91.30
Undecided
|
2
8.70
100.00
---------------+---------------------------------Total
|
23
100.00

1 - 16

Student
Specializations
10
9
8
7
6
5

HCI

LIS

Undecided

IEMP

2
1
0
HCI

1 - 17

IEMP

LIS

Undecided

Undergrad Majors
UG major |
Freq.
Percent
Cum.
--------------------------+----------------------------------American Studies |
1
4.76
4.76
Cog Sci |
1
4.76
9.52
Comp Sci |
3
14.29
23.81
Economics |
3
14.29
38.10
English |
5
23.81
61.90
Environmental Engineering |
1
4.76
66.67
Graphic Design |
1
4.76
71.43
Math |
2
9.52
80.95
Mechanical Engineering |
1
4.76
85.71
Nutrition |
1
4.76
90.48
Sci and Tech Policy |
1
4.76
95.24
Telecommunications |
1
4.76
100.00
--------------------------+----------------------------------Total |
21
100.00

1 - 18

Favorite Colors
color |
Freq.
Percent
Cum.
------------+----------------------------------black |
2
8.70
8.70
blue |
12
52.17
60.87
green |
1
4.35
65.22
orange |
1
4.35
69.57
purple |
1
4.35
73.91
red |
5
21.74
95.65
white |
1
4.35
100.00
------------+----------------------------------Total |
23
100.00
1 - 19

Calculus Knowledge
integrals |
Freq.
Percent
Cum.
------------+----------------------------------1 |
3
13.04
13.04
2 |
1
4.35
17.39
3 |
11
47.83
65.22
4 |
6
26.09
91.30
5 |
2
8.70
100.00
------------+----------------------------------Total |
23
100.00

1 - 20

Presenting
Numerical Data

1 - 21

Data Presentation
Data
Presentation
Qualitative
Data

Numerical
Data

Summary
Table
Bar
Chart
1 - 22

Pie
Chart

Stem-&-Leaf
Display

Frequency
Distribution

Dot
Chart

Histogram

Student Age
(Reported) Data
Stem-and-leaf plot for age
2*
3*
4*
5*
6*
7*

|
|
|
|
|
|

1 - 23

22233444555777899
01257

Frequency
4
6

10

Histogram

20

1 - 24

30

40

age

50

60

70

Starting Salaries (in


$K)
3*
4*
5*
6*
7*
8*
1 - 25

|
|
|
|
|
|

8
000025
0000
0000005
5
0

Numerical Data
Properties

1 - 26

Thinking Challenge
$400,000
$70,000
$50,000
$30,000

... employees cite low


pay -- most workers
earn only $20,000.

$20,000

... President claims


average pay is $70,000!

1 - 27

Standard Notation
Measure
Mean
Stand. Dev.

Sample

Population

Variance

Size

1 - 28

Numerical Data
Properties
Central Tendency
(Location)
Variation
(Dispersion)

Shape
1 - 29

Numerical Data
Properties &
Measures
Numerical Data
Properties

Central
Tendency

Variation

Shape

Mean

Range

Median

Interquartile Range

Mode

Variance

Skew

Standard Deviation

1 - 30

Central Tendency

1 - 31

Numerical Data
Properties &
Measures
Numerical Data
Properties

Central
Tendency

Variation

Shape

Mean

Range

Median

Interquartile Range

Mode

Variance

Skew

Standard Deviation

1 - 32

Whats wrong with


this?
Measurements 1 4 2 9 8
Middle measurement is 2, so thats the
median
X i X1 X 2 X n
X i 1

n
n
1 4 2 9 8

5
24 / 5
1 - 33
4.8

Ages
Mean = 29
Median = 27
2*
3*
4*
5*
6*
7*

|
|
|
|
|
|

1 - 34

22233444555777899
01257

Summary of
Central Tendency
Measures

Measure
Equation
Mean
Xi / n
Median
(n+1) Position
2
Mode
none

1 - 35

Description
Balance Point
Middle Value
When Ordered
Most Frequent

Shape

1 - 36

Numerical Data
Properties &
Measures
Numerical Data
Properties

Central
Tendency

Variation

Shape

Mean

Range

Median

Interquartile Range

Mode

Variance

Skew

Standard Deviation

1 - 37

Shape
1.

Describes How Data Are Distributed

2.

Measures of Shape

Skew = Symmetry

Left-Skewed
Mean Median Mode

1 - 38

Symmetric
Mean = Median = Mode

Right-Skewed
Mode Median Mean

Variation

1 - 39

Numerical Data
Properties &
Measures
Numerical Data
Properties

Central
Tendency

Variation

Shape

Mean

Range

Median

Interquartile Range

Mode

Variance

Skew

Standard Deviation

1 - 40

Quartiles
1. Measure of Noncentral Tendency
2. Split Ordered Data into 4 Quarters
25%

25%
Q1

25%
Q2

3. Position of i-th Quartile

25%
Q3

i (n 1)
Positioning Point of Qi
4

1 - 41

Ages
Range
Quartiles
2*
3*
4*
5*
6*
7*

|
|
|
|
|
|

1 - 42

22233444555777899
01257

Box Plots - Age and


Salary
Quartiles: 41K, 50K, 60K
Inner fences: ??
Outer fences: ??

1 - 43

40,000

20

50,000

40

60,000

60

70,000

80

80,000

Quartiles: 24, 27, 30


Inner fences: (15,39)
Outer fences: (6, 48)

Variance &
Standard Deviation
1. Measures of Dispersion
2. Most Common Measures
3. Consider How Data Are Distributed
4. Show Variation About Mean ( X or )
X = 8.3
4 6
1 - 44

8 10 12

Sample Variance
Formula
n

(X i X)
i 1

n 1
2

n - 1 in denominator!
(Use N if Population
Variance)

(X1 X) (X 2 X) (Xn X)

1 - 45

n 1

Equivalent Formula
n

xi x

s i 1
2

n 1

2
2
xi 2 xi x x
i 1

n 1

2
2
2
2
xi 2 xi x x
xi 2 x xi n x

n 1

2
2
xi 2 x n x n x

n 1

1 - 46

n 1

2
2
xi n x

n 1

Another Equivalent
Formula
2
2
2 xi n x
s

n 1

1 - 47

xi

2
xi n

n
n 1

2 xi
xi

n 1

Empirical Rule
If x has a symmetric, mound-shaped
distribution
Pr xi 32%

Pr xi 2 5%

Pr xi 3 0.3%
Justification: Known properties of the normal
distribution, to be studied later in the course
1 - 48

Preview of
Statistical Inference
You observe one data point
Make hypothesis about mean and standard
deviation from which it was drawn
Empirical Rule tells you how (un)likely the data
point is

If very unlikely, you are suspicious of the


hypothesis about mean and standard deviation,
and reject it

1 - 49

Summary of
Variation Measures
Measure
Range
Interquartile Range

Equation
Description
Xlargest - Xsmallest Total Spread
Q 3 - Q1

Standard Deviation
(Sample)

Standard Deviation
(Population)

Xi

Variance
(Sample)
1 - 50

Spread of Middle 50%


2

Dispersion about
Sample Mean

Dispersion about
Population Mean

n 1
X
N

(Xi - X )2
n-1

Squared Dispersion
about Sample Mean

Z-scores
Number of standard deviations from the
mean
xi
zi

1 - 51

Conclusion
1.
2.
3.

Described Qualitative Data Graphically


Described Numerical Data Graphically
Created & Interpreted Graphical Displays

4. Explained Numerical Data Properties


5. Described Summary Measures
6. Analyzed Numerical Data Using Summary
Measures
1 - 52

Вам также может понравиться