Академический Документы
Профессиональный Документы
Культура Документы
FALL 2018
Deductive
(logic based on known properties)
Sample
Population
Inductive
(logic based on observed instances)
POPULATION SAMPLE
Size Large Small
Size Notation N n
Easy to collect data? No Yes
Term used to describe A “parameter” A “statistic”
its nature
e.g., μ, σ e.g., x, s
POPULATION SAMPLE
Mean (notation) μ x
Std Deviation σ s
(notation)
Mean (formula)
x
x
x
N n
Variance (formula)
(x ) 2
s2
(x x) 2
2
n 1
N
Involves:
- taking a small sample from a larger set (Sampling)
- analyzing data from the small sample (Data analysis)
- testing the hypotheses to ascertain if true (Hypothesis Testing)
- making conclusions about the larger set (Statistical Inference)
- presenting your findings to an audience (Information Delivery)
where
in order to
Sample
Population
POPULATION SAMPLE
Parameters: μ, σ Statistics: x, s,
STAT - 835: Probability and Statistics 28
Back to “Important Questions, #1”
SAMPLE SAMPLE
SAMPLE SAMPLE
Disadvantage:
Relatively more preparation time is needed to calculate the
proportions of each group in the population, and therefore
determination of their proportions in the sample
STAT - 835: Probability and Statistics 36
Combinations of the 3 major methods of random sampling.
Descriptive Inferential
Graphical Non-graphical
Scaled Figures, Central Tendency Point Estimation
Dot Plots Dispersion/ Variance Hypothesis Testing
Scatter Plots Range Confidence Interval
Box Plots Shape Statistical Regression
Stem-and-leaf Plots
Bar Charts/Histograms
39
Descriptive Statistics
◦ Statistical procedures used to summarise,
organise, and simplify data. This process
should be carried out in such a way that
reflects overall findings
Raw data is made more manageable
Raw data is presented in a logical form
Patterns can be seen from organised data
Frequency tables
Graphical techniques
Measures of Central Tendency
Measures of Spread (variability)
Mean:
Sum of all measurements divided by the number
of measurements.
Median:
A number such that at most half of the
measurements are below it and at most half of the
measurements are above it.
Mode:
The most frequent measurement in the data.
y
y i
yy y i
i 1
i y1 y2 yn
Sum
11-Jan 52
442
y
y i
442
44.2
n 10
Notice that every single observation intervenes in the computation
of the mean.
STAT - 835: Probability and Statistics 45
Median
The median represents the middle of the
ordered sample data
When the sample size is odd, the median
is the middle value
When the sample size is even, the median
is the midpoint/mean of the two middle
values
42 43
Median 42.5
2
STAT - 835: Probability and Statistics 47
Mode
The mode is the value that occurs most
frequently
It is the least useful (and least used) of the
three measures of central tendency
mode = 32
STAT - 835: Probability and Statistics 49
Another Example of Mode
range = 60 – 32 = 28
STAT - 835: Probability and Statistics 58
Variance and standard deviation
The variance s2 is the sum of the squared
deviations from the mean divided by the number
of cases minus 1
iy y 2
s2
n 1
The standard deviation s is the square root of
the variance
iy y 2
s
n 1
It is a measure of “spread”
Steps:
◦ Compute each deviation
◦ Square each deviation
◦ Sum all the squares
◦ Divide by the data size (sample size) minus
one: n-1
iy y 2
931.60
iy y 2