Академический Документы
Профессиональный Документы
Культура Документы
in Statistics
Top Ten #1
Descriptive Statistics
Mean
Median
Mode
Mean
Median
Mode
Ex: 1,1,2,3,5,8
Mode = female
Mode = 1
Relationship
Relationship contd
Dispersion Measures of
Variability
Range
Variance
Standard deviation
Range
Empirical Rule
Standard Deviation =
Square Root of Variance
(x x)
n 1
xx
( x x )2
6-8=-2
(-2)(-2)= 4
6-8=-2
7-8=-1
(-1)(-1)= 1
8-8=0
13
13-8=5
(5)(5)= 25
Sum=40
Sum=0
Sum = 34
Mean=40/5=8
Standard Deviation
Total variation = 34
Sample variance = 34/4 = 8.5
Sample standard deviation =
square root of 8.5 = 2.9
5.30
n 1
5 1
5 1
2
Standard deviation:
s2
5.30 2.30
Graphical Tools
Top Ten #2
Hypothesis Testing
Population mean=
Population proportion=
A statement about the value of a population
parameter
Never include sample statistic (such as, xbar) in hypothesis
One-Tailed Tests
A test is one-tailed when the alternate
hypothesis, H1 or HA, states a direction, such as:
H1: The mean yearly salaries earned by full-time
Two-Tail Alternative
Two-Tailed Tests
A test is two-tailed when no direction is
specified in the alternate hypothesis
H1: The mean amount of time spent for the
H0 : = 80
HA: > 80
If test statistic =2.2 and critical value = 1.96,
reject H0, and conclude that the population
mean is likely > 80
If test statistic = 1.6 and critical value = 1.96,
do not reject H0, and reserve judgment about
H0
H0 false
Reject H0
Alpha = =
P(type I error)
1 (Correct
Decision)
Do not reject H0
1 (Correct
Decision)
Beta = =
P(type II error)
H0 : = 80
HA: > 80
If p-value = 0.01 and alpha = 0.05, reject H0,
and conclude that the population mean is
likely > 80
If p-value = 0.07 and alpha = 0.05, do not
reject H0, and reserve judgment about H0
Test Statistic
X
z
/ n
Example
The processors of Best Mayo indicate on the
label that the bottle contains 16 ounces of
mayo. The standard deviation of the process
is 0.5 ounces. A sample of 36 bottles from last
hours production showed a mean weight of
16.12 ounces per bottle. At the .05
significance level, can we conclude that the
mean amount per bottle is greater than 16
ounces?
Example contd
1. State the null and the alternative hypotheses:
H0: = 16,
H1: > 16
2. Select the level of significance. In this case,
we selected the .05 significance level.
3. Identify the test statistic. Because we know the
population standard deviation, the test statistic is z.
4. State the decision rule.
Reject H0 if |z|> 1.645 (= z0.05)
Example contd
5. Compute the value of the test statistic
X 16.12 16.00
z
1.44
n
0.5 36
6. Conclusion: Do not reject the null hypothesis.
We cannot conclude the mean is greater than 16
ounces.
Top Ten #3
Confidence Interval
A confidence interval is a range of values within
which the population parameter is expected
to occur.
Normal population
Sample size > 30
x
n
Normal Table
Example
490
2
1.96
10 0.56
49
49
Another Example
One of SOM professors wants to
estimate the mean number of hours
worked per week by students. A sample
of 49 students showed a mean of 24
hours. It is assumed that the population
standard deviation is 4 hours. What is
the population mean?
4
X 1.96
24.00 1.96
n
49
24.00 1.12
x
n
t n1
s
n
Confidence Interval:
Proportion
Use if success or failure
(ex: defective or not-defective,
satisfactory or unsatisfactory)
Normal approximation to binomial ok if
(n)() > 5 and (n)(1-) > 5, where
n = sample size
= population proportion
NOTE: NEVER use the t table if proportion!!
Confidence Interval:
Proportion
p(1 p)
pz
n
Ex: 8 defectives out of 100, so p = .08 and
n = 100, 95% confidence
(0.08)(.92)
.08 1.96
.08 .05
100
Confidence Interval:
Proportion
A sample of 500 people who own their house
revealed that 175 planned to sell their homes
within five years. Develop a 98% confidence
interval for the proportion of people who plan to
sell their house within five years.
175
p
0.35
500
(.35)(. 65)
.35 2.33
.35 .0497
500
Interpretation
Width of Interval
Top Ten #4
Linear Regression
Linear Regression
y b0 b1 x
Regression equation:
=dependent variable=predicted value
y
x= independent variable
b0=y-intercept =predicted value of y if x=0
b1=slope=regression coefficient
=change in y per unit change in x
Slope vs Correlation
Example
Given Data
x
48
52
33
Totals
x
48
52
33
Sum=7
Sum=133
n=3
Intercept (b0)
b y b x
0
7
x
2.33
n
3
y 133
n
44.33
Regression Equation
y 59.5 6.5x
y forecast b0 b1 x
error y y
SSE
( y y )
S
n2
n2
48
(3) y = (4)=
59.5(2)-(3)
6.5x
46.5
1.5
2.25
52
53
-1
33
33.5
-.5
.25
(1)=x
(2)=y
( y y )2
SSE=3.5
3.5
S
3.5 1.9
3 2
Actual salary typically $1,900
away from expected salary
Coefficient of Determination
Coefficient of Determination
R2 = SSR
SST
R2 = 197 = .98
200.5
Interpretation: 98% of total variation in salary
can be explained by variation in number of
children
0 < R2 < 1
R=Correlation Coefficient
R R
Our Example
Slope = b1 = -6.5
R2 = .98
R = -.99
Caution
Extreme Values
MS Excel Output
Correlation Coefficient (-0.9912): Note
that you need to change the sign because
the sign of slope (b1) is negative (-6.5)
Coefficient of Determination
Standard Error of Estimate
Regression Coefficient
Top Ten #5
Expected Value
Expected Value
Example
Step 1: 11+80+5=96
Step 2
x
P(x)
x P(x)
17
11/96=.115
17(.115)=1.955
18
80/96=.833
18(.833)=14.994
19
5/96=.052
19(.052)=.988
E(x)= 17.937
Top Ten #6
Binomial Is Discrete
Integer values
0,1,2,n
Binomial is often skewed, but may be symmetric
Normal Distribution
t Distribution
Normal or t Distribution?
Top Ten #7
P-value
P-value
H0: = 40
HA: > 40
Sample mean = 43
P-value = P(sample mean > 43, given H0 true)
Meaning: probability of observing a sample
mean as large as 43 when the population mean
is 40
How to use it: Reject H0 if p-value <
(significance level)
Two Cases
Suppose = .05
Case 1: suppose p-value = .02, then reject H0
(unlikely H0 is true; you believe population mean
> 40)
Case 2: suppose p-value = .08, then do not
reject H0 (H0 may be true; you have reason to
believe that the population mean may be 40)
Top Ten #8
No Variation
High Variation
Uncertainty, unpredictable
High standard deviation
Ex #1: Workers in downtown L.A. have variation
between CEOs and garment workers
Ex #2: New York temperatures in spring range
from below freezing to very hot
Comparing Standard
Deviations
Temperature Example
Beach city: small standard deviation (single
temperature reading close to mean)
High Desert city: High standard deviation (hot
days, cool nights in spring)
Sampling Distribution
Example
Top Ten #9
Population
Sample
Qualitative
Categorical data:
success vs. failure
ethnicity
marital status
color
zip code
4 star hotel in tour guide
Qualitative
Quantitative
Two cases
Case 1: discrete
Case 2: continuous
Discrete
(1) integer values (0,1,2,)
(2) example: binomial
(3) finite number of possible values
(4) counting
(5) number of brothers
(6) number of cars arriving at gas station
Continuous
Graphical Tools
Hypothesis Testing
Confidence Intervals
Quantitative: Mean
Qualitative: Proportion