Вы находитесь на странице: 1из 23

STATISTICS 13

Lecture 25
Jun 2, 2010
(Last one, yeah!!!)
Difference between Two Means: Two
Independent Samples
• Goal: compare the means of two populations :
– whether they are different (two-sided test); or
– whether one is larger than another (one-sided test)
• Data for this comparison
– a random sample of size n1 from population A with
mean μ1 and variance σ12
– a random sample of size n2 from population B with
mean μ2 and variance σ22
• The hypotheses of interest
H0: μ1 – μ2 = 0 versus Ha: μ1 – μ2 ‡0
(one-sided: Ha: μ1 – μ2 >0 or Ha: μ1 – μ2 <0)
(we will not do one sided)
Testing The Difference
Between Two Means
• Test statistic :
H 0 : µ1 − µ 2 = 0 versus
H a : one of the three alternativ es
x1 − x2
Test statistic : z =
s12 s22
+
n1 n2
with critical value and/or p - values
based on the standard normal distributi on.
( )
note : if σ 1 (σ 2 ) is known, the true value should be used in place of s1 s2
Example: Diary Intake
• Is there any Avg Daily Men Women
Intakes
difference in average
Sample size 50 50
daily intakes of dairy
products by men and Sample mean 756 762
women ? Use α = 0.05.
Sample Std Dev 35 30
Example: Diary Intake
(Cont.)
• p-value approach:

• Critical value approach:


Should we live a moral life?
• Statistics can help you answer this!
• If there is God and you live a moral life,
your gain is + infinity (go to heaven)
• If there is God but you don’t live a moral
life, your gain is – infinity (go to hell)
• If there is no God and you live a moral
life, your gain is – Constant1
• If there is no God and you don’t live a
moral life, your gain is + Constant2
Should we live a moral life?
• Let P(there is God)=p
• Let X be the gain if you live a moral life
• Let Y be the gain if you don’t
• E(X) = ∑xP(X=x)
• = + infinity p + (-Constant1) (1-p)
• = + infinity
• E(Y) = - infinity p + (+Constant2) (1-p)
• = - infinity
• Conclusion?
Part I: Summarize Data
 Type of variable: Qualitative vs. Quantitative
(discrete vs. continuous)
 Summarize data in a statistical table (frequency,
relative frequency and percentage)
 Graphical description of quantitative data
– Bar and Pie charts (for categorized quantitative
variables)
– Dotplot
– (Relative) frequency histogram: number of classes and
class width; interpretation
Part I: Summarize Data
(Cont.)
 Shape of the data
-Bell shaped: symmetric; Gaussian distributions
-Skewed: right or left
-Bimodal: indicating two distinct subpopulations
 Measures of center
– Mean: properties (linearity)
– Median: more robust to extreme values (e.g.,
for skewed distributions)
– Mode
Part I: Summarize Data
(Cont.)
 Measures of variability
– Range=max-min
- IQR:= Q3-Q1; more robust than range
– Variance: properties (nonnegative) and Standard
deviation: properties (nonnegative)
 Empirical Rule: 68%, 95%, 99.7%; coming from Gaussian
distributions; only applies for bell-shaped distributions
 Tchesheff’s theorem: applies for any distributions
 z-scores: measure of relative standing
 Percentiles: e.g., median—50% percentile
 Quartiles (Q1, Q3) and IQR:
IQR = Q3 – Q1
Part I: Summarize Data
(Cont.)
 Boxplots: five number summary (min, Q1, median, Q3, max);
fence (low=Q1-1.5*IQR; up=Q3+1.5*IQR); outliers: points
out of fence
 Graphs of bivariate qualitative data
-side-by-side bar charts and pie charts
 Graphs of bivariate quantitative data
- Scatterplot: form (linear; curvy; random),
direction (positive; negative) & strength (strong;
weak)
-Interpretation of scatterplot
Part I: Summarize Data
(Cont.)
 Numerical measure of linear association between two variables :
correlation coefficient r
--interpretation: r>0, r<0;r≈0,-1,1
-- properties: |r|≤1
--when to use (when there is a linear association; could be
misleading at the presence of outliers and/or curvilinear)
 Linear regression
--fitting: regression line of y on x: y=a+bx; formula for a (intercept)
& b (slope)
--prediction and explanation: predict the average value for y
corresponding to each value of x
--regression effect: With each increase of one SD in x there is an
increase of only r SDs in y, on the average
Part II: Basic Probability
 Probability: Basic Concepts
– Experiment
– Sample space S: the set of all possible
outcomes
– Simple events (outcomes) : all simple events are
mutually exclusive
– Events: subsets of S; an event occurs if and
only if the outcome of the experiment is in
that event
 Sample space with equally likely simple events
Part II: Basic Probability
(Cont.)
 Counting:
-Multiplicative rule
-Permutations: The number of ways you can choose r objects
in order from n distinct objects : n!/r!
-Combinations: The number of ways you can choose r objects
without regarding order from n distinct objects : n!/(r!*(n-
r)!)
 Event relations: interpretation & diagram
-Union: A ∪ B (either A or B )
-Intersection: A ∩ B (both A and B)
-Complement: AC (not A)
-mutually exclusive: A ∩ B=Ø (never occur simultaneously)
Part II: Basic Probability
(Cont.)
 Calculation of probability
--For union: additive rule: P(A ∪ B )=P(A)+P(B)-P(A ∩ B )
--For complement: one minus rule: P(AC)=1-P(A)
--For intersection: multiplication rule:
P(A∩ B)=P(A|B)P(B)=P(B|A)P(A)
 Conditional probability: P(A|B)= P(A ∩ B )/P(B)
 Independence
--three equivalent definitions: P(A ∩ B)=P(A)P(B);
P(A|B)=P(A); the occurrence of A (or B) won’t affect the
probability of B (or A)
Part II: Basic Probability
(Cont.)
 Random variables
--definition: mapping from S to R
--probability distribution for discrete random variables:
p(a)=P(X=a); p(a) in [0,1] and sum up to 1
--Mean and variance of discrete random variables
 Binomial random variables
--definition: number of successes in n independent, identical,
two outcomes trials
--probability distribution
--mean and variance: X ~ Bin(n,p), E(X)=np, Var(x)=np(1-p)
Part II: Basic Probability
(Cont.)
 Poisson random variables
-- definition: model the number of rare events
--probability distribution
--mean and variance: X ~ Poisson(µ), E(X)=Var(X)=µ
 Continuous random variables
--Probability density function/density curve
--Area under the density curve: P(a<X<b)
--Cumulative distribution function: F(a)=P(x≤a); in
between [0,1]; non-decreasing.
Part II: Basic Probability
(Cont.)
 Normal random variables
--Probability density function
--mean and standard deviation: X~ N(µ,σ), E(x)=µ, Var(X)=σ2
--Standardization: z=(x-µ)/σ ~ N(0,1)
--Standard normal distribution: N(0,1)
--Standard normal table (table 3)
--calculate normal probabilities: P(X<a), P(X>a), P(a<X<b)
--find percentiles of a normal distribution: e.g, 70th
percentile of X
Part III: Estimation and Testing

 Basic Concept
--Population and sample: whole vs. part
--Parameter vs. statistics
--Statistical inferences: estimation & testing

 Sampling:
--Simple Random Sampling: all possible samples of
the same size are equally likely
--Sampling Distribution of a Statistic
Part III: Estimation and
Testing (Cont.)
 Sampling Distribution of Sample Mean & sample
proportion
--Mean and standard error
--CLT: sum/mean of independent, identical random
variables has an approximate normal distribution
 Point Estimation
--unbiasedness: mean of estimator=parameter
--error in estimation: SE—standard deviation of
the estimator
--margin of error: 1.96XSE
Part III: Estimation and
Testing (Cont.)
 point estimation
--mean and proportion: sample mean and sample
proportion
--difference between two means and two
proportions
• Interval estimation
--confidence coefficient 1-α
--construction and interpretation
--confidence intervals for mean and proportion
and differences
Part III: Estimation and
Testing (Cont.)
 Hypothesis testing
--null and alternatives
--test statistics, p-values and rejection region
--two types of errors
--significance level α—maximum allowable type one
error rate of a testing procedure
--making conclusions: reject null or not
Part III: Estimation and
Testing (Cont.)
 Test statistic z: standardize the point estimator
under the null and then use standard normal table
-- critical value approach vs. p-value approach
--interpretation and reports
 For
--population mean
–-difference between two means
 (we did not cover:
-- one-side tests
--population proportion
--difference between two proportions)

Вам также может понравиться