Академический Документы
Профессиональный Документы
Культура Документы
Quantitative Methods 3 (EBS2001), 2018-2019, lecture 1 The intuition behind hypothesis testing: flipping a coin
Today: Preparing Cases 1 & 2 Þ throw the coin e.g. 20 times, and conclude!
- Sharpe 3rd ed., Chapters 4, 9-14, 20 (relevant parts)
- Some additional whistles and bells
Þ the resulting number of heads depends on chance
1st half QM1: descriptive statistics - if the coin is fair, we expect around 10 heads
2nd half QM1, QM2: inferential statistics “from a sample, what can we conclude about the population?” - if the coin is fair, we may easily obtain 11 or more heads
- if the coin is fair, we may even obtain 13 or more heads
- if the coin is fair, obtaining 15 or more heads seems unlikely
A simple example of inferential statistics
Typical lingo: null and alternative hypothesis, confidence & significance level, Basic idea: the more unlikely or surprising our result, the more doubt about the coin being fair
P-value, Type I & II error, etc…
3 4
Step 2 Model: specify the test statistic and its sampling distribution Step 2 Model: specify the test statistic and its sampling distribution
- the # heads h, our test statistic, is a random number
- if H0 is true, then h ~ binomial with n = 20 and p = 0.5 Þ expected value (mean) of h equals 10
P ( h ³ 11 ) = 0.412
P ( h ³ 13 ) = 0.132
P ( h ³ 15 ) = 0.021
Step 3 Mechanics & Conclusion: determine whether the test statistic is unlikely/surprising Step 3 Mechanics & Conclusion: determine whether the test statistic is unlikely/surprising
a) Reject H0 if h ³ 11 if H0 true, 41.2% risk to reject it “by accident” ® too risky decision rule!
b) Reject H0 if h ³ 13 if H0 true, 13.2% risk to reject it “by accident” ® better
c) Reject H0 if h ³ 15 if H0 true, 2.1% risk to reject it “by accident” ® safe rule
Hypothesis testing always involves the same three steps: The intuition behind confidence intervals: flipping a coin
1) Formulate opposing hypotheses about the population Imagine: a sample of 100 flips, with 53 heads
53
2) Specify the test statistic and its sampling distribution Þ pˆ H = = 0.53 as our point estimate for pH
100
the test statistic - is a random number
- has a known sampling distribution if H0 is true Þ in view of this sample outcome,
- we observe one “draw” from this distribution pH = 0.5 seems reasonable
pH = 0.55 seems reasonable
3) Determine whether the test statistic is unlikely / surprising pH = 0.4 does not seem reasonable
option a) the critical value approach (see above) (if true, then 53 heads would be very extreme!)
option b) the P-value approach (see later on) emphasized by Sharpe
Formalization: 95% Confidence Interval for pH, [ 0.432, 0.628 ]
“on the basis of the sample outcome, we are 95% confident that the true pH
lies somewhere between 0.432 and 0.628”
7 8
Today’s basic example: a (small) part of Case 1) Some prior data considerations
- a survey among a sample of 1st year Q4_1 •• Day Three basic types of data:
SBE students, E&BE, 2 – 2011 Q4_2 Birthday •• Month
i) cross-section n objects or subjects, measured at same time (Cases 1, 2 & 3)
Q4_3 •••• Year
- n = 173 respondents, many variables Q5 What is your gender? • 1 • 2 ii) time series one object or subject, measured during T successive time periods (Cases 4 & 5)
Male Female
• 1 Dutch iii) panel data n objects or subjects, measured during T successive time periods (Case 6)
- we consider a subset of 7 variables Q6 What is your nationality? • 2 German (combines i and ii)
• 3 other
(i.e. questions, see alongside)
Please rate on a 1 to 5 scale, where 1 means Note: our main focus is on type i) !
“very unimportant” and 5 “very important”.
I chose this university… 1 2 3 4 5 Four different scales of measurement (“NOIR”):
Q9_6 Because of the city of Maastricht. • • • • • Qualitative 1) nominal “red, white, blue” Q5, Q6
Q9_7 Because of student life here. • • • • • 2) ordinal “small, medium, large” Q9_6, Q9_7
Q6 Nationality
8) Regression analysis “to what extent can a quantitative variable be explained
Note: Valid Cumulative by a (number of) other quantitative variable(s)?”
Frequency Percent Percent Percent
- all variables have “missing values”
Valid Dutch 66 38.2 40.0 40.0 Which tool to use depends a.o. on your research questions and your measurement scale
- frequencies also interesting German 70 40.5 42.4 82.4
for Q4_1, Q4_2 and Q4_3 Other 29 16.8 17.6 100.0 - tool 1) discussed extensively today (by hand & SPSS)
Total 165 95.4 100.0 - tools 2)-7) discussed by analogy today (only SPSS, after the break)
(to check for “strange answers”)
Missing 9 8 4.6
Total 173 100.0 - tool 8) discussed next lecture
11 12
1) One-Sample T Test The sampling distribution of y
Research issue: does the mean of a quantitative variable differ from a specified value? If we have - observations that are independent
Null hypothesis: H0: μ = μ0 - either a normally distributed variable, or a large sample (say n > 30)
Assumptions: - independence assumption (the sample data are independent of each other)
Then it can be shown that:
- normal population assumption (the variable of interest is normally distributed)
(or: large sample, say n > 30) 1) the expected value of y equals the true μ “on average, we hit our target”
Sharpe: chapter 11
2) y fluctuates around the true μ with standard deviation
Application: Q9_7, “… because of student life here” σ
SD( y ) = σ = population standard deviation of y
n
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation 3) y has a normal distribution
Q9_7 Student life 171 1 5 2.68 1.087
Valid N (listwise) 171 σ
Þ y ~ Normal, with mean μ, st. dev. SD( y ) =
n
Research issue: the population mean, μ
y -μ σ
Þ ~ Standard Normal , SD( y ) =
Point estimate: the sample mean, here: y = 2.68 SD( y ) n
but: different sample ® different outcome
Þ y fluctuates randomly between samples
Þ y obeys a sampling distribution !
Þ what does our single “draw” for y tell us about μ ?
13 14
y -μ σ Example 1
Þ ~ Standard Normal , SD( y ) =
SD( y ) n
Null hypothesis: H0: μ = 3 (“neutral value”)
- problem: we don’t know the population standard deviation σ
Alternative hypothesis: Ha: μ ¹ 3 (two-sided)
- use the sample standard deviation s as point estimate for σ, to get the standard error of y :
“in the population, the average importance of motive ‘student life’ is 3”
s
SE ( y ) = s = sample standard deviation of y vs. “the average importance differs from 3”
n
point estimate - hypothesiz ed value y - μ0 - the standard normal distribution is centered around 0
t= =
standard error SE ( y ) Þ if H0 is true, we expect a test statistic “close to” 0
if H0 true, t obeys a t-distribution with n–1 df Þ “far enough” from 0? Þ values “far above” or “far below” 0 make H0 suspect
Þ reject H0
Tool 2: 100(1–α)% Confidence Interval for μ Þ is the actual value of –3.85 “far enough” from 0 ?
point estimate ± critical value * standard error = [ y ± tα / 2 × SE ( y )] (hardly used in QM3) Note: both positive and negative values for the test statistic cast doubt on the null !
15 16
Option a) The critical value approach Option b) The P-value approach
Rejection region: z > za/2 “if the null were true, then what would be the probability
to observe a test statistic so contradictory to it?”
α = significance level = Type I Error risk
Table A-63: P ( z ³ 3.85 ) = 1 – 0.9999 = 0.0001
E.g. α = 5% Þ Table A-59: z0.025 = 1.96
“if the null were true, then 5% chance to observe Þ P-value around 0.02% (two-sided!)
a test statistic of ³1.96, or £–1.96 ”
“if the null were true, there would be 0.02% chance
to observe a test statistic of ³3.85, or £ –3.85 ”
3.85 > 1.96 Þ reject H0 at 5% significance
Type I error risk is 5% Þ reject the null at any reasonable significance level
“the average score is significantly different from 3”
“the lower the P-value, the more evidence against the null! ”
Note: this time, only positive values for the test statistic cast doubt on the null !
Note: inspect P-value Þ conclusion !
19 20
2.17 > 1.645 Þ reject H0 at 5% significance Assumptions: - independence assumption (in each sample, the data are independent of each other)
- independent groups assumption (the samples are independent rather than paired)
Option b) The P-value approach - normal populations assumption (in each population, the data are normally distributed)
(or: both samples are large, say n1, n2 > 30)
“if the null were true, then what would be the probability
to observe a test statistic so contradictory to it?” ( y1 - y 2 ) - Δ 0 s12 s22
Test statistic: t= with SE ( y1 - y 2 ) = + general version
SE ( y1 - y 2 ) n1 n2
Table A-63: P ( z ³ 2.17 ) = 1 – 0.9850 = 0.0150
1 1
Þ P-value = 1.5% (one-sided!) or SE ( y1 - y 2 ) = s pooled + pooled version, “equal variances”
n1 n2
“if the null were true, then 1.5% chance to observe a test statistic ³ 2.17 ”
Distribution: t-distribution Þ standard normal with large samples
Þ very much evidence against H0
“the lower the P-value, the more evidence against the null! ” Sharpe: sections 13.1-13.5
BREAK
21 22
Application: quantitative variable: “birthdate” (defined from Q4_1/2/3; see case, item b)
subpopulations: Q5, “gender” Normally: inspect P-value Þ conclusion
23 24
3) One-Way ANOVA Application: quantitative variable: “birthdate”
subpopulations: Q6, “nationality”
Research issue: does the mean of a quantitative variable differ between more than two
(sub-)populations? Null hypothesis: H0: μ1 = μ2 = μ3 (k = 3)
Alternative hypothesis: Ha: at least one mean differs from the others
Null hypothesis: H0: μ1 = μ2 = … = μk (k subpopulations)
“the average age of Dutch, Germans and others is the same”
Assumptions: - independence assumption (in each sample the data are independent of each other; vs. “for at least one group, the average age is different”
the group samples are independent)
- equal variance assumption (in each population, the variance is the same)
SPSS: Analyze > Compare Means > One-Way ANOVA
- normal populations assumption (in each population, the data are normally distributed) choose “birthdate” as Dependent, Q6 as Factor; tick Descriptive under Options
Descriptives
SSTr /(k - 1)
Test statistic: F = Birthdate 95% Confidence
SSE /(N - k ) Interval for Mean
Std. Std. Lower Upper
- SSTr : variability between subpopulations N Mean Deviation Error Bound Bound Minimum Maximum
- SSE : variability within subpopulations Dutch 66 1991.26 2.010 .247 1990.77 1991.76 1983.57 1992.84
German 67 1990.02 1.474 .180 1989.66 1990.38 1986.49 1992.67
Other 29 1990.51 1.658 .308 1989.88 1991.14 1984.95 1992.86
Distribution: F-distribution Total 162 1990.62 1.823 .143 1990.33 1990.90 1983.57 1992.86
27 28
Null hypothesis: H0: μ1 = μ2 (i.e. Δ0 = 0) - 2nd table: correlation, see technique 5) below
Alternative hypothesis: Ha: μ1 ¹ μ2 (two-sided) - 3rd table: the actual test
“on average, the motives ‘city of Maastricht’ and ‘student life’ are equally important” e.g.: 0.275 = 2.96 – 2.68
vs. “they are not equally important” 0.085 = 1.106 / Ö171
3.249 = 0.275 / 0.085
SPSS: Analyze > Compare Means > Paired-Samples T Test
170 = n–1
choose Q9_6 and Q9_7 as Paired Variables
Paired Samples Statistics inspect P-value Þ conclusion
Mean N Std. Deviation Std. Error Mean
Pair 1 Q9_6 City of Maastricht 2.96 171 1.210 .092 P-value = 0.001 or 0.1%
Q9_7 Student life 2.68 171 1.087 .083
Þ very strong, evidence that the means differ
Paired Samples Correlations
N Correlation Sig.
Pair 1 Q9_6 City of Maastricht
171 .540 .000
& Q9_7 Student life
Corr. coefficient: measures degree of linear relation between x1 and x2 SPSS: Analyze > Correlate > Bivariate
choose Q9_6 and Q9_7 as Variables; tick Pearson (default)
å ( xi - x )( y i - y )
r =
å ( xi - x ) * å ( y i - y )
2 2
Correlations
Q9_6 City of Q9_7
Maastricht Student life
- –1 £ r £ 1 r = –1 : perfect negative relation Q9_6 City of Pearson Correlation 1 .540**
r = +1 : perfect positive relation Maastricht Sig. (2-tailed) .000
r = 0 : no linear relation N 171 171
Q9_7 Student life Pearson Correlation .540** 1
- r is scale-independent
Sig. (2-tailed) .000
- the role of both variables in r is symmetric N 171 171
- no distinction dependent vs. independent **. Correlation is significant at the 0.01 level (2-tailed).
- the relation need not be causal
Note: - r = 0.540 : pretty strong positive relation
Null hypothesis: H 0: ρ = 0 (population vs. sample correlation)
- P-value = 0.000
Assumptions: normality assumption (both variables are normally distributed) Þ very strong evidence that ρ differs from 0
31 32
6) Chi-square Test for Goodness of Fit
- alternative : Spearman’s rank correlation
- if variables are ordinal rather than quantitative (here?) Research issue: does the frequency distribution of a nominal variable differ from a specified distribution?
- if we doubt whether they are normally distributed
- to measure the strength of monotonic but nonlinear relations Null hypothesis: H0: p1 = …, p2 = …, pk = … (k categories)
- SPSS: same menu, tick Spearman, not Pearson
Assumptions: - independence assumption (the counts should be independent)
- see extra text “Spearman’s Rank Correlation” (available on QM3 course pages)
- sample size assumption (all Expi ≥ 5, i = 1,….,k)
Application: nominal variable Q6, “nationality” How have these numbers been determined?
Null hypothesis: H0: p1 = 0.40, p2 = 0.40, p3 = 0.20 - observed frequencies: if H0 true, we expect:
Alternative hypothesis: Ha: at least one share is different Obs1 = 66 Exp1 = n*p1 = 165*0.40 = 66.0
Obs2 = 70 Exp2 = n*p2 = 165*0.40 = 66.0
“in the 1st year student population, 40% is Dutch, 40% is German, 20% has another nationality”
Obs3 = 29 Exp3 = n*p3 = 165*0.20 = 33.0
vs. “the shares of the three nationalities are different”
Þ are the observed frequencies “far away” from the expected ones?
SPSS: Analyze > Nonparametric Tests > Legacy Dialogs > Chi-square
note the analogy with the coin-flipping example (multinomial vs. binomial)
choose Q6 as Test Variable
under Expected Values, tick Values ; indicate the shares using Add (66 - 66.0)2 (70 - 66.0)2 (29 - 33.0)2
Þ c2 = + + = 0.727
Q6 Nationality 66.0 66.0 33.0
Observed N Expected N Residual
Dutch 66 66.0 .0 - if H0 is true, the χ2 test statistic has a χ2-distribution with 2 df
German 70 66.0 4.0
Other 29 33.0 -4.0 provided that all expected frequencies ³ 5 (see Sharpe p. 443)
Total 165
- the χ2 test statistic is always ³ 0 (squares!)
Test Statistics
Q6 Nationality Þ if H0 true, the observed and expected frequencies should be close
Chi-Square .727a Þ the test statistic should be “close to” 0
df 2
Þ values “far above” 0 make H0 suspect ® reject
Asymp. Sig. .695
a. 0 cells (0.0%) have expected frequencies less than Þ is actual value of 0.727 “far enough” above 0?
5. The minimum expected cell frequency is 33.0.
35 36
Option a) The critical value approach: χ 2 > χ α2 ? 7) Chi-square Test of Independence (or “Homogeneity”)
α = 5% Þ χ 02.05 = 5.991 (Table A-62: df = 2) Research issue: are two nominal variables, tabulated against each other in a crosstable, dependent?
“if the null were true, then 5% chance to observe Null hypothesis: H0: the two variables are independent
a test statistic of 5.991 or larger”
Assumptions: - independence assumption (the counts should be independent)
0.727 < 5.991 Þ don’t reject H0 at 5%
- sample size assumption (all Expij ≥ 5, i = 1,…,R ; j = 1,…,C)
(Type I Error risk)
typically satisfied for a random sample, if n is large enough
“the lower the P-value, the more evidence against the null! ”
37 38
Application: nominal variable Q6, “nationality”, vs. How have these numbers been determined?
nominal variable Q5, “gender”
- Sharpe section 5.6: Conditional Probability and Independence
Null hypothesis: H0: Q6 and Q5 are independent
p. 160 : “Events A and B are independent whenever P (B A ) = P (Β) ”
Alternative hypothesis: Ha: Q6 and Q5 are dependent
SPSS: Analyze > Descriptive Statistics > Crosstabs - here: P (Male ) = 108 / 162 = 0.667 “ P (B) ”
choose Q5 as Row, Q6 as Column
P (Male Dutch ) = 38 / 66 = 0.576 “ P (B A ) ”
under Cells, tick Expected ; under Statistics, tick Chi-square
P (Male German ) = 51 / 67 = 0.761 “ P (B A ) ”
Q5 Gender * Q6 Nationality Crosstabulation
Q6 Nationality P (Male Other ) = 19 / 29 = 0.655 “ P (B A ) ”
Dutch German Other Total
Q5 Gender male Count 38 51 19 108
Expected Count 44.0 44.7 19.3 108.0 - independence: 66.7% of all students is male
female Count 28 16 10 54 Þ 66.7% of the Dutch students is male,
Expected Count 22.0 22.3 9.7 54.0
66.7% of the German students is male,
Total Count 66 67 29 162
66.7% of the other students is male
Expected Count 66.0 67.0 29.0 162.0
Chi-Square Tests
Value df Asymp. Sig. (2-sided)
Pearson Chi-Square 5.166a 2 .076
Likelihood Ratio 5.234 2 .073
Linear-by-Linear Association 1.659 1 .198
N of Valid Cases 162
a. 0 cells (0.0%) have expected count less than 5. The minimum
expected count is 9.67.
39
- what does this imply in terms of expected frequencies?
- Obs11 = 38 Dutch males observed frequency
Þ are the observed frequencies “far away” from the expected ones?
Conclusion:
P-value = 0.076 or 7.6%
Þ some evidence that the variables are dependent