Академический Документы
Профессиональный Документы
Культура Документы
Testing
Chapter 4 Outline
• Clint’s Dilemma and Estimation Procedures
o Clint’s Opinion Poll and His Dilemma
o Clint’s Estimation Procedure: The General and the Specific
o Taking Stock and Our Strategy to Assess the Reliability of Clint’s
Poll Results: Use the General Properties of the Estimation
Procedure to Assess the Reliability of the One Specific Application
o Importance of the Mean (Center) of the Estimate’s Probability
Distribution
o Importance of the Variance (Spread) of the Estimate’s Probability
Distribution for an Unbiased Estimation Procedure
• Hypothesis Testing
o Motivating Hypothesis Testing – The Evidence and the Cynic
o Formalizing Hypothesis Testing – Five Steps
o Significance Levels and Standards of Proof
o Type I and Type II Errors: The Tradeoffs
2. After collecting evidence from a crime scene, the police identified a suspect.
The suspect provides the police with a statement claiming innocence. The
district attorney is deciding whether or not to charge the suspect with a crime.
The district attorney asks a forensic expert to examine the evidence and
compare it to the suspect’s personal statement. After the expert completes
his/her work, the district attorney poses the following the question to the
expert:
Question: What is the probability that similar evidence would have arisen
IF the suspect were in fact innocent?
Initially, the forensic expert assesses this probability to be .50. A week later,
however, more evidence is uncovered and the expert revises the probability to
.01. In light of the new evidence, is it more or less likely that the suspect is
telling the truth?
2
3. The police charge a seventeen year old male with a serious crime. History
teaches us that no evidence can ever prove that a defendant is guilty beyond
all doubt. In this case, however, the police do have strong evidence against the
young man suggesting that he is guilty, although the possibility that he is
innocent cannot be completely ruled out. You have been impaneled on a jury
to decide this case. The judge instructs you and your fellow jurors to find the
young man guilty if you determine that he committed the crime “beyond a
reasonable doubt.”
a. The following table illustrates the four possible scenarios:
Jury Finds Defendant Jury Finds Defendant
Guilty Innocent
Defendant
Actually Jury is Jury is
Innocent correct__ incorrect__ correct__ incorrect__
Defendant
Actually Jury is Jury is
Guilty correct__ incorrect__ correct__ incorrect__
For each scenario, indicate whether the jury would be correct or
incorrect.
b. Consider each scenario in which the jury errs. In each of these cases,
what are the consequences (the “costs”) of the error to the young man
and/or to society?
3
4. Suppose that two baseball teams, Team RS and Team Y have played 185
games against each other in the last decade. Consider the following statement
made by Mac Carver, a self-described baseball authority:
Carver’s View: “Over the last decade, Team RS and Team Y have been
equally strong.”
Now, consider two hypothetical scenarios:
Hypothetical Scenario A Hypothetical Scenario B
Team RS wins 180 of the 185
Team RS wins 93 of the 185 games
games
a. For the moment, assume that Carver’s is correct. Comparatively
speaking, which scenario would be likely (high probability) and which
scenario would be unlikely (low probability)?
Assuming that Carver’s view is correct
Would Scenario A be Would Scenario B be
Likely ___ Unlikely ___ Likely ___ Unlikely? ___
↓ ↓
Would Would
Prob[Scenario A IF Carver Correct] Prob[Scenario B IF Carver Correct]
be be
High ___ Low ___ High ___ Low ___
b. Next, suppose that Scenario A actually occurs. Would you be inclined
to reject Carver’s view or not reject it? On the other hand, if Scenario
B actually occurs, what would you be inclined to do?
Scenario A actually occurs Scenario B actually occurs
↓ ↓
Reject Carver’s view? Reject Carver’s view?
Yes___ No___ Yes___ No___
We shall now return to Clint’s dilemma. The election is tomorrow and Clint must
decide whether or not to hold a pre-election beer tap rally designed to entice more
students to vote for him. If Clint is comfortably ahead, he could save his money
by not holding the beer tap rally. On the other hand, if the election is close, the
beer tap rally could prove critical. Ideally, Clint would like to poll each member
of the student body, but time does not permit this. Consequently, Clint decides to
conduct an opinion poll by selecting 16 students at random. Clint adopts the
philosophy of econometricians:
Econometrician’s Philosophy: If you lack the information to determine the
value directly, estimate the value to the best of your ability using the
information you do have.
4
Clint wrote the name of each student on a 3×5 card and repeated the following
procedure 16 times:
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.
Twelve of the sixteen students polled support Clint. That is, the estimated fraction
of the population supporting him is .75:
12 3
Estimated Fraction of Population Supporting Clint : EstFrac = = = .75
16 4
Based on the results of the poll, it looks like Clint is ahead. But how confident
should Clint be that he is in fact ahead. Clint faces a dilemma:
Clint’s Dilemma: Should Clint be confident that he has the election in hand
and save his funds or should he finance the beer tap rally?
Our project is to use the poll to help Clint resolve his dilemma:
Project: Use Clint’s poll to assess his election prospects.
Our Opinion Poll simulation taught us that while the numerical value of
the estimated fraction from one poll could equal the actual population fraction, it
typically does not. The simulations showed that in most cases the estimated
fraction will be either greater than or less than the actual population fraction.
Accordingly, Clint must accept the fact that the actual population fraction
probably does not equal .75. So, Clint faces a crucial question:
Crucial Question: How much confidence should Clint have in his estimate?
More to the point, how confident should Clint be in concluding that he is
actually leading?
Taking Stock and Our Strategy to Assess the Reliability of Clint’s Poll Results
Let us briefly review what we have done thus far. We have laid the groundwork
required to assess the reliability of Clint’s poll results by focusing on what we
know before the poll is conducted; that is, we have focused on the general
properties of the estimation procedure, the probability distribution of the estimate.
In Chapter 3 we derived the general equations for the mean and variance of the
estimated fraction’s probability distribution algebraically and then checked our
algebra by exploiting the relative frequency interpretation of probability in our
Opinion Poll simulation:
Let us review the importance of the mean and variance of the estimated fraction’s
probability distribution.
7
EstFrac
ActFrac
Figure 4.1: Probability Distribution of EstFrac, Estimated Fraction Values –
Importance of Mean
How confident should Clint be that his estimate is close to the actual population
fraction? Since the estimation procedure is unbiased, the answer to this question
depends on the variance of the estimated fraction’s probability distribution.
As the variance decreases, the likelihood of the estimate being “close to” the
actual value increases; that is, as the variance decreases, the estimate becomes
more reliable.
Hypothesis Testing
Now, we shall apply what we have learned about the estimate’s probability
distribution, the estimation procedure’s general properties, to assess how
confident Clint should be in concluding that he is ahead.
In the case of Clint’s poll, a cynic might say “Sure, a majority of those polled
supported Clint, but the election is actually a tossup. The fact that 75 percent of
those polled supported Clint was just the luck of the draw.”
Cynic’s View: Despite the poll results, the election is actually a tossup.
The Opinion Poll simulation clearly shows that 12 or even more of the 16
students selected could support Clint in a single poll when the election is a tossup.
Accordingly, we cannot simply dismiss the cynic’s view as nonsense. We must
take the cynic seriously. To assess his view, we pose the following question
10
which asks how likely it would be to obtain a result like the one that actually
occurred if the cynic is correct:
Question for the Cynic: What is the probability that the result from a single
poll would be like the one actually obtained (or even stronger), if the cynic is
correct and the election is a tossup?
More specifically,
Question for the Cynic: What is the probability that the estimated fraction
supporting Clint would equal .75 or more in one poll of 16 individuals, if the
cynic is correct (that is, if the election is actually a tossup and the fraction of
the actual population supporting Clint equals .50)?
When the probability is small, it would be unlikely that the election is a tossup
and hence, we could be confident that Clint actually leads. On the other hand,
when the probability is large, it is likely that the election is a tossup even though
the poll suggests that Clint leads:
How can we answer the question for the cynic? That is, how can we calculate this
probability, Prob[Results IF Cynic Correct]? To understand how, recall Clint’s
estimation procedure, his poll:
Write the names of every individual in the population on a separate card, then
• Perform the following procedure 16 times:
o Thoroughly shuffle the cards.
o Randomly draw one card.
o Ask that individual if he/she supports Clint and record the
answer.
o Replace the card.
• Calculate the fraction of those polled supporting Clint.
If the cynic is correct and the election is a tossup, the actual fraction of the
1
population supporting Clint would equal 2 or .50. Based on this premise, apply
the equations we derived to calculate the mean and variance of the estimated
fraction’s probability distribution:
1
Sample Size = T = 16 Actual Population Fraction = ActFrac = = .50
2
1 1 1
×
1 p(1 − p) 2 2 4 1
Mean[ EstFrac ] = p = = .50 Var[ EstFrac ] = = = =
2 T 16 16 64
1 1
SD[ EstFrac ] = Var[ EstFrac ] = = = .125
64 8
Next, recall the normal distribution’s rules of thumb:
Since the standard deviation is .125, the result of Clint’s poll, .75, is 2 standard
deviations above the mean, .50.
12
.95
.025
2 SD’s 2 SD’s
.25 .50 .75
Figure 4.3: Probability Distribution of EstFrac – Calculating Prob[Results IF
Cynic Correct]
The rules of thumb tell us that the probability of being within 2 standard
deviations of the random variable’s mean is approximately .95. Recall that the
area beneath the normal distribution equals 1.00. Since the normal distribution is
symmetric, the probability of being more than 2 standard deviations above the
mean is .025:
1.00 − .95 .05
= = .025
2 2
If the cynic is actually correct (if the election is actually a tossup), the probability
that the fraction supporting Clint would equal .75 or more in one poll of 16
individuals equals .025, that is, 1 chance in 40. Clint must now make a decision.
He must decide whether or not he is willing to live with the odds of a 1 in 40
chance that the election is actually a tossup. If he is willing to do so, he will not
fund the beer tap rally; otherwise, he will.
13
The following five steps describe how we can formalize hypothesis testing.
Step 2: Play the cynic and challenge the results; construct the null and alternative
hypotheses.
Cynic’s view: Despite the results, the election is actually a tossup; that is, the
actual fraction of the population supporting Clint is .50.
The null hypothesis adopts the cynical view by challenging the evidence; the
cynic always challenges the evidence. By convention, the null hypothesis is
denoted as H0. The alternative hypothesis is consistent with the evidence; the
alternative hypothesis is denoted as H1.
H0: ActFrac = .50 ⇒ Election is a tossup; cynic is correct
H1: ActFrac > .50 ⇒ Clint leads; cynic is incorrect and the evidence is correct
Step 3: Formulate the question to assess the cynic’s view and the null hypothesis.
Step 4: Use the general properties of the estimation procedure, the estimated
fraction’s probability distribution, to calculate Prob[Results IF H0 True].
Recall that z equals the number of standard deviations that the value lies from
the mean:
Value of Random Variable − Distribution Mean
z=
Distribution Standard Deviation
z 0.00 0.01
1.9 0.0287 0.0281
2.0 0.0228 0.0222
2.1 0.0179 0.0174
Table 4.2: Selected Right Tail Probabilities for the Normal Distribution
The value of the random variable equals .75 (from Clint’s poll); the mean
equals .50, and the standard deviation .125:
.75 − .50 .25
z= = = 2.00
.125 .125
Next, consider the table of right tail probabilities for the normal distribution.
Table 4.2, an abbreviated form of the normal distribution table, provides the
probability:
Probability that the result from a single poll would
Prob[Results IF Cynic Correct] = be like the one actually obtained (or even stronger)
IF the cynic is correct (if the election is a tossup)
= .0228
Sample size = 16 Mean = .50
SD = .125
.0228
EstFrac
2 SD’s
.50 .75
Figure 4.4: Probability Distribution of EstFrac – Calculating Prob[Results IF H0
True]
16
Now, consider two different significance levels that are often used in academia: 5
percent and 1 percent:
Significance Level = 5 percent Significance Level = 1 percent
↓ ↓
Prob[Results IF H0 True] Prob[Results IF H0 True]
less than significance level greater than significance level
↓ ↓
Prob[Results IF H0 True] small Prob[Results IF H0 True] large
↓ ↓
Unlikely that H0 is true Likely that H0 is true
↓ ↓
Reject H0 Do not reject H0
↓ ↓
Suggestion: Clint leads Suggestion: Election a toss up
Significance
Level
Prob Small Prob Large
Now, let us generalize. The significance level is the dividing line between
what we consider a small and large probability:
Prob[Results IF H0 True] Prob[Results IF H0 True]
less than significance level greater than significance level
↓ ↓
Reject H0 Do not reject H0
As we reduce the significance level, we make it more difficult to reject the null
hypothesis; we make it more difficult to conclude that Clint is leading.
Consequently, the significance level and standard of proof are intimately related;
as we reduce the significance level, we are implicitly adopting a higher standard
of proof:
Lower More Difficult Higher
Significance ⎯⎯→ To Reject Null ⎯⎯→ Standard
Level Hypothesis of Proof
What is the appropriate standard of proof for Clint? That is, what
significance level should he use? There is no definitive answer, only Clint can
decide. The significance level Clint’s chooses, his standard of proof, depends on a
number of factors. In part, it depends on the importance he attaches to winning the
election. If he attaches great importance to winning, he would set a very low
significance level, making it difficult to reject the null hypothesis. In this case, he
would be setting a very high standard of proof; much proof would be required for
him to reject the notion that the election is a tossup. Also, Clint’s choice would
depend on how “paranoid” he is. If Clint is a “worry wart” who always focuses on
the negative, he would no doubt adopt a low significance level. He would require
a very high standard of proof before concluding that he is leading. On the other
hand, if Clint is a carefree optimist, he would adopt a higher significance level
and thus a lower standard of proof.
Suppose that the police charge a seventeen year old male with a serious
crime. Strong evidence against him exists. The evidence suggests that he is guilty.
But a word of caution is now in order; no evidence can ever prove guilt beyond all
doubt. Even confessions do not provide indisputable evidence. There are many
examples of an individual confessing to a crime that he/she did not commit.
Now, let us play the cynic. The cynic always challenges the evidence:
Cynic’s view: Sure, there is evidence suggesting that the young man is guilty,
but the evidence results from the “luck of the draw.” The evidence is just
coincidental. In fact, the young man is innocent.
Now suppose that you are a juror charged with deciding the case. Criminal
trials in the U.S. require the prosecution to prove that the defendant is guilty
“beyond a reasonable doubt.” The judge instructs you to find the defendant guilty
if you believe the evidence meets the “beyond the reasonable doubt” criterion.
You and your fellow jurors must now decide what constitutes “proof beyond a
reasonable doubt.” To help you make this decision, we shall make two sets of
observations. We shall first express each in simple English and then “translate”
the English into “hypothesis testing language”; in doing so, remember the null
hypothesis asserts that the defendant is innocent:
20
Question: Suppose that the prosecutor decides to try the seventeen year old as
an adult rather than a juvenile. How should the jury’s standard of proof be
affected?
In this case, the costs of incarcerating an innocent man (Type I error) would
increase because the conditions in a prison are more severe than the conditions in
a juvenile detention center. Since the costs of incarcerating an innocent man
(Type I error) are greater, the jury should demand a higher standard of proof,
thereby making a conviction more difficult:
Now, review the relationship between the significance level and the
standard of proof; a lower significance level results in a higher standard of proof:
Significance
Level
Small Probability Large Probability
The choice of the significance level involves tradeoffs, a “tight rope act,” in
which we balance the relative costs of Type I and Type II error. There is no
automatic, mechanical way to determine the appropriate significance level. It
depends on the circumstances.
1
Traditionally, this probability is called the p-value. We shall use the more
descriptive term, however, to emphasize what it actually represents. Nevertheless,
you should be aware that this probability is typically called the p-value.