Вы находитесь на странице: 1из 15

Chapter 20.

Comparing Two Proportions


Topics covered in this chapter: Confidence Interval for the Difference between Two Proportions Significance Test for Comparing Proportion

Confidence Interval for the Difference between Two Proportions


Example 20.2: Men versus women living with their parents The Problem: A surprising number of young adults (ages 19 to 25) still live in their parents home. A random sample by the National Institutes of Health included 2253 men and 2629 women in this age group. The survey found that 986 of the men and 923 of the women lived with their parents. Is this good evidence that different proportions of young men and young women live with their parents? A 95% confidence interval is constructed to test the difference of the proportion of men and women living with their parents. 1. Create the data. a. Click on Variable View. b. Enter the variables n1, n2, p1, p2. All four should have the Numeric type. c. Change the number of decimals for p1 and p2 to 4.

d. Click on Data View. e. In row 1, type the values 2253, 2629, 0.4376, and 0.3511.

132

133Chapter 20

2. Compute the standard error. a. Click on Transform. b. Click on Compute Variable. c. Under Target Variable, type SE. The equation for standard error can be found on page 525 of the text book and can be entered under Numeric Expression as shown below.

Comparing Two Proportions134

The standard error is shown to be approximately 0.014. Usually the standard error is very small when dealing with proportions. 3. Calculate the 95% confidence interval. Remember to use z * = 1.96. a. Click on Transform. b. Click on Compute Variable. c. Under Target Variable, type lowerbound. d. Under Numeric Expression, type (p1 p2) 1.96 * SE.

e. Click on OK. f. Repeat steps a-e, except type upperbound instead of lowerbound, and change the Numeric Expression to (p1 p2) + 1.96 * SE.

135Chapter 20

The output shows the lowerbound and upperbound as .059 and .114. The 95% confidence interval of the difference of proportions of young men and young women living with their parents is (.059, .114). Therefore, we are 95% confident that the percent of young men living with their parents is 5.9% to 11.4% higher than that of young women.

Significance Test for Comparing Proportion


Example 20.4: Choosing a mate The Problem: A sample of never-married students at two historically black colleges in the South were asked whether they would marry someone from a lower social class than their own. The purpose of this study was to see whether the proportion of men and women who respond yes were significantly different.

Comparing Two Proportions136

1. Create the data. a. Click on Variable View. b. Enter the variables n1, n2, success1, success2, p1, and p2. c. Change the number of decimals for p1 and p2 to 3. d. Click on Data View. e. In row 1, type the values 149, 236, 91, 117, 0.611, and 0.496. 2. Compute the pooled sample proportion. a. Click on Transform. b. Click on Compute Variable. c. Under Target Variable, type pooled. d. In the Numeric Expression, type (success1 + success2) / (n1 + n2). e. Click on OK.

The pooled sample proportion is approximately 0.54. Now, the standard error must be computed. Note that this equation is long and that it should be done exactly as shown. 3. Compute the standard error. a. Click on Transform. b. Click on Compute Variable.

137Chapter 20

c. Under Target Variable, type SE. d. In the Numeric Expression box, type the following: SQRT(pooled*(1-pooled)*((1/n1)+(1/n2))). Make sure the equation is written correctly, then hit OK.

4. Compute the z-variable. a. Click on Transform. b. Click on Compute Variable. c. Under Target Variable, type z. d. Under Numeric Expression, type (p1 p2) / SE. e. Click on OK.

Comparing Two Proportions138

The z-variable is equal to approximately 2.21. 5. Compute the p-value. a. Click on Transform. b. Click on Compute Variable. c. Under Target Variable, type pvalue. d. Under Numeric Expression, type 2* (1 . e. Under Function group, click on CDF & Noncentral CDF. f. Under Functions and Special Variables, double-click on Cdfnorm. g. Choose z as the variable. h. Complete the function by closing the parentheses. h. Click on OK.

139Chapter 20

The p-value is 0.027. We have strong evidence at the .05 significance level that there is a difference in the proportion of men and women who are students at historical black colleges in the South that will marry someone in a social class lower than their own.

Chapter 20 Exercises
20.1 20.3 20.5 Who uses instant messaging? High school students in action. Broken crackers.

Comparing Two Proportions140

20.7 20.17 20.19 20.21 20.23 20.27 20.29 20.31 20.35

Protecting skiers and snowboarders. Truthfulness in online profiles. Genetically altered mice. Does statistical help make a difference? How often are statisticians involved? Female and male students. How to quit smoking. I refuse! Using credit cards.

427

Chapter 20 SPSS Solutions

**NOTE: SPSS does not do inference based on Z distributions, nor does it perform inference on variables that are already summarized. If you really want to use SPSS for these problems, follow the instructions below (youll be basically using Transform, Compute Variable as a calculator) or use another technology (such as a graphing calculator or another statistics program like Minitab or Crunchit.)
20.1 Well find a 95% confidence interval for p1 p2 , where p1 is the proportion of IM primary users in the 18 to 27 age group and p2 is the proportion of IM primary users in the 28 to 39 age group. There are enough successes IM primary users in each age group (more than 10), so we can use large-sample methods. From the data given, we 1 = 73 /158 = 0.462 and p 2 = 26 /143 = 0.182. The observed difference is have p 1 p 2 = 0.462 0.182 = 0.28. p The confidence interval formula is given by

1 p 2 z * p

1 (1 p 1 ) p (1 p 2 ) p + 2 . n1 n2

The interval for the difference is 18.0% to 38.0%. Were 95% confident, based on this information that between 18% and 28% more people in the 18 to 27 age group use IM more often than email, compared to people in the 28 to 39 age group. This difference is statistically significant because the interval does not contain zero (and we have a P-value of 0.000 for the test).

20.3 Let pF be the proportion of all female high school students who meet the physically activity recommendations and pM be the proportion of males. We want a confidence interval for pF pM . From the data given, we have F = 1915 / 6889 = 0.2780 and p M = 3078 / 7028 = 0.438. The observed difference is p F p M = 0.278 0.438 = 0.16. p

428

Both ends of the interval are negative, which means males are more likely to meet the physical activity recommendations than females. Between 13.9% and 18.1% more males meet the recommendations than females, with 99% confidence.

20.5 We need the plus four method here because none of the microwave group of crackers had checking. With two samples, add 1 to the successes in each group and 2 to the sample size in each group. We have pM = 1/ 67 = 0.0149 and pNM = 17 / 67 = 0.2537. The observed difference is 0.2388.

Based on this study, between 13.1% and 34.7% fewer crackers will have checking when microwaved, with 95% confidence. In the actual crackers used, the sample proportions NM = 16 / 65 = 24.6%. M = 0 and p are p

20.7 We want to know if helmet use is less common among those who have had head injuries. Well test H 0 : pHI = p NHI against H a : pHI < pNHI where p is the proportion in each group who use helmets. From the data, we compute the estimates needed for the HI = 96 / 578 = 0.1661, p NHI = 656 / 2992 = 0.2193, p and hypothesis test as = (96 + 656) /(578 + 2992) = 0.2106. We now compute the test statistic and its P-value. p

The test statistic is z = 2.87 with P-value 0.0021. This study does show that skiers and snowboarders who have had head injuries are less likely to use helmets.

429

20.17 These samples satisfy the large sample guidelines there were more than 10 each of successes and failures in each group. The sample proportions are 1 = 117 /170 = 0.6882 for the younger group and p 2 = 152 / 317 = 0.4795 for the older p group. Both ends of the interval are positive; younger teens are more likely to have false information in their profiles between 12.0% and 29.7% more of young teens have false information included than older teens, with 95% confidence.

20.19 Since none in the control group of 18 mice developed tumors, we cant use largesample methods. After adding 1 to each success, we have 24 mice with tumors in the group with lowered levels of DNA methylation, and 1 case of a tumor in the control group; there are now 35 mice in the group with lowered levels of DNA methylation and 20 in the control group. If we call the group with lowered levels group 1, we have p1 = 24 / 35 = 0.688. In the other group, we have p2 = 1/ 20 = 0.05. To find the 99%

confidence interval, we compute the margin of error as z * 2.576

p1 (1 p1 ) p2 (1 p2 ) + , or n1 + 2 n2 + 2

.688*.312 .05*.95 + = 0.238. The difference in the proportions of mice who 35 20 develop tumors is between 39.8% and 87.4% higher in mice with lowered levels of DNA methylation. Since this interval does not contain 0, the difference is significant at the 1% level.
20.21 We wish to test H 0 : pH = pNH against H a : pH pNH where p is the proportion of papers rejected without review in each group. The proportion of rejected papers that H = 293 / 514 = 0.57, and the proportion rejected who did not have help did have help is p NH = 135 /190 = 0.711. The pooled proportion is p = (293 + 135) /(514 + 190) = 0.608. is p

430

We reject the null hypothesis because the test statistic is z = 3.40 with P-value 0.0007. The observed difference in papers rejected without review is 14%; papers without statistical help are more likely to be rejected.

20.23 We compute the confidence interval for how much more often papers without statistical help are rejected after rejecting equality of rejection rates in Exercise 20.21.

Both ends of the interval are negative; with 95% confidence, papers who dont have statistical help are rejected between 6.4% and 21.8% more often than papers that did have statistical help.

20.27 We want to know if there is a difference in success by gender. Well test H 0 : pM = P against H a : pM pW . From the data given, we have W W = 23 / 34 = 0.6765 and M = 60 / 89 = 0.6742. p p The pooled estimate is = (23 + 60) /(34 + 89) = 0.6748. p

These data do not show a difference in success by gender. 67.6% of females succeed and 67.4% of males succeed. The test statistic is z = 0.02 with P-value 0.9840. (We really didnt need to compute the P-value a test statistic of z = 0.02 will never be significant).

431

C = 40 / 244 = 0.1639, and for the 20.29 For the patch only (control) group, we have p T = 87 / 245 = 0.3551. The confidence interval becomes treatment group, we have p

Were convinced the drug helps smokers quit; between 9.2% and 29.1% more will successfully quit with the drug, with 99% confidence.
20.31 Well test H 0 : pC = PH against H a : pH > PC where p is the proportion of offers rejected in each group. Be careful well have to add together the offers accepted and C = 6 / 38 = 0.1579 and rejected for the total number of offers. From the data, we have p H = 18 / 38 = 0.4737. The pooled estimate is p = (6 + 18) /(38 + 38) = 0.3158. p

Computer offers are less likely to be rejected; the test statistic is z = 2.97 with P-value 0.0015.

20.35 Calling the group who made impulse purchases group 1, we have the alternate hypothesis p1 p2 because we merely want to know if there is a difference in credit card use. Add the Yes and No answers to find the total number of shoppers for each type of 1 = 13 / 31 = 0.419 and p 2 = 35 / 66 = 0.530. For the hypothesis test, purchase. We have p we also need the blended p = (13 + 35) /(31 + 66) = 0.495. We compute the test statistic and its P-value below. With a test statistic of z = 1.02 and P-value of 0.3077, we cannot say there is a difference in credit card use between planned and impulse purchases.

432

To find the 95% confidence interval, we compute the margin of error as (1 p 1 ) p (1 p 2 ) .419*(1 .419) .530*(1 .530) p + = 0.211. The + 2 z* 1 , or 1.96 31 66 n1 n2 95% confidence interval for the difference in the proportion of credit card purchases is 32.2% to 10.0%; since this interval includes 0, we again cannot say there is a difference in credit card use for the different types of purchases.

Вам также может понравиться