Вы находитесь на странице: 1из 10

Natasha Nielsen

Math 1040
Zeph Smith
T/R 10:00-11:20
Report Introduction
Through active learning techniques, students are able to demonstrate an understanding of key concepts
learned in Math 1040. The Skittles term project allows students to be more engaged throughout the
term, as well as hands-on in learning the basic concepts of statistics. During the project, students will
follow the three steps of statistics: Preparing, Analyzing, and Concluding data. In the preparation phase
of this project, each student in the class will provide data, that data will then be compiled into a larger
sample. That sample will then be analyzed by way of an observational study, and that information can
be used to form a conclusion about the larger population of all skittles. In this case, the convenience
sample of data is the number of each color of Skittle found in a 2.17 ounce bag of Skittles candy.
Categorical Data

Proportion of Each Skittle Color

0.210

0.205
0.183

0.207
0.195

Red
Orange
Yellow
Green
Purple

Figure 1: Pie Chart showing Color Distribution of Skittles

Proportion of Skittle Colors

Proportion of Skittle Colors


0.250

0.210

0.207

0.205

Purple

Green

Red

0.195

0.200

0.183

0.150
0.100
0.050
0.000
Yellow Orange

Color of Skittle

Figure 2: Pareto Chart showing Color Distribution of Skittle

After compiling the data, it appears that the distribution of the colors is almost uniform. This can be
seen in both the pie chart and the Pareto charts which both show that the proportion of each color is
nearly identical to every other color. It should also be noted that the bigger the sample size, the more
uniform the results will be. While every bag will have a slightly different count of each color, each bag is
typically within normal variance. Obviously there will be a few outliers; however, the bag I selected was
within one standard deviation of the mean in every color.

Name

Red

Orange

Yellow

Green

Purple

Total Candies in Bag

Respondant 1

10

12

14

17

61

Respondant 2

13

20

11

61

Respondant 3

11

13

12

10

15

61

Respondant 4

13

11

12

13

10

59

Respondant 5

14

16

13

61

Respondant 6

15

10

13

19

62

Respondant 7

13

12

14

45

Respondant 8

18

14

11

13

62

Respondant 9

10

11

16

11

56

Respondant 10

12

13

10

13

13

61

Respondant 11

22

17

12

15

75

Respondant 12

11

11

18

13

14

67

Respondant 13

15

15

12

13

64

Respondant 14

26

24

21

22

21

114

Respondant 15

14

12

17

61

Respondant 16

12

14

12

10

11

59

Respondant 17

12

13

11

13

58

Respondant 18

14

12

14

15

58

Respondant 19

15

10

13

19

62

Respondant 20

17

11

10

15

62

Respondant 21

14

14

16

60

Respondant 22

12

17

14

60

Respondant 23

11

12

14

17

62

Respondant 24

15

17

18

61

Respondant 25

21

24

22

19

20

106

Respondant 26

19

10

14

11

62

Respondant 27

18

11

10

12

60

Respondant 28

10

13

17

10

58

Respondant 29

14

15

10

13

60

Respondant 30

15

14

14

10

60

Respondant 31

16

11

17

56

Respondant 32

14

12

15

11

10

62

Respondant 33

13

11

11

15

59

Respondant 34

10

15

10

10

14

59

Respondant 35

16

12

16

62

Respondant 36

10

18

16

60

Respondant 37

22

15

28

20

13

98

Respondant 38

11

14

14

14

61

Total Candies in Class

500

446

474

503

512

Table 1: Table of Class Data

Name
Respondant 32

Red

Orange
14

Yellow
12

Green
15

Purple
11

Table 2: Table of Personal Data

Total Candies in Bag


10

62

Quantitative Data

Number of Skittles

Total Number of Candies for Each Color


550
500
450
400
Red

Orange Yellow Green Purple


Color of Skittle

Figure 3: 5 Total Number of Candies for Each Color Histogram

Figure 4: Class 5 Number Summary Box Plot

5 Number Summary for Each Color


Red

Orange

Yellow

Green

Purple

Min

Q1

10

10

11

11

Q2

13

11

12

13

13.5

Q3

15

14

14

16

15

Max

26

24

28

22

21

STD Dev

4.89

4.47

4.72

3.51

3.52

38

38

38

38

38

Table 3: 5 Number Summary for Each Color Data

Red

Orange

Ratio of Candies
% of Each Color
Total Candies

0.205 0.183
20.5 18.3
500
446

Total Bags in Class


Avg Candies/bag
Grand Total Candies in
Class

38
61
2435

Yellow

Green

Purple

0.195
19.5
474

0.207
20.7
503

0.210
21.0
512

Table 4: Ratio, Total for Each Color, and Total Candies and Bags Data

Name
Respondant 32

Red

Orange
14

Yellow
12

Green
15

Purple
11

Total Candies in Bag


10

62

Table 5: Personal Candies Data

After examining the data, it appears that there was more variation with the red and yellow skittles,
followed by the orange skittle. The green skittle had the least variation followed by the purple skittle.
Looking a bit closer at the maximum and minimum values for each color further explains why there is
such a huge variation between skittle colors. The red skittle has a maximum value of 26, and a minimum
value of 0. Likewise, the yellow skittle had a maximum value of 28, and a minimum value of 3. This is not
what one would expect, if in fact, the intent of the distributor is to have equal amounts of all colors in a
2.17 ounce bag of skittles. The bag I selected, however, was within one standard deviation of the mean
in every color.
Reflection
The difference between categorical and quantitative data is that categorical data consists of names and
labels that are not numbers, whereas, quantitative data consists of numbers that also represent counts
or measurements (Triola, 2014). In preparing categorical data it makes sense to use graphs such as: bar

graphs, pie charts, and pareto charts. Conversely, when preparing quantitative data it makes sense to
use graphs such as: histograms, normal quantile plots, time series, scatterplots, frequency polygons, dot
plots, and stem and leaf plots. This is due in large part to the differences of data being graphed. As
explained above, categorical data does not represent counts or measurements, and therefore, even if
the data can be arranged in some order, math such as subtraction and division is meaningless.
Alternatively, quantitative data represents numbers or measurements and therefore depending on the
type of data, whether discrete or continuous, math such as subtraction, addition, multiplication, and
division makes sense and can be calculated for such data.
Confidence Interval Estimates
A confidence interval is when a sample proportion is used to construct a confidence interval estimate
of the true value of a population proportion (Triola, 2014).

99% confidence interval for the true proportion of yellow candies:

Where n=2435, p = .195, =.805,

= 2.576

p-E

)(

.195 - .0207 =.174


.195 + .0207 = .216

.174

.216

We can say with 99% confidence that the true proportion of yellow candies is between .174 and .216.

95% confidence interval for the true mean number of candies per bag:

Where = 61, n = 38, s = 13.2, df = 37,

= 2.026

61 4.34 = 56.66
61 + 4.34 = 65.34
56.66
We can say with 95% confidence that the true mean number of candies per bag is between 56.66 and
65.34.

98% confidence interval for the standard deviation of the number of candies per bag:

Where n= 38, s = 13.2, df = 37

)(

We can say with 98% confidence that the standard deviation of the number of candies per bag is
between 11.26 and 20.76.
Hypothesis Tests
A hypothesis test is a procedure for testing a claim about a property of a population (Triola, 2014).

20% of all Skittles are red.

Where n = 2435, x = 487 (.20 * 2435),

= .05

1-PropZTest:
Z=0
P = .999
= Fail to Reject

The P-Value is larger than the alpha value, therefore, there is not sufficient evidence to reject the null
hypothesis 20% of all skittles are red.

The mean number of cadies in a bag of Skittles is 55.

Where n = 38, = 61,

= .01

T Test
t = 2.80
P = .008

= Reject

The P- Value is smaller than the alpha value; therefore, there is sufficient evidence to reject the claim
that the mean number of candies in a bag of Skittles is 55.
Reflection
In order to construct confidence interval estimates, and hypothesis tests, certain criteria need to be
met. Firstly, the sample should be a simple random sample. Secondly, that the data fits the conditions
for a binomial distribution. And lastly, there must be at least five failures and at least five successes.
This can be done with a simple process of multiplying n and p (successes), and n and q (failures) to

ensure that they are both greater than five. All three intervals and hypothesis tests above meet all of
these requirements.
When reviewing the data gathered for the Skittles project, there appears to be either a few outliers or
more likely an error in purchasing the incorrect size of Skittles bag. Another error to factor in is that this
was a convenience sample, which means it does not necessarily represent the true population
proportions of Skittles. There are other errors that could also affect the results, these could be: the
sample size is too small or the incorrect data entry from other students. The sampling method could
easily be improved by removing the outliers, increasing the sample size, and ensure a simple random
sample.

Вам также может понравиться