Вы находитесь на странице: 1из 8

Math 1040 Skittles Term Project

Introduction

The purpose of the study is to determine the proportion of colors of candies in Skittles
bags.
Procedures: participants of students = 111. Each student purchased an original skittle bag
with 2.17-ounce. Each student would then count the number of red, orange, yellow, green
and purple color in each bag and record their own data. All of the data would be collected
and put into an excel sheet for further analysis.
Goals: Compare expected proportion of colors of Skittles to observed proportion of
colors of Skittles by student data. Understand random sampling and the inference of the
sample into population. Test the generalizability of students’ data into population.

Data Collection

Number of Skittles in different colors in my bag


Number of red Number of Number of Number of Number of
candies orange yellow green purple
candies candies candies candies
10 14 14 14 9

Proportion (or percentage) of the Skittles of each color


I expect to see 20% for each color because they are put into the bag random and they
should be equally distributed by probability.
Proportion of each color within the overall sample gathered by the class

Count Count Count Count Count


Red Orange Yellow Green Purple

Expected
20 20 20 20 20
Proportion

Observed
20.060 20.300 21.108 18.638 19.895
Proportion
Organizing and Displaying Categorical Data: Colors
Random sampling means that each individual in the population has an equal chance to be
selected to be in the sample. Each student purchased the skittle bags in different locations
and times. The bags are randomly chosen into the study. Therefore, the class data
represent a random sample.
Population would be all of the skittles bags that are produced in Salt Lake City area.
Because random sampling maximizes the likelihood that the subjects that are included in
the sample can be generalizable in the population, our sample can probably be the
representative of the individuals in the population.

Proportion Proportion Proportion Proportion Proportion Total


Red Orange Yellow Green Purple Count

My Bag 10 14 14 14 9 61

My Bag
16.393% 22.951% 22.951% 22.951% 14.754%
Proportion

Class Totals 1340 1356 1410 1245 1329 6680

Class
20.060% 20.300% 21.108% 18.638% 19.895%
Proportion

The graphs reflect what I expected to see. I expected to see proportion of about 20% for
each color and the observed proportions are close to 20% for each color. There are some
outlier such as bags that have more than 65.5 candies or less than 53.5 candies. If the big
bags contained very different proportion, for example, have 50% red color, and have
many of the big bags, the result can be skewed towards having more red color in total. It
means that the graphs may show a bigger portion of red in the pie chart and a higher
frequency bar for the pareto chart. The summary statistics may not be able to conclude
that the sample could be generalizable to the population because of the outliers. The
distribution of colors are 16.4%, 23.0%, 23.0%, 23.0% and 14.8% for red, orange,
yellow, green and purple, respectively, which is not very similar to the expected nor the
observed distribution of colors in the total class data. I assume that it is explained because
mine is only one individual subject. When the skittles are made, they are packed
randomly into many bags. It is possible for one bag to have different distribution than the
population, but if the distribution of colors are gathered by a large number of bags, the
result can simulate the distribution of the population.
Organizing and Displaying Quantitative Data: the Number of Candies per Bag

For total number of candies in each bag


Mean = 60.2
Standard deviation = 7.00
Min = 35
1st Quartile = 58
Median = 59
2nd Quartile = 61
Max = 97

Histogram with width of 10

Histogram of width of 5
The shape of the distribution is relative symmetric and bell shape. The data is gathered in
the center. Most of the data is around 55-65. The graphs reflect what I expected to see
because I expect that every bag should have similar amount of candies.
My bag has a total amount of 61 candies, which is close to the mean and median of the
data. That means that the overall data collected agree with my own data.

The number of candies from my own bag = 61


Total number of bags in the sample = 111

Difference between categorical and quantitative data

Categorical data refers to the data that can be placed in different category. They have
label values or characteristic of something. They measure quality. Bar graph and pie chart
make sense for categorical data because the data can be separated into group and the
frequency or proportion can be compared. Stemplot, histogram, and boxplot do not make
sense for this variable because the values are not continuous and some of the necessary
data cannot be obtained such as mean and median for the graphs. Proportion or
percentage make sense because the proportion be used to compare the impact of different
groups. Mean, standard deviation and 5 number summary do not make sense because
they are not numbers.
Quantitative data refers to the data that has numerical values and can usually represent
measurements. They measure quantity. Stemplot, histogram, and boxplot make sense for
quantitative data because they are numbers and can be separated into subgroups for
stemplot and histogram. Bar graph and pie chart do not make sense because the quantity
shouldn’t be separated into groups because they would be continuous numbers. Mean,
standard deviation and 5 number summary make sense because they are numbers.
Proportion or percentage do not make sense because the numbers shouldn’t be separated
into categories to calculate the proportion.

Confidence Interval Estimates

Confidence interval shows the degree of uncertainty with data of a sample. It can also be
used to estimate where the true population parameter would fall. Confidence interval is
used to express the range of uncertainty of a sample. It is used to infer to the true
population value with a level of uncertainty or with an estimate. A 95% confidence
interval, for example, indicates that the true population value would be within the
estimated interval 95% of the time.

A 99% confidence interval estimate for the true proportion of yellow candies

n=6680 (total number of candies)


x=1410 (total number of yellow candies)
α=0.01
z α/2 = z0.005 = 2.576

p hat = 1410/6680 = 0.2111

E = 2.576*((0.2111*(1-0.2111))/6680)^(1/2) = 0.01286
Therefore, lower bound = 0.2111 – 0.01286 = 0.19824
upper bound = 0.2111 + 0.01286 = 0.22396

A 95% confidence interval estimate for the true mean number of candies per bag

n=111 (total number of bags)


s=7.000257
α=0.05
df=100
t α/2 = t0.025 = 1.984

mean = 6680/111 = 60.1801

E = 1.984*7.000257/(111)^(1/2) = 1.3182

Therefore, lower bound = 60.1801 - 1.3182 = 58.8619


upper bound = 60.1801 + 1.3182 = 61.4983

We are 99% confident that the true value of population proportion of yellow Skittles
is between 0.19824 and 0.22396.

We are 95% confident that the true value of population mean number of candies per
Skittles bag is between 58.8619 and 61.4983.

Reflection

The sample needs to be in a normal distribution, which our sample is probably not. It has
to be random sampling, which our sample is. The sample has to be below 10% of the true
population, which our sample is.
Sampling error because we are only collecting samples in Utah. If the inference was to
made to the whole nation, the data in Utah may not be able to be used to generalize to the
nation because different states may have different manufactures in making Skittles.
Certain age groups may like to go to specific stores to buy Skittles.
It would be great if we can gather data from different schools and universities across
states, and from more age groups.
Conclusions: the observed proportion of colors of Skittles are similar to the expected
proportion of colors of Skittles.
e-Portfolio

I found it fascinating to see that even though the different bags have different total

number of candies and different proportion of color, when a big number of data is

collected, the observed proportion of colors of Skittles can be very similar to the expected

proportion of colors of Skittles.

Data analysis and the ability to understand data and graphs are essential when

reading research papers. Since I am a nursing major, I will come across classes that

possibly require me to read more intervention and innovations that are useful for treating

patients. After doing this project, it will help me understand what I will be reading better.

Some of the questions require deep thinking and some of them I didn’t know how to

solve. So I help me to develop the skills to search for answer from past notes and

textbooks. It also helps me to develop the skills to screen useful data from the internet

and eliminate information that is unrelated to the topic.

There are full of math application in the real world, I didn’t even think about the

colors of the Skittles can be used to formulate a math project. It broadens my thoughts to

looks for more math application and create a study design to answer it, such as the

frequency of people getting into the store or the things people purchased can all be

statistical problem.

Вам также может понравиться