Академический Документы
Профессиональный Документы
Культура Документы
Introduction
The purpose of the study is to determine the proportion of colors of candies in Skittles
bags.
Procedures: participants of students = 111. Each student purchased an original skittle bag
with 2.17-ounce. Each student would then count the number of red, orange, yellow, green
and purple color in each bag and record their own data. All of the data would be collected
and put into an excel sheet for further analysis.
Goals: Compare expected proportion of colors of Skittles to observed proportion of
colors of Skittles by student data. Understand random sampling and the inference of the
sample into population. Test the generalizability of students’ data into population.
Data Collection
Expected
20 20 20 20 20
Proportion
Observed
20.060 20.300 21.108 18.638 19.895
Proportion
Organizing and Displaying Categorical Data: Colors
Random sampling means that each individual in the population has an equal chance to be
selected to be in the sample. Each student purchased the skittle bags in different locations
and times. The bags are randomly chosen into the study. Therefore, the class data
represent a random sample.
Population would be all of the skittles bags that are produced in Salt Lake City area.
Because random sampling maximizes the likelihood that the subjects that are included in
the sample can be generalizable in the population, our sample can probably be the
representative of the individuals in the population.
My Bag 10 14 14 14 9 61
My Bag
16.393% 22.951% 22.951% 22.951% 14.754%
Proportion
Class
20.060% 20.300% 21.108% 18.638% 19.895%
Proportion
The graphs reflect what I expected to see. I expected to see proportion of about 20% for
each color and the observed proportions are close to 20% for each color. There are some
outlier such as bags that have more than 65.5 candies or less than 53.5 candies. If the big
bags contained very different proportion, for example, have 50% red color, and have
many of the big bags, the result can be skewed towards having more red color in total. It
means that the graphs may show a bigger portion of red in the pie chart and a higher
frequency bar for the pareto chart. The summary statistics may not be able to conclude
that the sample could be generalizable to the population because of the outliers. The
distribution of colors are 16.4%, 23.0%, 23.0%, 23.0% and 14.8% for red, orange,
yellow, green and purple, respectively, which is not very similar to the expected nor the
observed distribution of colors in the total class data. I assume that it is explained because
mine is only one individual subject. When the skittles are made, they are packed
randomly into many bags. It is possible for one bag to have different distribution than the
population, but if the distribution of colors are gathered by a large number of bags, the
result can simulate the distribution of the population.
Organizing and Displaying Quantitative Data: the Number of Candies per Bag
Histogram of width of 5
The shape of the distribution is relative symmetric and bell shape. The data is gathered in
the center. Most of the data is around 55-65. The graphs reflect what I expected to see
because I expect that every bag should have similar amount of candies.
My bag has a total amount of 61 candies, which is close to the mean and median of the
data. That means that the overall data collected agree with my own data.
Categorical data refers to the data that can be placed in different category. They have
label values or characteristic of something. They measure quality. Bar graph and pie chart
make sense for categorical data because the data can be separated into group and the
frequency or proportion can be compared. Stemplot, histogram, and boxplot do not make
sense for this variable because the values are not continuous and some of the necessary
data cannot be obtained such as mean and median for the graphs. Proportion or
percentage make sense because the proportion be used to compare the impact of different
groups. Mean, standard deviation and 5 number summary do not make sense because
they are not numbers.
Quantitative data refers to the data that has numerical values and can usually represent
measurements. They measure quantity. Stemplot, histogram, and boxplot make sense for
quantitative data because they are numbers and can be separated into subgroups for
stemplot and histogram. Bar graph and pie chart do not make sense because the quantity
shouldn’t be separated into groups because they would be continuous numbers. Mean,
standard deviation and 5 number summary make sense because they are numbers.
Proportion or percentage do not make sense because the numbers shouldn’t be separated
into categories to calculate the proportion.
Confidence interval shows the degree of uncertainty with data of a sample. It can also be
used to estimate where the true population parameter would fall. Confidence interval is
used to express the range of uncertainty of a sample. It is used to infer to the true
population value with a level of uncertainty or with an estimate. A 95% confidence
interval, for example, indicates that the true population value would be within the
estimated interval 95% of the time.
A 99% confidence interval estimate for the true proportion of yellow candies
E = 2.576*((0.2111*(1-0.2111))/6680)^(1/2) = 0.01286
Therefore, lower bound = 0.2111 – 0.01286 = 0.19824
upper bound = 0.2111 + 0.01286 = 0.22396
A 95% confidence interval estimate for the true mean number of candies per bag
E = 1.984*7.000257/(111)^(1/2) = 1.3182
We are 99% confident that the true value of population proportion of yellow Skittles
is between 0.19824 and 0.22396.
We are 95% confident that the true value of population mean number of candies per
Skittles bag is between 58.8619 and 61.4983.
Reflection
The sample needs to be in a normal distribution, which our sample is probably not. It has
to be random sampling, which our sample is. The sample has to be below 10% of the true
population, which our sample is.
Sampling error because we are only collecting samples in Utah. If the inference was to
made to the whole nation, the data in Utah may not be able to be used to generalize to the
nation because different states may have different manufactures in making Skittles.
Certain age groups may like to go to specific stores to buy Skittles.
It would be great if we can gather data from different schools and universities across
states, and from more age groups.
Conclusions: the observed proportion of colors of Skittles are similar to the expected
proportion of colors of Skittles.
e-Portfolio
I found it fascinating to see that even though the different bags have different total
number of candies and different proportion of color, when a big number of data is
collected, the observed proportion of colors of Skittles can be very similar to the expected
Data analysis and the ability to understand data and graphs are essential when
reading research papers. Since I am a nursing major, I will come across classes that
possibly require me to read more intervention and innovations that are useful for treating
patients. After doing this project, it will help me understand what I will be reading better.
Some of the questions require deep thinking and some of them I didn’t know how to
solve. So I help me to develop the skills to search for answer from past notes and
textbooks. It also helps me to develop the skills to screen useful data from the internet
There are full of math application in the real world, I didn’t even think about the
colors of the Skittles can be used to formulate a math project. It broadens my thoughts to
looks for more math application and create a study design to answer it, such as the
frequency of people getting into the store or the things people purchased can all be
statistical problem.