Вы находитесь на странице: 1из 14

Math 1040 Term Project

Math 1040 Term Project


Skittles Term Project
Amber D. Voorhies

Page
1

Math 1040 Term Project


Introduction
For our Statistics 1040 Math final project our class has was instructed to observe
and analyze data accumulated from Skittles candy bags we opened and counted. We
collected data from 40 individual 2.17-ounce bags of Skittles and then we collectively
analyzed the ratio of different colored Skittles. We calculated the mean, standard
deviation, and 5-number summary (Range, minimum, maximum, sum, and count). This
data was then constructed into tables and graphs to illustrate the results. We further
investigated our Skittle data by constructing confidence intervals as well as performed
hypothesis tests. We are learning about these different methods and how to apply them
to real life examples.
Colors
The pie and pareto graphs are good visual aids for the Skittle data our class
collected. Each graph is a good illustration to see our proportions and percentages and
allows for visual comparisons. The graphs do an adequate job in depicting the data and
what I expected to see from the data we collected. The overall data collected by the
whole class does not agree with my own data from my single bag of candies. My
proportions are skewed much more than those of the class data. One example is the
percentage of green candies my individual bag had. It had 4 green candies which
calculated out to be 6.557% of my total bag. Whereas the class total of green candies
was 452 which was 18.547% of the class total. With that being said both the class and
single data ended up having the most amount of orange candies and the least amount of
candies were green. The graphs and data can be seen below.

Page
2

Math 1040 Term Project

Pie Chart
Skittle Data Sample
by Color

Red

21%
Orange

Yellow

19% Green

19%

22%
20%

Proportions
Red
.188
Orange
.222
Yellow .1999 or .200
Green .185
Purple .205

Page
3

Purple

Math 1040 Term Project


25.000%
22.199%
20.517%
20.000%

19.984%

.222
.205

18.753%

18.547%

.188

.185

.200

15.000%

10.000%

5.000%

0.000%

Orange

Purple

Yellow

Red

Green

Results of My Own Bag of Skittles


Red
candies
# of
Candies
Percent
age

Orange
Candies

Yellow
candies

Green
candies

Purple
candies

Total

10

19

15

13

16.393%

31.148%

24.590%

6.557%

21.311%

Page
4

61
100.0
0%

Math 1040 Term Project

Results of Entire Class Skittle Bag


Samples
Red
candies
# of
Candies
Percenta
ge

Orange
Candies

457

Yellow
candies

Green
candies

541

18.753%

487

Purple
candies

Total

452

500

22.199%
19.984%
18.547%
The Number of Candies per Bag

20.517%

2437
100.00
%

Mean, Standard Deviation, and 5-number Summary of Entire Skittle


Class Sample
Red

Orange

Yellow

Green

Purple

Total

Mean

11.4

13.5

12.2

11.3

12.5

60.9

Standard Deviation

2.93

3.65

2.75

3.37

3.06

1.93

Range

12.0

13.0

11.0

14.0

13.0

10.0

Minimum

7.0

7.0

7.0

4.0

6.0

54.0

Maximum

19.0

20.0

18.0

18.0

19.0

64.0

Quartile 1

9.0

11.0

10.0

9.0

11.0

60.0

Quartile 3

13.0

15.5

15.0

14.0

14.3

62.0

457.0

541.0

487.0

452.0

500.0

2437.0

40.0

40.0

40.0

40.0

40.0

40.0

Sum
Count

Page
5

Math 1040 Term Project

Histogram
Number of Skittle Candies Per Bag
12
10
8
Frequency
Number of Individual Skittle bags

Frequency

4
2
0

Bin
Total Number of Skittles Per Bag

The shape of this distribution is a bell curve-a normal distribution with a few
outliers. The overall numbers in both single and class data sets do seem to agree with
one another. Their maximum (orange ) and minimum (green) colors coincide with one
another as well as their second lowest (red) number of colors. As far as proportions and
percentages I would have expected the numbers between the single and class data to
have been closer than they were. The total number of candies in my single bag were 61
and comparing that to the mean of the total class candies which was 60.9. This
demonstrates that the data collected from my single bag of candies and the data
collected from the entire class agree with one another.

Page
6

Math 1040 Term Project

Boxplot
Number of Skittle Candies Per Bag

48 50 52 54 56 58 60 62 64 66

Reflection
Two types of data we have studied this semester are quantitative and categorical.
According to Elementary Statistics, 12th Edition, Quantitative data "consists of numbers
representing counts or measurements," where "Categorical data consist of names or
labels not numbers representing counts or measurements."
Graphs that make more sense for categorical data are things like Pie charts and
Pareto charts/Bar graphs. These graphs help depict categories very well visually.
Quantitative data are better represented with graphs such as the boxplot, histograms,
frequency polygons, ogives, and scatterplots. These graphs depict numbers and show
trends and patterns. These represent numbers much better than the latter.
Some good calculations that make sense for Categorical data are the calculations
used in the 5-number summary. These include range, minimum, maximum, sum, and
Page
7

Math 1040 Term Project


count. The numbers that we get from these calculations help us to visually see
proportions and percentages. We can get solid visual representations of the number
derived from these calculations. Calculations that are helpful to quantitative data are
things like mean and the standard deviation. The mean is a measurement of the center of
our data-the top of our bell curve. This can also be known as the center of our data. This
is not easily represented on a pie chart-pie charts do not show the center. This is why a
boxplot, histograms, frequency polygons, ogives, and scatterplots because these graphs
allow us to see that center. According to Elementary Statistics, 12th Edition the standard
deviation "is a measure of how much data values deviate from the mean. This means a
lot numerically because it gives us a standard to evaluate with. Pie and pareto charts do
not illustrate outliers that fall outside of the standard deviation but boxplots, histograms,
frequency polygons, ogives, and scatterplots do.
To summarize, pie and pareto charts are good for categorical data and boxplot,
histograms, frequency polygons, ogives, and scatterplots and better for quantitative data.

Confidence Interval Estimates


Confidence Intervals are a collection of values that are used to "estimate the true
value of a population parameter." ( Elementary Statistics, pg 325) They give us visual
boundaries to help us see if our hypothesis is reasonable or not.
Discussion of the Three Interval Estimates
The first confidence interval we set up was 95% confidence Interval for the true
proportion of purple candies. Our results .194 < p < .226 show that the 500 purple

Page
8

Math 1040 Term Project


candies-which is .21 and it is within a 95% confidence interval. dAjls

Page
9

Math 1040 Term Project


The second confidence interval we constructed to estimate for the true mean
number of candies per bag is 61.726 < < 60.074. This supports or proves our mean
60.9 is within this confidence interval.

The third confidence interval we constructed was to get a 98% confidence interval to
estimate for the standard deviation. The standard deviation we calculated from our class
data is 1.93. This falls within the 98% confidence interval 1.510 < <2.560.

Page
10

Math 1040 Term Project

Hypothesis Tests
Hypothesis tests are used to measure and check claims made " about a property of
a population." (Elementary Statistics, pg 382) This helps us get a good indication if we
need to re-evaluate the initial claim.
Discussion
The first hypothesis test , we use a 0.01 significance level to test the claim that
20% of all Skittles candies are green. We used the null hypothesis that p=.02 and to test
this claim we used the alternate hypothesis that p .20. Finding that our significance
level or = .01 and our /2= .005. We found our critical values from .005 which are
-2.575 and 2.575. We then solved for our test statistic and got -1.79. -1.79 is within our
critical values which allows us to conclude that we fail to reject our null hypothesis
because we do not have enough evidence to reject the null hypothesis that 20% of all
Skittles candies are green. The evidence we have gained from testing the claim provides
sufficient evidence that we are unable to reject the claim that 20% of all Skittles candies
are green.

Page
11

Math 1040 Term Project

The second hypothesis test was to use 0.05 significance level to test the claim that
the mean number of candies in a bag of Skittles is 56. The null hypothesis is that p=56
and the alternate hypothesis is p56. The critical values for our significance level are
-2.023 and 2.023. Our test statistic is 16.057. This is significantly outside of our critical
values. We therefore reject our null hypothesis. We have enough evidence to show that
the mean number of candies in a bag is not 56.

Reflection
Requirements estimating a population proportion p are that the sample needs to
be a simple random sample, there needs to be a fixed number of trials, the trials need to
be independent, there need to be two categories of outcomes, and the probabilities need
to be constant for each trial. There also needs to be at least 5 successes and 5 failures.
The class samples do fall within these requirements. Requirements for hypothesis tests
concerning population proportions np 5 and nq 5. N=2437 and p =.21.
2437*.21=511.77 which is greater than 5. N*q which is 2437*.79=1925.23. The sample
did fit these requirements.
Requirements for interval estimates for a population mean is to make sure the
sample is a simple random sample. "Either or both the population is normally distributed
or n > 30. Our sample was a simple random sample and our n is 40 which is greater than
Page
12

Math 1040 Term Project

30. The requirements for a hypothesis testing the population mean is if

is not known

and normally distributed or standard deviation is not know and n >30 the equation you
use is

t=x /(

use the equation

s
)
n

or if

z=x /(

is known and normally distributed or

and n >30 you

)
n . The sample did meet these requirements.

Requirements for doing interval estimates and hypothesis tests for population
standard deviations are the samples need to be simple random samples, the population
needs to be normally distributed-no exceptions. This sample does meet these
requirements.
A possible error that could have been made by using this data is that we could
commit the fallacy of a type I error which is "the mistake of rejecting a true null
hypothesis." (Elementary Statistics, pg 393) In our Skittles example, this would be if we
were to reject that 20% if all Skittles candies are green when, in fact, our evidence
suggests that we actually fail to reject it.
The other possible error that could have been made is a type II error. This is where
" the mistake of failing to reject the null hypothesis is false." (Elementary Statistics, pg
392) In our Skittles example, this would be where we fail to reject that the mean number
of candies in a bag of Skittles is 56 when, after our calculations prove that 56 mean
Skittles is well outside of our critical values. The calculated total true mean of our data is
60.9. Type II error is failing to reject our null hypothesis when our evidence proves that we
ought to reject it.
It is best to have a large sample to get the best results. Large samples have more
allowance for errors and allow us to get a more normal distribution.
Conclusion-Reflective Writing
There are many math skills that I have been able to identify that we applied in this
project that will impact other classes I will take in my school career.
I have recently completed pre-requisite courses required for my nursing school
application. Statistics are used frequently in the medical field. Experiments and trials
being performed in order to get new treatments and medications approved for patient
care. These experiments and trials are used in the medical field to increase quality of
treatments.
Page
13

Math 1040 Term Project


Hypothesis tests are used to see whether or not treatments that health
professionals are recommending are effective. These experiments/trials use subjects and
are often conducted using a double blind study method. In this class, I learned that a
double blind study is when both the subjects being tested do not know whether they are
receiving a placebo or the medication that is being tested and the administrators or
doctors who are providing the medication do not know which subject gets the placebo or
medicine. This type of statistic experiment helps provide unbiased results. Before the
blind study is set up, researchers are able to construct hypothesis tests to help us see
how effective these medications/ treatments are. Methods based in statistics allow
people to make educated hypothesis tests and provide helpful results. This is imperative
in the medical field. When an individual's life depends on an outcome it is vital that we
have accurate test results so that we can give people the best medical care.
Statistic methods such a confidence levels help medical professionals successfully
explain treatment options and the advantages/disadvantages using evidence derived
using Statistical methods and formulas.
Using methods based in statistics allows researchers to pinpoint their successes
and helps them to make necessary changes. Researchers were able to do this very thing
when the SALK Vaccine Experiment was conducted in 1954. (See Elementary Statistics
pages 28-29) It is now 2015 and polio is and illness rarely ever heard of. Children are
able to be vaccinated to prevent the onset of polio. This is all because experiments were
conducted and the results were analyzed and deemed reputable using methods founded
in Statistics.
Taking this course has helped me better understand how medical community and
governments make decisions about patient care. Decisions are not made based on
emotions, they are based on facts and experiments that have been conducted in an
unbiased manner. We have laws and codes that are based on Statistical standards.
Treatments must meet requirements based on the standards mentioned previously and
then have to be successfully repeated over and over. Statistics has provided a foundation
of understanding for how things are done or rather how things ought to be done in the
medical community. This is extremely helpful for me as I am pursuing a career in the
medical field. It has expanded my understanding.

Page
14