Вы находитесь на странице: 1из 7

Riley Morris

1St
5/28/19
Skittle Project
Report Introduction

In this project we counted the number of skittles in a group of bags and we counted the number
of each color so that we could sort and classify the data with Hypothesis tests, confidence intervals, and
means.

Organizing and Displaying Categorical Data: Colors

Results

In my bag:

Red Orange Yellow Green Purple Total


Candies Candies Candies Candies Candies Candies
11 11 8 13 14 57

Entire Class Results:

Red Candies Orange Candies Yellow Candies Green Candies Purple Candies Total Candies
218 229 187 200 228 1062

The totals are pretty equal across all the colors but I was surprised to see that Yellow was the
lowest. Yellow was the lowest in my bag and it was the lowest for the class too. Originally, I thought the
proportions would be equal for all five colors, but purple looks like it is always having the most.

Organizing and Displaying Categorical Data: The Number of Candies per Bag
Riley Morris
1St
5/28/19
Mean SD Min Q1 Median Q3 Max
Bag Totals 59 1.37 57 58 59 60 61

The shape for our data is very symmetric and it is almost uniform. The graph does reflect what I
would expect to see. There are no outliers and there is a small range of distribution. The total that I got
for my bag agrees with the class data my total being 57 which agrees with our data. For our sample we
had 18 bags of skittles and tested them all.

Reflection:

Categorical data is sorted into groups based off of characteristics like shoe size, color, and
activities, while quantitative data is all about numbers and data. I believe that Histograms are best to
display categorical, and bar charts are the best for quantitative. Histograms are easier to make
comparisons to the other data so you can see exactly how much more or less there is, And for
quantitative bar charts are the best for the same reason you can easily see how much you have and they
are easier to read, but box and wicker plots are best if you are looking for the exact middle or to see the
spread. When you are doing means and standard deviation you don’t want to be using categorical so to
make it easier for your audience to read you use the categorical data with a clear labeled Histogram.
Categorical is best when comparing proportions and populations.

Confidence Interval Estimates:

Confidence intervals are used to predict where your data will land. You can be 99% sure that
your answer will land within two points.

99% Confidence interval: We can be 99% confident that there will be 9.07 and 11.71 Yellow skittles per
bag.
Riley Morris
1St
5/28/19
95% Confidence Interval for mean: We can be 95% confident that there will between 58.366 < μ <
59.634 Skittles per bag.

Hypothesis Testing

The purpose of a hypothesis test is to see if the proportions of skittles and their colors are true
to what we think they are. They help us to see if we need to reject the null hypothesis. The alternate
hypothesis lets you know if what you have is not equal to the information given.

Using a 0.05 significant level to test the claim that 20% of all skittles are red:

Ho: 20% of skittles =red

Ha: 20% of skittles ≠ red

I fail to reject the claim of 20% of the skittles being red

Use a 0.01 significance level to test the claim that the mean number of candies in a bag of skittles is 55

Ho: Mean number of skittles per bag = 55

Ha: Mean number of skittles per bag ≠ 55

I reject the Ho that the mean number of skittles per bag is 55

1. At first it would appear that the true proportion of red skittles is 0.2, but p-hat= 0.17. There is a
probability that that the sample proportion this low is due to sampling variability so we will conduct
the 1 sample z test of p (significant level= 0.05)
2. Ho= 0.2 Ha <0.2
3. Conditions
a. Random sample (assumed)
b. Sample <0.05% of population (Yes assuming there are more than 4000 bags)
c. Large sample size for normality npq≥ 10 (Yes)
4. P (p̂ <0.17) =P (z <-0.088) = 0.00036
5. Since p value 0.0 <0.05 we fail to reject

Reflection:

Interval conditions

1. We are trying to estimate mean = the average number of skittles per bag. Our guess is x̄= 59 but
because of sampling variability, we are unlikely to be correct so we will calculate the 95 %
confidence interval for the mean.
2. Conditions
a. Random sample (assumed)
b. Sample < 0.05%
c. N 30 < n (no, but population is approximately normal)
3. 95%CI= 59+ or – 2.898(1.37/ √18) = (58.366, 59.634)
Riley Morris
1St
5/28/19
4. There for I am 95% confident that the mean is between 58.366 and 59.634

Hypothesis test conditions

1. At first it would appear that the true proportion of red skittles is 0.2, but p-hat= 0.17. There is a
probability that that the sample proportion this low is due to sampling variability so we will conduct
the 1 sample z test of p (significant level= 0.05)
2. Ho= 0.2 Ha <0.2
3. Conditions
a. Random sample (assumed)
b. Sample <0.05% of population (Yes assuming there are more than 4000 bags)
c. Large sample size for normality npq≥ 10 (Yes)
4. P (p̂ <0.17) =P (z <-0.088) = 0.00036
5. Since p value 0.0 <0.05 we fail to reject

Are the conditions met with your sample?

Our conditions were not met with this sample because we had to small of a population size. Some of the
errors that were possible is that people might have mis counted their skittles or we could have gotten
bags that were produced at a time when the machine was not calibrated correctly. One way that I would
improve my sampling method is I would have made sure to get bags from different locations at different
times and I would get a larger sample size. My conclusion is that I am 95% confident there will be 58.366
Riley Morris
1St
5/28/19
and 59.634 skittles per bag.
Riley Morris
1St
5/28/19
++
Riley Morris
1St
5/28/19

Вам также может понравиться