Вы находитесь на странице: 1из 6

Skittles Project Group 3

Joel Hanes, Alamissi Ouro-Gneni, Virginia Darger, Lily Ratliff


Introduction:
The study of statistics is composed of ways to prepare data collection, analyze results and
make conclusions about the particular study. In the course Intro to Statistics 1040, we were
assigned a semester project to help us understand various concepts of statistics and apply those
concepts to everyday life. At the beginning of the semester, each student in the class purchased a
bag of 2.71oz Skittles candy. Our initial task was data collection, which we did by counting how
many skittles of each color were in each individual bag and total Skittles per bag. Every student
had their own bag and the data was compiled into a spread sheet in order to analyze the results.
Our next task was to analyze and interpret the data. We did this by constructing visual graphs and
charts as well as statistical tables seen below. The goal for this assignment is to better understand
the concepts of the study of statistics and key components to effectively interpret the validity of
statistical studies. Understanding these concepts is useful in our everyday life and the Skittles
project helped the concepts become more relatable.

Data Collection:
Organizing and Displaying Categorical Data: Colors

Observations:

The colors came out to be relatively close in number. We expected a color or two to be lower in
count and maybe a favored color to be out in front, which we did see in half the charts while the
others were more evenly distributed. The data from our bag follows the pattern of the whole
classes bag except for the color purple which is higher in ours, but the other colors follow the
pattern of the whole classes.
Group Three Bag:
Summary statistics:
Column n Mea Varianc Media Rang Min MaxQ Q Su
n
e
n
e
1 3 m
NUMBE 5 48.4 43.3
46
16
44 60 45 47 242
R

Class Bags:
Summary statistics:

Frequency table results for Total skittles per person:


Count = 26
Total skittles per
Frequency
Relative
Percent of Total
person
Frequency
54
1 0.038461538
3.8461538
55
1 0.038461538
3.8461538

Cumulative
Frequency
1
2

56
58

1
5

0.038461538
0.19230769

3.8461538
19.230769

3
8

59
60

6
6

0.23076923
0.23076923

23.076923
23.076923

14
20

61
62

5
1

0.19230769
0.038461538

19.230769
3.8461538

25
26

Organizing and Displaying Quantitative Data: The Number of Candies per


Bag

Mean: 59.1
Standard Deviation: 1.90
5 Number Summary:
Min: 54
Q1: 58
Q2: 59
Q3: 60
Max: 62
Observations:

The shape of the distribution is almost a normal bell shape, although it is skewed to the left. It
seems that the class data is relatively normal which can be expected considering there are not
extreme outliers. We expected there would be about the same number of candies in each bag,
which wasnt the case. The number of candies from my individual bag was 26 and the total
number of bags in the sample is 26.
Reflection
The categorical data are qualitative variables that consist of names or labels (not
numbers) which represent counts or measurements. The pie charts and bar graphs make sense for
categorical data because they compare one categorical variable against others. Computation, or
arranging in ordering such as low to high, does not make sense for categorical data however,
survey responses of yes, no, and undecided are more appropriate.
Quantitative data is numerical variables consisting of number that can be measured,
ordered, or counted. The scatterplot and steam plot make sense to house quantitative data
because, they help determine whether there is a relationship between two variables or separating
each value into two parts. Computation makes sense for quantitative data to find an average or
mean, standard deviation, five numbers summary, and sum.

Confidence Interval Estimates


A confidence interval is a range of specific values that is used to represent what the true
value of a population parameter may be. The confidence interval is a range of values instead of
just a single number so statisticians can have a better understanding as to how close the
calculated estimate is to the population. We also associate the confidence level value (in the form
of a percentage) with the confidence interval because it provides us with a value of how accurate
our population parameter calculations are. The confidence level allows us to report our
confidence in the estimate population mean value being in-between the range of our confidence
interval. In the following section, our group performed three confidence interval estimates using
the class data of Skittles.

The 95% confidence interval estimate for the true proportion of purple
candies:

Based on the calculations from our sample data, we are 95% confidence that the interval between
0.182 and 0.222 actually does contain the true value of the population proportion of the purple
color candies.
The 99% confidence interval estimate for the true mean number of candies
per bag:
99% confidence interval results:
: Mean of variable
Variable
Sample Mean
Mean candies per bag

Std. Err.

DF

L. Limit

U. Limit

59.0769230.37178603 2558.040593 60.113253

Based on the calculations from our data, we are 99% confident that the interval between 58.041
and 60.113 does contain the true mean value of the population of number of candies per bag.
The 98% confidence interval estimate for the standard deviation of the
number of candies per bag:
98% confidence interval results:
: standard deviation of variable
Variable
Standard Deviation. DF
Mean candies per bag

L. Limit

U. Limit

1.895744234 251.423897574 2.792213262

Based on the calculation from our data, we are 98% confident that the interval between 1.424
and 2.792 does contain the true value of the population standard deviation of the number of the
candies per bag.

Hypothesis tests
A hypothesis is an assumption or claim about some aspect of a population. The various
parameters of the population involved in hypothesis testing are mean, standard deviation,
probability, and variance. Hypothesis tests are used to evaluate the accuracy of the claim
(hypothesis) made about the property of a population. In the following section, our group
performed two hypothesis tests on our classs Skittle candy data.
Test the claim that 20% of all Skittles are green (class bags):
Hypothesis test results:
p: Proportion of successes
H0: p = 0.2
HA: p 0.2
Proportion Count Total Sample Prop.
p

311 1536

Variable

Std. Err.

Z-Stat

P-value

0.202473960.0102062070.24239742 0.8085

Sample Mean

Std. Err. DF

T-Stat

P-value

Candies per bag group 3

60.50.8660254 35.1961524 0.0138

Since our p-value of 0.805 was greater than , we fail to reject the claim. We have sufficient
evidence to support the claim of H0 that 20% of all Skittles candies are green.
The mean number of candies in a bag of Skittles is 56 (class bags):
Hypothesis test results:
: Mean of variable
H0: = 56
HA: 56
Variable
Candies per bag class

Sample Mean

Std. Err.

DF

T-Stat

P-value

59.0769230.37178603 258.2760589 <0.0001

Since our p-value of 0.0001 was less than , we reject the claim that the mean number of candies
in a bag of Skittles is 56. There is sufficient evidence to warrant a rejection of the claim.

Reflection:
The conditions for doing interval estimates and hypothesis tests:
The sample must be a simple random sample or the sample size n must be > 30.

The population needs to have a normal distribution.

The data for our sample met both requirements as the class sample was simple and
random, and although our sample size was less than 30, it was generally normally distributed as
shown in the histogram from the previous section.
An error that could have occurred is that although there is a normal distribution, it is
slightly skewed, and the population size is less than 30, which could cause results to be skewed.
The sampling method could be improved by making the sample more random. For
example, we could likely get more accurate results were the sample to have been taken from
students in statistics classes all throughout Utah.

Вам также может понравиться