Statsfinalproject

Mikaela Ingraham
Math 1040 Signature Project

Throughout this project I learned how realistic everyday situations can
be applied to calculations in statistics. This was a very interesting project to
me because I did not realize that collecting the population data of Skittles
could be used in a variety of different statistic calculations such as graph
compilation, confidence intervals, and margin of error. To summarize, this
project helped me realize that statistics calculations can be used to solve all
kinds of real world applications, as long as population data is collected. The
following information includes data I have collected and summarized during
my studies of statistics, beginning with sample data collection.
Based on limited information about the manufacturing process we
assumed each color would be roughly equivalent about 20%
proportionately. If you look at the results of the class data set you can see
that the variation from 20% ranges from +0.014 to -0.010. With a sample
size of 2,268 that would mean a count of 454 of each color. We think that the
estimate of 20% per color is valid and relatively accurate. We resisted the
temptation to make an inference from our own bags of skittles because we
considered the sample size to be too small, although there is no reason to
believe it is not random.
If you consider the overall class data as the sample then the population
would be Skittles manufactured. If for experimental purposes you designate
only Skittles purchased by our class as the population then of course every
individual bag would be a sample. In either case, unless there are distribution
factors we are unaware of, each sample is random.
Count Count Count Count
Count Red Orange Yellow Green Purple Total
Class
Counts 464 439 485 449 431 2268
In My Bag 8 19 12 8 9 56
The graphs in some ways reflected what I expected to see. I had heard
a rumor that yellows and oranges were by far the most common color in a
bag of Skittles, but based on our data, the colors are pretty evenly matched
up. Oranges appear to be the second least color which didnt match up with
my bags data, although yellow is the greatest color, as I predicted. It was
also a surprise that red Skittles were the second most common color, I
expected them to be less since I feel I never get a lot of red Skittles in a bag.
There were a few outliers, like my abnormally high orange Skittles
count, and I did see someone elses dta had a high purple Skittles count.
Outliers will add to the count of the data, which can make it higher than an
average sample.
I think the Skittles distribution mostly matches up with my data,
besides the orange count. Yellow Skittles were not my highest count like the
class data suggests, nor did I have a large amount of red Skittles.
The next part of the project I worked on included compiling the data
into numerical representation graphs. In order to collect the data for these
graphs, we had to calculate the average and standard deviation amount of
candies per bag. The results of our calculations and graphs are included.
Mean number of candies per bag: 59.5
Standard deviation of the number of candies per bag: 2.9
5-number summary for the number of candies per bag: 53, 58, 59.5, 61, 66
The shape of the distribution is symmetrical, or bell shaped. I was a
little surprised, because glancing over the data everyones number of
Skittles in each bag seemed completely random. However, seeing the data in
a graph made it apparent that the most common amount of Skittles in a bag
was around 60. Both the frequency and box-and-whiskers plot agreed with
this amount. I had 56 Skittles in my bag, so out of the classs 38 bags, my
bag agreed with the rest of the data. My bag would fit in with the first
quadrant.
Categorical data is qualitative data that can fit into groups such as the
number of yellow Skittles in a bag, as we demonstrated in this project.
Categorical data is further organized into groups for organizational and
statistical purposes. In our project, we individually recorded how many
Skittles of each color we had in a bag and combined our data into a class
sample. We grouped the data into categories because we wanted to see how
each color compared to each other in the sample. For categorical data, using
a pie chart makes sense because pie charts often represent the number of a
subject in each category. A Pareto chart is also useful for categorical data
because each category is arranged in descending order. Pareto charts put
focus on a significant part of the data, for instance, if we want to know the
frequency of the most common Skittles color. For categorical data, which
focuses on frequency, we would want to use calculations of frequency
distribution and relative frequency distribution. Relative frequency
distribution always adds up to one, which is ideal for pie chart
representations.
Quantitative data is numerical data that can be ordered and measured.
We would want to use this data for comparison of measurements. Graphs
that are used for quantitative data are frequency histograms and box-and-
whiskers plots. These graphs are useful because they group quantitative
data into numerical measurements which can be easily organized. These
graphs are also easy to analyze and determine whether the data is skewed
or symmetrical. Common calculations for quantitative data include
determining mean, mode, range, standard deviation, quadrants, and five
number summaries. We can also use lower and upper fence calculations to
determine whether there is an outlier in the data set.
The final part of the project I worked on included calculating
proportions, margin of error, and confidence intervals for different
proportions of Skittles. The following calculations were solving the confidence
intervals for yellow Skittles and the true value of the population mean. A
confidence interval is essential for providing a range of values that is likely to
contain the population parameter.
A confidence interval provides a range of values that is likely to contain
the population parameter of interest, or to express the degree of uncertainty
associated with a statistic. In statistics it is important that how well a sample
statistic estimates the population value. Specifically, a confidence interval is
an interval estimate combined with a probability statement. Typically
confidence intervals are preferred to point estimates because confidence
intervals provide the uncertainty and precision of the estimate.
During my time in Math 1040 I learned how exactly data is gathered,
organized, and calculated into readable information that is simple for one to
scan and understand. This class helped expand my understanding of
statistics as a whole and why gathering data about a population is important
in real-world applications. To summarize, I learned that statistics is about
gathering information in a relatively simple and reliable way and using data
results in a way that will better current existing conditions.
Prior to beginning this class I had no idea what statistics entailed. I had
an image in my head of a call center gathering data for personal interest but
I didnt know how that data was organized or used. I learned in this class that
there are diverse ways of sampling data from a population such as simple
random or systematic sampling. Statistics is not also a science only utilized
by call centers as I had previously believed, but can be used in very real
world settings such as customer responses for improving a company, or
optimal standardized testing in schools. One of my favorite problems in this
class involved a teacher missing an exam score but being able to find out
what it was using the mean and number of the exam scores. It seemed
simple but it had never occurred to me previously that there could be a way
to figure that out.
In this project itself I found it interesting that we could organize a vast

amount of data, the number of each color in a bag of Skittles for each person
in the class, and organize it in such a different amount of ways. I liked
arranging the data into graphs that held a lot of information but were visually
easy to scan and read. Applying what we learned about confidence intervals
to the project also helped me realize that confidence interval calculations
can be used and interpreted in a variety of data. Overall, this project helped
me understand that statistics calculations can be applied to many different
real-world applications.

Statsfinalproject

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statsfinalproject

Загружено:

Авторское право:

Доступные форматы

Mikaela Ingraham

Math 1040 Signature Project

In this project itself I found it interesting that we could organize a vast

Вам также может понравиться