Вы находитесь на странице: 1из 12

Nash Stewart

Statistics Term Project

The graphs reflect what I expected to see. They seem to represent a fairly even
distribution of colors. I was surprised that there would be so many orange and so
few green, and I was surprised to see that some people had quite a few more
skittles in their individual bags.
There was one person with 110 total skittles and the average seemed to be
between 60 and 70. These outliers will skew the mean towards the positive
direction. However the median will stay the same.

I had a lot higher proportion of purple than the total proportion of all peoples
purples, but everything else appears to be about normal.

The graph of the total candies in each bag is fairly symmetrical,


perhaps slightly left skewed. However, the results are not surprising. I would
have guessed that most people would receive about the same amount with
fewer people receiving more or less than the mean. The mean for the class
was 59.45 and I got 60 which was the median. So my results were as normal
as they could have been. I did not have any results that were outliers. The
overall class data reflects the results of my bag almost perfectly. The class
total had 64 bags of candies and the most people in the class had around 60
candies per bag. I had 60 candies in my bag so my bag agrees with the class
sample.
Quantitative data are data that can be measured, counted, added,
subtracted, multiplied, divided, etc. Categorical data are data that can be
sorted into different categories and include variables that cannot be
measured such as gender, hair color, college major, etc. The graphs that are
generally used for categorical data are a bar graph or a pie graph. These
types of graphs are good for organizing the data in a readable way. They
show how many of the observable units fit into each category, but the dot
plots, etc., are useful for displaying information that can be measured in
units. Graphs like the dot plot, stem plot, histogram, etc. are preferred for
quantitative data because they can show an infinite number of variables and
the relationship between the variables. For example, how they differ from
each other. A pie graph or a bar graph just show the different categories in
relation to each other, but the other graphs and plots used for quantitative
can actually show the differences between each individual variable of each
observable unit.
Z-test, mean, median, and standard deviations are all things that can
be measured in one trial of a study using quantitative data. To do tests like

these on categorical data you would have to run many different trials of the
same study repeatedly to turn the categorical data into a quantitative form
that is conducive to these calculations. Ratios and proportions are things that
you would care about calculating with categorical data, since it is much more
important to measure the category against the sample population than it is
to compare it to other variables. Also, categorical data cannot be measured
or manipulated mathematically in the same say that quantitative data can
be manipulated because it does not depend on numerical values but rather
categorical names.

A confidence interval is the margin of error for a statistic meaning that


if the confidence is 95%, there is a 95% chance that the actual
representation will be within the confidence interval. The confidence interval
is the observed statistic + or the critical value. There is a probability
determined by the degree of confidence that an actual representation of the
population parameter will fall within the confidence interval. The purpose of
the confidence interval is to determine the accuracy of the estimation. This
accuracy is determined by the degree of confidence. However there is a
tradeoff. Increased accuracy that a parameter will fall within the confidence
interval decreases the precision of the confidence interval.
Construct a 99% confidence interval estimate for the population proportion
of yellow candies. Show your work, including the computations for the
margin of error and the critical value.
Using the formula for estimating population parameter:
population proportion statistic = p
p E < parameter < p + E where E =

z / 2 p (1 p)/n

p = 698 / 3567 = 0.1957 and for 99% confidence

z / 2

= 2.58

When all is plugged in E = 0.0171


Therefore the confidence interval:
(0.1785 , 0.2128)

Construct a 95% confidence interval estimate for the population mean


number of candies per bag. Show your work, including the computations for
the margin of error and the critical value.
xx = 59.45
xx - E < < xx + E where E =
E = 1.96(2.6/

t /2

s/n

60 ) = 0.6579

(59.45 0.6579, 59.45 + 0.6579) = (58.7921, 60.1079)

Construct a 98% confidence interval estimate for the population standard


deviation of the number of candies per bag. Show your work, including the
computations and the critical values.

( n1 ) s / < < ( n1 ) s /
2

( 601 ) 2.62 /88.379< < ( 601 ) 2.62 /37.485


(2.1243,

3.2619

Discuss and interpret (with complete sentences) the results of each of your
three interval estimates.
The 99% confidence interval estimate for the population proportion of
yellow candies means that there is 99% confidence that the actual
proportion of yellow candies is between 0.1785 and 0.2128. The 95%
confidence interval estimate for the population mean number of candies per
bag means that there is 95% confidence that the actual mean of the number
of candies per bag is between 58.7921 and 60.1079. The 98% confidence
interval estimate for the population standard deviation of the number of
candies per bag means that there is 98% confidence that the actual standard
deviation for number of candies per bag is between 2.1243 and 3.2619 .

In this project I learned how to work with a team to analyze and determine
statistics for a sample population. In this case bags of skittles and their
content colors were the sample population. The mathematics and statistics
skills I have learned in this project will help me to more accurately problem

solve in the creation of statistic based programs I will write in my computer


science courses. Also the collaboration and cooperation with teammates will
greatly benefit me in the development of software in my computer science
projects. As I am tasked with writing scientific, research based programs, the
statistical knowledge which I gained through this course and this project will
be useful in writing code to simulate these processes or calculate the theory
based models.
Most of the problem solving skill development from this project
occurred during group work and while researching statistical procedures.
During group work, the team had to figure out how to meet together or
communicate online in a way that allowed for critical thinking and real time
collaboration. The group had to learn how to delegate tasks and work as a
team to find the best ways to calculate the statistics and present them.
This project showed me that real-world statistics are much more
complicated than I originally viewed them. For example, I was not aware of
how a confidence interval must be carefully considered and calculated in
order to have the statistic gain legitimacy. I originally thought that
confidence intervals were simply calculated and I did not realize that the
confidence increased as the interval increased.

Вам также может понравиться