Академический Документы
Профессиональный Документы
Культура Документы
Topics
Usefulness of ND
Many things are normally distributed or very close to it
ex: height, IQ score so on
Easy to work with mathematically
There is a very strong connection between the size of sample N and the extent to
which a sampling distribution approaches the normal form
Errors in measurement or in production
Characteristics of ND
symmetric bell shaped
one peak
Arithmetic mean, median and mode are equal
total area under the curve is 1.00
ND are denser in the center and less in tails
Normal distribution are defined by two parameters
mean “mu” (μ) and standard deviation (“sigma” σ)
One peak
Denser center
symmetrical
ND Orientation
The symmetrical bell-shaped curve representing the probability density function of a
normal distribution
The area of a vertical section of the curve represents the probability that the random
variable lies between the values which delimit the section.
The tail of the curve is called skew
μ is the location parameter
σ is the scale parameter.
skew
Normal Distribution
Statistics is used to organize data, which is important if we want to analyze and draw
general conclusions or make predictions about the data. For example, imagine we have a
collection of crayons and want to know how many crayons of a certain colour exist in the
crayon box. After collecting our information (that is, the number of crayons for each colour or
the frequency of appearance for each colour) we can organize and illustrate our data on a
graph, such as a bar graph or pie chart
The graphs can be helpful in deciding if we need to buy more crayons of a particular
colour or if we were to randomly pick up a crayon from the box, what colours would be more
likely picked. There are 60 blue crayons out of a total of 100 crayons in the box. That means
60% of the crayons are blue. If we were to pick out a random sample of 10 crayons from the
box, we could get any number of blue crayons ranging from 0 to 10. Since 60% of the crayons
are blue, we would expect to get 6 blue crayons with each sample trial, but the exact number
of blue crayons per trial will vary. However, the average number of blue crayons obtained
from a large number of trials should be close to the expected value of 6, and this average will
tend to become closer as more trials are performed. Let’s take a look at this example
graphically. The following graph shows the number of blue crayons per sample over 25 trials:
Normal Distribution
If we now organize the data so that we are illustrating the number of trials that had a specific
number of blue crayons (or the frequency of when we get a particular number of blue crayons)
we get the following graph:
In this case over 25 trials, the average number of the blue crayons per trial is 5.3. If we
keep performing more trials, we will find that the average will become closer to our expected
value of 6. If you compare the appearances of the above graphs, you’ll notice that data can
be distributed in a few different ways. Sometimes a peak may be more skewed to the left or
to the right of a graph, or it may have a random distribution. We often find that a data set will
follow a particular type of distribution where the peak is concentrated mostly around a
central value.
The above graph illustrates what is called a normal distribution of data, which means that
50% of the data points in the set are on either side of the central value. The central value in a
normal distribution is the value that occurs most often in the data set (i.e. the mode). It is
also the average value in the data set (i.e. the mean). Furthermore, if you were to rank all
the data values in the set in ascending order, the central value would also be the value that is
in the middle of the ordered set (i.e. the median).
We can also draw a curve through the data points. Notice that the curve’s shape
resembles a bell, which is why it is often called a bell curve. Graphs of normal distributions
will look different depending on the mean value (which determines the location of the center
of the graph) and the standard deviation (which is the measure of how spread out the data
values are). When the standard deviation is large, the data values are spread out from the
mean so the graph will look more flat. For example, the first graph has a larger standard
deviation than the second graph:
If we have a graph with a normal distribution and its mean value is equal to 0 and it has a
standard deviation of 1, then the graph is illustrating a standard normal distribution.
The above graph is telling us that 68% of all the data values are within 1 standard
deviation of the mean 95% of all the data values are within 2 standard deviations of the mean
99.7% of all data values are within 3 standard deviations of the mean.
Example: Let’s say we want to plant some shrubs along the exterior of a building but we want
to make sure that the plants will not likely grow taller than the windows. We find out that 95%
of a particular type of shrub grows to a maximum height between 1.1m and 1.7m tall.
Assuming the data is a normal distribution, we can calculate the mean and standard
deviation.
The mean is the halfway point in the data: Mean = (1.1 m + 1.7 m) ÷ 2 = 1.4 m 95% is 2
standard deviations on either side of the mean value. Therefore, the difference between
1.1m and 1.7m can be divided by 4 to determine the value of 1 standard deviation. 1 standard
deviation = (1.7 m – 1.1 m) ÷ 4 = 0.15 From this data, we now know that the average height of
that type of shrub is 1.4 m and that with any particular shrub, there is a 68% probability that
it will be within 0.15 m from the average (i.e. between 1.25 m and 1.55 m). “Standard Score”
or “z-score” is used to describe the number of standard deviations a particular value x is from
the mean.
If we have a shrub that is at a height of 1.85 m, according to the above graph it would be
3 standard deviations from the average. That is, the z-score for that shrub is 3. If we have
another shrub that is at a height of 1.8 m, how many standard deviations is it from the mean?
To solve this, we need to calculate the difference between the shrub height and the mean,
and then divide that value by the standard deviation:
z-score = (1.8 m – 1.4 m) ÷ 0.15 m = 2.67 Let’s say we have a particular shrub that is 1.2 m
tall. What is its z-score? z-score = (1.2 m – 1.4 m) ÷ 0.15 m = -1.33 This shrub is -1.33 standard
deviations from the average. A negative z-score indicates that the height of the shrub is
shorter than the average. If we have another shrub that is exactly the same height as the
average, the z-score would be 0. If we were to create another graph of the data using the
z-scores instead of height values, we would end up with the standard normal distribution
graph.
Practice Questions
1) The class average for a test is 75% and the standard deviation is 4%. Duncan has a test score
of 83%. Assuming the test scores follow a normal distribution, what percentage of the class
did better on the test than him? 2) It takes Leia an average of 45 minutes (standard deviation
is 7 minutes) to travel from her home to her office. If she wakes up late one morning and only
has 38 minutes to get to work, what is the probability she will get there on time assuming her
travel times follow a normal distribution?
Answers 1) z-score = (83% – 75%) ÷ 4% = 2 If we look at a standard normal distribution graph,
2.5% of the test scores are greater than Duncan’s score. 2) z-score = (38 min – 45 min) ÷ 7 min
= -1 If we look at a standard normal distribution graph, 84% of the travel times are above a
z-score of -1. This means that 84% of Leia’s travel times are longer than 38 minutes, that is,
she has an 84% chance of being late. So Leia only has a 16% chance of getting to work on time.
‘x;llTest for Normality
Graphical Method
Analytical Test Procedures
Graphical Method
provide powerful diagnostic tools for confirming assumptions
quick summaries of essential data
typically used with quantitative statistical evaluations
Histogram
Easiest simplest graphical plot
This test simply consist of looking at the histogram and discerning whether it
approximates the bell curve
Stem and Leaf Plot
is a special table where each data value is split into a "stem" (the first digit or digits) and
a "leaf" (usually the last digit).
same as histogram look for symmetry or the bell curve
Example:
Stem "1" Leaf "5" means 15
Stem "1" Leaf "6" means 16
Stem "2" Leaf "1" means 21
Etc