Вы находитесь на странице: 1из 15

Normal Distribution

Topics

 Introduction of Normal Distribution


 Usefulness of ND
 Characteristics of ND
 Area of ND
 Test for Normality
prepared by:Angelee Mandreza
Normal Distribution
 specific distribution that is symmetric about the mean
 also known as the Bell Curve
 also called Gaussian Curve
 most important and most widely used distribution in statistic

Usefulness of ND
 Many things are normally distributed or very close to it
 ex: height, IQ score so on
 Easy to work with mathematically
 There is a very strong connection between the size of sample N and the extent to
which a sampling distribution approaches the normal form
 Errors in measurement or in production

Characteristics of ND
 symmetric bell shaped
 one peak
 Arithmetic mean, median and mode are equal
 total area under the curve is 1.00
 ND are denser in the center and less in tails
 Normal distribution are defined by two parameters
 mean “mu” (μ) and standard deviation (“sigma” σ)

One peak

Denser center

symmetrical

ND Orientation
 The symmetrical bell-shaped curve representing the probability density function of a
normal distribution
 The area of a vertical section of the curve represents the probability that the random
variable lies between the values which delimit the section.
 The tail of the curve is called skew
 μ is the location parameter
 σ is the scale parameter.

skew
Normal Distribution

Statistics is used to organize data, which is important if we want to analyze and draw
general conclusions or make predictions about the data. For example, imagine we have a
collection of crayons and want to know how many crayons of a certain colour exist in the
crayon box. After collecting our information (that is, the number of crayons for each colour or
the frequency of appearance for each colour) we can organize and illustrate our data on a
graph, such as a bar graph or pie chart

The graphs can be helpful in deciding if we need to buy more crayons of a particular
colour or if we were to randomly pick up a crayon from the box, what colours would be more
likely picked. There are 60 blue crayons out of a total of 100 crayons in the box. That means
60% of the crayons are blue. If we were to pick out a random sample of 10 crayons from the
box, we could get any number of blue crayons ranging from 0 to 10. Since 60% of the crayons
are blue, we would expect to get 6 blue crayons with each sample trial, but the exact number
of blue crayons per trial will vary. However, the average number of blue crayons obtained
from a large number of trials should be close to the expected value of 6, and this average will
tend to become closer as more trials are performed. Let’s take a look at this example
graphically. The following graph shows the number of blue crayons per sample over 25 trials:
Normal Distribution
If we now organize the data so that we are illustrating the number of trials that had a specific
number of blue crayons (or the frequency of when we get a particular number of blue crayons)
we get the following graph:

In this case over 25 trials, the average number of the blue crayons per trial is 5.3. If we
keep performing more trials, we will find that the average will become closer to our expected
value of 6. If you compare the appearances of the above graphs, you’ll notice that data can
be distributed in a few different ways. Sometimes a peak may be more skewed to the left or
to the right of a graph, or it may have a random distribution. We often find that a data set will
follow a particular type of distribution where the peak is concentrated mostly around a
central value.
The above graph illustrates what is called a normal distribution of data, which means that
50% of the data points in the set are on either side of the central value. The central value in a
normal distribution is the value that occurs most often in the data set (i.e. the mode). It is
also the average value in the data set (i.e. the mean). Furthermore, if you were to rank all
the data values in the set in ascending order, the central value would also be the value that is
in the middle of the ordered set (i.e. the median).

We can also draw a curve through the data points. Notice that the curve’s shape
resembles a bell, which is why it is often called a bell curve. Graphs of normal distributions
will look different depending on the mean value (which determines the location of the center
of the graph) and the standard deviation (which is the measure of how spread out the data
values are). When the standard deviation is large, the data values are spread out from the
mean so the graph will look more flat. For example, the first graph has a larger standard
deviation than the second graph:

If we have a graph with a normal distribution and its mean value is equal to 0 and it has a
standard deviation of 1, then the graph is illustrating a standard normal distribution.
The above graph is telling us that 68% of all the data values are within 1 standard
deviation of the mean 95% of all the data values are within 2 standard deviations of the mean
99.7% of all data values are within 3 standard deviations of the mean.

Example: Let’s say we want to plant some shrubs along the exterior of a building but we want
to make sure that the plants will not likely grow taller than the windows. We find out that 95%
of a particular type of shrub grows to a maximum height between 1.1m and 1.7m tall.
Assuming the data is a normal distribution, we can calculate the mean and standard
deviation.

The mean is the halfway point in the data: Mean = (1.1 m + 1.7 m) ÷ 2 = 1.4 m 95% is 2
standard deviations on either side of the mean value. Therefore, the difference between
1.1m and 1.7m can be divided by 4 to determine the value of 1 standard deviation. 1 standard
deviation = (1.7 m – 1.1 m) ÷ 4 = 0.15 From this data, we now know that the average height of
that type of shrub is 1.4 m and that with any particular shrub, there is a 68% probability that
it will be within 0.15 m from the average (i.e. between 1.25 m and 1.55 m). “Standard Score”
or “z-score” is used to describe the number of standard deviations a particular value x is from
the mean.
If we have a shrub that is at a height of 1.85 m, according to the above graph it would be
3 standard deviations from the average. That is, the z-score for that shrub is 3. If we have
another shrub that is at a height of 1.8 m, how many standard deviations is it from the mean?
To solve this, we need to calculate the difference between the shrub height and the mean,
and then divide that value by the standard deviation:

z-score = (1.8 m – 1.4 m) ÷ 0.15 m = 2.67 Let’s say we have a particular shrub that is 1.2 m
tall. What is its z-score? z-score = (1.2 m – 1.4 m) ÷ 0.15 m = -1.33 This shrub is -1.33 standard
deviations from the average. A negative z-score indicates that the height of the shrub is
shorter than the average. If we have another shrub that is exactly the same height as the
average, the z-score would be 0. If we were to create another graph of the data using the
z-scores instead of height values, we would end up with the standard normal distribution
graph.

Practice Questions
1) The class average for a test is 75% and the standard deviation is 4%. Duncan has a test score
of 83%. Assuming the test scores follow a normal distribution, what percentage of the class
did better on the test than him? 2) It takes Leia an average of 45 minutes (standard deviation
is 7 minutes) to travel from her home to her office. If she wakes up late one morning and only
has 38 minutes to get to work, what is the probability she will get there on time assuming her
travel times follow a normal distribution?
Answers 1) z-score = (83% – 75%) ÷ 4% = 2 If we look at a standard normal distribution graph,
2.5% of the test scores are greater than Duncan’s score. 2) z-score = (38 min – 45 min) ÷ 7 min
= -1 If we look at a standard normal distribution graph, 84% of the travel times are above a
z-score of -1. This means that 84% of Leia’s travel times are longer than 38 minutes, that is,
she has an 84% chance of being late. So Leia only has a 16% chance of getting to work on time.
‘x;llTest for Normality
 Graphical Method
 Analytical Test Procedures

Graphical Method
 provide powerful diagnostic tools for confirming assumptions
 quick summaries of essential data
 typically used with quantitative statistical evaluations

Types of Graphical Method


 Histogram
 Stem and Leaf Plot
 Box and Whisker Plot
 Normal Percent Percent Plot
 Normal Quantile Quantile Plot
 Empirical Cumulative Distribution Function Plot

Histogram
 Easiest simplest graphical plot
 This test simply consist of looking at the histogram and discerning whether it
approximates the bell curve
Stem and Leaf Plot
is a special table where each data value is split into a "stem" (the first digit or digits) and
a "leaf" (usually the last digit).
same as histogram look for symmetry or the bell curve

Example:
Stem "1" Leaf "5" means 15
Stem "1" Leaf "6" means 16
Stem "2" Leaf "1" means 21
Etc

Box and Whisker Plot


display the five number summary of a set which are
mimimum - the smallest data point
1st quartile - middle number between the smallest number
to the median
median - half of of the data
3rd quartile - middle value between the median and the highest
data set point
maximum - largest data point

box plot is normally distributed if:


 the mean and median are equal
 the whisker are same length
 ideal box plot have 20 data points

Normal Percent Percent Plot


a P–P plot (probability–probability plot or percent–percent plot or P value plot) is a probability plot for assessing how
closely two data sets agree
Normal Quantile-Quantile Plot
is a graphical tool to help us assess if a set of data plausibly came from some theoretical
distribution such as a Normal or exponential.
The q-q plot is formed by:
Vertical axis: Estimated quantiles from data set 1
Horizontal axis: Estimated quantiles from data set 2
Empirical Cumulative Distribution Function Plot
 A cumulative distribution function (CDF) plot shows the empirical cumulative
distribution function of the data.
 similar function as aprobability plot
 BUT does not form a straight line
 S-shape curve under normality

Analytical Test Procedures


 tests based on empirical distribution function (EDF) test
 tests based on descriptive measures
Empirical Distribution Function (EDF) Tests
Test Name Definition Normal if
Kolmogorov-Smirnov Shapiro-Wilk test is a  If the test is not significant, the
Test way to tell if a random distribution is normal
sample comes from a
 If the test is significant, the
normal distribution
distribution is non-normal
Shapiro-Wilk Test Shapiro-Wilks test for The test rejects the hypothesis of normality when
normality is one of the p-value is less than or equal to 0.05. Failing
the normality test allows you to state with 95%
three general confidence the data does not fit the normal
normality tests distribution. Passing the normality test only allows
designed to detect all you to state no significant departure from normality
departures from was found.
normality
Anderson-Darling Anderson–Darling  H0: The data follows the normal
Test test is a distribution
statistical test of
 H1: The data do not follow the normal
whether a given sample
distribution
of data is drawn from a
given probability  The null hypothesis is that the data
distribution. are normally distributed; the
alternative hypothesis is that the data
are non-normal.

Tests Based on Descriptive Measures


 D’Agostino-Pearson Omnibus Test
 Jarqua-Bera Test
Test name Definition Normal if
D’Agostino-Pearson Omnibus It first computes  If the kurtosis is close to
Test the skewnessand kurtosis to 0,or then a normal
quantify how far from distribution is often
Gaussian the distribution is in assumed. These are
terms of asymmetry and called mesokurtic
shape. It then calculates how distributions.
far each of these values  If the kurtosis is less
differs from the value than zero, then the
expected with a Gaussian distribution is light tails
distribution, and computes a and is called a platykurtic
single P value from the sum distribution.
of these discrepancies
 If the kurtosis is greater
than zero, then the
distribution has heavier
tails and is called a
leptokurtic distribution.

Jarqua-Bera Test is a goodness-of-fit test of A normal distribution has a


whether sample data have skew of zero (i.e. it’s
the skewness and kurtosis ma perfectly symmetrical around
tching a normal distribution. the mean) and a kurtosis of
three; kurtosis tells you how
It is usually used for large
much data is in the tails and
data sets, because other
gives you an idea about how
normality tests are not
“peaked” the distribution is.
reliable when n is large
It’s not necessary to know
the mean or the standard
deviation for the data in
order to run the test.

Вам также может понравиться