Академический Документы
Профессиональный Документы
Культура Документы
Sponsored by
This handout has been written by students with no intention to substitute the
University official materials. Its purpose is to be an instrument useful to the
exam preparation, but it does not give a total knowledge about the program of
the course it is related to, as the materials of the university website or
professors.
1
First question: what is statistics? Essentially there is not a precise definition of
this subject, even though, generally speaking, we can say that it has the purpose
of studying quantitatively and qualitatively a particular phenomenon in
conditions of uncertainty
2
1.2 CLASSIFICATION OF VARIABLES
A VARIABLE is a specific characteristic of an individual or object. Variables can
be classified in several ways:
1. CATEGORICAL VARIABLES vs NUMERICAL VARIABLES .
CATEGORICAL VARIABLES produce responses that belong to
group or categories (ex. Responses to yes/no questions,
NUMERICAL VARIABLES: THEY they are defined by numbers
i. DISCRETE: we have a finite number of values
ii. CONTINUOUS: if the variable may take any value within
a given range of real numbers and arises from a
measurement process
N.B. All variables which are about money are continuous
2. MEASURAMENT LEVELS
We have again two groups:
QUALITATIVE DATA:
i. NOMINAL DATA: if we have words that describe the categories
or classes of responses.
ii. ORDINAL DATA: The amount of information is greater than in
the NOMINAL DATA, we can also rank the variables.
QUANTITATIVE DATA
i. INTERVAL DATA: we have a greater amount of information with
respect to the ORDINAL DATA, we have moreover as a
relationship the distance from a true zero
ii. RATIO DATA: More information with respect to the INTERVAL
DATA, we have moreover both an order and a distance from the
true zero.
TYPOLOGY fi pi
P 3 0,3
FF 4 0,4
E 3 0,3
TOT 10 1
4
PIE CHART:
0,3 0,3
0,4
P FF E
HISTOGRAM:
4.5
4
4
3.5
3 3
3
2.5
1.5
0.5
0
P FF E
We can notice which, between all the typologies, is the more frequent, through
the absolute frequency.
The price class will be, on the other hand, a QUALITATIVE ORDINAL
VARIABLE, that will be represented by a frequency distribution table and a bar
chart.
FREQUENCY DISTRIBUTION TABLE: We rank the price class in ascending
order
PRICE CLASS fi pi
B 4 0,4
M-B 3 0,3
M-A 2 0,2
A 1 0,1
5
BAR CHART:
4.5
4
4
3.5
3
3
2.5
2
2
1.5
1
1
0.5
0
B M-B M-A A
EXAMPLE N.2: we now use the PARETO DIAGRAM; this diagram is useful in
order to verify relevant factors from the non-relevant ones. It is a mixed graph,
formed by a bar chart and one composed by lines
CAUSES OF DELAY OF TRAINS ABSOLUTE FREQUENCIES fi
Maintenance 12
Strikes 26
Natural Causes 5
Other 2
Damages 35
We now compute the Frequency Distribution Table
FREQUENCY DISTRIBUTION TABLE:
6
PARETO DIAGRAM: It is a graph which represents the importance of the
differences caused by a certain phenomenon. It has both bar and lines, where
80.00% 80.00%
76.25%
70.00% 70.00%
60.00% 60.00%
pi (%) Fi (%)
each factor is represented by bars ranked in decreasing order, and where the
line is the cumulative distribution (called Lorenz curve).
70
61
60
51
48 48
50 44
39
40 35
30
20
10
0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
2014 2015
7
1.5 GRAPHS TO DESCRIBE NUMERICAL VARIABLES
In case we have NUMERICAL DICRETE VARIABLES with few different values
(such as the number of smartphones owned by a single person), we will use:
1. FREQUENCY TABLE
2. STICK CHART
3. DIAGRAMMA A SCALINI GRADINI (G)
We now compute the distribution function.
EXAMPLE:
FREQUENCY TABLE (with distribution function):
N° CARS OWNED fi pi Fi
1 32 0,32 0,32
2 48 0,48 0,80
3 16 0,16 0,96
5 4 0,04 1
TOT 100 1
STICK CHART:
With the stick chart you can compute both the absolute frequencies and the
relative ones
8
STEP FUNCTION CHART:
0.9 0.84
0.8 0.74
0.7
0.6 0.54
0.5
0.4
0.3
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
50-100 7-8
101-500 8-10
501-1000 10-11
1001-5000 11-14
9
More than 5000 14-20
EXAMPLE 1:
RELEVATIONS fi pi Fi Wi Ci=Pi/Wi
[10;30) 2 0,1 0,1 20 0,005
[30;50) 3 0,15 0,25 20 0,0075
[50;70) 5 0,25 0,5 20 0,0125
[70;90) 6 0,3 0,8 20 0,015
[90;110) 4 0,2 1 20 0,01
TOT 20
Ci is the FREQUENCY DENSITY and it is equal to the ratio between the relative
frequency pi and the class width Wi.
HISTOGRAM: We’ll have that if the classes are equal we can put either Pi or Ci,
while if they are different we have to put C i. Otherwise the chart will be wrong.
EXAMPLE N.2:
Range fi Pi Fi Wi Ci
[0;5) 15 0,3 0,3 5 0,06
[5;15) 12 0,24 0,54 10 0,024
[15;30) 10 0,2 0,74 15 0,0133
[30;50) 5 0,1 0,84 20 0,005
[50;100) 8 0,16 1 50 0,0032
TOT 50
HISTOGRAM: now the reference on the x-axis will be with respect to Ci, so the
areas of each rectangle will be referred at the real values of each value.
11
OGIVE:
SCATTER PLOT:
We can prepare a SCATTER PLOT, which is another kind of graph, by locating
one point for each pair of two variables that represent an observation in the data
set. The scatter plot provides a picture of the data, including the following:
1. The range of each variable;
2. The pattern of values over the range;
3. A suggestion as to a possible relationship between the two variables;
4. An indication of outliers.
EXAMPLE:
50 172 700
7 34 600
33 125 500
250 700
400
14 52
300
70 65
200
134 211
100
56 69
98 70 0
0 50 100 150 200 250 300
-100
As we can see, there is a positive relationship between the two variables, this
because more or less all in each case when a variable increases, the other does
the same
12
PARTICULAR CASE: SIMPSON’S PARADOX
Let’s assume that there is a situation for which, with a same age, the percentage
of unemployed people between the graduates from high school and college is
the half with respect to the population that has not the high school diploma.
Let’s consider, moreover, that historically the older generations have less high
school graduates, and, for other reasons, between the young the unemployment
rate is the higher than between the old
Starting from these statistics:
Without
Workers With diploma Total
diploma
Young 20 80 100
Old 120 30 150
Total 140 110 250
We’ll have then that in both cases the percentage of people without diploma will
be on average the double of the ones with it. We can then compute the number
of unemployed:
Unemployed Without With Diploma Total
Diploma
Young 6 12 18
Old 6 1 7
Total 12 13 25
Computing then the unemployment rate for the ones with diploma and for the
ones without, we’ll have these values:
%Unemployed without Diploma= 12/140 = 8,6%
%Unemployed with Diploma = 13/110 = 11,8%
This situation occurs when you don’t include an essential variable, as in this case
is the heterogenous distribution for age classes, and so the analysis of the
frequencies can lead to wrong results
13
CENTRAL TENDENCY
We have in this group: AVERAGE, MEDIAN, MODE, QUANTILES. Each one of
them expresses a different value of the variable group.
AVERAGE: The average is a numerical value that describes a set of data. It can
be computed on a population and on a sample.
FOR NOT GROUPED DATA
1. ARITHMETIC MEAN: The arithmetic mean is the most common kind of
mean. It is employed to sum with a single number a set of data on a
measurable phenomenon. You sum all the data and divide what you obtain
for the number of observations. It is used for QUANTITATIVE VARIABLES
and the final value will be a number between the MIN and the MAX. It is
highly influenced by the outliers.
∑𝑁
𝑖=1 𝑋𝑖
ON A POPULATION: µ= 𝑁
∑𝑛
𝑖=1 𝑋𝑖
ON A SAMPLE: x = 𝑛
2. APPROXIMATE MEAN:
∑𝑘
𝑖=1 𝑚𝑖 ∗𝑓𝑖
µ= mi = midpoint of the class. x
𝑁
∑𝑘
𝑖=1 𝑚𝑖 ∗𝑓𝑖
x = 𝑛
MEDIAN: the MEDIAN is the central value of an ordered set of data. In order to
compute it the set of data must be ranked. The Median is not influenced by the
outliers. Moreover, it has to be highlighted that it is not a position, but a value.
1. NOT GROUPED DATA: Rank the data in ascending order, then compute
the median position through the formula 0,5(n+1). Then you find the value
of the MEDIAN. If n is even, then you’ll have just a value, if n is even then
we’ll have an intermediate value, and, in order to find the median, you’ll
compute the average of the two observations between the number you
found
14
2. GROUPED DATA:
DISCRETE DATA: The median is the first value of which the
distribution function is more than 0,5
EXAMPLE:
Values fi pi Fi
1 32 0,32 0,32
2 48 0,48 0,8 MEDIAN (ME)
3 16 0,16 0,96
5 4 0,04 1
0.03
0.025
0.02
0.015
0.01
0.005
A
0 b
[50;100)
[30;50)
[0;30)
Aa
0,5+0,75−0,1667
ME= = 43,332
0,0250
MODE: The MODE is the most frequent value into a series of data, and it is
mainly used with QUALITATIVES DATA. In case there are not data that are
present more than once, there will be no mode.
15
QUANTILES: Quantiles are other indicators. There are of three types:
QUARTILES: they divide the series of data in 4 equal parts.
1. FIRST QUARTILE(Q1) = 25% of the values at
right and 75% of the values at left;
2. SECOND QUARTILE (Q2=ME) = 50% of the
values at right and 50% of the values at left;
3. THIRD QUARTILE (Q3) = 75% of the values
at right and 25% of the values at left;
DECILES: data are divided in 10 equal parts;
PERCENTILES: they divide data in 100 equal parts.
In order to compute the values, we find the position of the quartile desired (for
example, for the first quartile, through the formulaQ 1=0,25*(n+1), and you’ll
substitute 0,25 with0,5 or 0,75 if you want the second or the third one) and then
find the values.
GRAPHIC REPRESENTATION OF THE QUARTILES
We use five numbers: MIN, Q1, Q2, Q3, MAX. Through these numbers you can
have a synthetic distribution of the data. We use the BOX AND WHISKERS PLOT.
16
2. Q1 – MIN > MAX – Q3
3. MEDIAN – Q1 > Q3 – MEDIAN
OUTLIERS
The outliers in a data distribution are those data which are considered atypical
and far from the normal distribution. They affect the average.
An observation xi is defined atypical if it is not included into the range defined
by (Q1 – 1,5(Q3 – Q1); Q3 + 1,5(Q3 – Q1)):
1
SAMPLE: s2 = ∗ ∑𝑛𝑖=1(𝑥𝑖 − x )2
𝑛−1
SHORT FORMULAS:
∑𝑁
𝑖=1 𝑥𝑖
2
POPULATION: σ2 = − µ2
𝑁
𝟐
𝑛 ∑𝑛
𝑖=1 𝑥𝑖
2
SAMPLE: s =
2
∗ − x
𝑛−1 𝑛
1 2
= 𝑖=1(𝑥𝑖 − 2µ 𝑥𝑖 + µ ) you now divide the sum in three different sums, so
∗ ∑𝑁 2
𝑁
to have each element separated one from the other;
1
= 𝑁 [∑𝑁
𝑖=1 𝑥𝑖 + ∑𝑖=1 µ − ∑𝑖=1 2µ 𝑥𝑖 ] you will have ∑𝑖=1 µ equal to N*µ , moreover
2 𝑁 2 𝑁 𝑁 2 2
1
SAMPLE: s2 = ∗ ∑𝑘𝑖=1 𝑓𝑖 ∗ (𝑥𝑖 − x )2
𝑛−1
1
SAMPLE: s2 = ∗ ∑𝑘𝑖=1 𝑓𝑖 ∗ (𝑚𝑖 − x )2
𝑛−1
∑𝑘
𝑖=1 𝑚𝑖 𝑥𝑖
2
SHORTCUT FORMULA: σ2 = − µ2
𝑁
STANDARD DEVIATION: The standard deviation is the positive square root of
the variance and it is defined as follows:
POPULATION: σ = √𝜎 2 |SAMPLE: S = √𝑆 2
COEFFICIENT OF VARIATION: The Coefficient of Variation is a measure of
relative dispersion that expresses the standard deviation as a percentage of the
mean.
𝜎 𝑆
POPULATION: CV = |µ| | SAMPLE: CV = Equal to the ratio between
| x |
18
We have also a particular case, called EMPIRICAL RULE., which provides
an estimate of the approximate percentage of
observations that are contained within one,
two, or three standard deviations of the mean
INTERVAL Percentage for Percentage
Empirical Rule For Chebychev
µ±σ 68% At least 0%
µ ± 2σ 95% At least 75%
µ ± 3σ 99,73% At least 88,9%
We have then the z-score, which is a standardized value that indicates the
number of standard deviations a value is from the mean. A z-score greater than
zero indicates that the value is greater than the mean; a z-score less than zero
indicates that the value is less than the mean; and a z-score of zero indicates
𝑥 −𝜇
that the value is equal to the mean. The z-score is 𝑧 = 𝑖𝜎
2
∑𝐾𝑖=1 𝑓𝑖 (𝑚𝑖 − 𝑥̅ )
2
𝑠 =
𝑛−1
1
SAMPLE: COV (X, Y) = Sxy = ∗ ∑𝑛𝑖=1 (𝑥𝑖 − x ) ∗ (𝑦𝑖 − y ) x and y
𝑛−1
are the sample means
SHORTCUT FORMULAS
19
1
POPULATION: σxy = ∗ [∑𝑁
𝑖=1(𝑥𝑖 ∗ 𝑦𝑖 ) − (µ𝑥 ∗ µ𝑦 )]
𝑁
𝑛 ∑𝑁
𝑖=1(𝑥𝑖 ∗𝑦𝑖 )
SAMPLE: Sxy = ∗[ − x ∗ y
𝑛−1 𝑛
20
CORRELATION COEFFICIENT: The CORRELATION COEFFICIENT is computed
by dividing the covariance by the product of the standard deviationdi
correlazione lineare è una misura simmetrica relativa della relazione lineare tra
due variabile quantitative. Matematicamente essa è uguale al rapporto tra la
covarianza di X e Y e il prodotto degli scarti quadratici medi di X e Y.
Questo indice ha il compito di normalizzare la covarianza, in modo tale da dare
informazioni più precise riguardo alla relazione lineare tra le due variabili.
SHORTCUT:
𝐶𝑂𝑉 (𝑋,𝑌)
POPULATION: ρxy =
𝜎𝑥 ∗𝜎𝑦
𝐶𝑂𝑉 (𝑋,𝑌)
SAMPLE: rxy = 𝑆𝑥 ∗𝑆𝑦
The Covariance can assume values between [-1; +1], in case in which the index
assumes values of ± 1, then we will have a perfect linear correlation. All data
will be distributed on a line. If the index is equal to 0, then there will not be any
correlation; if it is greater than 0 there will be a positive linear correlation,
whereas if it is smaller than 0 we have a negative linear correlation. Normally
we call a weak linear correlation an index with a value between -0,3 and +0,3,
whereas if the index is greater than +0,7 or smaller than -0,7, we’ll have a
strong linear correlation. These limits are relative and not universally accepted;
they are useful as an indication of a possible distribution. It could happen that
data with nothing in common may still have a strong linear correlation. For this
reason, CORRELATION does not imply CAUSATION (so the presence of a
causal
relationship).
ESXAMPLE: We have also to point out that LINEAR INDEPENDENCE does not
imply STATISTIC INDEPENDENCE. If COV(X;Y)=0, so if there is LINEAR
INDEPENDENCE, it is not sure that there also is STATISTIC INDEPENDENCE. The
reverse, on the other hand, is true.
REGRESSION LINE: Through the regression line, we can describe two different
variables at the same time. It measures the asymmetric relationship of the 2
variables, so how much Y changes when X changes. This line will be defined by
an equation, defined by the parameters β0 and β1.
21
The equation will then be:
y = β0 + β1*x | x = independent variable | y = dependent variable
We’ll have that β0 is the vertical intercept, β1 is
the slope of the line, so it will point how much
Y increases when X is increasing. We will be
able to forecast the movement of Y if we know
the line’s equation and any value of X.
In order to find β1 and β0 we have to reason on
the forecast. Considering ŷ1 as the value that y
would assume in according to the forecast, and
knowing also the true value of y 1 we will have
that the difference ŷ1 – y1 = e1 which is the error
between the values which the variable assumes
and the ones forecast by the line.
It is necessary to find β0 and β1 in order
to minimize the sum of all the errors.
You then use the ORDINARY LEAST
SQUARES (OLS) for which you find the
minimum value of the sum of the
square of the errors.
MIN ∑𝑛𝑖=1 𝑒𝑖2
You can then compute an estimate of
β0 and β1 thanks to these formulas:
𝑆 𝐶𝑂𝑉 (𝑋,𝑌) 𝑆𝑦
β1 = b1 = 𝑆𝑥𝑦 2 = 𝑉𝐴𝑅 (𝑋) = 𝑟𝑥𝑦 ∗ 𝑆 b1
𝑥 𝑥
β0 = b0 = y – b1* x
22
PROBABILITY
SUMMARY OF APPLIED MATHEMATICS PROBABILITY
RANDOM EXPERIMENT: process which leads to two or more outcomes without
the possibility of forecast which one it will happen.
EVENT: an event is any subset of the sample space Ω of a random experiment.
SAMPLE SPACE: set of all possible outcomes
RANDOM VARIABLE: variable which assumes numerical values depending on
the results of a random experiment.
The Probability Axioms:
POSITIVITY: 0 ≤ P(E) ≤ 1
CERTAINTY: P(Ω) = 1
UNION: ⋃𝑛𝑖=1 𝐸𝑖 = ∑𝑛𝑖=1 𝐸𝑖
EXPECTED VALUE: the expected value of a random number is X(ω): Ω R the
number, if it exists, is defined by: E(x) = µ = ∑𝑥 𝑥 ∗ 𝑃(𝑥). In statistics the
expected value is equal to the mean.
VARIANCE: a random number X(ω): Ω R with expected value E[X(ω)] = µ1,
is called variance (VAR [X(ω)]) and it is defined by
𝑉𝑎𝑟(𝑋) = ∑(𝑥 − µ)2 ∗ 𝑃(𝑥)
𝑥
RANDOM BERNOULLIAN VARIABLE
The random BERNOULLIAN variable is the variable of a phenomenon which
shows only 2 outcomes.
1 = success probability = p
0 = insuccess probability = 1-p
𝑝, 𝑥 = 1
𝑝 𝑥 ∗ (1 − 𝑝)1−𝑥 , 𝑥 = 0,1
PROBABILITY FUNCTION: { 1 − 𝑝, 𝑥 = 0 {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
variance and the expected value will be:
E(x) = ∑𝑥 𝑥 *P(x) = 1*p + 0*(1-p) = p
Var(x) = p*(1-p)
23
CHAPTER 4 – DISCRETE PROBABILITY
DISTRIBUTIONS
4.4 BINOMIAL DISTRIBUTION
Let’s suppose that an experiment could have only two outcomes, mutually
exclusive, defined as “success” and “failure”, and we call p the probability for
the success to happen. If we repeat the experiment n times, in an independent
way, the distribution of the number of successes, X = X1 + X2 + … + Xn, is called
BINOMIAL DISTRIBUTION. Its PROBABILITY FUNCTION will be: X~BIN (n, p)
𝑛!
∗ 𝑝 𝑥 ∗ (1 − 𝑝)𝑛−𝑥
𝑃(𝑥 ) = {𝑥! (𝑛 − 𝑥 )! , 𝑥 = 0,1,2, … , 𝑛
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑛!
is called BINOMIAL COEFFICIENT and can be indicated with (𝑛𝑥), it points
𝑥!(𝑛−𝑥)!
out the number of combinations of x successes in n experiments.
EXAMPLE: Let’s take as an example a financial operator which has the
opportunity to close up to 6 contracts every day. The probability p to close or
not a contract is equal to 0,3 so we’ll have p = 0,3 and (1-p) = 0,7. Which one
is going to be then the probability that the operator is going to close ONE single
contract on the six available that day? – we’ll have x = 1 and n = 6.
6!
P (1) = 1!(6−1)! ∗ 0,31 ∗ (1 − 0,3)6−1 = 0,3025 = 30,25%
What is instead the probability that the operator is going to conclude EXACTLY
2 contracts in one day?
We can have various combinations on the contract’s distribution:
COMBINATION 1: 1,1,0,0,0,0 p, p, (1-p), (1-p), (1-p), (1-p), = p2 *(1-p)4
COMBINATION 2: 1,0,0,0,0,1 p, (1-p), (1-p), (1-p), (1-p), p = p2 *(1-p)4
Both the combinations are perfectly equal from the probability’s point of view.
So, we can compute the probability of the event x = 2, n = 6
6!
P (2) = 2!(6−2)! ∗ 0,32 ∗ (1 − 0,3)6−2 = 0,3241 = 32,41%
The probability to close 2 contracts in the same day is equal to 32,41%.
EXPECTED VALUE AND VARIANCE:
In case in which X1, X2, …, Xn are Bernoulli random variables both independent
and identically distributed with a parameter p, then we’ll have that:
E(x) = n*p
Var (x) = n*p*(1-p)
PROOF: Firstly we have to put the condition that X = X1 + X2 + … + Xn =
∑𝑛𝑖=1 𝑥𝑖 ~ BIN (n; p) | X ~ BIN | Xi ~ BER(p)
24
N.B. X ~ BIN (n; p) means that X is a binomial random variable with parameters
n and p. Xi ~ BER(p) means instead that Xi are Bernoulli random variables of
parameter p.
X0
So, the probability P (X = X0) = ∫X0 𝑓 (𝑥 )𝑑𝑥 = 0 is equal to the value that the
definite integral assumes in the given interval. If we are looking for the
probability of a single point it will always be equal to 0, because the integral
defined in an interval [X0; X0] will always be equal to 0.
P (a ≤ X ≤ b) = P (a < X ≤ b) = (a ≤ X < b) = (a < X < b) = P (X ≤ b) – P(X ≤
a) = F(b) – F(a) so it is equal to the difference between the values that the
distribution function F(x) assumes at b and the value that the distribution
function F(x) assumes at a.
The DISTRIBUTION FUNCTION F(x), for a continuous random variable X,
expresses the probability that X is equal at most at x, so F(x) = P (X ≤ x)
25
RANDOM VARIABLE AND NORMAL DISTRIBUTION: a normal distribution
must be symmetric, and its data must be concentrated at the mean value, and
then being lower when they are far from the mean.
A random variable which has a normal distribution will be indicated with X ~ N
(µ; σ2), so X will be a random variable with a normal distribution, expected value
E(X) = µ and variance VAR(X) = σ2. Its formula will be:
1 1 𝑥−µ 2
𝑓 (𝑥 ) = 𝑒 −2∗(𝜎
)
√2𝜋𝜎 2
PROPERTIES:
1. The function is bell-
shaped and symmetric;
2. The mean µ is expected value E(X), MEDIAN and MODE altogether;
3. It is asymptotic with respect to the x-axis;
4. It is increasing for x < µ and decreasing for x > µ;
5. It has two inflection points in µ - σ and in µ + σ;
6. The area under the curve is equal to 1.
EXAMPLE N. 1:
If the standard deviation (and so the
variance) assumes a lower value, the values
are going to be concentrate near to the
mean, otherwise, if the standard deviation
increases, the values are going to be less
concentrated and the tails are going to be
longer.
EXAMPLE N. 2:
If the variation is on µ, we’ll have a shift
of the graph on the right or on the left
whether µ increases or decreases,
respectively.
26
The DISTRIBUTION FUNCTION of a normal distribution will be defined by the
graph:
It will have an asymptote when F(X) = 1 and we’ll
have an inflection point at F(X) = 0,5.
F(x)
27
EXAMPLE: let’s take in consideration the prices for the tickets for Milan-Miami
such that µ = 500 and σ2 = 625. X ~ N (500; 625)
Compute the probability that the price is between 500€ and 550 €:
We know then that P (500 < X < 550), which, through the standardization, will
be transformed in:
500− 500 550−500
P( < Z < 25 )
25
Which will result in P (0 < Z < 2) = Fz (2) – Fz (0)
So, through the tables, we will have: 0,0972 – 0,5 = 0,4772 = 47,72%
Compute now the probability that the price is between 470€ and 520€.
The problem which comes from in computing this probability is that through the
tables we are not aware of the value that Z will take on 470€ because it is smaller
than the mean µ. We’ll proceed in this way:
P (470 < X < 520) with the STANDARDIZATION becomes:
470− 500 520−500
P( < Z < 25 ) = P (-1,2 < X < 0,8) = Fz (0,8) – Fz (-1,2)
25
We transform: P (X < -1,2) = P (X > 1,2) = 1 – Fz (1,2) because the normal
standard distribution is symmetric. The area at the left of -1,2, in fact, is equal
to the one at the right of 1,2 and vice versa.
So, we have then: Fz (0,8) – [1 – Fz (1,2)] = 0,7881 – [1 – 08849] = 0,7881–
0,1151 = 0,6730 = 67,3%
28
SUMMARY FOR THE SOLUTION OF STANDARD NORMAL DISTRIBUTIONS
USING THE GRAPH:
P (Z < a) = Fz (a)
P (Z > a) = 1 – Fz (a)
29
STANDARDIZATION TABLES:
30
NORMAL PROBABILITY PLOT:
Arrange data from low to high values
Find cumulative normal probabilities for all values
Examine a plot of the observed values vs. cumulative probabilities (with the
cumulative normal probability on the vertical axis and the observed data
values on the horizontal axis)
Evaluate the plot for evidence of linearity
It is similar to the SCATTER PLOT, the normal probability plot shows on the y-
axis the values of the distribution function for any variable. We’ll then have
various points on the graph (for this reason it is similar to the scatter plot).
If these points are well-distributed such
that they assume the form of a line, then
the distribution will be approximately
normal, otherwise it is assumed that it is
not.
As we can easily guess, in this case the
distribution can be defined as a normal
31
realization of a random variable) will be selected and used to sample statistics
such as sample mean and sample variance.
Sample statistics is used instead of working to obtain all the information from
the population for two main reasons:
Recall that the expected value of linear combination of random variables (as
the one just stated) is the linear combination of the expectations, it follows
that:
𝑛
1
𝐸 (𝑥̅ ) = 𝐸 ( ∗ ∑ 𝑥𝑖 ) =
𝑛
𝑖=1
32
1
= ∗ (𝐸 (𝑥1 ) + 𝐸 (𝑥2 )+. . +𝐸 (𝑥𝑛 )) =
𝑛
1
= ∗ (𝜇1 + 𝜇2 +. . +𝜇𝑛 ) =
𝑛
1
= ∗ 𝑛𝜇 = 𝜇
𝑛
The mean of the sampling distribution of the sample means (𝐸 (𝑥̅ )) is the
population mean (𝜇).
1 1 1
𝑉𝑎𝑟(𝑥̅ ) = 𝜎𝑥̅2 = 𝑉𝑎𝑟 ( 𝑥1 + 𝑥2 +. . + 𝑥𝑛 ) =
𝑛 𝑛 𝑛
1 1 1
= 2 𝑉𝑎𝑟(𝑥1 ) + 2 𝑉𝑎𝑟(𝑥2 )+. . + 2 𝑉𝑎𝑟(𝑥𝑛 ) =
𝑛 𝑛 𝑛
1 2 1 2 1
= 2
𝜎 + 2 𝜎 +. . + 2 𝜎 2 =
𝑛 𝑛 𝑛
1 2
𝜎2
= ∗ 𝑛𝜎 =
𝑛2 𝑛
𝜎2 𝑁 − 𝑛
∗
𝑛 𝑁−1
Now suppose that the population which the samples are drawn from has a
normal distribution, then also the sampling distribution of the sample means
has a normal distribution. It follows that it is possible to compute the standard
normal Z for the sample mean:
33
WHENEVER THE SAMPLING DISTRIBUTION OF THE SAMPLE MEANS HAS A
NORMAL DISRIBUTION, IT IS POSSIBLE TO COMPUTE A STANDARDIZED
NORMAL RANDOM VARIABLE, Z, THAT HAS MEAN EQUAL TO 0 AND VARIANCE
EQUAL TO 1:
𝑥̅ − 𝜇 𝑥̅ − 𝜇
𝑍= = 𝜎
𝜎𝑥̅
√𝑛
𝑥̅ − 𝜇
𝑍=
𝜎𝑥̅
Given the definition of sample mean and the central limit theorem we can now
introduce the concept of acceptance interval.
Characteristics:
If the sample mean is actually in the interval, then we can accept the
conclusion that the random sample comes from the population
The probability that the sample mean is within a particular interval
can be computed if the sample means have a distribution that is close
to normal
𝜇 ± 𝑍𝛼 ⁄2 ∗ 𝜎𝑋̅
PROVIDED THAT 𝑋 HAS A NORMAL DISTRIBUTION AND 𝑍𝛼 ⁄2 IS THE STANDARD
̅
NORMAL WHEN THE UPPER TAIL PROBABILITY IS 𝛼⁄2.THE PROBABILITY THAT
THE SAMPLE MEAN 𝑋̅ ISINCLUDED IN THE INTERVAL IS 1 − 𝛼
If the sample mean is outside the acceptance interval, then we suspect that
the population mean is not µ.
34
6.3 SAMPLING DISTRIBUTIONS OF SAMPLE PROPORTIONS
As we can use sample mean to obtain inferences about population mean, we
can also obtain inferences about population proportion using the sample
proportion.
𝑋
𝑝̂ =
𝑛
𝑋 1 1
𝐸 (𝑝̂ ) = 𝐸 ( ) = ∗ 𝐸 (𝑋) = ∗ 𝑛𝑃 = 𝑃
𝑛 𝑛 𝑛
𝑋 1 1 𝑃(1 − 𝑃)
𝑉𝑎𝑟(𝑝̂ ) = 𝑉𝑎𝑟 ( ) = 2 𝑉𝑎𝑟(𝑋) = 2 ∗ 𝑛𝑃(1 − 𝑃) =
𝑛 𝑛 𝑛 𝑛
As we saw with the sample means, also the sample proportion distribution can
be approximated to a standard normal distribution, provided that the sample
size is large enough that is if 𝑛𝑃(1 − 𝑃) > 5. We can therefore define the random
variable as:
𝑝̂ − 𝑃 𝑝̂ − 𝑃
𝑍= =
𝜎𝑝̂
√𝑃(1 − 𝑃)
𝑛
35
𝑛
2
1
𝑠 = ∗ ∑(𝑥𝑖 − 𝑥̅ )2
𝑛−1
𝑖=1
As for the sample mean and the sample proportion the expected value of the
sample variance is equal to the population variance:
𝐸 (𝑠 2 ) = 𝜎 2
It follows that:
36