Statistics Handouts: Part 1 of 2 II Edition A.Y. 2017/2018

STATISTICS HANDOUTS
Part 1 of 2 II edition a.y. 2017/2018
Written and edited by:

Giuseppe Musillo
Luisa Gibelli
Domenico Cerbone
Sponsored by
This handout has been written by students with no intention to substitute the
University official materials. Its purpose is to be an instrument useful to the
exam preparation, but it does not give a total knowledge about the program of
the course it is related to, as the materials of the university website or
professors.
1
First question: what is statistics? Essentially there is not a precise definition of
this subject, even though, generally speaking, we can say that it has the purpose
of studying quantitatively and qualitatively a particular phenomenon in
conditions of uncertainty
CHAPTER 1 – USING GRAPHS TO DESCRIBE DATA

1.1 DECISION MAKING IN AN UNCERTAIN ENVIRONMENT
There are two main groups in statistics:
 POPULATION (indicated with N): it is the complete set of all items that
interest an investigator. Population size can be very large or even infinite.
 SAMPLE (indicated with n): it is an observed subset of a population with
sample size given by n
Our aim is to make statements based on sample data that have some validity
about population at large. We need a sample, then, that is representative of the
population. One important principal that we must follow in the sample selection
is randomness.
1. RANDOM SAMPLING: Simple random sampling is a procedure used to
select a sample of n objects from a population in such a way that each
member of the population is chosen by chance and is equally likely to be
chosen.
2. SYSTEMATIC SAMPLING: it involves the selection of every jth item in
the population, where j is the ratio of the population size to the sample
size, j = N/n. Randomly select a number from 1 to j for the first item
selected. The resulting sample is called a systematic sample
A PARAMETER is a specific characteristic of a population, whereas a
STATISTIC is a specific characteristic of a sample.
Finally, we have two branches of statistics:
1. DESCRIPTIVE STATISTICS: Graphical and numerical procedures to
summarize and process data
2. INFERENTIAL STATISTICS: Using data to make predictions,
forecasts, and estimates to assist decision making
2
1.2 CLASSIFICATION OF VARIABLES
A VARIABLE is a specific characteristic of an individual or object. Variables can
be classified in several ways:
1. CATEGORICAL VARIABLES vs NUMERICAL VARIABLES .
 CATEGORICAL VARIABLES produce responses that belong to
group or categories (ex. Responses to yes/no questions,
 NUMERICAL VARIABLES: THEY they are defined by numbers
i. DISCRETE: we have a finite number of values
ii. CONTINUOUS: if the variable may take any value within
a given range of real numbers and arises from a
measurement process
N.B. All variables which are about money are continuous
2. MEASURAMENT LEVELS
We have again two groups:
 QUALITATIVE DATA:
i. NOMINAL DATA: if we have words that describe the categories
or classes of responses.
ii. ORDINAL DATA: The amount of information is greater than in
the NOMINAL DATA, we can also rank the variables.
 QUANTITATIVE DATA
i. INTERVAL DATA: we have a greater amount of information with
respect to the ORDINAL DATA, we have moreover as a
relationship the distance from a true zero
ii. RATIO DATA: More information with respect to the INTERVAL
DATA, we have moreover both an order and a distance from the
true zero.
1.3 GRAPHS TO DESCRIBE CATEGORICAL VARIABLES

We can describe categorical variables using frequency distribution tables and
graphs such as bar charts, pie charts, and Pareto diagrams. These graphs are
commonly used by managers and marketing researchers to describe data
collected
If we have nominal data, we will use
1. PIE CHART: it is a chart where the circle represents the total, and the
segments cut from its center depict shares of this total;
2. BAR CHART: it is a chart in which the frequencies are shown on the
horizontal axis and the variables are on the vertical axis;
3. PARETO DIAGRAM: it is a bar chart that displays the frequency of defect
causes. The bar at the left indicates the most frequent cause and the bars
to the right indicate causes with decreasing frequencies. A Pareto diagram
is used to separate the “vital few” from the “trivial many”;
3
If we have ordinal data, we have
1. FREQUENCY DISTRIBUTION: it is a table used to organize data. The
left column includes all the possible responses on a variable being studied.
The right one is a list of frequencies, or number of observations, for each
class. A relative frequency distribution is obtained by dividing each
frequency by the number of observations and multiplying the resulting
proportion by 100%.
2. BAR CHART: same as before
EXAMPLE N.1: Let’s take in consideration an analysis, on a sample of
n=10, of the restaurants in Milan. We’ll then analyze its typology, if they
do or do not deliver food (1=YES, 0=NO) and their price classes
N° RESTAURANT TYPOLOGY DELIVERY PRICE CLASS

1 P 1 B
2 P 1 M-B
3 FF 1 B
4 FF 1 B
5 P 0 M-B
6 E 0 M-B
7 FF 0 M-A
8 E 0 A
9 E 0 B
10 FF 0 M-A
I. The typology of the restaurant will be a QUALITATIVE NOMINAL

VARIABLE, for this reason it could be represented through a frequency
distribution table, a pie chart or an histogram.
FREQUENCY DISTRIBUTION TABLE:
TYPOLOGY fi pi
P 3 0,3
FF 4 0,4
E 3 0,3
TOT 10 1
4
PIE CHART:
0,3 0,3
0,4
P FF E
HISTOGRAM:
4.5
4
4
3.5
3 3
3
2.5
1.5
0.5
0
P FF E
We can notice which, between all the typologies, is the more frequent, through
the absolute frequency.
The price class will be, on the other hand, a QUALITATIVE ORDINAL
VARIABLE, that will be represented by a frequency distribution table and a bar
chart.
FREQUENCY DISTRIBUTION TABLE: We rank the price class in ascending
order
PRICE CLASS fi pi
B 4 0,4
M-B 3 0,3
M-A 2 0,2
A 1 0,1
5
BAR CHART:
4.5
4
4
3.5
3
3
2.5
2
2
1.5
1
1
0.5
0
B M-B M-A A
EXAMPLE N.2: we now use the PARETO DIAGRAM; this diagram is useful in
order to verify relevant factors from the non-relevant ones. It is a mixed graph,
formed by a bar chart and one composed by lines
CAUSES OF DELAY OF TRAINS ABSOLUTE FREQUENCIES fi
Maintenance 12
Strikes 26
Natural Causes 5
Other 2
Damages 35
We now compute the Frequency Distribution Table
CAUSES fi pi (%) Fi (%) - %

CUMULATIVE
Damages 35 43,75% 43,75%
Strikes 26 32,5% 76,25%
Maintenance 12 15% 91,25%
Natural Causes 5 6,25% 97,5%
Other 2 2,5% 100%
6
PARETO DIAGRAM: It is a graph which represents the importance of the
differences caused by a certain phenomenon. It has both bar and lines, where
100.00% 100% 100.00%

97.50%
90.00% 91.25% 90.00%
80.00% 80.00%
76.25%
70.00% 70.00%
60.00% 60.00%
50.00% 43.75% 50.00%

43.75%
40.00% 40.00%
32.50%
30.00% 30.00%
20.00% 15% 20.00%
10.00% 6.25% 10.00%

2.50%
0.00% 0.00%
Guasti Scioperi Lavori di Cause naturali Terzi
manutenzione
pi (%) Fi (%)
each factor is represented by bars ranked in decreasing order, and where the
line is the cumulative distribution (called Lorenz curve).
1.4 GRAPHS TO DESCRIBE TIME-SERIES DATA

Data measured at successive points in time are called time-series data. A
graph of time-series data is called a line-chart or time-series plot.
LINE CHART (TIME-SERIES DATA: through this graph we can analyze data
plotted in various moments of time.
80 74
70
61
60
51
48 48
50 44
39
40 35
30
20
10
0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
2014 2015
N.B. If a period is missing, it has to be made noted.
7
1.5 GRAPHS TO DESCRIBE NUMERICAL VARIABLES
In case we have NUMERICAL DICRETE VARIABLES with few different values
(such as the number of smartphones owned by a single person), we will use:
1. FREQUENCY TABLE
2. STICK CHART
3. DIAGRAMMA A SCALINI GRADINI (G)
We now compute the distribution function.
EXAMPLE:
FREQUENCY TABLE (with distribution function):
N° CARS OWNED fi pi Fi
1 32 0,32 0,32
2 48 0,48 0,80
3 16 0,16 0,96
5 4 0,04 1
TOT 100 1
STICK CHART:
With the stick chart you can compute both the absolute frequencies and the
relative ones
8
STEP FUNCTION CHART:
0.9 0.84
0.8 0.74
0.7
0.6 0.54
0.5
0.4
0.3
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
If we have CONTINUOUS NUMERICAL VARIABLES or DISCRETE NUMERICAL

VARIABLES with a high number of different values we’ll group our data in
classes, and we’ll use:
1. FREQUENCY DISTRIBUTION FUNCTION
2. HISTOGRAM: chart that consists of vertical bars constructed on a
horizontal line that is marked off with intervals for the variable being
displayed. The intervals correspond to the classes in a frequency
distribution table. The height of each bar is proportional to the number of
observations in that interval;
3. OGIVE: also called cumulative line graph, is a line that connects points
that are the cumulative percent of observations below the upper limit of
each interval in a cumulative frequency distribution
We can have EQUAL classes or DIFFERENT classes. These are the steps to follow
for the first case:
1. Rank the data in ascending order (ES. 1-2-3-4-5-6-7-8-9);
2. Finding the range, so the difference between MAX and MIN, so that
you choose a value that divides the classes equally
3. Choosing the number of classes, depending on the sample size
SAMPLE SIZE N. CLASSES
Fewer than 50 5-7
50-100 7-8
101-500 8-10
501-1000 10-11
1001-5000 11-14
9
More than 5000 14-20
4. It is to be determined the Class Width, which is equal to

𝐿𝑎𝑟𝑔𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 − 𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑊=
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠𝑒𝑠
EXAMPLE 1:
RELEVATIONS fi pi Fi Wi Ci=Pi/Wi
[10;30) 2 0,1 0,1 20 0,005
[30;50) 3 0,15 0,25 20 0,0075
[50;70) 5 0,25 0,5 20 0,0125
[70;90) 6 0,3 0,8 20 0,015
[90;110) 4 0,2 1 20 0,01
TOT 20
Ci is the FREQUENCY DENSITY and it is equal to the ratio between the relative
frequency pi and the class width Wi.
HISTOGRAM: We’ll have that if the classes are equal we can put either Pi or Ci,
while if they are different we have to put C i. Otherwise the chart will be wrong.
SYMMETRY: the shape of a distribution Is said to be symmetric if the

observations are balanced about its center (or when the median is equal to the
average)
SKEWNESS: a distribution is skewed, or asymmetric, if the observations are

not symmetrically distributed on either side of the center
10
Example of a
SYMMETRIC
histogram and a
SKEWED-LEFT one
EXAMPLE N.2:
Range fi Pi Fi Wi Ci
[0;5) 15 0,3 0,3 5 0,06
[5;15) 12 0,24 0,54 10 0,024
[15;30) 10 0,2 0,74 15 0,0133
[30;50) 5 0,1 0,84 20 0,005
[50;100) 8 0,16 1 50 0,0032
TOT 50
HISTOGRAM: now the reference on the x-axis will be with respect to Ci, so the
areas of each rectangle will be referred at the real values of each value.
11
OGIVE:
SCATTER PLOT:
We can prepare a SCATTER PLOT, which is another kind of graph, by locating
one point for each pair of two variables that represent an observation in the data
set. The scatter plot provides a picture of the data, including the following:
1. The range of each variable;
2. The pattern of values over the range;
3. A suggestion as to a possible relationship between the two variables;
4. An indication of outliers.
EXAMPLE:
X(mil$) Y (mil$) 800
50 172 700
7 34 600
33 125 500
250 700
400
14 52
300
70 65
200
134 211
100
56 69
98 70 0
0 50 100 150 200 250 300
-100
As we can see, there is a positive relationship between the two variables, this
because more or less all in each case when a variable increases, the other does
the same
12
PARTICULAR CASE: SIMPSON’S PARADOX
Let’s assume that there is a situation for which, with a same age, the percentage
of unemployed people between the graduates from high school and college is
the half with respect to the population that has not the high school diploma.
Let’s consider, moreover, that historically the older generations have less high
school graduates, and, for other reasons, between the young the unemployment
rate is the higher than between the old
Starting from these statistics:
Without
Workers With diploma Total
diploma
Young 20 80 100
Old 120 30 150
Total 140 110 250
Unemployment rate Without diploma With diploma

Young 30% 15%
Old 5% 3,33%
We’ll have then that in both cases the percentage of people without diploma will
be on average the double of the ones with it. We can then compute the number
of unemployed:
Unemployed Without With Diploma Total
Diploma
Young 6 12 18
Old 6 1 7
Total 12 13 25
Computing then the unemployment rate for the ones with diploma and for the
ones without, we’ll have these values:
%Unemployed without Diploma= 12/140 = 8,6%
%Unemployed with Diploma = 13/110 = 11,8%
This situation occurs when you don’t include an essential variable, as in this case
is the heterogenous distribution for age classes, and so the analysis of the
frequencies can lead to wrong results
CHAPTER 2 – USING GRAPHS TO DESCRIBE DATA

2.1 MEASURES OF CENTRAL TENDENCY AND LOCATION
Another way to describe and analyze a group of data is through measures of
tendency. There are several of them.
Describing Data Numerically:
o CENTRAL TENDENCY
o VARIATION
13
CENTRAL TENDENCY
We have in this group: AVERAGE, MEDIAN, MODE, QUANTILES. Each one of
them expresses a different value of the variable group.
AVERAGE: The average is a numerical value that describes a set of data. It can
be computed on a population and on a sample.
FOR NOT GROUPED DATA
1. ARITHMETIC MEAN: The arithmetic mean is the most common kind of
mean. It is employed to sum with a single number a set of data on a
measurable phenomenon. You sum all the data and divide what you obtain
for the number of observations. It is used for QUANTITATIVE VARIABLES
and the final value will be a number between the MIN and the MAX. It is
highly influenced by the outliers.
∑𝑁
𝑖=1 𝑋𝑖
 ON A POPULATION: µ= 𝑁
∑𝑛
𝑖=1 𝑋𝑖
 ON A SAMPLE: x = 𝑛
2. WEIGHTED MEAN: The weighted mean is a value computed through the

sum between the product of each observation with its frequency, divided
for the sum of the frequencies
∑𝑛
𝑖=1 𝑋𝑖 ∗𝑊𝑖
 x = ∑𝑛
𝑖=1 𝑊𝑖
FOR GROUPED DATA

1. AVERAGE: Being the variable X a discrete variable it will assume K
different values: fi = absolute frequencies
∑𝑘
𝑖=1 𝑋𝑖 ∗𝑓𝑖
 µ= 𝑁
∑𝑘
𝑖=1 𝑋𝑖 ∗𝑓𝑖
 x = 𝑛
2. APPROXIMATE MEAN:
∑𝑘
𝑖=1 𝑚𝑖 ∗𝑓𝑖
 µ=  mi = midpoint of the class. x
𝑁
∑𝑘
𝑖=1 𝑚𝑖 ∗𝑓𝑖
 x = 𝑛
MEDIAN: the MEDIAN is the central value of an ordered set of data. In order to
compute it the set of data must be ranked. The Median is not influenced by the
outliers. Moreover, it has to be highlighted that it is not a position, but a value.
1. NOT GROUPED DATA: Rank the data in ascending order, then compute
the median position through the formula 0,5(n+1). Then you find the value
of the MEDIAN. If n is even, then you’ll have just a value, if n is even then
we’ll have an intermediate value, and, in order to find the median, you’ll
compute the average of the two observations between the number you
found
14
2. GROUPED DATA:
 DISCRETE DATA: The median is the first value of which the
distribution function is more than 0,5
EXAMPLE:
Values fi pi Fi
1 32 0,32 0,32
2 48 0,48 0,8 MEDIAN (ME)
3 16 0,16 0,96
5 4 0,04 1
 QUANTITATIVE DATA GROUPED IN CLASSES: We have to find

the median class of the data series. You find the value through the
formula 𝐹𝑖−1 + 𝐶𝑖 ∗ (𝑀𝐸 − 𝑋𝑖 ) = 0,5
EXAMPLE:
Values fi pi Fi Wi Ci
[0;30) 4 0,16 0,16 30 0,0056
[30;50) 12 0,5 0,666 20 0,0250
[50;100) 8 0,34 1 50 0,0067
TOT 24 1
0.03
0.025
0.02
0.015
0.01
0.005
A
0 b
[50;100)
[30;50)
[0;30)
Aa
0,5+0,75−0,1667
ME= = 43,332
0,0250
MODE: The MODE is the most frequent value into a series of data, and it is
mainly used with QUALITATIVES DATA. In case there are not data that are
present more than once, there will be no mode.
15
QUANTILES: Quantiles are other indicators. There are of three types:
 QUARTILES: they divide the series of data in 4 equal parts.
1. FIRST QUARTILE(Q1) = 25% of the values at
right and 75% of the values at left;
2. SECOND QUARTILE (Q2=ME) = 50% of the
values at right and 50% of the values at left;
3. THIRD QUARTILE (Q3) = 75% of the values
at right and 25% of the values at left;
 DECILES: data are divided in 10 equal parts;
 PERCENTILES: they divide data in 100 equal parts.
In order to compute the values, we find the position of the quartile desired (for
example, for the first quartile, through the formulaQ 1=0,25*(n+1), and you’ll
substitute 0,25 with0,5 or 0,75 if you want the second or the third one) and then
find the values.
GRAPHIC REPRESENTATION OF THE QUARTILES
We use five numbers: MIN, Q1, Q2, Q3, MAX. Through these numbers you can
have a synthetic distribution of the data. We use the BOX AND WHISKERS PLOT.
CHART AND FUNCTION PROPERTY:

A) We have an asymmetric positive distribution when:
1. AVERAGE > MEDIAN
2. Q1 – MIN < MAX – Q3
3. MEDIAN – Q1 < Q3 – MEDIAN
B) Una distribuzione è simmetrica quando:
1. AVERAGE = MEDIAN
2. Q1 - MIN = MAX – Q3
3. MEDIAN – Q1 = Q3 – MEDIAN
C) Una distribuzione è asimmetrica negativa quando:
1. AVERAGE < MEDIAN
16
2. Q1 – MIN > MAX – Q3
3. MEDIAN – Q1 > Q3 – MEDIAN
OUTLIERS
The outliers in a data distribution are those data which are considered atypical
and far from the normal distribution. They affect the average.
An observation xi is defined atypical if it is not included into the range defined
by (Q1 – 1,5(Q3 – Q1); Q3 + 1,5(Q3 – Q1)):
2.2 MEASURES OF VARIABILITY

There are 5 types of measures of variability: RANGE, INTERQUARTILE
RANGE, VARIANCE, STANDARD DEVIATION, COEFFICIENT OF
VARIATION.
RANGE: The range is the difference between the largest and smallest
observation
INTERQUARTILE RANGE(IRQ): The INTERQUARTILE is equal to the
difference between the Third Quartile and the First Quartile (Q 3 – Q1). It
measures the variability of the central 50% of the data.
VARIANCE: The variance is the sum of
squared differences between each
observation and the population mean
divided by the population size, N.
VARIANCE FOR NOT-GRUPED DATA:
1
 POPULATION: σ2 = ∗ ∑𝑁 𝑖=1(𝑥𝑖 − µ)
2
𝑁
1
 SAMPLE: s2 = ∗ ∑𝑛𝑖=1(𝑥𝑖 − x )2
𝑛−1
SHORT FORMULAS:
∑𝑁
𝑖=1 𝑥𝑖
2
 POPULATION: σ2 = − µ2
𝑁
𝟐
𝑛 ∑𝑛
𝑖=1 𝑥𝑖
2
 SAMPLE: s =
2
∗ − x
𝑛−1 𝑛
PROOF FOR THE SHORTCUT FORMULA FOR THE POPULATION

1
σ2 = 𝑁 ∗ ∑𝑁
𝑖=1(𝑥𝑖 − µ)  you square the polynomial, and you explicit it;
2
1 2
= 𝑖=1(𝑥𝑖 − 2µ 𝑥𝑖 + µ )  you now divide the sum in three different sums, so
∗ ∑𝑁 2
𝑁
to have each element separated one from the other;
1
= 𝑁 [∑𝑁
𝑖=1 𝑥𝑖 + ∑𝑖=1 µ − ∑𝑖=1 2µ 𝑥𝑖 ]  you will have ∑𝑖=1 µ equal to N*µ , moreover
2 𝑁 2 𝑁 𝑁 2 2
we’ll have that 2µ can be taken out from the sum;

17
1
= [∑𝑁
𝑖=1 𝑥𝑖 + N*µ −2µ ∗ ∑𝑖=1 𝑥𝑖 ]  We’ll have that taking N to the denominator
2 2 𝑁
𝑁
N∗µ2 ∑𝑁
𝑖=1 𝑥𝑖
of any single sum we could simplify in µ2, whereas we’ll have that will
𝑁 𝑁
be equal to µ;
∑𝑁
𝑖=1 𝑥𝑖
2 ∑𝑁
𝑖=1 𝑥𝑖
2
= + µ2 - 2µ2  − µ2
𝑁 𝑁
1) FOR GROUPED DATA

1
 POPULATION: σ2 = ∗ ∑𝑁
𝑖=1 𝑓𝑖 ∗ (𝑥𝑖 − µ)
2
𝑁
1
 SAMPLE: s2 = ∗ ∑𝑘𝑖=1 𝑓𝑖 ∗ (𝑥𝑖 − x )2
𝑛−1
EXAMPLE: (with x = 1,96)

N° CARS OWNED fi xi - x (xi – x)2 fi*(xi – x)2
1 32 -0,96 0,9216 29,4912
2 48 0,04 0,0016 0,0768
3 16 1,04 1,0816 17,3056
5 4 3,04 9,2416 36,9664
TOT 100 83,84
83,84
= 0,8469 𝜎2 =
99
∑𝑘 𝑓 𝑥 2
SHORTCUT FORMULA: 𝑖=1𝑁 𝑖 𝑖 − µ2
2) VARIANCE FOR DATA GROUPED IN CLASSES: mi =midpoint of the

class
1
 POPULATION: σ2 = ∗ ∑𝑁
𝑖=1 𝑓𝑖 ∗ (𝑚𝑖 − µ)
2
𝑁
1
 SAMPLE: s2 = ∗ ∑𝑘𝑖=1 𝑓𝑖 ∗ (𝑚𝑖 − x )2
𝑛−1
∑𝑘
𝑖=1 𝑚𝑖 𝑥𝑖
2
SHORTCUT FORMULA: σ2 = − µ2
𝑁
STANDARD DEVIATION: The standard deviation is the positive square root of
the variance and it is defined as follows:
POPULATION: σ = √𝜎 2 |SAMPLE: S = √𝑆 2
COEFFICIENT OF VARIATION: The Coefficient of Variation is a measure of
relative dispersion that expresses the standard deviation as a percentage of the
mean.
𝜎 𝑆
POPULATION: CV = |µ| | SAMPLE: CV =  Equal to the ratio between
| x |
Standard Deviation and Mean.

CHEBYSHEV’S THEOREM
For any population with mean 𝜇, standard deviation 𝜎, and 𝑘 > 1, the percent of
1
observations that lie within [𝜇 ± 𝑘𝜎] is 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 100 [1 − (𝑘 2 )] % where K is the
number of standard deviation
18
 We have also a particular case, called EMPIRICAL RULE., which provides
an estimate of the approximate percentage of
observations that are contained within one,
two, or three standard deviations of the mean
INTERVAL Percentage for Percentage
Empirical Rule For Chebychev
µ±σ 68% At least 0%
µ ± 2σ 95% At least 75%
µ ± 3σ 99,73% At least 88,9%
We have then the z-score, which is a standardized value that indicates the
number of standard deviations a value is from the mean. A z-score greater than
zero indicates that the value is greater than the mean; a z-score less than zero
indicates that the value is less than the mean; and a z-score of zero indicates
𝑥 −𝜇
that the value is equal to the mean. The z-score is 𝑧 = 𝑖𝜎
2.3 WEIGHTED MEAN AND MEASURES OF GROUPED DATA

𝑤𝑥
The WEIGHTED MEAN of a set data is 𝑥̅ = ∑ 𝑛𝑖 𝑖 where 𝑤𝑖 = weight of the 𝑖𝑡ℎ
observation and 𝑛 = ∑ 𝑤𝑖
Suppose then that the data are grouped into K classes, with frequencies
𝑓1 , 𝑓2 , … , 𝑓𝑘 . If the midpoints of these classes are 𝑚1 , 𝑚2 , … , 𝑚𝑛 . Then the sample
mean and sample variance of grouped data are approximated in the following
manner: the mean is
∑𝐾𝑖=1 𝑓𝑖 𝑚𝑖
𝑥̅ =
𝑛
Where 𝑛 = ∑𝐾 𝑖=1 𝑓𝑖 and the variance is
2
∑𝐾𝑖=1 𝑓𝑖 (𝑚𝑖 − 𝑥̅ )
2
𝑠 =
𝑛−1
2.4 MEASURES OF RELATIONSHIPS BETWEEN VARIABLES

We can find three indexes in this class of measures of relationships, which are
fundamental to understand the degree of relationship between two different
variables. They are: COVARIANCE, CORRELATION COEFFICIENT,
REGRESSION LINE. Each one of them allows us to understand the relationship
between the two variables, answering to questions such as: “Is there a
relationship between the two variables?”, ”Is it positive or negative?”, “Is this
relationship strong?”.
COVARIANCE: The Covariance is a measure of the linear relationship between
two variables. If the value is positive we have a direct or increasing linear
relationship, whereas if it is negative we have a decreasing linear relationship.
1
 POPULATION: COV (X, Y) = σxy = 𝑁 ∗ ∑𝑁 𝑖=1(𝑥𝑖 − µ𝑥 ) ∗ (𝑦𝑖 − µ𝑦 )
1
 SAMPLE: COV (X, Y) = Sxy = ∗ ∑𝑛𝑖=1 (𝑥𝑖 − x ) ∗ (𝑦𝑖 − y )  x and y
𝑛−1
are the sample means
SHORTCUT FORMULAS
19
1
 POPULATION: σxy = ∗ [∑𝑁
𝑖=1(𝑥𝑖 ∗ 𝑦𝑖 ) − (µ𝑥 ∗ µ𝑦 )]
𝑁
𝑛 ∑𝑁
𝑖=1(𝑥𝑖 ∗𝑦𝑖 )
 SAMPLE: Sxy = ∗[ − x ∗ y
𝑛−1 𝑛
PROPERTIES OF THE COVARIANCE:

 COV (X, Y) = COV (Y, X)
 If COV (X, Y) ≥ 0 then we have a POSITIVE LINEAR RELATIONSHIP;
 If COV (X, Y) < 0 then we have a NEGATIVE LINEAR RELATIONSHIP;
 If COV (X, Y) = 0 then there is no LINEAR relationship (but this does not exclude
the presence of other relationships;
 COV (X, X) = VAR(X)
20
CORRELATION COEFFICIENT: The CORRELATION COEFFICIENT is computed
by dividing the covariance by the product of the standard deviationdi
correlazione lineare è una misura simmetrica relativa della relazione lineare tra
due variabile quantitative. Matematicamente essa è uguale al rapporto tra la
covarianza di X e Y e il prodotto degli scarti quadratici medi di X e Y.
Questo indice ha il compito di normalizzare la covarianza, in modo tale da dare
informazioni più precise riguardo alla relazione lineare tra le due variabili.
SHORTCUT:
𝐶𝑂𝑉 (𝑋,𝑌)
 POPULATION: ρxy =
𝜎𝑥 ∗𝜎𝑦
𝐶𝑂𝑉 (𝑋,𝑌)
 SAMPLE: rxy = 𝑆𝑥 ∗𝑆𝑦
The Covariance can assume values between [-1; +1], in case in which the index
assumes values of ± 1, then we will have a perfect linear correlation. All data
will be distributed on a line. If the index is equal to 0, then there will not be any
correlation; if it is greater than 0 there will be a positive linear correlation,
whereas if it is smaller than 0 we have a negative linear correlation. Normally
we call a weak linear correlation an index with a value between -0,3 and +0,3,
whereas if the index is greater than +0,7 or smaller than -0,7, we’ll have a
strong linear correlation. These limits are relative and not universally accepted;
they are useful as an indication of a possible distribution. It could happen that
data with nothing in common may still have a strong linear correlation. For this
reason, CORRELATION does not imply CAUSATION (so the presence of a
causal
relationship).
ESXAMPLE: We have also to point out that LINEAR INDEPENDENCE does not
imply STATISTIC INDEPENDENCE. If COV(X;Y)=0, so if there is LINEAR
INDEPENDENCE, it is not sure that there also is STATISTIC INDEPENDENCE. The
reverse, on the other hand, is true.
REGRESSION LINE: Through the regression line, we can describe two different
variables at the same time. It measures the asymmetric relationship of the 2
variables, so how much Y changes when X changes. This line will be defined by
an equation, defined by the parameters β0 and β1.
21
The equation will then be:
y = β0 + β1*x | x = independent variable | y = dependent variable
We’ll have that β0 is the vertical intercept, β1 is
the slope of the line, so it will point how much
Y increases when X is increasing. We will be
able to forecast the movement of Y if we know
the line’s equation and any value of X.
In order to find β1 and β0 we have to reason on
the forecast. Considering ŷ1 as the value that y
would assume in according to the forecast, and
knowing also the true value of y 1 we will have
that the difference ŷ1 – y1 = e1 which is the error
between the values which the variable assumes
and the ones forecast by the line.
It is necessary to find β0 and β1 in order
to minimize the sum of all the errors.
You then use the ORDINARY LEAST
SQUARES (OLS) for which you find the
minimum value of the sum of the
square of the errors.
MIN ∑𝑛𝑖=1 𝑒𝑖2
You can then compute an estimate of
β0 and β1 thanks to these formulas:
𝑆 𝐶𝑂𝑉 (𝑋,𝑌) 𝑆𝑦
β1 = b1 = 𝑆𝑥𝑦 2 = 𝑉𝐴𝑅 (𝑋) = 𝑟𝑥𝑦 ∗ 𝑆  b1
𝑥 𝑥
β0 = b0 = y – b1* x 
22
PROBABILITY
SUMMARY OF APPLIED MATHEMATICS PROBABILITY
RANDOM EXPERIMENT: process which leads to two or more outcomes without
the possibility of forecast which one it will happen.
EVENT: an event is any subset of the sample space Ω of a random experiment.
SAMPLE SPACE: set of all possible outcomes
RANDOM VARIABLE: variable which assumes numerical values depending on
the results of a random experiment.
The Probability Axioms:
 POSITIVITY: 0 ≤ P(E) ≤ 1
 CERTAINTY: P(Ω) = 1
 UNION: ⋃𝑛𝑖=1 𝐸𝑖 = ∑𝑛𝑖=1 𝐸𝑖
EXPECTED VALUE: the expected value of a random number is X(ω): Ω  R the
number, if it exists, is defined by: E(x) = µ = ∑𝑥 𝑥 ∗ 𝑃(𝑥). In statistics the
expected value is equal to the mean.
VARIANCE: a random number X(ω): Ω  R with expected value E[X(ω)] = µ1,
is called variance (VAR [X(ω)]) and it is defined by
𝑉𝑎𝑟(𝑋) = ∑(𝑥 − µ)2 ∗ 𝑃(𝑥)
𝑥
RANDOM BERNOULLIAN VARIABLE
The random BERNOULLIAN variable is the variable of a phenomenon which
shows only 2 outcomes.
 1 = success  probability = p
 0 = insuccess  probability = 1-p
𝑝, 𝑥 = 1
𝑝 𝑥 ∗ (1 − 𝑝)1−𝑥 , 𝑥 = 0,1
PROBABILITY FUNCTION: { 1 − 𝑝, 𝑥 = 0 {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
variance and the expected value will be:
E(x) = ∑𝑥 𝑥 *P(x) = 1*p + 0*(1-p) = p
Var(x) = p*(1-p)
23
CHAPTER 4 – DISCRETE PROBABILITY
DISTRIBUTIONS
4.4 BINOMIAL DISTRIBUTION
Let’s suppose that an experiment could have only two outcomes, mutually
exclusive, defined as “success” and “failure”, and we call p the probability for
the success to happen. If we repeat the experiment n times, in an independent
way, the distribution of the number of successes, X = X1 + X2 + … + Xn, is called
BINOMIAL DISTRIBUTION. Its PROBABILITY FUNCTION will be: X~BIN (n, p)
𝑛!
∗ 𝑝 𝑥 ∗ (1 − 𝑝)𝑛−𝑥
𝑃(𝑥 ) = {𝑥! (𝑛 − 𝑥 )! , 𝑥 = 0,1,2, … , 𝑛
𝑛!
is called BINOMIAL COEFFICIENT and can be indicated with (𝑛𝑥), it points
𝑥!(𝑛−𝑥)!
out the number of combinations of x successes in n experiments.
EXAMPLE: Let’s take as an example a financial operator which has the
opportunity to close up to 6 contracts every day. The probability p to close or
not a contract is equal to 0,3 so we’ll have p = 0,3 and (1-p) = 0,7. Which one
is going to be then the probability that the operator is going to close ONE single
contract on the six available that day? – we’ll have x = 1 and n = 6.
6!
P (1) = 1!(6−1)! ∗ 0,31 ∗ (1 − 0,3)6−1 = 0,3025 = 30,25%
What is instead the probability that the operator is going to conclude EXACTLY
2 contracts in one day?
We can have various combinations on the contract’s distribution:
COMBINATION 1: 1,1,0,0,0,0  p, p, (1-p), (1-p), (1-p), (1-p), = p2 *(1-p)4
COMBINATION 2: 1,0,0,0,0,1  p, (1-p), (1-p), (1-p), (1-p), p = p2 *(1-p)4
Both the combinations are perfectly equal from the probability’s point of view.
So, we can compute the probability of the event x = 2, n = 6
6!
P (2) = 2!(6−2)! ∗ 0,32 ∗ (1 − 0,3)6−2 = 0,3241 = 32,41%
The probability to close 2 contracts in the same day is equal to 32,41%.
EXPECTED VALUE AND VARIANCE:
In case in which X1, X2, …, Xn are Bernoulli random variables both independent
and identically distributed with a parameter p, then we’ll have that:
E(x) = n*p
Var (x) = n*p*(1-p)
 PROOF: Firstly we have to put the condition that X = X1 + X2 + … + Xn =
∑𝑛𝑖=1 𝑥𝑖 ~ BIN (n; p) | X ~ BIN | Xi ~ BER(p)
 E(x) = E (X1 + X2 + … + Xn) = E(X1) + E(X2) + … + E(Xn) = p + p +

… + p = n*p
 Var (x) = Var (X1 + X2 + … + Xn) = Var (X1) + Var (X2) + … + Var
(Xn) =
p*(1-p) + p*(1-p) + … + p*(1-p) = n*p*(1-p)
24
N.B. X ~ BIN (n; p) means that X is a binomial random variable with parameters
n and p. Xi ~ BER(p) means instead that Xi are Bernoulli random variables of
parameter p.
CHAPTER 5 – CONTINUOUS PROBABILITY

DISTRIBUTION
5.3 THE NORMAL DISTRIBUTION
DEFINITION: A continuous random variable is a variable which can assume
any value in a determined interval.
As we have seen before, if you want to represent a continuous random variable
you employ a histogram. In this case, though, we’ll have to put determined
conditions which will modify the graph. If on the y-axis of the histogram we had
the frequency density Ci, now we’ll have the probability density function f(x).
Moreover we must outline that the widths W of each class tend to 0, so that we
can define the function point by point.
THE DISTRIBUTION FUNCTION shows
different properties:
 f(x) > 0;
 The probability that X assumes in an interval
is the area under the graph in that interval;
 The area under the graph on the whole
interval of admitted values of X is equal to 1.
X0
So, the probability P (X = X0) = ∫X0 𝑓 (𝑥 )𝑑𝑥 = 0 is equal to the value that the
definite integral assumes in the given interval. If we are looking for the
probability of a single point it will always be equal to 0, because the integral
defined in an interval [X0; X0] will always be equal to 0.
P (a ≤ X ≤ b) = P (a < X ≤ b) = (a ≤ X < b) = (a < X < b) = P (X ≤ b) – P(X ≤
a) = F(b) – F(a)  so it is equal to the difference between the values that the
distribution function F(x) assumes at b and the value that the distribution
function F(x) assumes at a.
The DISTRIBUTION FUNCTION F(x), for a continuous random variable X,
expresses the probability that X is equal at most at x, so F(x) = P (X ≤ x)
The blue area will then be

equal to F(a), the sum
between the blue and the
red area is F(b), so the F(b)
– F(a) area, which is equal
to P (a ≤ X ≤ b), is equal to
the red area and indicates
the probability that X is
between a and b.
25
RANDOM VARIABLE AND NORMAL DISTRIBUTION: a normal distribution
must be symmetric, and its data must be concentrated at the mean value, and
then being lower when they are far from the mean.
A random variable which has a normal distribution will be indicated with X ~ N
(µ; σ2), so X will be a random variable with a normal distribution, expected value
E(X) = µ and variance VAR(X) = σ2. Its formula will be:
1 1 𝑥−µ 2
𝑓 (𝑥 ) = 𝑒 −2∗(𝜎
)
√2𝜋𝜎 2
PROPERTIES:
1. The function is bell-
shaped and symmetric;
2. The mean µ is expected value E(X), MEDIAN and MODE altogether;
3. It is asymptotic with respect to the x-axis;
4. It is increasing for x < µ and decreasing for x > µ;
5. It has two inflection points in µ - σ and in µ + σ;
6. The area under the curve is equal to 1.
EXAMPLE N. 1:
If the standard deviation (and so the
variance) assumes a lower value, the values
are going to be concentrate near to the
mean, otherwise, if the standard deviation
increases, the values are going to be less
concentrated and the tails are going to be
longer.
EXAMPLE N. 2:
If the variation is on µ, we’ll have a shift
of the graph on the right or on the left
whether µ increases or decreases,
respectively.
26
The DISTRIBUTION FUNCTION of a normal distribution will be defined by the
graph:
It will have an asymptote when F(X) = 1 and we’ll
have an inflection point at F(X) = 0,5.
F(x)
STANDARDIZATION OF A RANDOM VARIABLE

Let’s call X a random variable with mean µ and variance σ2. The random variable
𝑥− µ
Z= is called STANDARDIZED RANDOM VARIABLE and we also know its
𝜎
expected value and variance:
𝑥− µ
E(Z) = E ( 𝜎 ) = 0
𝑥− µ
VAR (Z) = VAR ( )=1
𝜎
We call NORMAL STANDARDIZED RANDOM VARIABLE if the starting
𝑥− µ
variable is a NORMAL. If X ~ N (µ; σ2) we’ll have that Z = ( ) ~ N (0; 1).
𝜎
Its function f(x) will be equal to: f(z) =
1 2
1
𝑒 −2∗𝑥 . To find the value of the area
√ 2𝜋
under the graph we use the NORMAL
DISTRIBUTION TABLES.
If, for example, z =0,3 then the area

under the graph up to that point will be equal to 0,6179.
The normal standard is employed to compute the probability P (a < X < b) with
X ~ N (µ; σ2). To compute this probability, it is used an articulate procedure
divided into three parts:
1) STANDARDIZZATION: we have to replace the computation of a random
variable X with the computation of random variable Z:
𝑎− µ 𝑏− µ
P (a < X < b) = P ( <Z< )
𝜎 𝜎
2) COMPUTING THE DISTRIBUTION FUNCTION:

𝑎− µ 𝑏− µ 𝑏− µ 𝑎− µ
P( <Z< ) = Fz ( ) – Fz ( )
𝜎 𝜎 𝜎 𝜎
3) READING THE TABLES: We have to read the tables knowing some of

their properties:
 On the tables there are only positive values for Z;
 On the tables we cannot see the values of the probability on the right
in a value  NO P (Z > z).
27
EXAMPLE: let’s take in consideration the prices for the tickets for Milan-Miami
such that µ = 500 and σ2 = 625. X ~ N (500; 625)
 Compute the probability that the price is between 500€ and 550 €:
We know then that P (500 < X < 550), which, through the standardization, will
be transformed in:
500− 500 550−500
P( < Z < 25 )
25
Which will result in P (0 < Z < 2) = Fz (2) – Fz (0)
So, through the tables, we will have: 0,0972 – 0,5 = 0,4772 = 47,72%
 Compute now the probability that the price is between 470€ and 520€.
The problem which comes from in computing this probability is that through the
tables we are not aware of the value that Z will take on 470€ because it is smaller
than the mean µ. We’ll proceed in this way:
P (470 < X < 520)  with the STANDARDIZATION becomes:
470− 500 520−500
P( < Z < 25 ) = P (-1,2 < X < 0,8) = Fz (0,8) – Fz (-1,2)
25
We transform: P (X < -1,2) = P (X > 1,2) = 1 – Fz (1,2) because the normal
standard distribution is symmetric. The area at the left of -1,2, in fact, is equal
to the one at the right of 1,2 and vice versa.
So, we have then: Fz (0,8) – [1 – Fz (1,2)] = 0,7881 – [1 – 08849] = 0,7881–
0,1151 = 0,6730 = 67,3%
28
SUMMARY FOR THE SOLUTION OF STANDARD NORMAL DISTRIBUTIONS
USING THE GRAPH:
P (Z < a) = Fz (a)
P (Z > -a) = P (Z < a) = Fz (a)
P (Z > a) = 1 – Fz (a)
P (Z < -a) = P (Z > a) = 1 – Fz (a)
P (a < Z < b) = Fz (b) – Fz (a)
P (-b < Z < -a) = Fz (b) – Fz (a)
P (-a < Z < b) = Fz (b) + Fz (a) – 1
P (-b < Z < a) = Fz (b) + Fz (a) – 1
29
STANDARDIZATION TABLES:
30
NORMAL PROBABILITY PLOT:
 Arrange data from low to high values
 Find cumulative normal probabilities for all values
 Examine a plot of the observed values vs. cumulative probabilities (with the
cumulative normal probability on the vertical axis and the observed data
values on the horizontal axis)
 Evaluate the plot for evidence of linearity
It is similar to the SCATTER PLOT, the normal probability plot shows on the y-
axis the values of the distribution function for any variable. We’ll then have
various points on the graph (for this reason it is similar to the scatter plot).
If these points are well-distributed such
that they assume the form of a line, then
the distribution will be approximately
normal, otherwise it is assumed that it is
not.
As we can easily guess, in this case the
distribution can be defined as a normal
In this case the data distribution cannot

be associated to a normal, because the
graph does not look like a line.
CHAPTER 6 – DISTRIBUTIONS OF SAMPLE

STATISTICS
6.1 SAMPLING FROM A POPULATION
Populations for various statistical studies are modeled as random variables
whose probability distributions have unknown mean and variance as the
statistical sampling and analysis are conducted. In order to obtain inferences
about the population mean and variance, sample observations (also called
31
realization of a random variable) will be selected and used to sample statistics
such as sample mean and sample variance.
A SIMPLE RANDOM SAMPLE IS CHOSEN BY A PROCESS THAT SELECTS A

SAMPLE OF n OBJECTS FROM APOPULATION IN SUCH A WAY THAT EVERY
MEMBER OF THEPOPULATION HAS THE SAMPLEPROBABILITY OF BEING
SELECTED, AND EVERY POSIBLE SAMOLE OF A GIVEN SIZE, n, HAS
THESAMEPROBABILITY OF SELECTION.
Sample statistics is used instead of working to obtain all the information from
the population for two main reasons:
1. It is often difficult to obtain and measure every single item in a

population
2. Properly selected sample can estimations of population characteristics
that are quite close to the actual population values
To compute sample statistics the properties of the sampling distribution is to

be examined.
CONSIDER A RANDOM SAMPLE SELECTED FROM A POPULATION THAT IS USED

TO MAKE AN INFERNECE ABOUT SOME POPULATION CHARACTERISTICS AND
CONSIDER ONE CHARACTERISTIC (SUCH AS THE SAMPLE MEAN). EVERY
SAMPLE DRAWN FROM THE GIVEN POPULATION WILL GIVE A DIFFERENT
VALUE FOR THE CHOSEN CHARACTERISTIC. THE SAMPLING DISTRIBUTION OF
THE CHARACTERISTIC (ex. MEAN) IS THE PROBABILITY DISTRIBUTION OF
THE VALUES OBATINED FROM ALL POSSIBLE SAMPLES OF THE SAME NUMBER
OF OBSERVATIONS DRAWN FROM THE POPULATION.
6.2 SAMPLING DISTRIBUTIONS OF SAMPLE MEANS

CONSIDER A RANDOM SAMPLE OF n OBSERVATIONS FROM A VERY LARGE
POPULATION WITH MEAN µ AND VARIANCE σ2 . THE OBSERVATION FROM THE
SAMPLE ARE CALLED RANDOM VARIABLES AND ARE DEFINED AS X1, X2,
X3,..,Xn. THE SAMPLE MEAN OF THIS SAMPLE IS DEFINED AS FOLLOWS:
𝑛
1
𝑋̅ = ∗ ∑ 𝑥𝑖
𝑛
𝑖=1
Recall that the expected value of linear combination of random variables (as
the one just stated) is the linear combination of the expectations, it follows
that:
𝑛
1
𝐸 (𝑥̅ ) = 𝐸 ( ∗ ∑ 𝑥𝑖 ) =
𝑛
𝑖=1
32
1
= ∗ (𝐸 (𝑥1 ) + 𝐸 (𝑥2 )+. . +𝐸 (𝑥𝑛 )) =
𝑛
1
= ∗ (𝜇1 + 𝜇2 +. . +𝜇𝑛 ) =
𝑛
1
= ∗ 𝑛𝜇 = 𝜇
𝑛
The mean of the sampling distribution of the sample means (𝐸 (𝑥̅ )) is the
population mean (𝜇).
Recall also that the variance of linear combination of independent random

variables is the sum of the linear coefficients squared times the variance of the
random variables. It follows that:
1 1 1
𝑉𝑎𝑟(𝑥̅ ) = 𝜎𝑥̅2 = 𝑉𝑎𝑟 ( 𝑥1 + 𝑥2 +. . + 𝑥𝑛 ) =
𝑛 𝑛 𝑛
1 1 1
= 2 𝑉𝑎𝑟(𝑥1 ) + 2 𝑉𝑎𝑟(𝑥2 )+. . + 2 𝑉𝑎𝑟(𝑥𝑛 ) =
𝑛 𝑛 𝑛
1 2 1 2 1
= 2
𝜎 + 2 𝜎 +. . + 2 𝜎 2 =
𝑛 𝑛 𝑛
1 2
𝜎2
= ∗ 𝑛𝜎 =
𝑛2 𝑛
We characteristics must be borne in mind:
 As the sample size n increases the variance of the sampling

distribution decreases. This means that a larger sample size results in
a more concentrated sampling distribution.
 If the sample size, n, is not a small fraction of the population size N

then the individual sample members are not distributed independently
of one another. In this case the sample variance is as follows:
𝜎2 𝑁 − 𝑛
∗
𝑛 𝑁−1
 The correlated standard deviation is called standard error and is

defined as follows:
𝜎
𝜎𝑥̅ =
√𝑛
Now suppose that the population which the samples are drawn from has a
normal distribution, then also the sampling distribution of the sample means
has a normal distribution. It follows that it is possible to compute the standard
normal Z for the sample mean:
33
WHENEVER THE SAMPLING DISTRIBUTION OF THE SAMPLE MEANS HAS A
NORMAL DISRIBUTION, IT IS POSSIBLE TO COMPUTE A STANDARDIZED
NORMAL RANDOM VARIABLE, Z, THAT HAS MEAN EQUAL TO 0 AND VARIANCE
EQUAL TO 1:
𝑥̅ − 𝜇 𝑥̅ − 𝜇
𝑍= = 𝜎
𝜎𝑥̅
√𝑛
This is exactly what the central limit theorem (C.L.T.) shows:
LET X1,X2,…Xn BE A SET OF n INDEPENDENT RANDOM VARIABLES HAVING

IDENTICAL DISTRIBUTIONS WITH MEAN µ, VARIANCE σ2, AND 𝑋̅ AS THE
MEAN OF THESE RANDOM VARIABLES. AS n BECOMES LARGE, THE CENTRAL
LIMIT THEOREM STATES THAT THE DISTRIBUTION OF
𝑥̅ − 𝜇
𝑍=
𝜎𝑥̅
APPROACHES THE NORMAL SATNDARD DISTRIBUTON.
Given the definition of sample mean and the central limit theorem we can now
introduce the concept of acceptance interval.
AN ACEPTANCE INTERVAL IS AN INTERVAL WITHIN WHICH A SAMPE MEAN

HAS A HIGH PROBABLITY OF OCCURRING, GIVEN THAT WE KNOW THE
POPULATION MEANAND VARIANCE.
Characteristics:
 If the sample mean is actually in the interval, then we can accept the
conclusion that the random sample comes from the population
 The probability that the sample mean is within a particular interval
can be computed if the sample means have a distribution that is close
to normal
ASSUMING THAT WE KNOW THE POPULATION MEAN AND VARIANCE, WE CAN

CONSTRUCT A SYMETRIC ACCEPTANCE INTERVAL:
𝜇 ± 𝑍𝛼 ⁄2 ∗ 𝜎𝑋̅
PROVIDED THAT 𝑋 HAS A NORMAL DISTRIBUTION AND 𝑍𝛼 ⁄2 IS THE STANDARD
̅
NORMAL WHEN THE UPPER TAIL PROBABILITY IS 𝛼⁄2.THE PROBABILITY THAT
THE SAMPLE MEAN 𝑋̅ ISINCLUDED IN THE INTERVAL IS 1 − 𝛼
If the sample mean is outside the acceptance interval, then we suspect that
the population mean is not µ.
34
6.3 SAMPLING DISTRIBUTIONS OF SAMPLE PROPORTIONS
As we can use sample mean to obtain inferences about population mean, we
can also obtain inferences about population proportion using the sample
proportion.
LET X BE THE NUMBER OF SUCCESS IN A BINOMIAL SAMPLE OF n

OBSERVATIONS WITH THE PARAMETER P. THE PARAMETER IS THE
PROPRTION OF THE POPULATION MEMEBERS THAT HAVE A CHARACTERISTIC
OF INTEREST. WE DEFINE THE SAMPLE PROPORTION AS FOLLOWS:
𝑋
𝑝̂ =
𝑛
WHERE X IS THE SUM OF A SET OF n INDEPENDENT BERNOULLI RANDOM

VARIABLES, EACH WITH PROBABILITY OF SUCCESS. AS A RESULT, 𝑝̂ IS THE
MEAN OF A SET OF INDEPENT RANDOM VARIABLES.
Recall the formulas of expected value and variance from a binomial

distribution:
𝐸 (𝑋) = 𝑛𝑃 𝑉𝑎𝑟(𝑋) = 𝑛𝑃(1 − 𝑃)
It follows that the expected value and variance of 𝑝̂ are respectively:
𝑋 1 1
𝐸 (𝑝̂ ) = 𝐸 ( ) = ∗ 𝐸 (𝑋) = ∗ 𝑛𝑃 = 𝑃
𝑛 𝑛 𝑛
𝑋 1 1 𝑃(1 − 𝑃)
𝑉𝑎𝑟(𝑝̂ ) = 𝑉𝑎𝑟 ( ) = 2 𝑉𝑎𝑟(𝑋) = 2 ∗ 𝑛𝑃(1 − 𝑃) =
𝑛 𝑛 𝑛 𝑛
As we saw with the sample means, also the sample proportion distribution can
be approximated to a standard normal distribution, provided that the sample
size is large enough that is if 𝑛𝑃(1 − 𝑃) > 5. We can therefore define the random
variable as:
𝑝̂ − 𝑃 𝑝̂ − 𝑃
𝑍= =
𝜎𝑝̂
√𝑃(1 − 𝑃)
𝑛
6.4 SAMPLING DISTRIBUTION OF SAMPLE VARIANCES

LET X,X,…,X BE A RANDOM SAMPLE OFOBSERVATIONS FROM APOPULATION.
THEN:
35
𝑛
2
1
𝑠 = ∗ ∑(𝑥𝑖 − 𝑥̅ )2
𝑛−1
𝑖=1
IS CALLED SAMPLE VARIANCE AND ITS SQUARE ROOT, s, IS CALLED SAMPLE

STANDARD DEVIATION.
As for the sample mean and the sample proportion the expected value of the
sample variance is equal to the population variance:
𝐸 (𝑠 2 ) = 𝜎 2
CHAPTER 7 – CONFIDENCE INTERVAL

ESTIMATION: ONE POPULATION
7.1 POINT ESTIMATOR AND POINT ESTIMATE
In order to understand the difference between point estimator and point
estimate is essential to keep in mind the definitions of estimator and estimate
of a parameter.
AN ESTIMATOR OF A POPULATION PARAMETER IS A RANDOM VARIABLE THAT

DEPENDS ON THE SAMPLE INFORMATION. THE VALUE IT ASSUMES PROVIDES
APPROXIMATIONS OF THIS UNKNOWN PARAMETER.
AN ESTIMATE IS A SPECIFIC VALUE OF A RANDOM VARIABLE
It follows that:
A POINT ESTIMATOR OF A POPULATION IS A SINGLE VARIABLE WHICH IS A

FUNCTION OF THE SAMPLE INFORMATION AND PRODUCES A SOLE
APPROXIMATION OF A CONSIDEREDPOPULATION PARAMETER (EX. THE
MEAN(µ) OR PROPORTION(P).
A POINT ESTIMATE IS THE SINGLE VALUED PRODUCED BY THE POINT

ESTIMATOR
A fundamental characteristic of a point estimator is unbiasedness. A point

estimator is said to be unbiased if its expected value is equal to the parameter
itself E(θ)= Θ. Such characteristic does not imply that a point estimate is
exactly the correct value of the parameter but, rather, that the average of the
point estimate values for all possible samples (which is exactly the expected
value) estimates correctly the population parameter. Given a practical problem
in which there are different unbiased estimators the most efficient one is the
one whose distribution is closer to the population parameter to be estimated.
36

Statistics Handouts: Part 1 of 2 II Edition A.Y. 2017/2018

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statistics Handouts: Part 1 of 2 II Edition A.Y. 2017/2018

Загружено:

Авторское право:

Доступные форматы

STATISTICS HANDOUTS

Part 1 of 2 II edition a.y. 2017/2018

Written and edited by:

CHAPTER 1 – USING GRAPHS TO DESCRIBE DATA

1.3 GRAPHS TO DESCRIBE CATEGORICAL VARIABLES

N° RESTAURANT TYPOLOGY DELIVERY PRICE CLASS

I. The typology of the restaurant will be a QUALITATIVE NOMINAL

CAUSES fi pi (%) Fi (%) - %

100.00% 100% 100.00%

50.00% 43.75% 50.00%

20.00% 15% 20.00%

10.00% 6.25% 10.00%

1.4 GRAPHS TO DESCRIBE TIME-SERIES DATA

N.B. If a period is missing, it has to be made noted.

If we have CONTINUOUS NUMERICAL VARIABLES or DISCRETE NUMERICAL

SAMPLE SIZE N. CLASSES

Fewer than 50 5-7

4. It is to be determined the Class Width, which is equal to

SYMMETRY: the shape of a distribution Is said to be symmetric if the

SKEWNESS: a distribution is skewed, or asymmetric, if the observations are

FREQUENCY DISTRIBUTION TABLE:

X(mil$) Y (mil$) 800

Unemployment rate Without diploma With diploma

CHAPTER 2 – USING GRAPHS TO DESCRIBE DATA

2. WEIGHTED MEAN: The weighted mean is a value computed through the

FOR GROUPED DATA

 QUANTITATIVE DATA GROUPED IN CLASSES: We have to find

CHART AND FUNCTION PROPERTY:

2.2 MEASURES OF VARIABILITY

PROOF FOR THE SHORTCUT FORMULA FOR THE POPULATION

we’ll have that 2µ can be taken out from the sum;

1) FOR GROUPED DATA

EXAMPLE: (with x = 1,96)

2) VARIANCE FOR DATA GROUPED IN CLASSES: mi =midpoint of the

Standard Deviation and Mean.

2.3 WEIGHTED MEAN AND MEASURES OF GROUPED DATA

2.4 MEASURES OF RELATIONSHIPS BETWEEN VARIABLES

PROPERTIES OF THE COVARIANCE:

 E(x) = E (X1 + X2 + … + Xn) = E(X1) + E(X2) + … + E(Xn) = p + p +

CHAPTER 5 – CONTINUOUS PROBABILITY

The blue area will then be

STANDARDIZATION OF A RANDOM VARIABLE

If, for example, z =0,3 then the area

2) COMPUTING THE DISTRIBUTION FUNCTION:

3) READING THE TABLES: We have to read the tables knowing some of

P (Z > -a) = P (Z < a) = Fz (a)

P (Z < -a) = P (Z > a) = 1 – Fz (a)

P (a < Z < b) = Fz (b) – Fz (a)

P (-b < Z < -a) = Fz (b) – Fz (a)

P (-a < Z < b) = Fz (b) + Fz (a) – 1

P (-b < Z < a) = Fz (b) + Fz (a) – 1

In this case the data distribution cannot

CHAPTER 6 – DISTRIBUTIONS OF SAMPLE

A SIMPLE RANDOM SAMPLE IS CHOSEN BY A PROCESS THAT SELECTS A

1. It is often difficult to obtain and measure every single item in a

To compute sample statistics the properties of the sampling distribution is to

CONSIDER A RANDOM SAMPLE SELECTED FROM A POPULATION THAT IS USED

6.2 SAMPLING DISTRIBUTIONS OF SAMPLE MEANS

Recall also that the variance of linear combination of independent random

We characteristics must be borne in mind:

 As the sample size n increases the variance of the sampling

 If the sample size, n, is not a small fraction of the population size N