Вы находитесь на странице: 1из 68

APPLIED STATISTICS

M.E.E. Alberto Campos Sánchez


*PHD student
ABOUT THE TEACHER.
• Master degree at ITCG.

• Student of robotics focused on computer vision at CINVESTAV.


A VISUAL-BASED REACTIVE
NAVIGATION OF A SWARM OF
QUADROTORS FOR USAR
MISSONS.
A BRIEF HISTORY ABOUT VISIÓN.

The big bang of evolution of the species started


543 million years ago due to the development
of vision in different species.
MACHINE LEARNING.

Hubel & Wiesel, 1959, through


experiments with a cat discovered
what detect the primary visual cortex.
MACHINE LEARNING / DEEP
LEARNING
• Machine Learning (ML) is a computer vision area whose objective is that through
algorithms the machines can learn specific tasks.

• Deep Learning is an area of ​ML that uses models based on convolutional


neural networks (CNN).
MACHINE LEARNING VS DEEP
LEARNING
CONVOLUTIONAL NEURAL
NETWORKS (CNN´S)

What is a convolutional neural network???????????


WAYS TO USE A CNN
• From scratch.
• Through Transfer Learning.
• As es feature extractor.
WHAT ABOUT YOU?
• What’s your name?
• What have you learned until today?
• What do you want to be in 5 years?
INTRODUCTION.

Statistics is the science of data. Therefore, we began our study of


statistics by going into the art of examining it. Any data set contains
information about a group of individuals. The information is organized in
the form of variables.
Individuals are the objects described by a set of data. Individuals can be people, but they can also be animals
or things.

A variable is any characteristic of an individual. A variable can take different values ​for different individuals.
A database of university students, for example, contains data on each of the students
enrolled. Students are the individuals described by the data set. For each individual,
the data contain values ​of variables such as date of birth, sex (male or female),
career choice or grades. In practice, any data set is accompanied by general
information that helps to understand them. When planning a statistical study or when
you are faced with a new data set, consider the following questions:
1. Who? Which individuals describe the data?
How many individuals appear in the data?
2. What? How many variables do the data contain?
What are the exact definitions of these variables?
In which units has each variable been registered?
The weight, for example, can be expressed in kilograms,
in quintales or in tons.
3. Why? What purpose is pursued with this data?
Do we want to answer a specific question?
Do we want to draw conclusions about some individuals
that we do not really have data on?
Some variables, such as sex or profession, simply classify subjects into categories. Others, however,
such as height or annual income, take numerical values ​with which we can make arithmetic
calculations.
EXAMPLE

Name Age Sex Race Incomes Job

Fleetwood, Delores 39 Female White 62.1 Director

Perez, Juan 27 Male White 45.56 Freelance

Wang, Lin 22 Female Assian 74.1 Directive

Johnson, LaVerne 48 Male Black 95.2 Tecnisian


EXERCISE
Model Type Type of Number of Consumptio Consumptio
change cilinders n in the city n in the road
BMW 3181 Small Automatic 4 10.8 7.6

BMW 3181 Small Standar 4 10.3 7.4

Buick Century Medium Automatic 6 11.8 8.2

Chevrolet Blazer Big Automatic 6 14.8 11.8

(a) What individuals describe this data set?


(b) For each individual, what variables are given? Which of these variables are categorical and which are numerical?
EXERCISE

The data on a medical study contain values of many variables for each of the subjects of the study. Of the following variables, which are categorical
and which are numerical?

(a) Gender (male or female).


(b) Age (years).
(c) Race (Asian, black, white or other).
(d) Smoker (yes, no).
(e) Blood pressure (in millimeters of mercury).
(f) Concentration of calcium in the blood (in micrograms per liter).
Distribution graphics

 Statistical tools and ideas help us examine data to describe its main
characteristics. This test is called exploratory data analysis. Like an explorer
crossing unknown lands, the first thing we will do is simply describe what we
see.
Categorical variables: bar

charts and sector diagrams

 The values of a categorical variable are labels assigned to categories of


the variable, such as "man" and "woman". The distribution of a categorical
variable lists the categories and gives the count or the percentage of
individuals in each category. For example, here is the distribution of the
number of families by type in Sweden according to Eurostat data from
1991. Type of Count(Thous Percentage
family ands)
Couples 1168 53.50
without
children

Couples with 830 38.02


children
Man alone 27 1.24
with children
Woman 158 7.24
alone with
children
Exercise

The data on the percentage of women who were doctorate in different


disciplines in the USA during 1994 (according to the 1997 Statistical Abstract of
the United States) are the following:
Computing 15.4%

Pedagogy 60.8%

Engineering 11.1%
(a) Present this data in the form of a bar chart.
(b) Would it also be correct to use a pie chart to show this data? Biology 40.7%
Justify your answer.

Psicol 62.2%
ogy
Physics
21.7%
Exercise

According to data from the National Institute of Statistics (INE) the most
significant causes of death in Spanish hospitals in 1996 were:
disorders of 133499
the
circulatory
system
(a) Find the percentage of each of the causes of death and express it with integer values. What percentage of deaths
was due to tumors? Tumors 89204
(b) Draw a bar diagram of the distribution of causes of death in Spanish hospitals. Identify each bar well.
(c) Would it also be correct to use a pie chart to represent the data? Justify your answer.

Disorders of 34718
the
respiratory
system
Immune 5504
system
disorders
External
causes of
trauma and
16324
poisoning
Description of distributions with numbers

Ladislao Kubala is possibly the best player the F.C. has ever had. Here is the number of goals per season that this player
scored while in the F.C. Barcelona:
AVERAGE.

Almost always the description of a distribution includes a measure of its


center.
The most common center measure is the arithmetic mean or average.
EXERCISE.

Here are the results of 18 first-year university students in the SSHA


(Survey of Study Habits and Attitudes)
about study habits and attitude:

154 103 109 126 137 126 115 137 152


165 140 165 154 129 178 200 101 148
THE MEDIAN.
The median M is the midpoint of a distribution, that is, it is the
number such that half of the observations are minor and the other half,
older. To find the median of a distribution:
1. Order all observations from the minimum to the maximum.
2. If the number of observations n is odd, then the median M
it is the central observation of the ordered list. Find the position of
observations from the beginning of the median counting (n + 1) / 2
observations from the beginning of the list.
3. If the number of observations n is even, the median M is the mean of
the two central observations of the ordered list. The position of the
median is, again, counting (n + 1) / 2 observations from the beginning of
the list.
EXAMPLE.

Find the median number of goals for Paulino Alcántara while he was a
F.C. Barcelona. First, sort the data in increasing order:

15 5 42 0 39 8 47 15 21 6 25 33 34 19 6 42
EXERCISE.

A study in Switzerland examined the number of caesarean sections


carried out by 15 doctors (men) during one year. His results were:

27 50 33 25 86 25 85 31 37 44 20 36 59 34 28
(a) Find the mean and median number of caesarean sections.
(b) Find the mean and median number of cesarean sections without the
two atypical observations. The results in (a) and in (b), illustrate the
robustness of the median and the lack of robustness of the mean?
EXERCISE.

In the United States, the distribution of individual income is very


asymmetric to the right. In 1997, the average and median income of 1%
of the richest Americans was 330,000 and $ 675,000, respectively. Which
of these values ​corresponds to the mean and which to the median?
Justify your answer.
THE TYPICAL DEFLECTION.

The variance s ^ 2 of a set of observations is the sum of the squares of


the deviations of the observations with respect to their mean divided by n
- 1. Algebraically the variance of n observations x 1, x 2,. . . , x n is:
EXERCISE.

The metabolic level of a person is the rate at which the body consumes
energy.
This level is important in dietetics studies. Here are the metabolic levels
of 7 men who took part in a diet study (units are calories every 24 hours,
calories are also used to describe the energy content of food).
1.792 1.666 1.362 1.614 1.460 1.867 1.439
FIND S^2 ANS S.

1.792 1.666 1.362 1.614 1.460 1.867 1.439

Observations Deviations Deviations


squared
EXERCISE.

The concentration of certain substances in the blood influences the


health of people. Here are the measurements of the level of phosphates
in the blood of a patient who made sixteen consecutive visits to a clinic,
expressed in milligrams of phosphate per deciliter of blood.

2,9 7,0 7,5 3,3 5,6 5,2 4,6 4,9 5,7 6,4 4,1 5,9 6,8 8,2 3,9 6,5
EXERCISE.
The great football player Ferenc Puskas, popularly known as Cañoncito Pum,
played from the 1948/49 season to 1956/57 at the Kispest in Budapest. In 1956
he fled Hungary when the Hungarian Revolution broke out and was two
seasons without playing. In season 1958/59 he signed for Real Madrid and was
active in this team as a player of the Spanish league until the 1965/66 season.
Here is the number of goals he scored per season:
EXERCISE.
(a) Find the average x̄ and the standard deviation s of the number of goals in
the league since the 1948/49 season to the 1965/66 season.
(b) Find x̄ and s once the seasons are eliminated 1956/57 and 1957/58. How
does the elimination of these two seasons affect the values ​of x̄ and s?
1.- Last year, a small consulting company paid
€ 22,000 to each of its five managers and 50,000
to the two university graduates. Finally, the owne
of the company charged € 270,000.
What is the average salary paid in this company?
How many employees earn less than the averag
What is the median salary?
2.- In 1798 the English scientist Henry Cavendish determined the density of the
Earth very accurately. When complicated measurements are made, it is
advisable to repeat the operation several times and work with the average of all
of them.Cavendish repeated his measurement 29 times.Here the results
obtained:
5,50 5,61 4,88 5,07 5,26 5,55 5,36 5,29 5,58 5,65 5,57 5,53 5,62 5,29 5,44 5,34
5,79 5,10 5,27 5,39 5,42 5,47 5,63 5,34 5,46 5,30 5,75 5,68 5,85
3.- x and s are not enough. The mean x and the
standard deviation s as measures of center and
dispersion are not a complete description of a
distribution. Data sets of different shapes can have
the same mean and standard deviation. Find s and x
of the following data sets:
Dataset A 9,14 8,14 8,74 8,77 9,26 8,10
6,13 3,10 9,13 7,26 4,74
Dataset B 6,58 5,76 7,71 8,84 8,47 7,04
5,25 5,56 7,91 6,89 12,50
4.- The players of the Baltimore Orioles baseball team
in the US were the highest paid during the 1998 US
league. Here are their salaries in thousands of dollars.
(For example, 6,495 means $ 6,495,000.)
6.495 6.486 6.300 6.269 5.442 5.391 3.600 3.600 3.583
3.089 2.850 2.500 1.950 1.663 1.367 1.333 1.150 900 856
800 800 665 650 450 450 170 170
Find the mean, ,median, s2 and s.
Statistical fashion

Fashion is the value that has the highest


frequency. It is represented by Mo. Find the
fashion of the distribution:
4, 5, 4, 6, 7, 8, 10, 4, 2, 1, 13
If in a group there are two or several scores
with the same frequency and that frequency
is the maximum, the distribution is bimodal
or multimodal, that is, it has several
fashions.
1, 2, 2, 1, 4, 1, 2, 5, 7, 8, 9, 15, 3, 4, 7, 8, 7,
4
When all the scores of a group have the
same frequency, there is no fashion.
If two adjacent scores have the maximum
frequency, the mode is the average of the
two adjacent scores.
1, 3, 4, 5, 9, 1, 2, 8, 10, 1, 12, 3, 9, 16, 3
Fashion calculation for grouped data

𝑓𝑖 − 𝑓𝑖−1
𝑀0 = 𝐿𝑖 + ∗ 𝑎𝑖
𝑓𝑖 − 𝑓𝑖−1 + 𝑓𝑖 − 𝑓𝑖+1
Li it is the lower limit of the modal class.
fi is the absolute frequency of the modal class.
fi-1 is the absolute frequency immediately below the modal
class.
fi+1 is the absolute frequency immediately following the
modal class.
ai it is the amplitude of the class.
Example
Calculate and graph the fashion of a
statistical distribution that is given by the
following table:
fi
(60, 63) 5
(63, 66) 18
(66, 69) 42
(69, 72) 27
(72, 75) 8
100
Exercise

fi fi fi
(20, 27) 7 (2, 12) 10 (40, 46) 15
(27, 34) 18 (12, 22) 13 (46, 52) 28
(34, 41) 20 (22, 32) 27 (52, 58) 42
(41, 48) 33 (32, 42) 32 (58, 64) 27
(48, 55) 46 (42, 52) 42 (64, 70) 8
(55, 62) 12 (52, 62) 20 (70, 76) 45
(62, 69) 50 (62, 72) 11 (76, 82) 26
(69, 76) 2 (72, 82) 9 (82, 88) 23
100
Quantitative variables: histograms

When the quantitative variables take many


values, the graph of the distribution is
clearer if the next values ​are grouped. The
most common graph to describe the
distribution of a quantitative variable is a
histogram.
Exercise
1.- The data
Exercises. given below correspond to the
weights in Kg. Of eighty people:
60;66;77;70;66;68;57;70;66;52;75;65;69;71;58;66;
6 7 ; 7 4 ; 6 1 ;6 3 ; 6 9 ; 8 0 ; 5 9 ; 6 6 ; 7 0 ; 6 7 ; 7 8 ; 7 5 ; 6 4 ; 7 1 ; 8 1 ; 6 2 ; 6
4;69;68;72;83;56;65;74;67;54;65;65;69;61;67;73;5
7;62;67;68;63;67;71;68;76;61;62;63;76;61;67;67;6
4;72;64;73;79;58;67;71;68;59;69;70;66;62;63;66;
(a) Obtain a distribution of data in intervals of amplitude 5, being the first interval
[50; 55].
(b) Calculate the percentage of people under 65 kg.
(c) How many people weigh more than or equal to 70 Kg but less than 85?
(d) Create the histogram and find the mean and fashion.
2.- Given the following distribution, construct
a statistical table in which the absolute
frequencies, the relative frequencies and
the increasing relative accumulated
frequencies appear:
xi 1 2 3 4 5 6
ni 5 7 9 6 7 6
Note: you are going to obtain xi ni fi Fi↓
3.- The average temperatures registered
during the month of May in Madrid are given
by the following table:
Temperature 13 14 15 16 17 18 19 20 21
22
N.° of days 1 1 2 3 6 8 4 3 2 1
Construct the corresponding graphic
representation.
4.- Calculation of the arithmetic mean, the
median and the mode. The tax that is
applied, in various European countries, to
the purchase of works of art was analyzed.
The results obtained were the following:
COUNTRY
Spain 0.16
Italy 0.20
Belgium 0.06
Netherlands 0.06
Germany 0.07
Portugal 0.17
Luxembourg 0.06
Finland 0.22
5.- A tire manufacturer has collected, from
the different dealers, information on the
number of thousands of kilometers traveled
by a specific model of these tires until there
has been a puncture or a blowout of the tire.
The concessionaires have provided the
following information:
a) Build a frequency table for these data
taking as number of intervals the value of
13.
B) Build the cumulative ascending
frequency tables.
C) Draw the relative frequency histogram
without accumulating and accumulated.
D) Calculate the main measures of central
tendency.
1.-Find the mean, median, fashion, s and s2
Exercises.
and créate a histogram.
A) B)
20 5 4 0.6 0.5 0.3
12 14 11 0.2 1.4 0.5
3 27 32 1.3 2.1 0.2
8 20 12 2.4 0.9 1.3
9 22 20 0.2 1.4 1.8
15 7 8 0.5 1.1 2.2
17 2 2 0.7 0.6 1.1
8 6 5 2 0.4 0.5
3 9 10 1.7 1 0.5
2.- A group of 18 subjects (Group 1) was asked
to assemble as many words as possible from a
disordered set of letters in 2 minutes. The
amount of correct words armed was used as an
indicator of the ability of each subject. The
results were:
6 2 4 4 7 3 6 7 7 5 6 5 6 5 6 1 7 13
Another group of 18 subjects (Group 2)
performed the same task. The results were:
3.- Children, unlike adults, tend to
remember movies, stories and stories as a
succession of actions rather than the
argument as a whole and as a whole. In the
story of a movie, for example, they often
use the words "and then ...". A psychologist
with supreme patience asked 50 children to
tell her a certain movie they had seen. He
considered the variable: amount of "and
then ..." used in the story and recorded the
following data:
As part of the same study, the experimenter obtained the
same type of data from 50 adults. These were:

10 12 5 8 13 10 12 8 7 9
11 10 9 9 11 15 12 17 14 10
9 8 15 16 10 14 7 16 9 1
4 11 12 7 9 10 3 11 14 8
12 5 10 9 7 11 14 10 15 9
For both:

a) Build the frequency table.


b) Calculate the mean, median and mode.
c) Graph both distributions so that they can
be compared.
4.- Here are fifteen numbers. 10 12 13 15
15 17 19 20 20 20 21 25 25 25 25
(a) Find the mode.
(b) Find the median.
(c) Create a histogram.
1.- Belleview
Quiz. College must make a report to
the budget committee about the average
credit hour load a full-time student carries.
(A 12-credit-hour load is the minimum
requirement for full-time status. For the
same tuition, students may take up to 20
credit hours.) A random sample of 40
students yielded the following information
(in credit hours):
17 12 14 17 13 16 18 20 13 12 12 17 16 15
14 12 12 13 17 14 15 12 15 16 12 18 20 19
2.- Barron’s Profiles of American Colleges,
19th Edition, lists average class size for
introductory lecture courses at each of the
profiled institutions. A sample of 20 colleges
and universities in California showed class
sizes for introductory lecture courses to be:
14 20 20 20 20 23 25 30 30 30 35 35 35 40
40 42 50 50 80 80
Find the mean and s.
3.- How old are professional football
players? The 11th Edition of The Pro
Football Encyclopedia gave the following
information. Random sample of pro football
player ages in years:
24 23 25 23 30 29 28 26 33 29 24 37 25 23
22 27 28 25 31 29 25 22 31 29 22 28 27 26
23 21 25 21 25 24 22 26 25 32 26 29
(a) Compute the mean, median, and mode
of the ages.
(b) Create an histogram.

Вам также может понравиться