Академический Документы
Профессиональный Документы
Культура Документы
A variable is any characteristic of an individual. A variable can take different values for different individuals.
A database of university students, for example, contains data on each of the students
enrolled. Students are the individuals described by the data set. For each individual,
the data contain values of variables such as date of birth, sex (male or female),
career choice or grades. In practice, any data set is accompanied by general
information that helps to understand them. When planning a statistical study or when
you are faced with a new data set, consider the following questions:
1. Who? Which individuals describe the data?
How many individuals appear in the data?
2. What? How many variables do the data contain?
What are the exact definitions of these variables?
In which units has each variable been registered?
The weight, for example, can be expressed in kilograms,
in quintales or in tons.
3. Why? What purpose is pursued with this data?
Do we want to answer a specific question?
Do we want to draw conclusions about some individuals
that we do not really have data on?
Some variables, such as sex or profession, simply classify subjects into categories. Others, however,
such as height or annual income, take numerical values with which we can make arithmetic
calculations.
EXAMPLE
The data on a medical study contain values of many variables for each of the subjects of the study. Of the following variables, which are categorical
and which are numerical?
Statistical tools and ideas help us examine data to describe its main
characteristics. This test is called exploratory data analysis. Like an explorer
crossing unknown lands, the first thing we will do is simply describe what we
see.
Categorical variables: bar
Pedagogy 60.8%
Engineering 11.1%
(a) Present this data in the form of a bar chart.
(b) Would it also be correct to use a pie chart to show this data? Biology 40.7%
Justify your answer.
Psicol 62.2%
ogy
Physics
21.7%
Exercise
According to data from the National Institute of Statistics (INE) the most
significant causes of death in Spanish hospitals in 1996 were:
disorders of 133499
the
circulatory
system
(a) Find the percentage of each of the causes of death and express it with integer values. What percentage of deaths
was due to tumors? Tumors 89204
(b) Draw a bar diagram of the distribution of causes of death in Spanish hospitals. Identify each bar well.
(c) Would it also be correct to use a pie chart to represent the data? Justify your answer.
Disorders of 34718
the
respiratory
system
Immune 5504
system
disorders
External
causes of
trauma and
16324
poisoning
Description of distributions with numbers
Ladislao Kubala is possibly the best player the F.C. has ever had. Here is the number of goals per season that this player
scored while in the F.C. Barcelona:
AVERAGE.
Find the median number of goals for Paulino Alcántara while he was a
F.C. Barcelona. First, sort the data in increasing order:
15 5 42 0 39 8 47 15 21 6 25 33 34 19 6 42
EXERCISE.
27 50 33 25 86 25 85 31 37 44 20 36 59 34 28
(a) Find the mean and median number of caesarean sections.
(b) Find the mean and median number of cesarean sections without the
two atypical observations. The results in (a) and in (b), illustrate the
robustness of the median and the lack of robustness of the mean?
EXERCISE.
The metabolic level of a person is the rate at which the body consumes
energy.
This level is important in dietetics studies. Here are the metabolic levels
of 7 men who took part in a diet study (units are calories every 24 hours,
calories are also used to describe the energy content of food).
1.792 1.666 1.362 1.614 1.460 1.867 1.439
FIND S^2 ANS S.
2,9 7,0 7,5 3,3 5,6 5,2 4,6 4,9 5,7 6,4 4,1 5,9 6,8 8,2 3,9 6,5
EXERCISE.
The great football player Ferenc Puskas, popularly known as Cañoncito Pum,
played from the 1948/49 season to 1956/57 at the Kispest in Budapest. In 1956
he fled Hungary when the Hungarian Revolution broke out and was two
seasons without playing. In season 1958/59 he signed for Real Madrid and was
active in this team as a player of the Spanish league until the 1965/66 season.
Here is the number of goals he scored per season:
EXERCISE.
(a) Find the average x̄ and the standard deviation s of the number of goals in
the league since the 1948/49 season to the 1965/66 season.
(b) Find x̄ and s once the seasons are eliminated 1956/57 and 1957/58. How
does the elimination of these two seasons affect the values of x̄ and s?
1.- Last year, a small consulting company paid
€ 22,000 to each of its five managers and 50,000
to the two university graduates. Finally, the owne
of the company charged € 270,000.
What is the average salary paid in this company?
How many employees earn less than the averag
What is the median salary?
2.- In 1798 the English scientist Henry Cavendish determined the density of the
Earth very accurately. When complicated measurements are made, it is
advisable to repeat the operation several times and work with the average of all
of them.Cavendish repeated his measurement 29 times.Here the results
obtained:
5,50 5,61 4,88 5,07 5,26 5,55 5,36 5,29 5,58 5,65 5,57 5,53 5,62 5,29 5,44 5,34
5,79 5,10 5,27 5,39 5,42 5,47 5,63 5,34 5,46 5,30 5,75 5,68 5,85
3.- x and s are not enough. The mean x and the
standard deviation s as measures of center and
dispersion are not a complete description of a
distribution. Data sets of different shapes can have
the same mean and standard deviation. Find s and x
of the following data sets:
Dataset A 9,14 8,14 8,74 8,77 9,26 8,10
6,13 3,10 9,13 7,26 4,74
Dataset B 6,58 5,76 7,71 8,84 8,47 7,04
5,25 5,56 7,91 6,89 12,50
4.- The players of the Baltimore Orioles baseball team
in the US were the highest paid during the 1998 US
league. Here are their salaries in thousands of dollars.
(For example, 6,495 means $ 6,495,000.)
6.495 6.486 6.300 6.269 5.442 5.391 3.600 3.600 3.583
3.089 2.850 2.500 1.950 1.663 1.367 1.333 1.150 900 856
800 800 665 650 450 450 170 170
Find the mean, ,median, s2 and s.
Statistical fashion
𝑓𝑖 − 𝑓𝑖−1
𝑀0 = 𝐿𝑖 + ∗ 𝑎𝑖
𝑓𝑖 − 𝑓𝑖−1 + 𝑓𝑖 − 𝑓𝑖+1
Li it is the lower limit of the modal class.
fi is the absolute frequency of the modal class.
fi-1 is the absolute frequency immediately below the modal
class.
fi+1 is the absolute frequency immediately following the
modal class.
ai it is the amplitude of the class.
Example
Calculate and graph the fashion of a
statistical distribution that is given by the
following table:
fi
(60, 63) 5
(63, 66) 18
(66, 69) 42
(69, 72) 27
(72, 75) 8
100
Exercise
fi fi fi
(20, 27) 7 (2, 12) 10 (40, 46) 15
(27, 34) 18 (12, 22) 13 (46, 52) 28
(34, 41) 20 (22, 32) 27 (52, 58) 42
(41, 48) 33 (32, 42) 32 (58, 64) 27
(48, 55) 46 (42, 52) 42 (64, 70) 8
(55, 62) 12 (52, 62) 20 (70, 76) 45
(62, 69) 50 (62, 72) 11 (76, 82) 26
(69, 76) 2 (72, 82) 9 (82, 88) 23
100
Quantitative variables: histograms
10 12 5 8 13 10 12 8 7 9
11 10 9 9 11 15 12 17 14 10
9 8 15 16 10 14 7 16 9 1
4 11 12 7 9 10 3 11 14 8
12 5 10 9 7 11 14 10 15 9
For both: