Statistics: A. History On Statistics and Probability

Part 1
a. History on Statistics and Probability
STATISTICS
Statistics is concerned with exploring, summarizing, and making inferences about
the complex systems. It has indeed become ingrained in our intellectual heritage.
Simple forms of statistics have been used since the beginning of civilization, when
pictorial representations or other symbols were used to record numbers of people,
animals, and inanimate objects on skins, slabs, or sticks of wood and the walls of
caves.
Before 3000BC, the Babylonians used small clay tablets to record tabulations or
agricultural yields and commodities bartered or sold. The Egyptians analyzed the
population and material wealth of their country before beginning to build the pyramids
in the 31st century BC. The biblical books are primary statistical works in the form of
numbers and chronicles. In the books, the numbers containing two separate censuses
of the Israelite and the chronicles describing material wealth of various Jewish tribes.
Similar numerical records is also existed in China before 2000BC. The ancient Greeks
held censuses to be used as bases for taxation as early as 594BC. For example, the
Roman Empire is the first government to gather extensive data about the population,
area, and wealth of the territories that it controlled. In addition, a few comprehensive
censuses were also made in Europe which is the Middle Ages or Medieval Period that
is lasted from the 5th to the 15th century. The Carolingian kings ordered surveys of the
ecclesiastical holdings:
Pepin the Short (r. 758)

Charlemagne (r.762)
Next, following the Norman Conquest of England in 1066, William I, king of England,
ordered a census to be taken; the information gathered in the census, conducted in
1086, was recorded in the Domesday Book. In the early 16th century, registration of
deaths and births was begun in England. The first noteworthy statistical study of
population, observations on the London, Bills of Mortality was written. A similar
study of mortality made in Breslau, Germany, in 1691 was used by the English
3
astronomer, Edmond Halley as a basis for the earliest mortality table. A systematic
collection of data on the population and the economy was begun in the Ilatian city
states of the Venice and Florence during the Renaissance.
The term of Statistics, derived from the word state, was used to refer to a collection
of facts of interest to the state. The idea of collection data spread from Italy to the
other countries of Western Europe. By the first half of the 16th century, it was common
for European governments to require parishes to register births, marriages, and deaths.
On account of poor public health conditions, this last statistics was of particular
interest. On the other hand, high mortality rate in Europe before the 19th century was
due mainly to epidemic diseases, wars, and famines. Among epidemics, the worst were
the plagues. Starting with the Black Plague in 1348, plagues recurred frequently for
nearly 40 years. The epidemic disease is a disease that spreads rapidly through a
population, killing a great many people, or an outbreak of such a disease. In 1562, as a
way to alert the Kings court to consider moving to the countryside, the City of
London began to publish weekly bills of mortality. Initially these mortality bills listed
the places of death and whether a death had resulted from plague. Beginning in 1625
the bills were expanded to include all causes of death. In 1662, the English tradesman
John Graunt published a book entitled Natural and Political Observations made upon
the Bills of Mortality. It notes the total number of deaths in England and the number
due to the plague for five different plagues, is taken from this book. Graunt used
London bills of mortality to estimate the citys population and project a figure for all
England. In his book he noted that these figures would be of interest to the rulers of the
country, as indicators of both the number of men who could be drafted into an army
and the number who could be taxed. Graunt also the London bills of mortalityand
some intelligent guesswork as to what diseases killed whom and at what ageto
infer ages at death. Graunt then used this information to compute tables giving the
proportion of the population that dies at various ages. It states, for instance, that of 100
births, 36 people will die before reaching age of 6, 24 will die between the age of 6
and 15, and so on. His work on mortality tables inspired further work by Edmund
Halley in 1693. Halley, the discoverer of the comet bearing his nameand also the
man who was most responsible, by both his encouragement and his financial support,
for the publication of Isaac Newtons famous Principle Mathematicsused tables of
mortality to compute the odds that a person of any age would live to any other
particular age. Halley was influential in convincing the insurers of the time that an
annual life insurance premium should depend on the age of the person being insured.
Following Graunt and Halley, the collection of data steadily increased throughout the
remainder of the 17th and on into the 18th century.
4
The term statistics, which was used until the 18th century as a shorthand for the
descriptive science of states. In 19th century, statistics increasingly identified with
numbers. By the 1830s, the term was almost universally regarded in Britain and
France as being synonymous with the numerical science of society. This change in
meaning was caused by the large availability of census records and other tabulations
that began to be systematically collected and published by the governments of Western
Europe and the United States beginning around 1800.Until the late 1800s, statistics
became concerned with inferring conclusions and numerical data. The movement
began with Francis Galtons work on analyzing hereditary genius through the uses of
what we would call now regression and correlation analysis and obtained much of its
impetus form the work of Karl Pearson. He is the one who developed the chi-square
goodness of fit tests, was the first director of the Galton Laboratory, endowed by
Francis Galton in 1904. Pearson originated a research program aimed at developing
new methods of using statistics in inference. In the early 20th century, two of the most
important areas of applied statistics are population biology and agriculture. This was
due to the interest of Pearson and others at his laboratory as well as to the remarkable
accomplishments of the English scientist Ronald A. Fisher. The theory of inference
developed by there pioneers, including among others:
Karl Pearsons son, Egon

Polish born mathematical statistician, Jerzy Neyman
After the early years of the 20th century, a rapidly increasing number of people in
science, business, and government began to regard statistics as a tool that was able to
provide quantitative solutions to scientific and practical problems.
For the time being, the ideas of statistics are everywhere. Descriptive statistics are
featured in every newspaper and magazine as well as any electronics book.
Statistics inference has become indispensable to public health and medical research,
engineering and scientific studies, marketing and quality control, education,
accounting, economics, meteorological forecasting, polling and surveys, and other
researches that make any claim to being scientific.
PROBABILITY
5
Probability is distinguished from statistics. While statistics deals with data and
inferences from it, probability deals with the stochastic processes which lie behind
data or outcomes. Probability is a branch of mathematics that deals with calculating
the likelihood of a given event's occurrence, which is expressed as a number between 1
i. An event with a probability of 1 can be considered a certainly.
ii. An event with a probability of 0.5 can be considered to have equal odds of occurring or
not occurring.
iii. An event with a probability of 0 can be considered an impossibility
and 0.
For instance,
i. the probability of a coin toss resulting in either "heads" or "tails" is 1, because

there are no other options, assuming the coin lands flat.
ii. the probability of a coin toss resulting in "heads" is .5, because the toss is
equally as likely to result in "tails."
iii. the probability that the coin will land (flat) without either side facing up is 0,
because either "heads" or "tails" must be facing up.
A little paradoxical, probability theory applies precise calculations to quantify

uncertain measures of random events.
Calculating probabilities in a situation like a coin toss is straightforward,

because the outcomes are mutually exclusive: either one event or the other must
occur. Each coin toss is an independent event; the outcome of one trial has no
effect on subsequent ones. No matter how many consecutive times one side lands
facing up, the probability that it will do so at the next toss is always 1/2 (0.5). The
mistaken idea that a number of consecutive results (six "heads" for example)
makes it more likely that the next toss will result in a "tails" is known as the
gambler's fallacy , one that has led to the downfall of many a bettor.
Probability theory had its start in the 17th century, when two French
mathematicians, Blaise Pascal and Pierre de Fermat carried on a correspondence
6
discussing mathematical problems dealing with games of chance. Contemporary
applications of probability theory run the gamut of human inquiry, and include
aspects of computer programming, astrophysics, music, weather prediction, and
medicine.
Since participation in games and gambling is as old as mankind, it seems as if

the idea of probability should be almost as old. But, the realization that one could
predict an outcome to a certain degree of accuracy was inconceivable until the
sixteenth and seventeenth centuries. This idea that one could determine the
chance of a future event was prompted from the necessity to achieve a predictable
balance between risks taken and the potential for gain. In order to make a profit,
underwriters were in need of dependable guidelines by which a profit could be
expected, while the gambler was interested in predicting the possibility of gain.
The first recorded evidence of probability theory can be found as early as 1550 in
the work of Cardan. In 1550 Cardan wrote a manuscript in which he addressed
the probability of certain outcomes in rolls of dice, the problem of points, and
presented a crude definition of probability. Had this manuscript not been lost,
Cardan would have certainly been accredited with the onset of probability theory.
However, the manuscript was not discovered until 1576 and printed in 1663,
leaving the door open for independent discovery.
The onset of probability as a useful science is primarily attributed to Blaise

Pascal (1623-1662) and Pierre de Fermat (1601-1665). While contemplating a
gambling problem posed by Chevalier de Mere in 1654, Blaise Pascal and Pierre
de Fermat laid the fundamental groundwork of probability theory, and are thereby
accredited the fathers of probability. The question posed was pertaining to the
number of turns required to ensure obtaining a six in the roll of two dice. The
correspondence between Pascal and Fermat concerning this and the problem of
points led to the beginning of the new concepts of probability and expectation.
In the seventeenth century, a shopkeeper, John Graunt (1620-1674), set out to

predict mortality rates by categorizing births and deaths. In the London Life
Table, Graunt made a noteworthy attempt to predict the number of survivors out
of one hundred through increments of ten years. This work along with his earlier
paper Natural and Political Observations Made upon the Bills of Mortality, the
first known paper to use data in order to draw statistical inferences, gained him
access into the Royal Society of London.
7
Graunts observations and predictions elicited interest in probability from
others, such as two brothers, Ludwig and Christiaan Huygens. Beginning with the
interest initially sparked by Graunts work and later by the work of Pascal and
Fermat, Christiaan Huygens, a Dutch physicist, became the first to publish a text
on probability theory entitled De Ratiociniis in Ludo Aleae (On Reasoning in
Games and chance), in 1657. In this text, Huygens presented the idea of
mathematical expectation. This text was unrivaled until James Bernoulli
(1654-1705) wrote Ars Conjectandi, which was published eight years after his
death.
In Ars Conjectandi, Bernoulli expounded on and provided alternative proofs

to Huygens De Ratiociniis in Ludo Aleae, presented combinations and
permutations which encompass most of the results still used today, included a
series of problems on games of chance with explanations, and finally, and most
importantly, he revealed the famous Bernoulli theorem, later called the law of
large numbers.
Probability theory continued to grow with Abraham DeMoivres Doctrine of

Chances: or, a Method of Calculating the Probability of Events in Play,
published in 1718, and Pierre Simon Laplaces (1749-1827) and Theorie
Analytique des Probabilites, published in 1812. The Theorie Analytique des
Probabilites outlined the evolution of probability theory, providing extensive
explanations of the results obtained. In this book Laplace presented the definition
of probability, which we still use today, and the fundamental theorems of addition
and multiplication of probabilities along with several problems applying the
Bernoulli process.
The first major accomplishment in the development of probability theory was the
realization that one could actually predict to a certain degree of accuracy events which
were yet to come. The second accomplishment, which was primarily addressed in the
1800's, was the idea that probability and statistics could converge to form a well
defined, firmly grounded science, which seemingly has limitless applications and
possibilities. It was the initial work of Pascal, Fermat, Graunt, Bernoulli, DeMoivre,
and Laplace that set probability theory, and then statistics, on its way to becoming the
valuable inferential science that it is today.
In short, probability and statistics are related areas of mathematics which concern
themselves with analyzing the relative frequency of events.Both subjects are important,
8
relevant, and useful. But they are different, and understanding the distinction is crucial
in properly interpreting the relevance of mathematical evidence. Many a gambler has
gone to a cold and lonely grave for failing to make the proper distinction between
probability and statistics.
In summary, probability theory enables us to find the consequences of a given

ideal world, while statistical theory enables us to to measure the extent to which our
world is ideal.
b. Measures of Statistical Dispersion
Measures of dispersion are descriptive statistics that describe how similar a set of
scores are to each other. The more similar the scores are to each other, the lower the
measure of dispersion will be The less similar the scores are to each other, the higher
the measure of dispersion will be. In general, the more spread out a distribution is,
the larger the measure of dispersion will be.
Common examples of measures of statistical dispersion are the variance,

standard deviation and range.
VARIANCE
The variance is a numerical value used to indicate how widely individuals in a
group vary. If individual observations vary greatly from the group mean, the variance
is big; and vice versa.In other words, variance is the mean of the squares of the
deviations from the arithmetic mean of a data set.
EXAMPLES :
The data set 12, 12, 12, 12, 12 has a variance of zero (the numbers are identical).
The data set 12, 12, 12, 12, 13 has a variance of 0.167; a small change in the
numbers equals a very small var.
9
The data set 12, 12, 12, 12, 13,013 has a variance of 28171000; a large change in
the numbers equals a very large number.
It is important to distinguish between the variance of a population and the variance of

a sample. They have different notation, and they are computed differently. The
variance of a population is denoted by 2; and the variance of a sample, by s2.
The variance of a population can be calculated by using the formula:
Xi X 2
is the variance;
2 2
N
X is the population mean;
Xi is the ith element from the population;
N is the number of elements in the population.
When we are dealing with a sample (that is, a subset of the complete population), we
cannot of course compute the mean and variance exactly, but rather estimate them.
The variance of a sample is defined by slightly different formula:

10
Xi X 2
s2
2
s is the sample variance;
N 1
X is the sample mean;
Xi is the ith element from the sample;
Using this formula, the variance of the sample is an unbiased estimate of the variance
of the population.
And finally, the variance is equal to the square of the standard deviation.
STANDARD DEVIATION
Standard deviation is a measure of dispersement in statistics. Dispersement just
means how much your data is spread out. Specifically, it shows you how much your
data is spread out around the mean or average.
A normal distribution graph can represent hundred of situations in real life. That can
be modeled with a bell curve. For instance, peoples weights, heights, nutrition habits
and exercise regimens can be modeled with a normal distribution graph.
It tells you how tightly your data is clustered around the mean. When the bell curve is
flattened (your data is spread out), you have a large standard deviationyour data is
further away from the mean. When the bell curve is very steep, your data has a small
standard deviationyour data is tightly clustered around the mean.
EXAMPLE :
11
It is important to distinguish between the standard deviation of a population and the
standard deviation of a sample. They have different notation, and they are computed
differently. The standard deviation of a population is denoted by and the standard
deviation of a sample, by s.
The standard deviation of a population can be calculated by using the formula:
Xi X 2
is the variance;
X is the population mean;
Xi is the ith element from the population;
12
Xi X
The standard deviation of 2 a sample is defined by
slightly different formula:
s
N 1
s is the sample variance;
X is the sample mean;
Xi is the ith element from the sample;
And finally, the standard deviation is equal to the square root of the variance.
RANGE
In statistics, the range of a set of
data is the Range = Maximum value - Minimum value difference
between the largest and
smallest values.
EXAMPLE :
13
In {4, 6, 9, 3, 7} the lowest value is 3,
and the highest is 9,
so the range is 9 - 3 = 6.
PART 2
a. Tabulating data
Lower Secondary Student Height ( cm ) Weight ( kg )
1 151 55
2 147 34
3 154 38
4 160 49
5 155 45
6 155 48
7 149 37
8 153 41
9 163 45
10 169 53
11 160 47
12 145 37
13 156 46
14 159 43
14
15 146 37
16 152 39
17 167 51
18 153 49
19 164 40
20 161 51
21 166 45
22 159 44
23 165 42
24 159 39
25 163 45
26 150 36
27 157 40
28 161 42
29 160 55
30 158 52
31 156 38
32 162 39
33 170 50
34 149 36
35 160 44
36 159 49
37 171 51
38 164 47
39 156 42
40 167 45
15
41 148 41
42 153 44
43 152 39
44 165 46
45 159 40
46 163 40
47 166 49
48 159 51
49 161 43
50 167 47
TABLE 1
Upper Secondary Student Height ( cm ) Weight ( kg )
1 165 65
2 145 37
3 158 51
4 156 50
5 174 60
6 168 50
7 162 45
8 177 56
9 168 68
10 172 68
11 168 67
12 150 44
13 148 42
14 167 42
16
15 157 50
16 157 40
17 153 52
18 178 59
19 165 46
20 175 55
21 170 52
22 166 66
23 151 40
24 144 39
25 161 63
26 154 48
27 163 60
28 159 50
29 152 46
30 172 52
31 155 45
32 164 58
33 151 53
34 155 54
35 151 58
36 160 54
37 161 46
38 175 82
39 150 53
40 169 76
17
41 165 52
42 153 45
43 155 51
44 155 37
45 156 47
46 171 61
47 178 52
48 158 53
49 163 55
50 165 73
TABLE 2
b. Constructing frequency distribution table
[ LOWER SECONDARY STUDENT ]
Weight ( kg ) Frequency
30-34 1
35-39 11
40-44 14
45-49 15
50-54 7
55-59 2
18
[ UPPER SECONDARY STUDENT ]
Weight ( kg ) Frequency
35-39 3
40-44 5
45-49 8
50-54 16
55-59 6
60-64 4
65-69 5
70-74 1
75-79 1
80-84 1
19
c. i. Constructing two different statistical graphs for both
categories
Weight ( kg ) Frequency Midpoint
25-29 0 27
30-34 1 32
35-39 11 37
40-44 14 42
45-49 15 47
50-54 7 52
55-59 2 57
60-64 0 62
20
Frequency Polygon + Histogram
16
14
12
10
Frequency
0
27 32 37 42 47 52 57 62
Midpoint
Weight ( kg ) Frequency Cumulative Frequency Upper Boundary
25-29 0 0 29.5
30-34 1 1 34.5
35-39 11 12 39.5
40-44 14 26 44.5
45-49 15 41 49.5
50-54 7 48 54.5
55-59 2 50 59.5
21
Ogive
60
50
40
Cumulative Frequency
30
20
10
0
24.5 29.5 34.5 39.5 44.5 49.5 54.5 59.5 64.5
-10
Upper Boundary
Weight ( kg ) Frequency Midpoint
35-39 3 37
40-44 5 42
45-49 8 47
50-54 16 52
55-59 6 57
22
60-64 4 62
65-69 5 67
70-74 1 72
75-79 1 77
80-84 1 82
23
Frequency Polygon + Histogram
18
16
14
12
Frequency
10
0
32 37 42 47 52 57 62 67 72 77 82 87
Midpoint
24
Weight ( kg ) Frequency Cumulative Frequency Upper Boundary
30-34 0 0 34.5
35-39 3 3 39.5
40-44 5 8 44.5
45-49 8 16 49.5
50-54 16 32 54.5
55-59 6 38 59.5
60-64 4 42 64.5
65-69 5 47 69.5
70-74 1 48 74.5
75-79 1 49 79.5
80-84 1 50 82.5
25
Ogive
60
50
Cumulative Frequency
40
30
20
10
0
29.5 39.5 49.5 59.5 69.5 79.5 89.5
Midpoint
ii. Calculating the mean, median and mode of the weight of the
students for both categories.
Mean, x = fx
f
(1 32) (11 37) (14 42) (15 47) (7 52) (2 57)
=
1 11 14 15 7 2
32 407 588 705 364 114
=
50
2210
=
50
= 44.2
26
N
F
Median, m = L 2 C
fm

50 26
= 44.5 2 5
15
= 44.167
Mode,
f f
= L c
m m 1
( f f )( f f )
m m 1 m m 1
15 14
= 44.5 5
(15 14) (15 7 )
= 45.125
Mean, x = fx
f
=
(3 37) (5 42) (8 47) (16 52) (6 57) (4 62) (5 67) (1 72) (1 77) (1 82)
3 5 8 16 6 4 5 1 1
111 210 376 832 342 248 335 72 77 82
=
50
2685
=
50
= 53.7
27
N
F
Median, m = L 2 C
fm

502 16
= 49.5 5
16
= 52.313
Mode,
f f
= L c
m m 1
( f f )( f f )
m m 1 m m 1
16 8
49.5 5
=
(16 8) (16 6)
= 51.72
Determining the most suitable measure central tendency
If it is at all possible, all three measures of central tendency are attempted to find. This
is to obtain as much information about the subjects you study. However, if this is not
possible, then there are situations in which the mean, median, and mode have their
specific "advantages".
Based of my answers above, the mean is ordinarily the preferred measure of central
tendency. The mean is the arithmetic average of a distribution that reflects the central
value around which the data seems to cluster. The mean presented along with the
variance and the standard deviation is the "best" measure of central tendency for
continuous data.
28
There are some situations in which the mean is not the "best" measure of central
tendency. In certain situations, the median is the preferred measure. These situations
are as follows:
when you know that a distribution is skewed

when you believe that a distribution might be skewed
when you have a small number of subjects
The purpose for reporting the median in these situations is to combat the effect
of outliers. Outliers affect the distribution because they are extreme scores. For
example, in a distribution of peoples income, a person who has an income of over a
million dollars would dramatically increase the mean income whereas in reality, most
of the people in the distribution do not make that kind of money. In this case, the
median is the preferred measure of central tendency.
However, on this particular situation, the median is not that suitable as it only shows
the middle value of the whole data.
The mode is rarely chosen as the preferred measure of central tendency. The mode is
not usually used because the largest frequency of scores might not be at the center. The
only situation in which the mode may be preferred over the other two measures of
central tendency is when describing discrete categorical data. The mode is preferred in
this situation because the greatest frequency of responses is important for describing
categorical data.
PART 3
a. Body Mass Index (BMI)
The body mass index (BMI) or Quetelet index is a value derived from the mass
(weight) and height of an individual. The BMI is defined as the body mass divided by
29
the square of the body height, and is universally expressed in units of kg/m2, resulting
from mass in kilograms and height in metres.
The BMI ranges are based on the relationship between body weight and disease and
death.Overweight and obese individuals are at an increased risk for the following
diseases:
Coronary artery disease Sleep apnea

Dyslipidemia Stroke
Type 2 diabetes At least 10 cancers, including
Gallbladder disease endometrial, breast, and colon cancer.
Hypertension Epidural lipomatosis
Osteoarthritis
Experts have expressed uncertainty about relying too heavily on BMI, stressing that it
is not an accurate measure of body fat or health.
30
BMI fails to take age and sex into account. Women naturally tend to have more body
fat than men of equal BMI, while older people tend to have more body fat than
younger people with the same BMI.
Furthermore, BMI measurements have no way of measuring where body fat is located
in the body. Studies have indicated that belly fat - the fat surrounding abdominal
organs - is more dangerous than peripheral fat beneath the skin in other body areas.
If you are obese according to BMI, you are most likely obese according to body fat
percentage also. When sampling from the general population, over 95% of men and
99% of women identified as obese by BMI were also obese via body fat levels.
Individuals who are very muscular such as bodybuilders or those that have very little
muscle definition may not receive an accurate BMI reading by using height and weight
measurements alone. Muscle weighs more than fat. Hence a muscular person may
appear to have a higher BMI and be perfectly healthy, or a frail, inactive person may
appear to have a lower BMI and in reality have more body fat than is healthy.
Those who have enough lean mass to be classified as obese by BMI but not by body
fat percentage, are far and few in society. These persons would normally be highly
active athletes, and it is unlikely sedentary persons or those with infrequent exercise
habits would fall into this category.10
If you are normal weight or overweight according to BMI (18.5-29.9) there is still a
chance you are actually obese, which is primarily due to low levels of lean mass
(muscle, water and glycogen).
BMI also does not account for lactating or pregnant women, children and teenagers
who have not reached physical maturity and are still growing, and a tendency for
natural differences in height and weight ratios between races.
31
b. Calculating Body Mass Index (BMI) for lower secondary student
and upper secondary student
Lower Secondary Student Height ( cm ) Weight ( kg ) BMI
1 151 55 24.12
2 147 34 15.73
3 154 38 16.02
4 160 49 19.14
5 155 45 18.73
6 155 48 19.98
7 149 37 16.67
8 153 41 17.51
9 163 45 16.94
10 169 53 18.56
11 160 47 18.36
12 145 37 17.60
13 156 46 18.90
14 159 43 17.01
15 146 37 17.36
16 152 39 16.88
17 167 51 18.29
18 153 49 20.93
19 164 40 14.87
20 161 51 19.68
21 166 45 16.33
32
22 159 44 17.40
23 165 42 15.43
24 159 39 15.43
25 163 45 16.94
26 150 36 16.00
27 157 40 16.23
28 161 42 16.20
29 160 55 21.48
30 158 52 20.83
31 156 38 15.61
32 162 39 14.86
33 170 50 17.30
34 149 36 16.22
35 160 44 17.19
36 159 49 19.38
37 171 51 17.44
38 164 47 17.47
39 156 42 17.26
40 167 45 16.14
41 148 41 18.72
42 153 44 18.80
43 152 39 16.88
44 165 46 16.90
45 159 40 15.82
46 163 40 15.06
33
47 166 49 17.78
48 159 51 20.17
49 161 43 16.59
50 167 47 16.85
TABLE 3
Upper Secondary Student Height ( cm ) Weight ( kg ) BMI
1 165 65 23.88
2 145 37 17.60
3 158 51 20.43
4 156 50 20.55
5 174 60 19.82
6 168 50 17.72
7 162 45 17.15
8 177 56 17.87
9 168 68 24.09
10 172 68 22.99
11 168 67 23.74
12 150 44 19.56
13 148 42 19.17
14 167 42 15.06
15 157 50 20.28
16 157 40 16.23
17 153 52 22.21
34
18 178 59 18.62
19 165 46 16.90
20 175 55 17.96
21 170 52 17.99
22 166 66 23.95
23 151 40 17.54
24 144 39 18.81
25 161 63 24.30
26 154 48 20.24
27 163 60 22.58
28 159 50 19.78
29 152 46 19.91
30 172 52 17.58
31 155 45 18.73
32 164 58 21.56
33 151 53 23.24
34 155 54 22.48
35 151 58 25.44
36 160 54 21.09
37 161 46 17.75
38 175 82 26.78
39 150 53 23.56
40 169 76 26.61
41 165 52 19.10
42 153 45 19.22
35
43 155 51 21.23
44 155 37 15.40
45 156 47 19.31
46 171 61 20.86
47 178 52 16.41
48 158 53 21.23
49 163 55 20.70
50 165 73 26.81
TABLE 4
BMI (18.5-24.5) Weight (kg)
18.56 51
18.72 47
18.73 42
18.8 45
18.9 41
19.14 44
19.38 39
19.68 46
19.98 40
20.17 40
20.83 49
20.93 51
36
21.48 43
24.12 47
18.62 50
18.73 40
18.81 52
19.1 59
19.17 46
19.22 55
19.31 52
19.56 66
19.78 40
19.82 39
19.91 63
20.24 48
20.28 60
20.43 50
20.55 46
20.7 52
20.86 45
21.09 58
21.23 53
21.23 54
21.56 58
37
22.21 54
22.48 46
22.58 82
22.99 53
23.24 76
23.56 52
23.74 45
23.88 51
23.95 37
24.09 47
24.3 61
TABLE: Students with ideal weight
Range 82kg 32kg 24.5 18.5
50kg OR 6 (bmi)
38

x
x 2
mean 2
Variance,
2
Mean N N
2315 120361
(50.326) 2
46 46
50.326kg 83.828
Standard Deviation, 83.828
9.156
Based on the findings, the mean weight of ideal choice for both category
of students is 50.326kg. The standard deviation of the data is seems to be
smaller with value of 9.156. This shows that the set of the data is
clustered about the mean and indicates that most of the students in both
category fall between the weights of 41 and 59.
c. BMI Classification
39
Number of Lower Number of Upper
Classification
Secondary BMI Secondary BMI
Underweight (<18.5) 36 14
Normal (18.5-24.9) 14 32
Overweight (25-29.9) 0 4
Obesity (>30) 0 0
TABLE 5
Underweight (36)
LOWER
Normal (14)
SECONDARY
Overweight (0)
STUDENTS (50)
Obesity (0)
STUDENTS
Underweight (14)
UPPER
Normal (32)
SECONDARY
Overweight (4)
STUDENTS (50)
Obesity (0)
40
Part 4
a. Probability when a student
i. has ideal weight ii. does not have ideal weight
14 32 46 36 14 4 54

100 100 100 100
0.46 0.54
b. 3 students are chosen at random, probability when
i. none of students has ideal weight
P( X 0) 3C 0 (0.46) 0 (0.54) 3
0.157464
ii. only 1 student has ideal weight
P( X 1) 3C1 (0.46)1 (0.54) 2

0.402408
41
iii. only 2 students have ideal weight
P( X 2) 3C2 (0.46) 2 (0.54)1

0.342792
iv. all 3 students have ideal weight
P( X 3) 3C3 (0.46) 3 (0.54) 0

0.097336
1.2
0.8
probability
0.6
0.4
0.2
0
0 1 2 3
Number of students
c. Mean, Variance, Standard Deviation of the binomial distribution

(IN MY SCHOOL)
42
Mean, np
(2727)(0.46)
1254.42
Variance, 2 npq
(2727)(0.46)(0.54)
677.3868
Standard Deviation, npq
(2727)(0.46)(0.54)
26.0267
Conclusion
After completing the project, I have realized that the concept of both statistics and
probability are of great importance in everyday life. Probability and statistics are
two closely related mathematical subjects. Both use much of the same terminology and
there are many points of contact between the two. It is very common to see no
distinction between probability concepts and statistical concepts.
Both statistical theory and probability theory are significant for us to record past
events and predict for future events. In this project, statistic theory enables us to
measure and record the BMI of the students. It also allows us to make comparison
after recording all the data. On the other hand, probability theory enables us to predict
and make assumptions on the number of students that have ideal weight. Those data
are essential yet important in order to figure out the proportion of students that are
43
obese in school, or in any region. As a result, it is also consequential in order to fight
against obesity among students.
I have improved my analytical skills after accomplishing this project. By collecting,

tabulating, classifying and interpreting the information, I have definitely gained a lot
on the concept of statistics and probability.
Reflection
i. Use of statistics and probability in daily life
1. Planning Around the Weather

Nearly every day you use probability to plan around the weather. Meteorologists can't
predict exactly what the weather will be, so they use tools and instruments to
determine the likelihood that it will rain, snow or hail. For example, if there's a
60-percent chance of rain, then the weather conditions are such that 60 out of 100 days
with similar conditions, it has rained. You may decide to wear closed-toed shoes rather
than sandals or take an umbrella to work. Meteorologists also examine historical data
bases to guesstimate high and low temperatures and probable weather patterns for that
day or week. Furthermore, the computer models are built using statistics that compare
prior weather conditions with current weather to predict future weather.
2. Sports Strategies
Athletes and coaches use probability to determine the best sport strategies for games
and competitions. A baseball coach evaluates a players batting average when placing
him in the line up. For example, a player with a 200 batting average means hes gotten
a base hit two out of every 10 at bats. A player with a 400 batting average is even more
likely to get a hit -- four base hits out of every 10 at bats. Besides, if a high-school
football kicker makes nine out of 15 field goal attempt from over 40 yards during the
season, he has a 60 percent change of scoring on his next field goal attempt from that
distance. The equation is: 915=0.60 or 60%
3. Insurance Options
Probability plays an important role in analysing insurance policies to determine which
plans are best for you or your family and what deductible amounts you need. For
example, when choosing a car insurance policy, you use probability to determine how
likely it is that you'll need to file a claim. For example, if 12 out of every 100 drivers --
or 12 percent of drivers -- in your community have hit a deer over the past year, you'll
44
likely want to consider comprehensive -- not just liability -- insurance on your car.
You might also consider a lower deductible if average car repairs after a deer-related
incident run $2,800 and you don't have out-of-pocket funds to cover those expenses.
4. Games and Recreational Activities

You use probability when you play board, card or video games that involve luck or
chance. You must weigh the odds of getting the cards you need in poker or the secret
weapons you need in a video game. The likelihood of getting those cards or tokens
will determine how much risk you're willing to take. For example, the odds are
46.3-to-1 that you'll get three of a kind in your poker hand -- approximately a
2-percent chance -- according to Wolfram Math World. But, the odds are
approximately 1.4-to-1 or about 42 percent that you'll get one pair. Probability helps
you assess what's at stake and determine how you want to play the game
5. Predicting Disease
Lots of times on the news reports, statistics about a disease are reported. If the reporter
simply reports the number of people who either have the disease or who have died
from it, it's an interesting fact but it might not mean much to your life. But when
statistics become involved, you have a better idea of how that disease may affect
you.For example, studies have shown that 85 to 95 percent of lung cancers are
smoking related. The statistic should tell you that almost all lung cancers are related to
smoking and that if you want to have a good chance of avoiding lung cancer, you
shouldn't smoke.
6. Medical Studies
Scientists must show a statistically valid rate of effectiveness before any drug can be
prescribed. Statistics are behind every medical study you hear about.
ii. Brochure
45

Statistics: A. History On Statistics and Probability

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statistics: A. History On Statistics and Probability

Загружено:

Авторское право:

Доступные форматы

Part 1

a. History on Statistics and Probability

Pepin the Short (r. 758)

Karl Pearsons son, Egon

i. An event with a probability of 1 can be considered a certainly.

iii. An event with a probability of 0 can be considered an impossibility

i. the probability of a coin toss resulting in either "heads" or "tails" is 1, because

A little paradoxical, probability theory applies precise calculations to quantify

Calculating probabilities in a situation like a coin toss is straightforward,

Since participation in games and gambling is as old as mankind, it seems as if

The onset of probability as a useful science is primarily attributed to Blaise

In the seventeenth century, a shopkeeper, John Graunt (1620-1674), set out to

In Ars Conjectandi, Bernoulli expounded on and provided alternative proofs

Probability theory continued to grow with Abraham DeMoivres Doctrine of

In summary, probability theory enables us to find the consequences of a given

b. Measures of Statistical Dispersion

Common examples of measures of statistical dispersion are the variance,

It is important to distinguish between the variance of a population and the variance of

The variance of a population can be calculated by using the formula:

Xi is the ith element from the population;

N is the number of elements in the population.

The variance of a sample is defined by slightly different formula:

Xi is the ith element from the sample;

N is the number of elements in the population.

The standard deviation of a population can be calculated by using the formula:

X is the population mean;

Xi is the ith element from the population;

N is the number of elements in the population.

s is the sample variance;

X is the sample mean;

Xi is the ith element from the sample;

N is the number of elements in the population.

Lower Secondary Student Height ( cm ) Weight ( kg )

Upper Secondary Student Height ( cm ) Weight ( kg )

b. Constructing frequency distribution table

[ LOWER SECONDARY STUDENT ]

[ LOWER SECONDARY STUDENT ]

Weight ( kg ) Frequency Midpoint

Weight ( kg ) Frequency Cumulative Frequency Upper Boundary

[ UPPER SECONDARY STUDENT ]

Weight ( kg ) Frequency Midpoint

[ LOWER SECONDARY STUDENT ]

[ UPPER SECONDARY STUDENT ]

Determining the most suitable measure central tendency

when you know that a distribution is skewed

a. Body Mass Index (BMI)

Coronary artery disease Sleep apnea

Lower Secondary Student Height ( cm ) Weight ( kg ) BMI

Upper Secondary Student Height ( cm ) Weight ( kg ) BMI

BMI (18.5-24.5) Weight (kg)

TABLE: Students with ideal weight

Range 82kg 32kg 24.5 18.5

Standard Deviation, 83.828

a. Probability when a student

i. has ideal weight ii. does not have ideal weight

b. 3 students are chosen at random, probability when

i. none of students has ideal weight

ii. only 1 student has ideal weight

P( X 1) 3C1 (0.46)1 (0.54) 2

P( X 2) 3C2 (0.46) 2 (0.54)1

iv. all 3 students have ideal weight

P( X 3) 3C3 (0.46) 3 (0.54) 0

c. Mean, Variance, Standard Deviation of the binomial distribution