Вы находитесь на странице: 1из 20

GEC 102: MATHEMATICS IN THE MODERN WORLD

STATISTICS
What is Statistics?
Statistics is used almost every day in our life. It is also very useful in the field of
research.
Statistics is a field of mathematics that deals with the Collection, Organization,
Analysis, and Interpretation of quantitative data.

Basic Concepts in Statistics


Population refers to a group of individual or entities that the researcher is interested
to gather information or draw conclusions from.
Sample is a part of population or is a sub-collection of elements drawn from a
population.
When we say Collection of data we mean the process of gathering relevant
information from the population. When we talk about Organization of data we refer to
the systematic arrangement of data into tables, graphs, or charts so that logical and
statistical conclusions can easily be derived from the collected information. Analysis of
data refers to the process of deducing relevant information from the given data so that
the numerical description can be formulated. Interpretation of data is all about deriving
conclusion from the data that have been analyzed. It also involves making predictions or
forecasts about large groups based on gathered data from small groups.

Survey, test, interview, observe, experiment, register,


COLLECT search

Table, graphs, texts


ORGANIZE

Numerical analysis (“most”, “how many percent”,


ANALYZE “least”)

Give the meaning or implication of the findings,


INTERPRET conclude
Two Fields of Statistics
Statistics may be subdivided into two fields: the Descriptive and the Inferential fields.
1. Descriptive Statistics consists of the collection, organization, summary and
presentation of data.
 Here, the statistician tries to describe a given situation.
2. Inferential Statistics is another area of Statistics concerned with drawing conclusions
about large groups of data called the population based on selected elements of that
population, known as sample.
 Here, the statistician tries to make inferences from the samples to population.

MEASURES OF CENTRAL TENDENCY


What is a measure of central tendency?
A measure of central tendency or measure of central location is summary measure
that describes a whole set of data with a single quantity that represents the middle or center
of its distribution the way in which a group of data that cluster around a central value. In
short, this is a measure that tells where the center of a data set is located.
The most commonly used measures of central tendency are the mean, median, and
mode.

MEAN
The mean ( x) , also called as the “average” or “arithmetic average” or “arithmetic
mean”, is the most commonly used measure of central tendency. It is said to be the most
reliable measure of central tendency and has the least probable error but does not supply
information about the homogeneity of the distribution.
UNGROUPED DATA
A. Simple Mean
Getting the simple mean means that we are giving equal weight to each value in the data set.
To compute the mean of ungrouped data, we use the formula:

x
x or x
x1  x2  x3  ...  xn
n n
Where x represents each value/observation, and n represents the total number of observation.
Examples:
1. The ages of five contestants in a Statistics Quiz Bee are the following: 18, 17, 18, 19, and
18. Find their average age.
Solution:
x1  x2  x3  ...  xn
x
n
18  17  18  19  18
x . Add all the values (ages)
5
90
x Then divide the sum by 5
5
x  18 Therefore, the mean age of the contestant is 18.
2. Six employees are working as call center agents. Their salaries are as follows: Php 23500,
Php24300, Php 25800, Php 23900, Php 24100, and Php 24950. What is the average salary of
the employees?
Solution:
x1  x2  x3  ...  xn
x
n
Php23500  Php24300  Php25800  Php23900  Php24100  Php24950
x
6
x  Php 24425

Thus, the mean salary of the employees is Php 24 425.

B. Weighted Mean
Weighted mean is mean calculated by giving values in a data set more influence
according to some attribute of the data. It is an average in which each quantity to be averaged
has assigned weight, and these weightings determine the relative importance of each quantity
on the average.
The formula for weighted mean is:

WM 
 wx
w
Where w is the weight of each value and x is the matching value.
Examples:
1. You take three 100-point exams in your Mathematics class and score 80, 80 and 95
respectively. The last exam is much easier than the first two, so your professor has given
it less weight. The weights for the three exams are:
 Exam 1: 40% of your grade
 Exam 2: 40 % of your grade
 Exam 3: 20% of your grade
What is your final weighted average for the exam?
Solution:

WM 
 wx
w
[(0.4)(80)  (0.4)(80)  (0.2)(95)]
WM  Multiply the numbers in your data set by
0.4  0.4  0.2
the weights
32  32  19
WM  Add the numbers
0.4  0.4  0.2
83
WM  Divide
1
WM  83
Therefore, your final weighted average for the exam is 83.

2. At the end of the first semester, Mary received her grades as shown on the table below
with their respective weight or units. What is her general weighted average for the first
semester?
SUBJECTS Grades Units
Mathematics 2.0 3.0
English 1.75 3.0
Filipino 2 3.0
Science 2.5 3.0
Religion 1.25 3.0

Solution:
[(2.0)(3)  (1.75)(3)  (2.0)(3)  (2.5)(3)  (1.25)(3)]
WM 
33333
6  5.25  6  7.5  3.75
WM 
33333
28.5
WM 
15
WM  1.90
Therefore, the GWA of Mary is 1.90

MEDIAN
A median (Md) is defined as the middle value or observation in an organized list of
numbers and falls in the middle-most position of the whole data.

UNGROUPED DATA
The median value in an ungrouped data is determined by first arranging the numbers
in value order from lowest to highest or vice versa. If there is an odd amount of
numbers, the median value is the middle most number, with the same amount of
numbers below and above. If there is an even amount of numbers in the list, the
middle pair must be determined, added together and divided by two to find the
median value. The median can be used to determine an approximate average.
Examples:
1. The number of books loaned from the library during each day of the week were 36, 31,
24, 45, 50. What is the median?
Solution:
a. First, arrange the data in ascending or descending order.
24, 31, 36, 45, 50
b. Next divide the data set into two equal parts.
24, 31, 36, 45, 50
 Since there is an odd number of values in the data, we take the middle most
number/value which is 36 as the median of the data set.
Therefore, the median of the given data is 36.
2. The speed of ten stenographer in typing per minute are as follows:
Stenographer 1 2 3 4 5 6 7 8 9 10
Speed 121 110 120 119 112 121 118 115 107 115
Determine the median of the speed of the stenographers.
Solution:
a. First, arrange the data in ascending or descending order.
107, 110, 112, 115, 115, 118, 119, 120, 121, 121
b. Next divide the data set into two equal parts.
107, 110, 112, 115, 115, 118, 119, 120, 121, 121
 Since there is an even number of values in the data, we take the two middle most
numbers/values which are 115 and 118.
c. Get the average of the two values.
115 + 118 = 233
233/2 = 116.5

Therefore, the median of the speed of the stenographers is 116.5 per minute.
This implies that the stenographers whose speed is above the median are fast while
those whose speed lie below the median are a little slower than the other.

MODE (Mo)
The number/value/observation in a data set which appears the most number of times.
If no number in the list is repeated, then there is no mode for the list. However, it is also
possible to have more than one mode for the same distribution of data, (bi-modal, tri-
modal, or multi-modal).

UGROUPED DATA
To find the mode of an ungrouped data, find the frequency of each
number/value/observation in the given data set. Then, choose the
number/value/observation having the highest frequency as the mode.
MODE=number/value/observation with the highest frequency
Examples:
1. Find the mode of the given data set: 15, 28, 25, 48, 22, 43, 39, 44, 43, 49, 34, 22, 33,
27, 25, 22, 30.
Solution:
First, arrange the data set in ascending or descending order.
15, 22, 22, 22, 25, 25, 27, 28, 30, 33, 34, 39, 43, 43, 44, 48, 49
Next, determine the number that appeared the most number of times.
 In the given data set, the number that appeared the most number of times is
22.
2. The speed of ten stenographer in typing per minute are as follows:
Stenographer 1 2 3 4 5 6 7 8 9 10
Speed 121 110 120 119 112 121 118 115 107 115

Find the mode of the data set.


Solution:
First, arrange the data in ascending or descending order.
107, 110, 112, 115, 115, 118, 119, 120, 121, 121
Next, determine the number that appeared the most number of times.
 In the given data, the numbers that appeared the most number of times are 115
and 121.

Thus, the data set has two modes : 115 and 121. The data set is said to be bi-modal.
GROUPED DATA
17 25 30 33 25 45 19 23
27 35 45 48 20 38 18 39
44 22 46 26 36 29 21 15
50 47 34 26 37 25 49 33
22 33 44 38 46 41 32 37
Compute the mean, median and mode.

MEAN

x
 fX
n
Where:
f = frequency
X = Class mark or midpoint
n = total number of observations
Frequency Distribution Table
a. Compute for the range. Range is the difference of the highest value and the lowest
value.
R=highest value - lowest value
R= 50-15
R=35
Range
b. Class Interval =
k
k=1 + [(3.3) log(n)]
k=1+[(3.3) log(40)]
k=1+[(3.3)(1.602059991)]
k=1+5.286797971
k=6.286797971
k≈6
35
ci 
6
ci  5.8333  6
FREQUENCY DISTRIBUTION TABLE
Class Interval Tally Frequency (f) Midpoint (X) fX
45-50 IIIII-III 8 47.5 380
39-44 IIII 4 41.5 166
33-38 IIIII-IIIII 10 35.5 355
27-32 IIII 4 29.5 118
21-26 IIIII-IIII 9 23.5 211.5
15-20 IIIII 5 17.5 87.5
TOTAL n=40  fX  1318
45  50
c. Midpoint=  47.5
2

x
 fX
n
1318
x
40
x  32.95
Thus, the mean of the grouped data is 32.95

MEDIAN

n 
 2   CFb 
Md  LBMC   ci
 fm 
 

Where:
n/2 = median class, n is the total number of observations
LBMC = Low boundary of median class
<CFb = cumulative frequency of the class immediately below the median class
Fm = frequency of median class
ci = class interval

Class Interval Frequency Cumulative frequency (<CF)


45-50 8 40
39-44 4 32
33-38 10 28
27-32 4 18
21-26 9 14
15-20 5 5
TOTAL n=40
Note:
n
Determine the median class by computing for the value of . Locate the computed
2

n
value for at the <CF column (must be within one of the <CF). The interval
2
corresponding to this <CF value is the median class.
n 40
  20
2 2
n 
 2   CFb 
Md  LBMC   ci
 fm 
 

LBMC = 33 - 0.-5 = 32.5


ci = 6
 40 
 2  18 
Md  32.5   6
 10 
 
 20  18 
Md  32.5   6
 10 
2
Md  32.5    6
10 
Md  32.5  (0.2)6
Md  32.5  1.2
Md  33.7

Therefore, the median is 33.7


MODE
 d1 
Mo  LBMC    ci
 d1  d 2 
Where:
LB = Lower boundary of modal class
d1 = difference between the frequency of modal class and the frequency below it
d2 = difference between the frequency of modal class and the frequency above it
ci = class interval
Class Interval Frequency Cumulative frequency (<CF)
45-50 8 40
39-44 4 32
33-38 10 28
27-32 4 18
21-26 9 14
15-20 5 5
TOTAL n=40
Note:
Identify the modal class by determining the interval with the highest frequency.
 d1 
Mo  LBMC    ci
 d1  d 2 
LBMC = 33 - 0.-5 = 32.5 ci = 6
 10  4 
Mo  32.5   6
 (10  4)  (10  4) 
 6 
Mo  32.5   6
 6  6 
6
Mo  32.5    6
12 
Mo  32.5  (0.5)6
Mo  32.5  3
Mo  35.5
Therefore, the median of the data is 35.5
Exercises:
A.) Find the mean, median, and mode of below data set:
157, 133, 232, 267, 289, 274, 321, 348, 188, 432

B.) Consider the frequency distribution below, determine the mean,median and mode of
the grouped data.
The heights of 40 grade 6 pupils in a certain grade school are presented in a
frequency distribution as shown below:
Height of a class of 40 students
Class Interval Frequency
78-82 2
73-77 6
68-72 6
63-67 8
58-62 7
53-57 7
48-52 4
n = 40
Measures of Variability
Variability refers to the spread of scores in a set of data.
The measures of variability describe numerically the extent to which individual
observations in a data set are scattered about the average.
There are four measures of variability: the range, the mean absolute deviation,
variance, and standard deviation.

FOUR MEASURES OF VARIABILITY


1. Range (R) - The range is the difference between the highest value and the lowest
value. It is the simplest measure of variability to calculate. This is often used when a
quick look at the spread of observations in a data set is needed based only on the extreme
values in the set. The formula is as follows:
R = highest value - lowest value
Examples:
a. Find the range of the following group of numbers: 10, 12, 5, 16, 7, 13, 4.
Solution:
 The highest number is 16, and the lowest number is 4, so 16-4=12.
Therefore, the range is 12.
b. A data set with 10 numbers: 199, 145, 123, 167, 145, 191, 182, 178, 162, 151. What is
the range?
Solution:
 The highest number is 199, and the lowest number is 123, so 199-123=76.
Therefore, the range is 76.
2. Mean Absolute Deviation (MAD) is the average distance between each observation
and the mean. It gives us an idea about the variability in a data set.
Steps in calculating the Mean Absolute Deviation (MAD)
Step 1: Calculate the mean.
Step 2: Calculate how far away each value/observation is from the mean using
positive distance. These are called absolute deviations.
Step 3: Add those deviations together.
Step 4: Divide the sum by the number of values/observation.
Ungrouped Data: MAD 
 xx
n
Where MAD = Mean absolute deviation
x = raw score
x = mean score
n = number of observations

Grouped Data: MAD 


f X x
n
Where MAD = Mean absolute deviation

X = Midpoint

f = frequency
x = mean score of grouped data
n = number of observations
3. Variance is the average of the squared deviation of the set of observations from the
mean. It measures how far a data set is spread out. A variance of zero indicates that all of
the data values are identical. A small variance indicates that the data points tend to be
very close to the mean, and to each other. A high variance indicates that the data points
are very spread out from the mean, and from one another.
Ungrouped Data
  x  x
2

Sample Variance : S 2
n 1 
n 1
2
Where S n 1 = sample variance
x = raw score
x = sample mean
n = number of observations
  x  2
Population Variance: N 
2

N
Where  = population variance
2
N

x = raw score
 = population mean
N = number of observations
Grouped Data
n fX 2  ( fX ) 2
Sample Variance : S 2
n 1 
n(n  1)

Where S n21 = sample variance


f = frequency
X = class mark or midpoint
n = number of observations

 f  X  
2

Population Variance:  2
N 
N

Where  N2 = sample variance


f = frequency
X = class mark or midpoint
 = sample mean
N = number of observations
**Population variance refers to the value of variance that is calculated from
population data, and sample variance is the variance calculated from sample data.

4. Standard Deviation is a measure of the dispersion of a set of data from its mean. It is
determined by calculating the positive root square root of variance. A large standard
deviation indicates that the data points are far from the mean (heterogeneous) and a
small standard deviation indicates that they are clustered closely around the mean
(homogeneous).
Ungrouped and Grouped Data
Sample Standard Deviation : S n 1  S n21

Population Standard Deviation:  N   N2


 Comparing the value of 2i (i=class interval) and the computed standard
deviation, we can already determine the homogeneity of the given data. If 2i > s, then the
data is homogeneous. If 2i < s, then the data is heterogeneous.
Example (Ungrouped Data)

A group of mountaineers went hiking to Mt. Pulag, Philippines to study the different
species of plants existing in that area. The ages of the mountaineers are 34, 35, 45, 46, 49,
32. Find the MAD, variance and standard deviation of their ages.

Solution:

A) MAD
34  35  45  46  49  32
Mean Age ( x )=  40.17
6

x xx x x

34 -6.17 6.17
35 -5.17 5.17
45 4.83 4.83
46 5.83 5.83
49 8.83 8.83
32 -8.17 8.17
 x  x  39

MAD 
 xx 
39
 6 .5  7
n 6

Therefore, the mean absolute deviation is 7 years old.


B. Sample Variance
( x  40.17 )

x xx  x  x 2

34 -6.17 38.0689
35 -5.17 26.7289
45 4.83 23.3289
46 5.83 33.9889
49 8.83 77.9689
32 -8.17 66.7489
n=6   x  x
2
 266.8334

  x  x
2
266.8334 266.8334
S 2
n 1     53.36668  53.37
n 1 6 1 5

Thus, the sample variance is 53.37.

C. Standard Deviation
S n21  53.37

S n 1  S n21  53.37  7.305477397  7.31

Therefore, the standard deviation is 7.31


Example (Grouped Data)
Given the frequency distribution table below, compute for the MAD, Sample
Variance, and Standard Deviation.
Scores of 40 Students in a 60-point Quiz
Class Interval Frequency
53-58 3
47-52 4
41-46 1
35-40 2
29-34 10
23-28 11
17-22 4
11-16 3
5-10 2
N = 40
A.) MAD
Class Interval Frequency (f) X fX X  x f X x

53-58 3 55.5 166.5 25.2 75.6


47-52 4 49.5 198 19.2 76.8
41-46 1 43.5 43.5 13.2 13.2
35-40 2 37.5 75 7.2 14.4
29-34 10 31.5 315 1.2 12
23-28 11 25.5 280.5 4.8 52.8
17-22 4 19.5 78 10.8 43.2
11-16 3 13.5 40.5 16.8 50.4
5-10 2 7.5 15 22.8 45.6
n = 40  fX  1212  f X  x  384

x
 fX 
1212
 30.3
n 40
MAD 
f X x
n
384
MAD   9 .6
40
Therefore, the mean absolute deviation of the scores of the students is 9.6

B.) Sample Variance

C.I (f) X fX X2 fX2


53-58 3 55.5 166.5 3080.25 9240.75
47-52 4 49.5 198 2450.25 9801
41-46 1 43.5 43.5 1892.25 1892.25
35-40 2 37.5 75 1406.25 2812.5
29-34 10 31.5 315 992.25 9922.50
23-28 11 25.5 280.5 650.25 7152.75
17-22 4 19.5 78 380.25 1521
11-16 3 13.5 40.5 182.25 546.75
5-10 2 7.5 15 56.25 112.50
n = 40  fX  1212  fX 2 =43002

n fX 2    fX 
2

S 2

n n  1
n 1

(40  43002)  (1212) 2


S n21 
40 40  1
1720080  1468944
S n21 
(40)(39)
251136
S n21 
1560
S n21  160.9846154  160.98

Therefore, the sample variance is 160.98.


C.) Standard Deviation
S n 1  S n21
S n 1  160.98
S n 1  12.68778941  12.69

Therefore, the standard deviation is 12.69.

Is the data homogeneous or heterogeneous?


 Compare the value of 2i and the standard deviation. (i = 6)
2i = 2(6) = 12
12 < 12.69

 Since 2i is less than the standard deviation, it only shows that the data is
heterogeneous.

Exercise:
Find the MAD, sample variance and standard deviation of below grouped data
and determine the homogeneity of each group.
Weight of 50 women in a fitness club
Class Interval Frequency
129-136 2
121-128 7
113-120 6
105-112 5
97-104 10
89-96 12
81-88 8
n = 50

Вам также может понравиться