Академический Документы
Профессиональный Документы
Культура Документы
STATISTICS IN
RESEARCH
UNIVERSITY OF
BATANGAS
GRADUATE
SCHOOL
MERCEDES A. MACARANDANG,
ED.D
Meaning of Statistics
Today, statistics is defined in three meanings: namely: singular, plural and general meanings
In its singular sense, the word statistics refers to the branch of mathematics which deals with the
systematic collection, tabulation, presentation, analysis, and interpretation of quantitative data
which are collected in a methodical manner without bias.
In its plural sense, statistics means a set of quantitative data or facts.
In the more general (common) usage, statistics has two meanings: First, it refers to numerical
facts.
The second meaning of statistics refers to the field of disciplines of study. In this sense, the word
statistics is defined as “a group of methods that are used to collect, organize, present, analyze, and
interpret data to make decisions.
Generally, statistics is divided into statistical methods and statistical theory or mathematical statistics.
Statistical methods refer to those procedure and techniques used in the collection, presentation,
analysis and interpretation of quantitative data. Likewise, statistical theory or mathematical statistics
deals with the development and exposition of theories which constitutes the bases of the statistical
methods.
Scope of Statistics
The use of statistics is spread through all fields, namely: fisheries, agriculture, commerce, trade and
industry, health, education, nursing, medicine, biology, economics, psychology, sociology, engineering,
chemistry, physics and many others. It is said that statistics is the “tool” of all sciences. It is called the
“language of research”.
In education, statistics is the vital tool in evaluating the achievements of students and the performance of
mentors, staff, administrators. Statistical results serve as basis for promotion and retention of students.
Statistical treatment determines the effectiveness and ineffectiveness of instruction, research, extension
Page 94
and production.
Functions of Statistics
To provide investigators means of measuring scientifically the conditions that may be involved in
a given problem and assessing the way in which they are related.
To show the laws underlying facts and events that cannot be determined by individual
observations.
To show relations of cause and effect that otherwise may remain unknown.
To find out trends and behavior in related conditions which otherwise may remain ambiguous.
THEORY
OBSERVATIONS
Figure 1 graphically represents the role of statistics in the research process. The diagram is based on the
thinking of Walter Wallace and illustrates how the knowledge base of any scientific enterprise grows and
develops. One point the diagram makes is that scientific theory and research continually shape each
other. Statistics are one of the most important means by which research and theory interact.
Since the figure is circular, it has no beginning or end, we could begin our discussion at any point.
understand these phenomena, they develop explanations. The explanation to any phenomenon is
provided by a theory.
A hypothesis is a statement about the relationship between variables that, while logically derived
from the theory, is much more specific and exact.
Observations may come from different data gathering procedures like surveys, questionnaires,
experiments, etc.
Results of observations are analyzed and subjected to statistical procedures and then
conclusions are made which may either accept or reject the given hypothesis.
As statistical analysis comes to an end, we would move on the next stage of the process. In this
phase, we would primarily b concerned with assessing our theory, but we would also look for other trends
in the data. As we developed tentative explanations, we might begin to revise or elaborate our theory. If
we change the theory to take into account these findings, however, a new research project designed to
test the revised theory is called for, and the wheel of science would begin to turn again.
In summary, statistics permit us to analyze data, identify and probe trends and relationships, to
develop generalizations and to revise and improve our theories. They are also an indispensable part of
the research enterprise. Without statistics, the interaction between theory and research would become
extremely difficult and the progress of our disciplines would be severely retarded.
Statistics is a set of tools used to organize and analyze data. Data must either be numeric in
origin or transformed by researchers into numbers. For instance, statistics could be used to analyze
percentage scores English students receive on a grammar test: the percentage scores ranging from 0 to
100 are already in numeric form. Statistics could also be used to analyze grades on an essay by
assigning numeric values to the letter grades, e.g., A=4, B=3, C=2, D=1, and F=0.
Employing statistics serves two purposes, (1) description and (2) prediction. Statistics are used to
describe the characteristics of groups. These characteristics are referred to as variables. Data is
gathered and recorded for each variable. Descriptive statistics can then be used to reveal the
distribution of the data in each variable.
Statistics is also frequently used for purposes of prediction. Prediction is based on the concept of
generalizability: if enough data is compiled about a particular context (e.g., students studying writing in a
specific set of classrooms), the patterns revealed through analysis of the data collected about that context
can be generalized (or predicted to occur in) similar contexts. The prediction of what will happen in a
similar context is probabilistic. That is, the researcher is not certain that the same things will happen in
other contexts; instead, the researcher can only reasonably expect that the same things will happen.
Prediction is a method employed by individuals throughout daily life. For instance, if writing students begin
class every day for the first half of the semester with a five-minute freewriting exercise, then they will likely
come to class the first day of the second half of the semester prepared to again freewrite for the first five
Page 94
minutes of class. The students will have made a prediction about the class content based on their
previous experiences in the class: Because they began all previous class sessions with freewriting, it
would be probable that their next class session will begin the same way. Statistics is used to perform the
same function; the difference is that precise probabilities are determined in terms of the percentage
chance that an outcome will occur, complete with a range of error. Prediction is a primary goal of
inferential statistics.
Descriptive Statistics. The general function of statistics us to manipulate data so that the
original research question(s) can be answered. The researcher can call upon two general classes of
statistical techniques that, depending on the research situation, are available to accomplish the task. The
first class of techniques is called descriptive statistics and is relevant when (1) when the researcher needs
to summarize or describe the distribution of a single variable and (2) when the researcher wishes to
understand the relationship between two or more variables. If we are concerned with describing a single
variable, then our goal will be to arrange the values or scores of that variable so that the relevant
information can be quickly understood and appreciated. Percentages, graphs, and charts can be all used
as single-variable descriptive statistics. The process of allowing a few numbers to summarize many
numbers is called data reduction and is the basic goal of single-variable descriptive statistical
procedures. Descriptive statistics which is devoted to the summarization and description of data sets.
These includes topics on the measures of central tendency, measures of variability and measures of
correlation. Descriptive statistics consists of methods for organizing, displaying, and describing of data by
using tables, graphs and summary measures.
The second type of descriptive statistics is designed to help the investigator understand the
relationship between two or more variables. These statistics, called measure of association or
correlation, allow the researcher to quantify the strength and direction of relationship. These statistics are
very useful because they enable us to investigate two matters of central theoretical and practical
importance to any science: causation and prediction. These techniques help us trace the ways by which
some variables might have causal influence on others, and depending on the strength of the relationship,
they enable us to predict the scores on one variable from the scores of another.
Descriptive Statistics are used to describe the basic features of the data gathered from an
experimental study in various ways. They provide simple summaries about the sample and the measures.
Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.
It is necessary to be familiar with primary methods of describing data in order to understand phenomena
and make intelligent decisions. Various techniques that are commonly used are classified as:
Graphical displays of the data in which graphs summarize the data or facilitate
comparisons.
Tabular description in which tables of numbers summarize the data.
In general, statistical data can be briefly described as a list of subjects or units and the data associated
with each of them. Although most research uses many data types for each unit, this introduction treats
only the simplest case.
1. To choose a statistic that shows how different units seem similar. Statistical textbooks call one
solution to this objective, a measure of central tendency.
2. To choose another statistic that shows how they differ. This kind of statistic is often called a
measure of statistical variability.
When summarizing a quantity like length or weight or age, it is common to answer the first question
Page 94
with the arithmetic mean, the median, or the mode. Sometimes, we choose specific values from the
cumulative distribution function called quantiles.
The most common measures of variability for quantitative data are the variance; its square root, the
standard deviation; the range; interquartile range; and the average absolute deviation (average
deviation).
3. Summarize data
4. Present data
Inferential Statistics. The second class of statistical techniques becomes relevant when we
wish to generalize our findings from a sample to a population. It is concerned with making decisions
about a large body of data in the population of interest by using samples. It consists of methods that use
sample results to help make predictions. A population is the total collection of all cases in which the
researcher is interested and wishes to understand. A population is usually large to be measured and
social scientists almost never have the resources or time to test every case in the population. Hence, the
need for inferential statistics, which involves using information from samples (carefully chosen subset of
the defined population) to make inferences about populations. Samples are of course, mush cheaper to
assemble, and if proper techniques are followed – generalization based on these samples can be very
accurate representations of the population.
One of the puzzling aspects of studying statistics is learning when to use which statistics. There
are guidelines which should be remembered. The first of these concerns : discrete and continuous
variables; and the second concerns level of measurement.
A variable is said to be discrete if it has a basic measurement that cannot be subdivided. The
measurement process for discrete variables involves accurate counting of the number of unit per case.
For example, the number of people per household is a discrete variable.
Levels of Measurement
Every statistical technique involves performing some mathematical operations such as adding
scores or ranking cases. Before you can properly use a technique, you must measure a variable being
processed in a way that justifies the required mathematical operations.
The three levels of measurements are nominal, ordinal and interval-ratio. All measurement
involves classification as a minimum. In nominal measurement, classification into categories is the only
measurement permitted. The categories are not numerical and can be compared to each other only in
terms of the number of cases classified in them. Although at times, numerical labels are used to identify
the categories of a variable measured at the nominal level. The only mathematical operation permissible
with nominal variables is counting the number of occurrences that have been classified into the various
Page 94
The Interval-Ratio level of measurement. The categories of nominal level variables have no
numerical quality to them. Ordinal-level variables have categories that can be arrayed along a scale from
high to low, but the exact distances between categories are unknown. Variables measured at the
interval-ratio level not only classification and ranking but also allow the distance from category to category
(score to score) to be exactly defined. Interval-ratio variables are measured in units that have equal
intervals and true zero point. For example, the ages of your respondents is a measurement procedure
that would produce interval-ratio data because the unit of measurement (years) has equal intervals (the
distance from year to year is 365 years) and a true zero point (it is possible to be zero years old). Other
examples of interval-ratio variables would be income, number of children, weight, test scores, and years
married.
In statistics, we always deal with data either from a population or form a sample.
Population refers to the totality of observations of the entire universe of people or factors. Examples: all
teachers in Metro Manila, all government employees in the Philippines, etc.
Sample refers to a subset of the total population. Example: selected teachers in Metro Manila, selected
employees in the Philippines.
Representative Sample is a sample that represents the characteristics of the population as closely as
possible.
Random Sample is a sample drawn in such a way that each element of the population has equal
chances of being selected.
Element or members of a sample or population is a specific subject or object (for example, a person, a
firm, item, state or country) about which the information is collected.
Variable is a characteristic under study that assumes different values for different elements. In contrast
to a variable, the value of a constant is fixed.
Data are numbers or measurements that are collected as a result from observation, interview,
questionnaire, experimentation, test, and so forth.
Types of Data
There are two general types of data: (1) numerical and (2) categorical data.
Page 94
Numerical data are those that are expressed in numerical values, such as 5, 212, 5.34, etc. These are
classified into: discrete data and continuous data.
Discrete data are always expressed in whole numbers. They cannot be expressed in fractions
or decimals. Ex. 12 brothers, 29 students
Continuous data are those which can be expressed in decimals or fraction. Ex. 5.36 ft., 70.526
lbs., 71/2 meters
Categorical data are classificatory data. They are not expressed in numerical values. They are merely
labeled and classified into categories for statistical analysis
Measurement refers to the assignment of numbers to observations made of objects or persons in such a
way that the numbers can be subjected to statistical analysis by manipulating or using the needed
operations according to mathematical rules of correspondence.
Variable refers to a factor, property, attribute, characteristics, or behavior that differentiates a group of
persons, a set of things, events, etc.,which takes on two or more dimensions, categories or levels with
descriptive or numerical values that can be measured qualitatively and/or quantitatively. Ex. Sex (male/
female), socio-economic status (high/middle/low); geographic location (urban/rural), etc.
Types of variables:
Independent Variable refers to the factor, property, attribute that is introduced, manipulated, or
treated to determine if it influences or causes a change in the dependent variable. The antecedent, cause,
stimulus that is introduced at the outset of the investigation. Ex: a method of teaching, a kind of fertilizer.
Dependent variable is the factor, property, characteristic or attribute that is measured and made
the object of analysis. It is the consequent, effect, criterion, response or output that is analyzed and
treated statistically during the investigation for purposes of hypothesis testing.
Quantitative variable is a variable which can be measured quantitatively. The data collected are
called quantitative data.
Qualitative or categorical variable is a variable which cannot assume a numerical value but can
be classified into two or more categories. The data collected are called qualitative data.
SCALES OF MEASUREMENT
Nominal Scale applies to data that are divided into different categories and these are used only
for identification purposes. Ex. Names of companies, cars, gender, marital status, etc.
Ordinal Scale applies to data that are divided into different categories that could be ranked.
Interval Scale applies to data that can be ranked and for which the difference between two
values can be calculated and interpreted.
Ratio Scale applies to data that can be ranked and for which all arithmetic operation (addition,
subtraction, multiplication and division) can be done.
Page 94
Exercise 1:
Ordinal variables
Ratio variables
Discrete variables
Continuous variables
Quantitative variables
Qualitative variables
4. Below are some items from a public-opinion survey. For each item, indicate the level of measurement
and whether the variable will be discrete or continuous.
c. If you were asked to use one of these four names for your social class, which would you say
you belonged in? ____ Upper _____ Middle _____ Working _____ Lower
h. The only way to deal with the drug problem is to legalize all drugs.
5. Read 3 theses using quantitative approach. Identify the research problems. Based on Chapter III of
these researches, identify the statistical measures used in each of the problems. Show how the findings
were presented.
Page 94
MODULE 2 – BASIC DESCRIPTIVE STATISTICS: – Percentages, Ratios and Rates, Tables, Charts
and Graphs
Lesson Objectives: At the end of the lesson, the students shall be able to:
Introduction
Research results do not speak for themselves. They must be arranged in ways that allow the
researcher (and his or her readers) to comprehend their meaning quickly. The primary concern of
descriptive statistics is to present research results clearly and concisely. Researchers use a process
called data reduction to organize data into presentable form. Data reduction involves using a few
numbers, a table, or a graph to summarize or stand for a larger array of data.
Data reduction may lose important information like precision and details, so summarizing
statistics might present a misleading picture of research results. This can be avoided if not totally
eradicated if the researcher takes into consideration several decisions in the choice of different
summarizing techniques. These are: how to present the data, what kind of information to lose, and how
much detail can safely be obscured.
In this lesson, we will consider several commonly used techniques for presenting research
results: percentages and proportions, ratios and rates, tables, charts and graphs.
Percentages and proportions supply a frame of reference for reporting research results in the
sense that they standardize the raw data: percentages to the base of 100 and proportions to the base of
1.00. The mathematical definitions of proportions and percentages are:
Example: Of the 80 graduates of Bachelor in Secondary Education, 70 took the Licensure Examination.
Out of this, 59 passed. What are the percentages and proportions of takers, and passers?
Percentage (%) of takers = (f/N) x 100 = (70/80) x 100 = (0.875) x 100 = 87.5%
Percentage (%) of passers = (f/N) x 100 = (59/70) x 100 = (0.843) x 100 = 84.3%
Example: Given the data presented in the following tables, we will see the advantage of
presenting them in percentages.
Table 1.1 – DECLARED MAJOR FIELDS OF STUDYIN THE TWO PROGRAMS OF THE COLLEGE OF
EDUCATION
English 46 39
Filipino 36 29
Mathematics 52 49
Physical Education 23 18
General Science 50 42
If the frequencies are the only one given in a set of data, making comparisons is difficult because
the total number of enrollments are different. To make comparisons easier, the difference in size can b
effectively eliminated by standardizing both distribution to the common base of 100 as shown in Table
1.2.
Table 1.2 – DECLARED MAJOR FIELDS OF STUDY IN THE TWO PROGRAMS OF THE COLLEGE OF
EDUCATION
The percentages in Table 1.2 make it easier to identify both differences and similarities between
Page 94
2. Always report the number of observations along with proportions and percentages, This permits
the reader to judge the adequacy of the sample size and, conversely, helps to prevent the
researcher from lying with statistics.
Ratios and rates provide two additional ways in which the distribution of a variable can be simply
and dramatically summarized. Ratios are especially useful for comparing categories in terms of relative
frequency. Instead of standardizing the distribution of a variable to the base of 100 or 1.00, as we did in
computing percentages and proportions, we determine the ratios by dividing the frequency of one
category by the frequency in another. Mathematically, a ratio can be defined as:
Ratio = -------
To illustrate the use of ratios, suppose you were interested in the relative sizes of male and
female students in the College of Education and found out that there are 225 female and 58 male
students in the college. To find the ratio of female students (f1) to male students (f2), divide 225 by 58.
The resultant ratio is 3.88. This number would mean that for every male student in the College of
Education, there are 3.88 female students.
Note that ratios can be very economical ways of expressing the relative predominance of two
categories. In our example, the predominance of female students in the College of Education is obvious
from the raw data. Ratios are a precise measure of the relative frequency of one category per unit of the
other category. They tell us in an exact way the extent to which one category outnumber the other.
Rates provide still another way of summarizing the distribution of a single variable. Rates are
defined as the number of actual occurrences of some phenomenon divided by the number of possible
occurrences per some unit of time. Rates are usually multiplied by some power of 10 to eliminate decimal
points. For example, the crude death rate for a population is defined as the number of deaths in that
population (actual occurrences) divided by the number of people in the population (possible occurrences)
per year. This quantity is then multiplied by 1000. The formula for the crude death rate can be expressed
as:
If there were 100 deaths during a given year in a town of 7000, the crude death rate for that year
would be
Page 94
By the same token, if a school with an enrolment of 8,700 experienced 120 dropouts during a
particular academic year, the dropout rate would be:
Or, for every 1000 enrolees, there were 13.79 students who stopped schooling during the academic year
in question.
So far, we have considered three techniques (proportions and percentages, ratios and rate) for
describing and summarizing data. All three techniques express clearly and concisely, the distribution of
a single variable. They represent different ways of expressing information so that it can be quickly
appreciated.
Tabular Presentation
In many research activities data are gathered from different sources. These collected data
through various methods need to be organized. To give meaning to these raw data, appropriate tables
and graphs are used. In this lesson, we will consider tabular presentation through frequency distribution
and different methods of graphical presentation.
Frequency Distribution
Raw data can be tabulated or organized into a frequency distribution headed by a number and a
title. Frequency distribution is defined as the arrangement of the gathered data by categories with their
corresponding frequencies and class marks or midpoints. It has a class frequency containing the number
of observations belonging to a class interval. Each class interval contains a grouping defined by the
limits, called the lower or upper limits. Between these lower and upper limits are called class boundaries.
Table 1
Enrolment in the College of Education During the Academic Year 2007 – 2008
First Year 66
Second Year 62
Third Year 71
Fourth Year 87
Total 386
This is an example of a table presenting nominal data. The table consists of two columns , the
Page 94
first of which pertains to the categories being presented and the second column pertains to the
frequencies of each of the categories. In this table the data in the nominal scale are labeled.
Frequency Distribution of Ordinal Data
Table 2
Strongly agree 58
Agree 45
Moderately agree 39
Disagree 26
Strongly disagree 20
Total 188
Table 2 presents an example of the tabular presentation of ordinal data. For ordinal data, the
distributions are scaled or graded so that the score values in the distribution present the degree of the
particular characteristic of the variable. It is for this reason that this type of data is always presented in
order, arranging data from highest to lowest or vice-versa.
A frequency distribution provides the classroom teacher with a systematic arrangement of raw
scores by tallying the frequency of occurrence of each score in the interval or in some instances score
values that have been grouped.
1. Arrange the scores from highest to lowest in a column headed X. The X represents the raw
score.
2. Head the second column Tallies and record a slash or tally mark for each score. If a score value
appears twice, this column will have two slashes, three values gives three slashes, and so on.
3. Count the slash marks and place the number corresponding to the total number of tallies for each
raw sore value in the third column. The f column represents the occurrence of each score.
4. Sum the f column and record the number of scores (N) as a total.
32 39 40 25 29 35 39 28 41 29 37 30
27 32 29 29
Page 94
41 / 1
40 / 1
39 // 2
37 / 1
35 / 1
32 // 2
30 / 1
29 //// 4
28 / 1
27 / 1
25 / 1
N=16
When the interval between the lowest and highest scores exceeds about 30 units, grouping
scores into intervals may aid in the analysis. Grouped data condenses the scores into a smaller number
of categories which may aid in interpretation of a large number of scores or a set of scores with a wide
range.
Group / Class / Step Frequency Distribution is the process of placing scores in scaled group
called classes or steps. A class / step is group of a specified number of consecutive scores single scores
or measures. The specified number of consecutive scores that a class/step contains is called interval.
The lower end-number of the class is called lower limit and the upper end-number of the class is called
upper limit.
1. Find the highest and the lowest scores. Get their difference (Range).
2. Determine the number of classes or steps by dividing the range by the number of steps or
classes desired. The ideal number of steps or classes ranges from 10 to 20 depending upon the
number of scores or measures. There is no fixed rule but the more scores the more number of
classes there should be.
3. Determine the lowest limit. This is done by looking into the lowest score. The lowest score can
be the lowest limit, but it is advisable that the lower limits be exactly divisible by the desired
interval. If the lowest score is 40 and the interval is 3, the lowest limit will be 39. Forty is not
exactly divisible by 3, so look for the number which is nearest the lowest score and exactly
divisible by 3. That number is 39.
4. Determine the upper lower limits by adding the interval to the previous lower limits.
5. Determine the upper limits of each lower limit until reaching the highest score or including the
highest score.
6. Tally each raw score according to the interval in which it falls.
7. Get the frequencies of the tallies in each of the class or step intervals.
8. Find the sum of the frequencies (N).
Example:
47 32 58 37 24 28 55 38 35 44 49
Page 94
47 51 38 33 29 27 42 39 53 46 40
28 30 47 50 45 39 32 36 36 51 47
39 33 38 36 45 43 33 44 42 36 41
44 41 36 34
1. The highest score is 58 and the lowest score is 24. The range is 34.
2. To find the class interval, divide the range, 34 by 10 (the desired number of step or class-
intervals). The answer is 3.4, so 3 the step-interval.
3. The lowest score is 24. Since 24 is exactly divisible by 3, then it is the lowest limit .
4. The resulting frequency distribution would be:
Step Distribution
57 - 59 / 1
54 - 56 / 1
51 - 53 /// 3
48 - 50 // 2
45 - 47 /////-// 7
42 - 44 /////-/ 6
39 - 41 /////-/ 6
36 - 38 /////-//// 9
33 – 35 ///// 5
30 - 32 /// 3
27 - 29 //// 4
24 - 26 / 1
N= 48
To work with the distribution of a variable as if it were continuous, statisticians use real class
limits. To find the real class limits of any class interval, begin with the limits stated in the frequency
distribution (the stated class limits). Subtract 0.5 to the stated lower limit and add 0.5 to the stated upper
limit.
Note that, when conceptualized with real limits, the class interval overlap with each other and the
distribution can be seen as continuous.
In addition to real limits, you will need to work with midpoints of the class interval to construct
some types of graphs. Midpoints are defined as the points exactly halfway between the upper and real
lower limits and can be found by dividing the sum of the upper and lower real limits by two.
Page 94
Example:
Real Limits Real Limits Midpoints
57 – 59 56.5 – 59.5 58
54 – 56 53.5 – 56.5 55
51 – 53 50.5 – 53.5 52
48 – 50 47.7 – 50.5 49
Two commonly used adjuncts to the basic frequency distribution for interval-ratio data are the
cumulative frequency and percentage distribution. Their primary purpose is to allow the researcher(and
his or her audience) to tell at a glance how many cases fall below a given score or class interval in the
distribution.
To construct a cumulative frequency (cf) column, begin with the lowest class interval in the
distribution. The entry in the cf columns for that interval will the same as the number of cases in the
interval. For the next higher interval, the cf will be all the cases in the interval plus all the cases in the first
interval, and so on.
The percentage column is determined by dividing the frequency of each class interval by the total
number of cases and multiplying the quotient by 100.
58 - 59 1 48 1 2.08
55 - 56 1 47 2 2.08
52 - 53 3 46 5 6.25
49 - 50 2 43 7 4.17
46 - 47 7 41 14 14.58
43 - 44 6 34 20 12.5
40 - 41 6 28 26 12.5
37 - 38 9 22 35 18.75
34 – 35 5 13 40 10.42
31 - 32 3 8 43 6.25
28 - 29 4 5 47 8.33
24 - 26 1 1 48 2.08
N= 48
Researchers frequently use charts and graphs to present their data in ways that are visually more
dramatic than frequency distributions. These devices are particularly useful for conveying an impression
of the overall shape of a distribution and for highlighting the clustering of cases in a particular range of
scores. The most common techniques are the pie and bar charts, histogram and line chart or frequency
polygon. The first two are appropriate for discrete variables at any level of measurement and the last two
Page 94
DIVORCE
D
15% DIVORCED
SINGLE MARRIED
50% MARRIED
35% SINGLE
_________________________________________________
Single 10 50
Married 7 35
Divorced 3 15
N = 20 100% __
Bar Charts. Like pie charts, bar charts are relatively straightforward. Conventionally, the
categories of the variable are arrayed along the horizontal axis (abscissa) and frequencies or percentages
of the variable, construct or draw a rectangle constant with width and height corresponding to the number
of cases in the category.
10
8
6
Series1
4
2
0
8 2 8 4 0 6
-2 -3 -3 -4 -5 -5
24 30 36 42 48 54
lines.
FIGURE 4: FREQUENCY POLYGON OF A GROUPED SCORES
10
9
8
7
6
5 Series1
4
3
2
1
0
24 27 30 33 36 39 42 45 48 51 54 57
- - - - - - - - - - - -
28 29 32 35 38 41 44 47 50 53 56 59
Ranking of Scores
One way of arranging scores is by ranking. Rank is the position of an observation, score or
individual in relation to the others in the group according to some characteristics such as magnitude,
quality of importance.
Ranking is the process of determining the relative position of values, measures, or scores
according to some bases such as magnitude, worth, quality, importance, or chronology. It is an
arrangement of values or scores form the highest to the lowest.
The following scores are obtained from a 60 item test in Assessment of Learning administered to
36 students:
56 44 32 34 22 52 21 18 40 38
30 41 50 30 47 30 49 36 20 46
30 50 27 40 33 49 36 27 48 33
41 25 36 48 24 19
Page 94
Procedure:
1. Arrange the scores in a descending order, that is, from the highest to the lowest, in a vertical
column X. Write each score as many times as it occurs.
2. Number the scores consecutively from the highest to the lowest under the symbol N.
3. Assign ranks under the symbol R. The rank of scores occurring once is the same as its
consecutive number. To find the ranks of scores occurring twice or more times, find the average
of the consecutive numbers.
X CN R X CN R X CN R
56 1 1 40 13 13.5 30 25 24.5
52 2 2 40 14 13.5 30 26 24.5
50 3 3 38 15 15 27 27 27.5
49 4 4.5 36 16 17 27 28 27.5
49 5 4.5 36 17 17 25 29 29
48 6 6.5 36 18 17 24 30 30.5
48 7 6.5 34 19 19 24 31 30.5
47 8 8 33 20 20 21 32 32.5
46 9 9 32 21 21.5 21 33 32.5
44 10 10 32 22 21.5 20 34 34
41 11 11.5 30 23 24.5 19 35 35
41 12 11.5 30 24 24.5 18 36 36
Ranking is used to indicate the relative position of a pupil or student in a group to which he/she
belongs. By ranking test scores, it is possible to compare the achievement of a pupil with those of the
others in the same group. A report of a student’s rank is a very good indication of individual performance
compared to general group performance. Ranking however does not consider the extent of the difference
between successive test scores. From the ranks, the percentage of pupils that surpasses a pupil or that
are surpassed by him can be determined. Ranks are generally well understood by students and parents.
1. The ranks of scores 56, 52, 50, 47, and 46 are their numbers namely: 1, 2, 3, 8 and 9. These
scores appear only once, their consecutive numbers are their ranks.
2. The rank of score 49 is 4.5 , that is the average of 4 and 5 ; 4 added to 5 divided by 2.
3. The rank of 30 is 24.5,that is the average of numbers 23, 24, 25 and 26.
4. Score 30 has a rank of 20. Nineteen students or 52.86 percent of the class surpassed the
student who got this score. This student surpassed 16 or 44.44 percent of his classmates.
Page 94
EXERCISE 2
1. At St. Mercy College, the number of males and females in the various fields of study are as follows:
Humanities 117 83
Social Sciences 97 132
Natural Sciences 72 20
Business 156 139
Nursing 250 375
Education 48
239
Read each of the following problems carefully before constructing the fraction and solving for the
answer.
2. Twenty high school students completed a class to prepare them for the College Board. Their scores
are as follows:
A. Display this data in a frequency distribution with columns for frequencies and percentages.
51 42 33 66 43 44 42 51 54 60
46 38 45 21 33 42 57 38 48 26
56 54 37 27 31 33 35 38 64 44
55 32 45 51 52 46 40 59 27 46
51 54 61 58 58 57 52 49 36 45
B. What are the real limits and midpoints of the each class interval?
C. Add columns to the table to display the percentage distribution, cumulative frequency and
cumulative percentages.
F.1. What are the ranks of scores 45, 38, 51, 27 and 60?
F.2. What % of the class surpassed the student whose score is 46?
F.3. What % of the class is surpassed by the student/s whose score is 54?
Objectives: At the end of the lesson the students shall be able to:
1. Define mean, median and mode;
2. Compute the mean, median and mode for ungrouped and ungrouped data;
Central tendency relates to a point in a distribution around which the scores tend to center. This
point can be used as the most representative value for a distribution of scores. A measure of central
tendency is helpful in showing where the average or typical score falls. The teacher can see how an
individual student performance relates to the average value or make comparisons about two or more
classes that took the same test.
The benefit of frequency distributions, graphs, and charts is that they summarize the
overall shape of a distribution of scores in a way hat can be quickly comprehended.
However, it is necessary to report more detailed information about the distribution.
Two kinds of statistics are useful; they are (1) measures of central tendency and (2)
measures of dispersion.
Three commonly used measures of central tendency are : the mode, median, and
mean.
These three summarize an entire distribution of scores by describing the most
common score (the mode), the middle case (the median), or the typical score of the cases
(the mean) of that distribution.
These statistics are powerful because they can reduce huge arrays of data to a single,
easily understood number.
The function of the central purpose of descriptive statistic is to summarize or “reduce”
data.
Median
E.g.; in this set of scores 61, 75, 80, 87, 93, 80 is the median.
How to find the median--- first, the cases must be placed in order from the highest to
the lowest score. Once this is done, find the central or middle case.
Page 94
When the number of cases (N) is odd, the value of the median is unambiguous
because there will always be a middle cases; and, in this situation, the median is defined as
the score exactly halfway between the scores of the two middle cases.
If the number is even, there will be two middle scores. The median will be the average
of the scores of the two middle cases.
Since the median requires that scores be ranked from high to low, it cannot be
calculated for variables measured at the nominal level.
The score of nominal-level variables cannot be ordered: the scores are different from
each other but do not form a mathematical scale of any sort.
Therefore, the median can only be found either ordinal or interval-ratio data but is generally
more appropriate for the former (the ordinal)
The median is the most exact measure of central tendency. Extreme low or high scores do
not much affect the median. The value of the median depends on the number of scores,
not much on the magnitude of the scores. If most of the scores are high, the median is
high, if most of the scores are low, the median is low.
Example:
When the number of cases is odd, arrange the scores from highest to lowest or vice versa. Write
down all the scores, the median is the middlemost score.
Example: When the number of cases is odd.
20 21 19 19 18 22 23 16 15 22 21 18 25
The median is 20 : 25 23 22 22 21 21 20 19 19 18 18 16 15
37 40 35 24 19 38 27 36 18 20 39 28 22 32
The middlemost scores are 32 and 28. The average of these two numbers is 30. So the median
is 30 .
X F
90 – 94 1
85 – 89 2
80 – 84 7
75 – 79 9
70 – 74 11
65 – 69 8
60 – 64 5
55 – 59 5
50 – 54 1
45 – 49 1
N = 50
Where:
i = the interval
2. Find the values of the symbols:
a. N/2 = 50/2 = 25
b. Fl = Add the frequencies of the score from the lower score end upward until reaching
half sum but not exceeding it. ( 1+ 1+ 5+ 5+ 8 = 20) Twenty (20) is the partial sum from
the lower limit. The median (25th score lies in the step-interval 70 – 74 and its frequency
is 11)
c. The value of f is 11 ( the frequency of the interval where the median lies)
d. LL is 69.9 ( the real lower limit of 70 – 74 = the interval where the median lies)
e. i, the interval of the class limits, is 5.
3. Substitute the values for the symbols in the formula and solve.
(25 – 20)
Mdn = 69.4 + ________ 5
11
5
= 69.5 + ___ x 5
11
= 69.5 + (.4545) x 5
= 69.5 +2.2725
Mdn = 71.77
(N/2 – Fu)
Mdn = Ul – ________ x i
in which : Ul = real upper limit of the interval where the median lies
4.1.2 Fu = Add the frequencies of the score from the upper score end downward until
reaching half sum but not exceeding it. ( 1+ 2+ 7+ 9 = 19) Nineteen (19) is the partial sum from the
upper limit. The median (25th score lies in the step-interval 70 – 74 and its frequency is 11)
The value of f is 11 ( the frequency of the interval where the median lies)
UL is 74.5 ( the real upper limit of 70 – 74 = the interval where the median lies)
Page 94
11
(6)
11
Mdn = 71.77
Mean
The mean or the arithmetic mean is referred to as the average of scores or measures. It is
considered the best measure of central tendency due to the following qualities:
1. Each score contributes its proportionate share in computing the mean. The mean is
more stable than the median or the mode.
2. Since the mean means average, it is best understood and more widely used measure of
central tendency.
3. It is used as basis in computing other statistical measures like the average deviation,
standard deviation, coefficient of variability, coefficient of correlation, etc.
4. the arithmetic average
5. It reports the average score of a distribution, and its calculation is straightforward
To compute the mean, add the scores and then divide by the number of scores.
Formula:
the use of “mean” is fully justified only when working with interval-ratio data.
∑ (Xi-M) = minimum. if the differences between the scores and the mean are squared
and then added, the resultant sum will be less than the sum of the squared differences
between the scores and any other point in the distribution.
Every score in the distribution affects the mean. The mode and median are not so
affected. This quality is both an advantage and a disadvantage. The mean utilizes all the
available information—every score in the distribution affects the mean. On the other hand,
when a distribution has a few extreme cases (very high or very low scores), the mean may
become very misleading as a measure of centrality.
Median and mean will be the same when a distribution is symmetrical (share a same point).
When a distribution has some extremely high score (the positive skew), the mean will always
have a greater numerical value than the median.
If the distribution has some very low scores (a negative skew), the mean will be lower in value
than the median.
The relationships between medians and means also have a practical value; i.e. a quick
comparison of the median and mean will always tell you if a distribution is skewed and the
direction of the skew.
For the good and honest researcher, the selection of a measure of central tendency for a badly
skewed distribution will hinge on what he or she wishes to show and, in most cases, either both
statistics or the median alone should be reported.
Computation of the Mean from Ungrouped Data ( When the number of cases is less than 30)
68 70 56 45 60 54 63 48 35 29
45 63 36 49 36 55 47
X
68 M = 859/17
70 M = 50.529 or 50.53
56
45
60
Page 94
54
63
48
35
29
45
63
36
49
36
55
47
∑X = 859
X = AM + (∑fd/N) i
Where:
AM = assumed mean
∑fd = is the algebraic sum of the products of the frequencies and their corresponding deviations
from the assumed mean
2.2 Assume a mean. The assumed mean can be in any part of the frequency distribution, but it is
advisable to get the midpoint of the class-interval at the middle of the distribution, that one with the
highest frequency.
2.3 Fill column D starting from the step where the assumed mean lies, assign this a 0 deviation.
From 0, number the steps upward 1,2, 3 4, and downward 1,2, 3, 4 etc. All deviations above the
assumed mean have positive signs and all deviations below the assumed mean have negative signs.
2.4 Multiply the frequency by the deviation for each step to get the fd column, and get the sum of fd.
This is the algebraic sum of the fd column.
(∑fd/N) x i
Page 94
X f d fd
90 – 94 1 4 4
85 – 89 2 3 6
80 – 84 7 2 14
75 – 79 9 1 9 +33
70 – 74 11 0 0
65 – 69 8 -1 -8
60 – 64 5 -2 -10
55 – 59 5 -3 -15
50 – 54 1 -4 -4
45 – 49 1 -5 -5 -42
N = 50 Efd = -9
1. Assume a mean. Get the midpoint of the interval where the assumed mean lies.
AM = 72
2. Fill in Column d (deviation). The deviation is the spread of the score from a point of origin
3. Fill in Column fd . The sum of the positive values is +33 and that of the negative values is – 42.
The sum of fd is -9.
M = 72 + (-9/50) 5
M = 72 + (-0.18) 5
M = 72 + (-0.9)
M = 72 – 0.9
M = 71.10
Page 94
X f d fd
90 – 94 1 5 5
85 – 89 2 4 8
80 – 84 7 3 21
75 – 79 9 2 18
70 – 74 11 1 11 +63
65 – 69 8 0 0
60 – 64 5 -1 -5
55 – 59 5 -2
-10 50 – 54 1
-3 -3 45 – 49 1 -
4 -4 -22
N = 50 Efd + 41
Given:
AM = 67
Efd = +4
I =5
N = 50
M = 67 + (+41/50) 5
M = 67 + (0.82) 5
M = 67 + 4.1
M = 71.10
Another method of computing the mean is through the midpoint method. The formula is:
M = EFM
N
X f M fM
90 – 94 1 92 92
85 – 89 2 87 174
80 – 84 7 82 574
75 – 79 9 77 693
70 – 74 11 72 792
65 – 69 8 67 536
60 – 64 5 62 310
55 – 59 5 57 285
50 – 54 1 52 52
45 – 49 1 47 47
N = 50 EfM=3555
Procedure:
5. Divide this by N.
M = 3555/50 = 71.10
THE MODE
The mode of any distribution is the value that occurs most frequently.
For example, in the set of scores 98, 92, 90, 90, 84, 64, the mode is 90 because it occurs
twice.
It is a simple statistic, most useful when there is a need to have a “quick and easy” indicator
of central tendency and when it is worked with nominal-level variables.
If a researcher desires to report only the most popular or common value of a distribution, or
if the variable under consideration is nominal, then the mode is the appropriate measure of
central tendency.
Limitations of the mode: (1) some distributions have no mode at all or so many modes
that the statistic loses all meaning. (2) With ordinal and interval-ratio data, the modal score
may not be central to the distribution as a whole. That is, most common does not
necessarily mean “typical” in the sense of identifying the center of the distribution.
Example :
Freshman major instruments in Soochow University in 1999, the mode of this distribution, the single
largest category is those who major in piano.
Example:
Distribution Frequency
Male 20
Female 20
Example:
% of correct Frequency
58 2
60 2
62 3
64 2
66 3
67 4
68 1
69 1
70 1
93 5
N=24
The mode of the distribution is 93. But this is not the majority of the scores. It is not appropriate for
the instructor to summarize this distribution by reporting only the modal score because he won’t be able to
convey an accurate picture of the distribution as a whole.
Procedure:
Data:
25 30 37 41 52 52 30 37 42 37
52
52
Page 94
42
41
37 Mode = 37
37
37
37
30
30
30
25
Determining the Crude Mode from Grouped Scores (Frequency Distribution). The crude mode is
the midpoint of the interval with the highest frequency.
X F
90 – 94 1
85 – 89 2
80 – 84 7
75 – 79 9
70 – 74 11 Crude Mode = 72
65 – 69 8
60 – 64 5
55 – 59 5
50 – 54 1
45 – 49 1
N = 50
When a group of scores has two different scores with the same highest frequency, the group is
said to be bi-modal. If there are three different scores with the same highest frequency, the group is tri-
modal, four, quadri-modal, etc.
Mo = 3Mdn – 2M
In which;
Mo = the mode
M = the mean
Page 94
In the Frequency Distribution given above where the median is 71.77 and the mean is 71.10, the mode
is:
3 (71.77) – 2 (71.10)
= 215.31 – 142.2
= 73.11
The mode is merely the most typical value or the most frequent measure. It is computed when a quick
method of computing the most typical and approximate measure of central tendency is all that is needed.
you want to report the central score. The median always lies at the exact center of a distribution.
variables are measured at the interval-ratio level (except for highly skewed distributions)
you want to report the typical score. The mean is "the fulcrum that exactly balances all of the
scores."
The measures of location or point measures are the quartiles, deciles and percentiles. The
quartiles (Q1, Q2, Q3 and Q4) are points dividing the distribution into four equal parts. The deciles (D1,
D2, D3, . . . D10) are points which divide the total number of cases in a frequency distribution into ten
equal parts. The percentiles (P1, P2, P3, etc. ) are points which divide the score distribution into one
hundred equal parts.
The procedure in finding the point measures is almost the same as that of the median.
Page 94
Quartiles
The first quartile (Q1) is located at one-fourth of the number of cases, such that 25% of all the
cases lie at or below it and 75% at or above it.
The value of the third quartile corresponds to the value of the seventy-fifth percentile. Seventy-
five percent of all the cases lie at or above it and 25% lie at or below it.
The value of the second quartile is equal to the value of the median, such that 50% of all the
cases lie at or below it and 50% lie at or above it.
Formula:
(N/4 – F)
Q1 = LL + ________ x I
f
F = partial sum
I = interval
Finding Q1
X F CM
90 – 94 1 50
85 – 89 2 49
80 – 84 7 47
75 – 79 9 40
70 – 74 11 31
65 – 69 8 20
60 – 64 5 12
55 – 59 5 7
50 – 54 1 2
45 – 49 1 1
N = 50
Procedure:
1. Add Column CM in the Frequency Distribution. It stands for the cumulative frequencies, this is done
by adding the scores from the lower score end upward.
2. Find N/4. 50/4 = 12.5. The twenty-fifth score lies in the interval 65 - 69.
3. Determine the partial sum (F). That is the sum of the frequencies upward which totals 25 (Q/4) but not
exceeding it. In the given distribution, the partial sum (F) is 12
Page 94
(12.5 – 12)
Q1 = 64.5 + _______ x 5
8
(0.5)
Q1 = 64.5 + ___ x 5
8
Q1 = 64.5 + (0.06) x 5
Q1 = 64.5 + .30
Q1 = 64.80
Third Quartile
Formula:
(3N/4 - F)
Q3 = LL + ______ I
f
LL = 74.5
F = 31
f= 9
I=5
Q3 = 74.5 + (6.5/9) x 5
Q3 = 74.5 + (.72) x 5
Q3 = 74.5 + 3.6
Q3 = 78.1
Percentiles
Percentiles are points dividing the distribution into 100 equal parts.
Formula:
(NPx – F)
Px = LL + _________ x I
Page 94
where:
Px = the number of percentile desired; NPx = Percentile Sum (N x the percentage desired); F =
partial sum ( the number of scores falling below the desired percentile); f = the
frequency of the interval where the desired percentile lies;
LL the exact lower limit of the interval where the desired percentile lies; I =
interval.
X F CM
90 – 94 1 50
85 – 89 2 49
80 – 84 7 47
75 – 79 9 40
70 – 74 11 31
65 – 69 8 20
60 – 64 5 12
55 – 59 5 7
50 – 54 1 2
45 – 49 1 1
N = 50
Procedure:
(10 – 7) (3)
P20 = 59.5 + _______ x 5 = 59.5 + ___ x 5 = 59.5 + ( .6) x 5 = 59.5 + 3.0 = 62.5
5 5
Page 94
NPx = 50 x .65 = 32.5 ; F = 1+ 1+ 5+5+8+11 = 31; P65 lies at the interval 75 – 79;
The real lower limit (LL) is 74.5; the frequency of this interval (f) is 9; and the interval is 5. Substituting the
formula:
(32.5 – 31 ) (1.5)
P65 = 74.5 + _________ x 5 = 74.5 + ____ x 5 = 74.5 + (.17) x 5 = 74.5 + .85 =75.35
9 9
Deciles: the points that divide a distribution of scores into 10ths
Mean: the arithmetic average of the scores. M represents the mean of a sample, and μ
is the mean of a population.
Median (Md): the point in a distribution of scores above and below which exactly half of
the cases fall.
Mode: the most common value in a distribution or the largest category of a variable.
Percentile: a point below, which a specific percentage of the cases fall.
Skew: the extent to which a distribution of scores has a few scores that are extremely
high (positive skew) or extremely low (negative skew).
EXERCISE 3
1. Differentiate the mean from the mode and median. Discuss their uses and importance.
2. Find the mean, median and mode of the following set of scores:
89 77 63 99 92 93 94 65 62 82 86 76
3. Find the mean, median and mode of the following set of scores in Philippine History:
82 43 72 74 69 68 67 87 86 73
85 75 65 60 35 57 52 59 40 42
61 57 70 50 45 68 62 49 69 58
61 65 60 81 63 48 54 46 54 44
67 66 49 58 67 60 60 68 58 62
Page 94
Objectives: At the end of the lessons, the students shall be able to:
1. define variability, index of qualitative variation, range, standard deviation, average deviation
and quartile deviation;
2. describe skewness and kurtosis and their use in interpretation of test scores; and
3. define, compute, compare and explain the appropriate uses of standard scores and how to
make test scores comparable.
Introduction
The measures of central tendency represented by the mean, median and mode are valuable
statistical measures, but they describe only the typical score representing the whole distribution. They
describe only the tendency of the scores to pile up at or near the middle of the distribution. The measures
of variability or dispersion are important . They show the tendency of the scores to spread or scatter
above or below the center point of the distribution. They show how close or how far the scores are from
each other. These measures also show the homogeneity or heterogeneity of different sets of scores.
The higher the measure of variability the more homogenous is the group; the lower the measure of
Page 94
What is Variability?
Variability refers to how "spread out" a group of scores is. To see what we mean by spread out,
consider graphs in Figure 1. These graphs represent the scores on two quizzes. The mean score for each
quiz is 7.0. Despite the equality of means, you can see that the distributions are quite different.
Specifically, the scores on Quiz 1 are more densely packed and those on Quiz 2 are more spread out.
The differences among students was much greater on Quiz 2 than on Quiz 1.
Quiz 2
Page 94
The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is. Just as in the section on central tendency we discussed measures of the center of a
distribution of scores, in this chapter we will discuss measures of the variability of a distribution. There are
four frequently used measures of variability, the range: interquartile range, variance, and standard
deviation. In the next few paragraphs, we will look at each of these four measures of variability in more
detail.
Range
The range is the simplest measure of variability to calculate, and one you have probably encountered
many times in your life. The range is simply the highest score minus the lowest score. Let’s take a few
examples. What is the range of the following group of numbers: 10, 2, 5, 6, 7, 3, 4? Well, the highest
number is 10, and the lowest number is 2, so 10 - 2 = 8. The range is 8. Let’s take another example.
Here’s a dataset with 10 numbers: 99, 45, 23, 67, 45, 91, 82, 78, 62, 51. What is the range? The highest
number is 99 and the lowest number is 23, so 99 - 23 equals 76; the range is 76. It provides a quick
approximation of the spread of the scores, but it is not a dependable measure of variability because it is
calculated from only two values.
The index of qualitative variation (IQV) is essentially the ratio of the amount of variation actually
observed of scores to the maximum variation that could exist in a distribution. The index varies from 0.00
(no variation) to 1.00 (maximum variation) and is used commonly with variables measured at the nominal
level. However, IQV can be used with any variable when scores have been grouped into a frequency
distribution.
Assume that a researcher is interested in comparing the racial heterogeneity of three small
groups of neighborhoods. By inspection, you see that neighbourhood A is the least heterogeneous of the
three. Neighborhood B is more heterogeneous than A, and neighborhood C is the most heterogenous of
the three. The computational formula for IQV is :
K(N - f )
IQV = ________
N (k – 1)
IQV = 3(0)/16,200
IQV = ).00
Thus, the IQV, in a quantitative and precise way, substantiates our impressions. Neighborhood A exhibits
no variation on the variable “race”, Neighborhood B has substantial variation and Neighborhood C has the
maximum amount of variation.
Variance
Variability can also be defined in terms of how close the scores in the distribution are to the middle of the
distribution. Using the mean as the measure of the middle of the distribution, the variance is defined as
the average squared difference of the scores from the mean. The data from Quiz 1 are shown in Table 1.
The mean score is 7.0. Therefore, the column "Deviation from Mean" contains the score minus 7. The
column "Squared Deviation" is simply the previous column squares.
Deviation Squared
Scores
from Mean Deviation
Page 94
9 2 4
9 2 4
9 2 4
8 1 1
8 1 1
8 1 1
8 1 1
7 0 0
7 0 0
7 0 0
7 0 0
7 0 0
6 -1 1
6 -1 1
6 -1 1
6 -1 1
6 -1 1
6 -1 1
5 -2 4
5 -2 4
7 0 1.5
Page 94
One thing that is important to notice is that the mean deviation from the mean is 0. This will always
be the case. The mean of the squared deviations is 1.5. Therefore, the variance is 1.5. Analogous
calculations with Quiz 2 show that it's variance is 6.7. The formula for the variance is:
where σ2 is the variance, μ is the mean, and N is the number of numbers. For Quiz 1, μ = 7 and N = 20.
If the variance in a sample is used to estimate the variance in a population, then the previous
formula underestimates the variance and the following formula should be used:
where s2 is the estimate of the variance and M is the sample mean. Note that M is the mean of a sample
taken from a population with a mean of μ. Since, in practice, the variance is usually computed in a
sample, this formula is most often used. The simulation "estimating variance" illustrates the bias in the
formula with N in the denominator.
Let's take a concrete example. Assume the scores 1, 2, 4, and 5 were sampled from a larger
population. To estimate the variance in the population you would compute s 2 as follows:
M = (1 + 2 + 4 + 5)/4 = 12/4 = 3.
There are alternate formulas that can be easier to use if you are doing your calculations with a hand
calculator:
and
Page 94
For this example,
ΣX2 = 12 + 22 + 42 + 52 = 46
Standard Deviation
The standard deviation is simply the square root of the variance. This makes the standard
deviations of the two quiz distributions 1.225 and 2.588. The standard deviation is an especially useful
measure of variability when the distribution is normal or approximately normal because the proportion of
the distribution within a given number of standard deviations from the mean can be calculated. For
example, 68% of the distribution is within one standard deviation of the mean and approximately 95% of
the distribution is within two standard deviations of the mean. Therefore, if you had a normal distribution
with a mean of 50 and a standard deviation of 10, then 68% of the distribution would be between 50 - 10
= 40 and 50 +10 =60. Similarly, about 95% of the distribution would be between 50 - 2 x 10 = 30 and 50 +
2 x 10 = 70. The symbol for the population standard deviation is σ; the symbol for an estimate computed
in a sample is s. Figure 2 shows two normal distributions. Both distributions have means of 50. The blue
distribution has a standard deviation of 5; the red distribution has a standard deviation of 10. For the blue
distribution, 68% of the distribution is between 45 and 55; for the red distribution, 68% is between 40 and
60.
Page 94
Figure 2. Normal distributions with standard
deviations of 5 and 10.
Standard Deviation
The standard is the square root of the mean of the squared deviations of all scores from the
mean. It is basically a measure of how far each score is from the mean. Since the standard deviation is
based on deviations from the mean, these two statistics are used together to give meaning to test scores.
SD = ∑√( X – M)2
N
Procedure:
3. Place Column X – M (deviations); get the values by subtracting the mean from each of the scores.
When the scores are less than the mean, the negative sign precedes the difference between the raw
score and the mean.
5. Find the sum of the squared deviation and divide it by the number of cases.
Page 94
Example: Given this set of scores: 43, 41, 40, 38, 37, 33, 31, 29, 26, 24, 22
X (X – X) (X-X)2
43 7 49
41 5 25
40 4 16
38 2 4
37 1 1
33 -3 9
30 -6 36
29 -7 49
24 - 12 144
24 - 12 144
21 - 15 225
∑X = 360 702
N = 10
X = 36
SD = √702/36
SD = 19.5 = 4.415
The formula for standard deviation using the short method is:
∑fd ∑fd
SD = I √ ____ - ___
N N
I is class interval
∑fd is the sum of the products of the frequencies by the deviations of the score from the mean,
squared.
∑fd is the sum of the products of the frequencies by the deviations of the score from the mean.
Example:
X F d fd fd2
Page 94
90 – 94 1 4 4 16
85 – 89 2 3 6 18
80 – 84 7 2 14 28
75 – 79 9 1 9 +33 9
70 – 74 11 0 0 0
65 – 69 8 -1 -8 8
60 – 64 5 -2 -10 20
55 – 59 5 -3 -15 45
50 – 54 1 -4 -4 16
45 – 49 1 -5 -5 -42 25
N = 50 ∑fd = - 9 ∑fd 2 = 185
185 -9
SD = 5 √ ----- - -----
50 50
SD = 5 √3.7 – (-0.18)2
SD = 5 √ 3.7 - 0.0324
SD = 5 √ 3.8876
SD = 5 x 1.9150
SD = 9.575
The mean deviation is not very much used in statistical work. Nevertheless, there are times when
it becomes necessary to compute the mean or average deviation. The mean deviation is the square root
of the absolute values of the difference between the mean and the raw scores.
Example:
X /X – M/
43 7
41 5
40 4
38 2
37 1
33 -3
30 -6
29 -7
24 - 12
24 - 12
21 - 15
∑X = 360 ∑= 74
Page 94
N = 10
X = 36
AD = 74/10 = 7.4
When using the statistics of percentiles, deciles, quartiles, or the median which are based on the
order of the scores, the standard deviation cannot be used as a measure of variability, since the
deviations used in calculation of the standard deviation are based on the mean. The variability of a
distribution of scores can be used by using the two points, Q3 and Q1. A measure of the variability of the
middle 50 percent of the scores is considered to be a good estimate, because extreme scores or erratic
spacing between scores in the upper 25 percent and lower 25 percent are excluded in the computation.
This is the quartile deviation. This is the value that is equal to half the distance from Q1 to Q3.
(Q3 – Q1)
Q = ______
2
X F CM
90 – 94 1 50
85 – 89 2 49
80 – 84 7 47
75 – 79 9 40
70 – 74 11 31
65 – 69 8 20
60 – 64 5 12
55 – 59 5 7
50 – 54 1 2
45 – 49 1 1
N = 50
(12.5 – 12)
Q1 = 64.5 + _______ x 5
8
(0.5)
Q1 = 64.5 + ___ x 5
8
Q1 = 64.5 + (0.06) x 5
Q1 = 64.5 + .30
Page 94
Q1 = 64.80
Third Quartile
Formula:
(3N/4- F)
Q3 = LL + ______ I
f
Q3 = 74.5 + (6.5/9) x 5
Q3 = 74.5 + (.72) x 5
Q3 = 74.5 + 3.6
Interquartile Range
The interquartile range (IQR) is the range of the middle 50% of the scores in a distribution. It is computed
as follows:
The 75th percentile was called the upper hinge and the 25th percentile was called the lower hinge. Using
this terminology, the interquartile range is referred to as the H-spread.
A related measure of variability is called the semi-interquartile range. The semi-interquartile range is
defined simply as the interquartile range divided by 2. If a distribution is symmetric, the median plus or
minus the semi-interquartile range contains half the scores in the distribution.
The range is the quick measure of variability although it is the crudest measure. When the
median is used as the measure of central tendency, the quartile deviation is used as the measure of
variability in test interpretation. The quartile deviation, like the median, is unaffected by a few extreme
scores in a distribution. The most used measure of variability is the standard deviation, since it is the
most stable and varies less from one sample to another than other measures.
Characteristics/Properties of Distributions
Page 94
To describe a frequency distribution by reporting its characteristics, a teacher will need to give at
least one measure of central tendency and at least one measure of variability. In addition to these two
values, further description requires information about the skewness and kurtosis of the distribution.
Skewness is the degree of symmetry of the scores. Kurtosis is the degree of peakedness or flatness of
the distribution curve.
Skewness refers to the degree of symmetry attached to the occurrence of the scores along the
score interval. When the scores tend to center around one point with those on both sides of that point
balancing each other, the distribution is said to have no skewness. If there are some scores in the
distribution that are so atypical of the group that the distribution becomes asymmetrical, then that
distribution is said to be skewed. If the atypical scores are above the measure of central tendency (in the
positive direction), the distribution is said to be positively skewed. Likewise, if the atypical scores are
below the measure of central tendency (in the negative direction), the distribution is said to be negatively
skewed.
Sk = 3 (M – Md)
SD
Distributions also differ from each other in terms of how large or "fat" their tails are. Figure 11 shows two
distributions that differ in this respect. The upper distribution has relatively more scores in its tails; its
shape is called leptokurtic. The lower distribution has relatively fewer scores in its tails; its shape is called
platykurtic .
Page 94
The characteristic of kurtosis is very closely related to the characteristic of variability. It can give
an indication of the degree of homogeneity of the group being tested in regard to the characteristic being
measured. If students tend to be much alike, the scores will generate a leptokurtic frequency polygon; if
students are very different, a platykurtic distribution is generated. A mesokurtic distribution is neither
platykurtic nor leptokurtic.
The kurtosis for the normal distribution is approximately 0.263. Hence if the Ku is greater than 0.263 , the
distribution is most likely platykurtic; while if the Ku is less than 0.263, the distribution is most likely
leptokurtic (Garett, 1973).
K= Q
(P90 – P10)
The figures below show distributions differing in kurtosis. The top distribution has long tails. It is called
"leptokurtic." The bottom distribution has short tails. It is called "platykurtic."
STANDARD SCORES
A standard score is one of many derived scores used in testing today. Derived scores are
valuable to the classroom teacher. Since scores differ from different tests, the teacher can make them
comparable by expressing them in the same scale. For norm-referenced tests, it is meaningful to
interpret classroom test scores by locating a student’s score with reference to the average for the class
and to describe the distance between the score and the average in terms of the spread of the scores in
the distribution.
Tristan’s raw score on an English achievement test was 50. In the same class of students Tristan
scored 70 on the Mathematics achievement test. To compare the raw score on one test with a raw
score on another test to obtain a total or average score is meaningless. The units are not comparable
because the tests may have different possible total scores, means, and standard deviations. By
converting raw scores on both tests to standard scores, the units become comparable, and can be
Page 94
interpreted properly.
Using the deviation of a score from the mean (X – X) and the standard deviation (SD), a teacher
can build what is called a z-score.
z =X-X
SD
Z = a standard score
X = any raw score
X = the mean
SD = the standard deviation
For example, the means and standard deviation for Tristan’s two test scores are as follows:
Mathematics test 70 75 7
Comparison can be made between the two scores because the scores were earned in the same group of
students. Substituting the formula:
Z = 50 - 45 Z = 70 - 75
5.6 7
The two scores of Tristan can now be compared. Even if he got a higher score in Mathematics than in
English, he still did well in English as shown by the higher value of the standard score in that subject.
EXERCISE 4
1. Find the standard deviation and average deviation of the following set of scores:
32 39 40 25 29 35 39 28 41 29 37 30
27 32 29 26
2. Find the standard deviation, quartile deviation, skewness and kurtosis. Illustrate your answer.
ci f
54 – 56 3
51 – 53 3
48 – 50 1
45 – 47 5
42 – 44 6
39 – 41 9
36 – 38 5
33 – 35 7
Page 94
30 – 32 4
27 – 29 4
24 – 26 2
3. Vinn’s score in the midterm test in Statistics was 48 and 56 in the final test. The mean of the first test is 42 and the
standard deviation is 5. In the second test the mean is 60 and the standard deviation is 6. In which test did Vinn do
better?
Above average 15 11 10 13
Average 20 28 18 21
Below Average 6 10 12 16
Objectives: At the end of the lessons, the students shall be able to:
1. define correlation;
Page 94
What is Correlation
The correlation is a way to measure how associated or related two variables are. The researcher
looks at things that already exist and determines if and in what way those things are related to each other.
The purpose of doing correlations is to allow us to make a prediction about one variable based on what
we know about another variable.
Correlation is a measure of the relation between two or more variables. The measurement scales
used should be at least interval scales, but other correlation coefficients are available to handle other
types of data. Correlation coefficients can range from -1.00 to +1.00. The value of -1.00 represents a
perfect negative correlation while a value of +1.00 represents a perfect positive correlation. A value of
0.00 represents a lack of correlation
For example, there is a correlation between income and education. We find that people with
higher income have more years of education. (You can also phrase it that people with more years of
education have higher income.) When we know there is a correlation between two variables, we can
make a prediction. If we know a group’s income, we can predict their years of education.
Positive correlation
In a positive correlation, as the values of one of the variables increase, the values of the second
variable also increase. Likewise, as the value of one of the variables decreases, the value of the other
variable also decreases. The example above of income and education is a positive correlation. People
with higher incomes also tend to have more years of education. People with fewer years of education
tend to have lower income.
1. SAT scores and college achievement—among college students, those with higher SAT scores also
have higher grades
2. Happiness and helpfulness—as people’s happiness level increases, so does their helpfulness
(conversely, as people’s happiness level decreases, so does their helpfulness)
This table shows some sample data. Each person reported income and years of education.
Years of
Participant Income
Education
Page 94
#1 125,000 19
#2 100,000 20
#3 40,000 16
#4 35,000 16
#5 41,000 18
#6 29,000 12
#7 35,000 14
#8 24,000 12
#9 50,000 16
#10 60,000 17
We can make a graph, which is called a scatterplot. On the scatterplot below, each point represents one
person’s answers to questions about income and education. The line is the best fit to those points. All
positive correlations have a scatterplot that looks like this. The line will always go in that direction if the
correlation is positive
Negative correlation
Page 94
In a negative correlation, as the values of one of the variables increase, the values of the second variable
decrease. Likewise, as the value of one of the variables decreases, the value of the other variable
increases.
This is still a correlation. It is like an “inverse” correlation. The word “negative” is a label that shows the
direction of the correlation.
There is a negative correlation between TV viewing and class grades—students who spend more time
watching TV tend to have lower grades (or phrased as students with higher grades tend to spend less
time watching TV).
1. Education and years in jail—people who have more years of education tend to have fewer years in jail
(or phrased as people with more years in jail tend to have fewer years of education)
2. Crying and being held—among babies, those who are held more tend to cry less (or phrased as babies
who are held less tend to cry more)
We can also plot the grades and TV viewing data, shown in the table below. The scatterplot below shows
the sample data from the table. The line on the scatterplot shows what a negative correlation looks like.
Any negative correlation will have a line with that direction.
TV in hours per
Participant GPA
week
#1 3.1 14
#2 2.4 10
#3 2.0 20
#4 3.8 7
#5 2.2 25
#6 3.4 9
#7 2.9 15
#8 3.2 13
Page 94
#9 3.7 4
#10 3.5 21
Strength
Correlations, whether positive or negative, range in their strength from weak to strong.
Positive correlations will be reported as a number between 0 and 1. A score of 0 means that there
is no correlation (the weakest measure). A score of 1 is a perfect positive correlation, which does not
really happen in the “real world.” As the correlation score gets closer to 1, it is getting stronger. So, a
correlation of .8 is stronger than .6; but .6 is stronger than .3.
The correlation of the sample data above (income and years of education) is .79.
Negative correlations will be reported as a number between 0 and -1. Again, a 0 means no
correlation at all. A score of –1 is a perfect negative correlation, which does not really happen. As the
correlation score gets close to -1, it is getting stronger. So, a correlation of -.7 is stronger than -.5; but -.5
is stronger than -.2.
Remember that the negative sign does not indicate anything about strength. It is a symbol to tell
you that the correlation is negative in direction. When judging the strength of a correlation, just look at the
number and ignore the sign.
Page 94
The correlation of the sample data above (TV viewing and GPA) is -.63.
Imagine reading four correlational studies with the following scores. You want to decide which study had
the strongest results:
In this example, -.8 is the strongest correlation. The negative sign means that its direction is negative.
Advantage
An advantage of the correlation method is that we can make predictions about things when we
know about correlations. If two variables are correlated, we can predict one based on the other. For
example, we know that SAT scores and college achievement are positively correlated. So when college
admission officials want to predict who is likely to succeed at their schools, they will choose students with
high SAT scores.
We know that years of education and years of jail time are negatively correlated. Prison officials
can predict that people who have spent more years in jail will need remedial education, not college
classes.
Disadvantage
The problem that most students have with the correlation method is remembering that correlation does
not measure cause. Take a minute and chant to yourself: Correlation is not Causation! Correlation is not
Causation!
We know that education and income are positively correlated. We do not know if one caused the
other. It might be that having more education causes a person to earn a higher income. It might be that
having a higher income allows a person to go to school more. It might also be some third variable.
A correlation tells us that the two variables are related, but we cannot say anything about whether
one caused the other. This method does not allow us to come to any conclusions about cause and effect.
Reminders:
Anybody who wants to interpret the result of the coefficient of correlation should be guided by the
following:
1. The relationship of two variables dies not necessarily mean that one is the cause or the effect of the
other variable. It does not imply cause-effect relationship.
2. When the computed r is high, it does not necessarily mean that one factor is strongly dependent on
the other. This shown by height and intelligence of people. Making a correlation here does not make any
sense at all.
On the other hand, when the computed r is small it does not necessarily mean that one factor has
no dependence on the other factor. This may be applicable to I. Q. and grades in school. A low grade
Page 94
would suggest that a student did not make use of his time in studying.
3. If there is a reason to believe that the two variables are related and the computed r is high, these two
variables are really meant as associated.
On the other hand, if the variables correlated are low (though theoretically related) other factors
might be responsible for such small association.
4. Lastly, the meaning of correlation coefficient just simply informs us that when two variables change
there may be a strong or weak relationship taking place.
Measures of Correlation
Correlation Coefficient, r :
The quantity r, called the linear correlation coefficient, measures the strength and the direction of
a linear relationship between two variables. The linear correlation
coefficient is sometimes referred to as the Pearson product moment correlation coefficient in honor of its
developer Karl Pearson.
The value of r is such that -1 < r < +1. The + and – signs are used for positive
linear correlations and negative linear correlations, respectively.
Positive correlation: If x and y have a strong positive linear correlation, r is close to +1. An r
value of exactly +1 indicates a perfect positive fit. Positive values indicate a relationship between x and
y variables such that as values for x increases, values for y also increase.
Negative correlation: If x and y have a strong negative linear correlation, r is close to -1. An r
value of exactly -1 indicates a perfect negative fit. Negative values
indicate a relationship between x and y such that as values for x increase, values
for y decrease.
No correlation: If there is no linear correlation or a weak linear correlation, r is
close to 0. A value near zero means that there is a random, nonlinear relationship
between the two variables
Note that r is a dimensionless quantity; that is, it does not depend on the units
employed.
A perfect correlation of ± 1 occurs only when the data points all lie exactly on a
straight line. If r = +1, the slope of this line is positive. If r = -1, the slope of thisline is negative.
Page 94
A correlation greater than 0.8 is generally described as strong, whereas a correlation less than
0.5 is generally described as weak. These values can vary based upon the "type" of data being
examined. A study utilizing scientific data may require a stronger correlation than a study using social
science data.
Interpreting Pearson's r
There are other, more sound ways of judging the meaningfulness of a correlation
o Hypothesis testing
The coefficient of determination, r 2, is useful because it gives the proportion of the variance
(fluctuation) of one variable that is predictable from the other variable. It is a measure that allows us to
determine how certain one can be in making predictions from a certain model/graph.
The coefficient of determination is the ratio of the explained variation to the total
variation.
The coefficient of determination is such that 0 < r 2 < 1, and denotes the strength
of the linear association between x and y.
The coefficient of determination represents the percent of the data that is the closest to the line of
best fit. For example, if r = 0.922, then r 2 = 0.850, which means that 85% of the total variation in y
can be explained by the linear relationship between x and y (as described by the regression
equation). The other 15% of the total variation in y remains unexplained.
Page 94
r = 1 - (6 d2 ) / n(n2 - 1)
A 9 28.4
B 15 29.3
C 24 37.6
D 30 36.2
E 38 36.5
F 46 35.3
G 53 36.2
H 60 44.1
Page 94
I 64 44.8
J 76 47.2
These figures form the basis for the scatter diagram, below, which shows a reasonably strong positive
correlation - the older the car, the longer the stopping distance.
Graph 1: Car age and Stopping distance (data from Table 1 above)
To process this information we must, firstly, place the ten pieces of data into order, or rank them
according to their age and ability to stop. It is then possible to process these ranks.
A 9 28.4 1 1
B 15 29.3 2 2
C 24 37.6 3 7
D 30 36.2 4 4.5
E 38 36.5 5 6
F 46 35.3 6 3
Page 94
G 53 36.2 7 4.5
H 60 44.1 8 8
I 64 44.8 9 9
J 76 47.2 10 10
Notice that the ranking is done here in such a way that the youngest car and the best stopping
performance are rated top and vice versa. There is no strict rule here other than the need to be consistent
in your rankings. Notice also that there were two values the same in terms of the stopping performance of
the cars tested. They occupy 'tied ranks' and must share, in this case, ranks 4 and 5. This means they are
each ranked as 4.5, which is the mean average of the two ranking places. It is important to remember that
this works despite the number of items sharing tied ranks. For instance, if five items shared ranks 5, 6, 7,
8 and 9, then they would each be ranked 7 - the mean of the tied ranks.
Now we can start to process these ranks to produce the following table:
A 9 28.4 1 1 0 0
B 15 29.3 2 2 0 0
C 24 37.6 3 7 4 16
E 38 36.5 5 6 1 1
Page 94
F 46 35.3 6 3 -3 9
G 53 36.2 7 4.5 -2.5 6.25
H 60 44.1 8 8 0 0
I 64 44.8 9 9 0 0
J 76 47.2 10 10 0 0
d 32.5
2
Note that the two extra columns introduced into the new table are Column 6, 'd', the difference between
stopping distance rank and age rank; and Column 7, 'd 2', Column 6 entries squared. These squared
figures are summed at the foot of Column 7.
r = 1 - (6 d2 ) / n(n2 - 1)
What does this tell us? When interpreting the Spearman Rank Correlation Coefficient, it is usually enough
to say that:
This is the case whether r is positive or negative. In our case of car ages and stopping distance
performance, we can say that there is a strong correlation between the two variables.
Gamma
An alternative to rank-order correlation is the Goodman’s and Kruskal’s gamma (G). The value of
one variable can be estimated or predicted from the other variable when you have the knowledge of their
values. The gamma can also be used when ties are found in the ranking of the data.
Page 94
Where:
G = the difference between the proportion of pairs ordered in the parallel direction and the
proportion of pairs ordered in the opposite direction.
1. Arrange the ordering for one of the two characteristics from the highest to the lowest or
vice versa from top to bottom through the rows and for the other characteristic from the
highest to the lowest or vice versa from left to right through the column.
2. Compute Ns by multiplying the frequency in every cell by the series of the frequencies in
all of the other cells which are both to the right of the original cell below it and then sum
up the products obtained.
3. To solve N1, you simply reverse partially the process described in Step 2. You multiply
the frequency of every cell by the sum of the frequencies in all the cells to the left of the
original cell below it and then sum up the products obtained.
Example: Compute the gamma for the data shown in the following table:
Educational Status
Upper 24 19 5 48
Middle 12 54 29 95
Lower 9 26 25 60
Total 45 99 59 203
Solution:
= 2405
G= Ns – N1
Ns + N1
= 6204 – 2405
6204 + 2405
= 3799/8609 = .44
A gamma coefficient of +.44 indicates a moderately small positive correlation between socio-
economic status and educational qualification. The results suggests a correlation based on a dominance
of a parallel direction of the two variables. This means that there is 44 percent greater chance for a
parallel direction than opposite direction for the variables of socio-economic status and educational
status. If the gamma coefficient is -.44 it will indicate instead a moderately small negative correlation
based in a dominance of opposite direction.
Lambda
The lambda coefficient is represented by the lower case Greek letterλ; which is also known as
Guttman’s Coefficient of probability. This is defined as the proportionate reduction in error measure which
shows the index of how much an error is reduced in prediction values of one variable from values of
another. It is also the other way of measuring to what degree the accuracy of the prediction can be
improved. If you have a lambda of .80, you have minimized the error of your prediction about the values
of the dependent variable by 80 percent; if your lambda is .30, you have minimized the error of your
prediction by only 30 percent. The lambda coefficient is a measure of association for comparing several
groups or categories at the nominal level.
Formula:
λc = Fbi – Mbc
Where: λ = N
the–lambda
Mbc coefficient
Fbi = the biggest cell frequencies in the ith row (with the sum taken over all the
rows)
Mbc = the biggest of the column totals
Page 94
Example: Compute the λc and λr for the data on the following table:
Political Party
Catholic 49 25 18 92
Protestant 26 25 20 71
Solution:
λc = EFbi - Mbc
N – Mbc
= (49 + 72 + 26 ) - 122
290 – 122
= 147 – 122
181
λr = Efbj - Mbr
N – Mbr
= (49 + 72 + 21 ) – 127
290 – 127
= 142 – 127
Page 94
This procedure is used when you are interested in getting the degree of relationship between two
variables where one variable is continuous such as test scores and the other is dichotomous nominal
variable such as gender. Your question in mind perhaps is, “Is gender related to intelligence?” In this
case the most appropriate statistical technique is the point biserial correlation, rpbi.
Formula:
Example:
IQ Scores
Y Y2 Fp Fw F FY FY2 FpY
12 144 3 1 4 48 576 36
11 121 5 1 6 66 726 55
10 100 7 0 7 70 700 70
9 81 8 0 8 72 648 72
8 64 9 3 12 96 768 72
7 49 7 5 12 84 588
49 6 36 6 7 3 78
468 36 5 25 3 2 5
25 125 15 4 16 2 8
10 40 160 8 3 9 1
9 10 30 90 3 2 4
0 5 5 10 20 0
51 41 92 619 4869 416
Solution:
= 38272 – 31569
2091 (447948 – 383161)
Phi Coefficient
If both variables instead are nominal and dichotomous, the Pearson simplifies even further. First,
perhaps, we need to introduce contingency tables. A contingency table is a two dimensional table
containing frequencies by category. For this situation it will be two by two since each variable can only
take on two values, but each dimension will exceed two when the associated variable is not dichotomous.
In addition, column and row headings and totals are frequently appended so that the contingency table
ends up being n + 2 by m + 2, where n and m are the number of values each variable can take on. The
label and total row and column typically are outside the gridded portion of the table, however.
As an example, consider the following data organized by gender and employee classification (faculty/staff).
Staff 10 5 15
Faculty 5 10 15
Totals: 15 15 30
Contingency tables are often coded as below to simplify calculation of the Phi coefficient.
Y\X 0 1 Totals
1 A B A+B
0 C D C+D
Totals: A + C B+D N
For this example we obtain: phi = (25-100)/√(15•15•15•15) = -75/225 = -0.33, indicating a slight
correlation. Please note that this is the Pearson correlation coefficient, just calculated in a simplified
manner. However, the extreme values of |r| = 1 can only be realized when the two row totals are equal
and the two column totals are equal. There are thus ways of computing the maximal values, if desired.
Page 94
Formula: W= 12∑D2
m2 (n) (n2 – 1)
Example: Do the supervisors differ in their rankings of the employees’ individual project studies?
1 2 3 4
A 2 3 3 2 10 -12 144
B 9 10 10 10 39 17 289
C 1 2 1 3 7 -15 225
D 3 1 2 1 7 -15 225
E 4 5 4 5 18 -4 16
F 7 7 6 6 26 4 16
G 5 6 5 4 20 -2 4
H 6 8 8 8 30 8 64
I 8 9 7 7 31 9 81
J 10 4 9 9 32 10 100
Computation:
W = 12∑D2
m2 (n) (n2 – 1)
= 12 (1164)
Page 94
42 (10) (102 – 1)
= 13968
16(10) (100 – 1) = 13968 / 15840 = 0.88
This implies that there is high degree of agreement among the supervisors in their ratings of the
employees’ projects.
EXERCISES:
1. Find the coefficient of correlation of the following midterm (x) and final (y) grades:
X 75 70 65 90 85 85 80 70 65 90
Y 80 75 65 95 90 85 90 75 70 90
1 19 25
2 18 27
3 15 29
4 12 25
5 11 21
6 9 19
7 7 14
8 5 13
9 6
12
3. Compute the gamma coefficient to determine the degree of association between socio-economic
status and use of local goods.
Socio-Economic Status
High Medium Low
Use of f f f
Local Goods
Very Good 32 16
18 Good 22 18
24 Poor 12
12 36
Classroom Placement
Special Class Tutor Regular Class
Mentally
Retarded 40 25 15
Learning
Disabled 20 40
35 No Label 10
25 50
Page 94
Total 31 47 78
Y Fp Fw
15 5 3
14 7 3
13 9 5
12 10 8
11 11 5
10 9 6
9 8
9 8 6
10 7
5 6 6
4 5
Page 94
MODULE 6 – INFERENTIAL STATISTICS
Objectives: At the end of the lessons, the students shall be able to:
1. define inferential statistics, hypothesis, hypothesis testing and other terms relative to
doing experimental researches.
2. discuss the importance of inferential statistics and hypothesis testing in doing
experimental researches;
3. differentiate directional from non-directional tests, null hypothesis from alternative
hypothesis;
4. explain the uses of inferential statistics;
5. List and explain the steps in hypothesis testing;
6. Compute Z-test, T-tests, F-test and Chi-Square test of given problems;
7.
Inferential Statistics
Inferential statistics deals with the analysis and interpretation of data. This statistics consists of
different statistical tools/tests used in the analysis of interval, ratio, nominal and ordinal data. These tests
are used in making inferences from or conclusions on larger groups, populations, or generalizations about
them on the basis of the information obtained by the study of one or more samples. The extent to which
the use of these statistics can be done with accuracy depends on the goodness of samples. The
sampling techniques / procedures are also of great important with regard to the use of these different
statistical tests.
Making Predictions Using Inferential Statistics
Inferential statistics are used to draw conclusions and make predictions based on the descriptions of data.
In this section, we explore inferential statistics by using an extended example of experimental studies.
Key concepts used in our discussion are probability, populations, and sampling.
Page 94
Experiments
A typical experimental study involves collecting data on the behaviors, attitudes, or actions of two or more
groups and attempting to answer a research question (often called a hypothesis). Based on the analysis
of the data, a researcher might then attempt to develop a causal model that can be populations.
A question that might be addressed through experimental research might be "Does grammar-based
writing instruction produce better writers than process-based writing instruction?" Because it would be
impossible and impractical to observe, interview, survey, etc. all first-year writing students and instructors
in classes using one or the other of these instructional approaches, a researcher would study a sample –
or a subset – of a population. Sampling – or the creation of this subset of a population – is used by many
researchers who desire to make sense of some phenomenon.
To analyze differences in the ability of student writers who are taught in each type of classroom, the
researcher would compare the writing performance of the two groups of students. Two key concepts used
to conduct the comparison are:
Dependent Variables
Independent Variables
Dependent Variables
In an experimental study, a variable whose score depends on (or is determined or caused by) another
variable is called a dependent variable. For instance, an experiment might explore the extent to which the
writing quality of final drafts of student papers is affected by the kind of instruction they received. In this
case, the dependent variable would be writing quality of final drafts.
Independent Variables
In an experimental study, a variable that determines (or causes) the score of a dependent variable is
called an independent variable. For instance, an experiment might explore the extent to which the writing
quality of final drafts of student papers is affected by the kind of instruction they received. In this case, the
independent variable would be the kind of instruction students received.
Probability
Beginning researchers most often use the word probability to express a subjective judgment about the
likelihood, or degree of certainty, that a particular event will occur. People say such things as: "It will
probably rain tomorrow." "It is unlikely that we will win the ball game." It is possible to assign a number to
the event being predicted, a number between 0 and 1, which represents degree of confidence that the
event will occur. For example, a student might say that the likelihood an instructor will give an exam next
week is about 90 percent, or .9. Where 100 percent, or 1.00, represents certainty, .9 would mean the
student is almost certain the instructor will give an exam. If the student assigned the number .6, the
likelihood of an exam would be just slightly greater than the likelihood of no exam. A rating of 0 would
indicate complete certainty that no exam would be given (Shoeninger, 1971).
The probability of a particular outcome or set of outcomes is called a p-value. In our discussion, a p-value
will be symbolized by a p followed by parentheses enclosing a symbol of the outcome or set of outcomes.
For example, p(X) should be read, "the probability of a given X score" (Shoeninger). Thus p(exam) should
be read, "the probability an instructor will give an exam next week."
Page 94
Population
A population is a group which is studied. In educational research, the population is usually a group of
people. Researchers seldom are able to study every member of a population. Usually, they instead study
a representative sample – or subset – of a population. Researchers then generalize their findings about
the sample to the population as a whole.
Sampling
Sampling is performed so that a population under study can be reduced to a manageable size. This can
be accomplished via random sampling, discussed below, or via matching.
Random sampling is a procedure used by researchers in which all samples of a particular size have an
equal chance to be chosen for an observation, experiment, etc (Runyon and Haber, 1976). There is no
predetermination as to which members are chosen for the sample. This type of sampling is done in order
to minimize scientific biases and offers the greatest likelihood that a sample will indeed be representative
of the larger population. The aim here is to make the sample as representative of the population as
possible. Note that the closer a sample distribution approximates the population distribution, the more
generalizable the results of the sample study are to the population. Notions of probability apply here.
Random sampling provides the greatest probability that the distribution of scores in a sample will closely
approximate the distribution of scores in the overall population.
Matching
Matching is a method used by researchers to gain accurate and precise results of a study so that they
may be applicable to a larger population. After a population has been examined and a sample has been
chosen, a researcher must then consider variables, or extrinsic factors, that might affect the study.
Matching methods apply when researchers are aware of extrinsic variables before conducting a study.
Two methods used to match groups are:
Precision Matching
Frequency Distribution
Although, in theory, matching tends to produce valid conclusions, a rather obvious difficulty arises in
finding subjects which are compatible. Researchers may even believe that experimental and control
groups are identical when, in fact, a number of variables have been overlooked. For these reasons,
researchers tend to reject matching methods in favor of random sampling.
Methods
Statistics can be used to analyze individual variables, relationships among variables, and differences
between groups. In this section, we explore a range of statistical methods for conducting these analyses.
Statistics can be used to analyze individual variables, relationships among variables, and differences
between groups.
Hypothesis Testing
Meaning of Hypothesis
Null Hypothesis
The null hypothesis, H0, represents a theory that has been put forward, either because it is believed to be
true or because it is to be used as a basis for argument, but has not been proved. For example, in a
clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than
the current drug. We would write
We give special consideration to the null hypothesis. This is due to the fact that the null hypothesis relates
to the statement being tested, whereas the alternative hypothesis relates to the statement to be accepted
if / when the null is rejected.
The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We
either "Reject H0 in favour of H1" or "Do not reject H0"; we never conclude "Reject H1", or even "Accept
H1".
If we conclude "Do not reject H0", this does not necessarily mean that the null hypothesis is true, it only
suggests that there is not sufficient evidence against H0 in favour of H1. Rejecting the null hypothesis
then, suggests that the alternative hypothesis may be true.
Alternative Hypothesis
The alternative hypothesis, H1, is a statement of what a statistical hypothesis test is set up to establish.
For example, in a clinical trial of a new drug, the alternative hypothesis might be that the new drug has a
different effect, on average, compared to that of the current drug. We would write
The alternative hypothesis might also be that the new drug is better, on average, than the current drug. In
this case we would write
H1: the new drug is better than the current drug, on average.
The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We
either "Reject H0 in favour of H1" or "Do not reject H0". We never conclude "Reject H1", or even "Accept
H1".
If we conclude "Do not reject H0", this does not necessarily mean that the null hypothesis is true, it only
suggests that there is not sufficient evidence against H0 in favour of H1. Rejecting the null hypothesis
then, suggests that the alternative hypothesis may be true.Setting up and testing hypotheses is an
essential part of statistical inference. In order to formulate such a test, usually some theory has been put
Page 94
forward, either because it is believed to be true or because it is to be used as a basis for argument, but
has not been proved, for example, claiming that a new drug is better than the current drug for treatment of
the same symptoms.
In each problem considered, the question of interest is simplified into two competing claims / hypotheses
between which we have a choice; the null hypothesis, denoted H0, against the alternative hypothesis,
denoted H1. These two competing claims / hypotheses are not however treated on an equal basis:
special consideration is given to the null hypothesis.
There are different ways of stating a hypothesis. Let us consider an experiment involving two groups, an
experimental group, and a control group. The experimenter likes to test whether the treatment (values
clarification lessons) will improve the self-concept of the experimental group. The same treatment is not
given to the control group. It is presumed that any difference between the two groups after the treatment
can be attributed to the experimental treatment with a certain degree of confidence.
Ho: There will be no significant difference in self-concept between the group that will be exposed
to values clarification lessons and the group which will not be exposed to the same.
H1: The self-concept of the group that will be exposed to values clarification lessons will differ
from that of the control group.
Ho: There will be no significant effect of the values clarification lessons on the self-concept of the
students.
H1: Values clarification lessons will have a significant effect on the self-concept of the students.
Ho: The self-concept of the students will not relate to the values clarification lessons conducted
on them.
H1: The self-concept of the students will not be related to the values clarification lessons
existence of difference between the two means. In this case, when the direction of the nature of the
difference is not stated, the test is considered ad non-directional. The non-directional tests makes use of
two tails or two-sides of the statistical model or distribution.
If the direction of the difference is stated, that is, the self-concept of one group I more positive
than that of the other group, the test becomes directional. Accordingly, the hypothesis is in its alternative
form. To wit, H1: u1 >u2 or u1 < u2. The former uses only the positive end of the distribution while the latter
uses the negative end in the rejection of the Ho. When comparing your statistical results with the
distributions in specified tables, be sure to note whether you used a one-tailed test or a two-tailed test.
One-sided Test
A one-sided test is a statistical hypothesis test in which the values for which we can reject the null
hypothesis, H0 are located entirely in one tail of the probability distribution.
In other words, the critical region for a one-sided test is the set of values less than the critical value of the
test, or the set of values greater than the critical value of the test.
The choice between a one-sided and a two-sided test is determined by the purpose of the investigation or
prior reasons for using a one-sided test.
Example
Suppose we wanted to test a manufacturers claim that there are, on average, 50 matches in a box. We
could set up the following hypotheses
H0: µ = 50,
against
Either of these two alternative hypotheses would lead to a one-sided test. Presumably, we would want to
test the null hypothesis against the first alternative hypothesis since it would be useful to know if there is
likely to be less than 50 matches, on average, in a box (no one would complain if they get the correct
number of matches in a box or more).
Yet another alternative hypothesis could be tested against the same null, leading this time to a two-sided
test:
H0: µ = 50,
against
Here, nothing specific can be said about the average number of matches in a box; only that, if we could
reject the null hypothesis in our test, we would know that the average number of matches in a box is likely
to be less than or greater than 50.
Page 94
Two-Sided Test
A two-sided test is a statistical hypothesis test in which the values for which we can reject the null
hypothesis, H0 are located in both tails of the probability distribution.
In other words, the critical region for a two-sided test is the set of values less than a first critical value of
the test and the set of values greater than a second critical value of the test.
The choice between a one-sided test and a two-sided test is determined by the purpose of the
investigation or prior reasons for using a one-sided test.
Example
Suppose we wanted to test a manufacturers claim that there are, on average, 50 matches in a box. We
could set up the following hypotheses
H0: µ = 50,
against
Either of these two alternative hypotheses would lead to a one-sided test. Presumably, we would want to
test the null hypothesis against the first alternative hypothesis since it would be useful to know if there is
likely to be less than 50 matches, on average, in a box (no one would complain if they get the correct
number of matches in a box or more).
Yet another alternative hypothesis could be tested against the same null, leading this time to a two-sided
test:
H0: µ = 50,
against
Here, nothing specific can be said about the average number of matches in a box; only that, if we could
reject the null hypothesis in our test, we would know that the average number of matches in a box is likely
to be less than or greater than 50.
Type I Error
In a hypothesis test, a type I error occurs when the null hypothesis is rejected when it is in fact true; that
is, H0 is wrongly rejected.
For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better,
on average, than the current drug; i.e.
The following table gives a summary of possible results of any hypothesis test:
Decision
A type I error is often considered to be more serious, and therefore more important to avoid, than a type II
error. The hypothesis test procedure is therefore adjusted so that there is a guaranteed 'low' probability of
rejecting the null hypothesis wrongly; this probability is never 0. This probability of a type I error can be
precisely computed as
If we do not reject the null hypothesis, it may still be false (a type II error) as the sample may not be big
enough to identify the falseness of the null hypothesis (especially if the truth is very close to hypothesis).
For any given set of data, type I and type II errors are inversely related; the smaller the risk of one, the
higher the risk of the other.
Type II Error
In a hypothesis test, a type II error occurs when the null hypothesis H0, is not rejected when it is in fact
false. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no
better, on average, than the current drug; i.e.
A type II error would occur if it was concluded that the two drugs produced the same effect, i.e. there is no
difference between the two drugs on average, when in fact they produced different ones.
The probability of a type II error is generally unknown, but is symbolised by and written
P(type II error) =
Page 94
Test Statistic
A test statistic is a quantity calculated from our sample of data. Its value is used to decide whether or not
the null hypothesis should be rejected in our hypothesis test.
The choice of a test statistic will depend on the assumed probability model and the hypotheses under
question.
Critical Value(s)
The critical value(s) for a hypothesis test is a threshold to which the value of the test statistic in a sample
is compared to determine whether or not the null hypothesis is rejected.
The critical value for any hypothesis test depends on the significance level at which the test is carried out,
and whether the test is one-sided or two-sided.
Critical Region
The critical region CR, or rejection region RR, is a set of values of the test statistic for which the null
hypothesis is rejected in a hypothesis test. That is, the sample space for the test statistic is partitioned
into two regions; one region (the critical region) will lead us to reject the null hypothesis H0, the other will
not. So, if the observed value of the test statistic is a member of the critical region, we conclude "Reject
H0"; if it is not a member of the critical region then we conclude "Do not reject H0".
Significance Level
The significance level of a statistical hypothesis test is a fixed probability of wrongly rejecting the null
hypothesis H0, if it is in fact true.
It is the probability of a type I error and is set by the investigator in relation to the consequences of such
an error. That is, we want to make the significance level as small as possible in order to protect the null
hypothesis and to prevent, as far as possible, the investigator from inadvertently making false claims.
P-Value
The probability value (p-value) of a statistical hypothesis test is the probability of getting a value of the test
statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis H0, is
true.
It is equal to the significance level of the test for which we would only just reject the null hypothesis. The
p-value is compared with the actual significance level of our test and, if it is smaller, the result is
significant. That is, if the null hypothesis were to be rejected at the 5% significance level, this would be
reported as "p < 0.05".
Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is, the more
convincing is the rejection of the null hypothesis. It indicates the strength of evidence for say, rejecting the
null hypothesis H0, rather than simply concluding "Reject H0' or "Do not reject H0".
Power
The power of a statistical hypothesis test measures the test's ability to reject the null hypothesis when it is
actually false - that is, to make a correct decision.
In other words, the power of a hypothesis test is the probability of not committing a type II error. It is
calculated by subtracting the probability of a type II error from 1, usually expressed as:
The maximum power a test can have is 1, the minimum is 0. Ideally we want a test to have high power,
close to 1.
Hypothesis Testing
When you are evaluating a hypothesis, you need to account for both the variability in your
sample and how large your sample is.
Hypothesis testing is generally used when you are comparing two or more groups.
For example, you might implement want to determine the effectiveness of a teaching method, say
Method B. To evaluate it there is a need to compare the teaching results using the method introduced
with the method currently being used (Method A). The usual method of teaching is used to the Control
Group while the method being introduced is used in the experimental group. The teaching results in these
two groups are then compared.
When you are evaluating a hypothesis, you need to account for both the variability in your
sample and how large your sample is. Based on this information, you'd like to make an assessment of
whether any differences you see are meaningful, or if they are likely just due to chance. This is formally
done through a process called hypothesis testing.
5. Drawing a Conclusion
Step 1: Specify the Null Hypothesis
The null hypothesis (H0) is a statement of no effect, relationship, or difference between two or more
groups or factors. In research studies, a researcher is usually interested in disproving the null hypothesis.
Examples:
The alternative hypothesis (H1) is the statement that there is an effect or difference. This is usually
the hypothesis the researcher is interested in proving. The alternative hypothesis can be one-sided (only
provides one direction, e.g., lower) or two-sided. We often use two-sided tests even when our true
hypothesis is one-sided because it requires more evidence against the null hypothesis to accept the
alternative hypothesis.
Examples:
The performance of students using Method B differs with the performance of students
using Method A. (two-sided).
The performance of students using Method B is lower than the performance of students
who were taught using Method A. (one-sided).
The significance level (denoted by the Greek letter alpha— ) is generally set at 0.05. This means that
there is a 5% chance that you will accept your alternative hypothesis when your null hypothesis is
actually true. The smaller the significance level, the greater the burden of proof needed to reject the null
hypothesis, or in other words, to support the alternative hypothesis.
In another section we present some basic test statistics to evaluate a hypothesis. Hypothesis testing
generally uses a test statistic that compares groups or examines associations between variables.
When describing a single sample without establishing relationships between variables, a confidence
interval is commonly used.
Page 94
The p-value describes the probability of obtaining a sample statistic as or more extreme by chance
alone if your null hypothesis is true. This p-value is determined based on the result of your test statistic.
Your conclusions about the hypothesis are based on your p-value and your significance level.
Example:
P-value = 0.01 This will happen 1 in 100 times by pure chance if your null hypothesis is
true. Not likely to happen strictly by chance.
Example:
P-value = 0.75 This will happen 75 in 100 times by pure chance if your null hypothesis is
true. Very likely to occur strictly by chance.
Your sample size directly impacts your p-value. Large sample sizes produce small p-values even
when differences between groups are not meaningful. You should always verify the practical
relevance of your results. On the other hand, a sample size that is too small can result in a failure to
identify a difference when one truly exists.
Plan your sample size ahead of time so that you have enough information from your sample to show a
meaningful relationship or difference if one exists.
Example:
Average ages were significantly different between the two groups (16.2 years vs. 16.7
years; p = 0.01; n=1,000). Is this an important difference? Probably not, but the large
sample size has resulted in a small p-value.
Example:
Average ages were not significantly different between the two groups (10.4 years vs.
16.7 years; p = 0.40, n=10). Is this an important difference? It could be, but because the
sample size is small, we can't determine for sure if this is a true difference or just happened
due to the natural variability in age within these two groups.
1. P-value <= significance level => Reject your null hypothesis in favor of your
alternative hypothesis. Your result is statistically significant.
2. P-value > significance level => Fail to reject your null hypothesis. Your result is not
statistically significant.
Hypothesis testing is not set up so that you can absolutely prove a null hypothesis. Therefore,
when you do not find evidence against the null hypothesis, you fail to reject the null hypothesis. When you
Page 94
do find strong enough evidence against the null hypothesis, you reject the null hypothesis. Your
conclusions also translate into a statement about your alternative hypothesis. When presenting
the results of a hypothesis test, include the descriptive statistics in your conclusions as well. Report exact
p-values rather than a certain range. For example, "The intubation rate differed significantly by patient
age with younger patients have a lower rate of successful intubation (p=0.02)." Here are two more
examples with the conclusion stated in several different ways.
Example:
H0: There is no difference in survival between the intervention and control group.
H1: There is a difference in survival between the intervention and control group.
= 0.05; 20% increase in survival for the intervention group; p-value = 0.002
Conclusion:
The difference in survival between the intervention and control group was statistically
significant.
There was a 20% increase in survival for the intervention group compared to control
(p=0.001).
Example:
H0: There is no difference in survival between the intervention and control group.
H1: There is a difference in survival between the intervention and control group.
= 0.05; 5% increase in survival between the intervention and control group; p-value =
0.20
Conclusion:
The difference in survival between the intervention and control group was not statistically
significant.
There was no significant increase in survival for the intervention group compared to
control (p=0.20).
Z-test
The Z-test is a statistical test used in inference which determines if the difference between a sample
mean and the population mean is large enough to be statistically significant, that is, if it is unlikely to have
occurred by chance.
Page 94
The Z-test is used primarily with standardized testing to determine if the test scores of a particular sample
of test takers are within or outside of the standard performance of test takers.
Notation and mathematics
In order for the Z-test to be reliable, certain conditions must be met. The most important is that since the
Z-test uses the population standard deviation, it must be known. The sample must be a simple random
sample of the population. If the sample came from a different sampling method, a different formula must
be used. It must also be known that the population varies normally (i.e., the sampling distribution of the
probabilities of possible values fits a standard normal curve). If it is not known that the population varies
normally, it suffices to have a sufficiently large sample, generally agreed to be ≥ 300 or 400.
In actuality, knowing the true σ of a population is unrealistic except for cases such as standardized testing
in which the entire population is known. In cases where it is impossible to measure every member of a
population it is more realistic to use a t-test, which uses the standard error obtained from the sample
along with the t-distribution.
SE = σ/ √n
The formula for calculating the z score for the Z-test is as follows:
Z=x–u
SE
where:
Finally, the z score is compared to a Z table, a table which contains the percent of area under the normal
curve between the mean and the z score. Using this table will indicate whether the calculated z score is
within the realm of chance or if the z score is so different from the mean that the sample mean is unlikely
to have happened by chance.
Example
In a U.S. school district, a standardized reading test is used to test the performance of fifth grade students
in an elementary school against the national norm for fifth grade students. The number of fifth grade
students in this elementary school taking the test is 55 students.
The national norm test score, the population mean, for this particular standardized test is 100 points. The
population standard deviation for the year under study is 12.
The scores of the fifth grade students of the elementary school in this school district are a sample of the
Page 94
total population of fifth grade students in the U.S. which have also taken the test.
The school district is told that the mean for their particular school is 96, which is lower than the national
mean. Parents of the students become upset when they learn their school is below the national norm for
the reading test. The school district administration points out that the test scores are actually pretty close
to the population mean though they are lower.
The real question is this, is the school's mean test score sufficiently lower than the national norm as to
indicate a problem or is the school's mean test score within acceptable parameters. We will use the Z-test
to see.
Remember that a z score is the distance from the population mean in units of the population standard
deviation. This means that in our example, a mean score of 96 is −2.47 standard deviation units from the
population mean. The negative means that the sample mean is less than the population mean. Since the
normal curve is symmetric the Z table is always expressed in positive z scores so if the calculated z score
is negative, look it up in the table as if it were non-negative.
Next we look the z score up in a Z table and we find that a z score of −2.47 is 49.32%. This means that
the area under the normal curve between the population mean and our sample mean is 49.32%.
What this tells us is that 49.32% plus 50% or 99.32% of the time, a randomly selected group of 55
students have a higher average score than these 55 students had. This is because our z score is
negative so we are below the population mean. So not only do we include the distance between our
sample mean and the population mean, we also include the area under the normal curve which is greater
than the population mean.
If our sample mean had been 104 rather than 96, then our z score would have been 2.47 which would
have indicated that our sample mean was above the population mean. That would have indicated that the
fifth grade students in our sample were in the top 0.7% of the nation.
But let's get back to our original question. Is there a problem with the reading program at our elementary
school? Our question can be reformulated to say, is the mean from our elementary school, a sample from
the general population of fifth grade students, far enough outside of the norm that we need to take a
corrective action to improve the reading program?
Let's put this in the form of a hypothesis which we are going to test with our statistical analysis. Our
alternative hypothesis is that our sample mean is significantly different from the population mean and that
corrective action is necessary. Our null hypothesis is that the difference is purely attributable to chance
and no action is necessary.
To answer this question, we need to determine what is the level of confidence (confidence level) we want
to use. Typically a 0.05 confidence level is used meaning that if the null hypothesis is true we stand only a
5% chance of rejecting it anyway.
Page 94
In the case of our sample mean, the z score of −2.47 which provides us a value of 49.32% means that
49.32% plus 50% or 99.32% of the time, a randomly selected group of 55 students have a higher average
score than the 55 students in our sample had. To test our null hypothesis, we have to conduct a two-sided
test. Since our sample is outside of this area by 1.82%, we have to reject the null hypothesis because the
value of 1.82% is less than 5%, our confidence level.
Therefore we can conclude with a 95% confidence level that the test performance of the students in our
sample were not within the normal variation.
References
Sprinthall, Richard C. Basic Statistical Analysis: Seventh Edition, copyright 2003, Pearson
Education Group
Example problem using the Z-test in the process to test statistical hypotheses for a research
problem
Research Problem: We randomly select a group of 9 subjects from a population with a mean IQ of 100
The research question for this experiment is - Does training subjects with the Get Smart training program,
increase their IQ significantly over the average IQ for the general population? We will use the six step
process to test statistical hypotheses for this research problem.
Since this problem involves comparing a single group's mean with the population mean and the
standard deviation for the population is known, the proper statistical test to use is the Z-test.
Z = 2.6
We need to find the value of Z that will only be exceeded 5% of the time since we have set our
alpha level at .05. Since the Z score is normally distributed (or has the Z distribution), we can find
this 5% level by looking at the Z / T table. The associated Z-score would be 1.64 (or 1.65).
Page 94
6. Statement of results: The average IQ of the group taking the Get Smart training program is
significantly higher than that of the general population.
If we reject the null hypothesis, we accept the alternative hypothesis. The statement of results
then states the alternative hypothesis which is the research question stated in the affirmative
manner.
We mentioned that we use the Z-test to compare the mean of a sample with the population mean when
the population standard deviation is known. We will now turn to the statistic to use when the standard
deviation of the population is not known, the one-sample t-test.
The T-test
The t-test is one type of inferential statistics. It is used to determine whether there is a significant
difference between the means of two groups. With all inferential statistics, we assume the dependent
variable fits a normal distribution. When we assume a normal distribution exists, we can identify the
probability of a particular outcome. We specify the level of probability (alpha level, level of significance, p)
we are willing to accept before we collect data (p < .05 is a common value that is used). After we collect
data we calculate a test statistic with a formula. We compare our test statistic with a critical value found
on a table to see if our results fall within the acceptable level of probability.
When the difference between two population averages is being investigated, a t-test is used. In
other words, a t-test is used when we wish to compare two means (the scores must be measured on an
interval or ratio scale). We would use a t-test if we wished to compare the reading achievement of boys
and girls. With a t-test, we have one independent variable and one dependent variable. The independent
variable (gender in this case) can only have two levels (male and female). The dependent variable would
be reading achievement. If the independent had more than two levels, then we would use a one-way
analysis of variance (ANOVA).
The test statistic that a t-test produces is a t-value. Conceptually, t-values are an extension of z-
scores. In a way, the t-value represents how many standard units the means of the two groups are apart.
With a t-test, the researcher wants to state with some degree of confidence that the obtained
difference between the means of the sample groups is too great to be a chance event and that some
difference also exists in the population from which the sample was drawn. In other words, the difference
that we might find between the boys' and girls' reading achievement in our sample might have occurred
by chance, or it might exist in the population. If our t-test produces a t-value that results in a probability
of .01, we say that the likelihood of getting the difference we found by chance would be 1 in a 100 times.
We could say that it is unlikely that our results occurred by chance and the difference we found in the
sample probably exists in the populations from which it was drawn.
Five factors contribute to whether the difference between two groups' means can be considered
significant:
1. How large is the difference between the means of the two groups? Other factors being equal, the
greater the difference between the two means, the greater the likelihood that a statistically
significant mean difference exists. If the means of the two groups are far apart, we can be fairly
Page 94
3. How many subjects are in the two samples? The size of the sample is extremely important in
determining the significance of the difference between means. With increased sample size,
means tend to become more stable representations of group performance. If the difference we
find remains constant as we collect more and more data, we become more confident that we can
trust the difference we are finding.
4. What alpha level is being used to test the mean difference (how confident do you want to be
about your statement that there is a mean difference). A larger alpha level requires less
difference between the means. It is much harder to find differences between groups when you
are only willing to have your results occur by chance 1 out of a 100 times (p < .01) as compared
to 5 out of 100 times (p < .05).
3. The scores in the populations have the same variance (s1=s2) Note: We use a different
calculation for the standard error if they are not.
This is concerned with the difference between the average scores of a single sample of
individuals who are assessed at two different times (such as before treatment and after
treatment). It can also compare average scores of samples of individuals who are paired in some
way (such as siblings, mothers, daughters, persons who are matched in terms of a particular
characteristics).
t= D
√ ∑ D2 – (∑D)2
n (n – 1)
Where: D = the mean difference between the pretest and posttest
∑D2 = the sum of squares of the difference between the pretest an posttest
∑D = the summation of the difference between the pretest and the post test
Example: An experimental study was conducted on the effect of programmed materials in English on the
performance of 20 selected college students. Before the program was implemented the pretest was
administered and after 5 months the same instrument was used to get the posttest result. The following
is the result of the experiment.
Pretest Posttest D D2
20 25 -5 25
30 35 -5 25
10 25 -15 225
15 25 -10 100
20 20 0 0
10 20 -10 100
18 22 -4 16
14 20 -6
36 15 20
-5 25 20 15
5 25 18
30 -12 144 15
10 5 25
15 16 -1 1
20 25 -5 25
18 10 8 64
40 45 -5 25
10 15 -5 25
10 10 0 0
12 18 -6 36
20 25 -5
25 ∑D=
-81 ∑D2 = 947
Page 94
D = -81 / 20
= -4.05
Solution:
1. Problem: Is there a significant difference between the pretest and the posttest means on the use of
programmed materials in English?
2. Hypothesis:
Ho : There is no significant difference between the pretest and posttest or the use of the
programmed materials did not affect the students’ performance in English.
3. Level of Significance
df = n – 1
= 20 – 1 = 19
t.05 = 1.729
5. Computation:
t= -4.05
√ 947 – (-81)2 / 20
20 (20 -1)
= - 4.05
√ 947 – 328.05
20(19)
= - 4.05
√ 618.95 / 380
= -4-05
√1.6288
= -4.05 / 1.2762
= -3.17
4. Decision: The computed t is higher than the critical value of t, reject the null hypothesis.
5. Conclusion: There is significant difference between the pretest and the posttest means. The
posttest result is higher than the pretest result, implying that the use of programmed materials in
English is effective.
Page 94
T-test for Uncorrelated Samples
We often want to know whether the means of two populations on some outcome differ. For
example, there are many questions in which we want to compare two categories of some categorical
variable (e.g., compare males and females) or two populations receiving different treatments in context of
an experiment. The two-sample t-test is a hypothesis test for answering questions about the mean where
the data are collected from two random samples of independent observations, each from an underlying
normal distribution:
The steps of conducting a two-sample t-test are quite similar to those of the one-sample test. In
this example we will examine a program's effect by comparing the birthweights of babies born to women
who participated in an intervention with the birthweights of a group that did not.
A comparison of this sort is very common in medicine and social science. To evaluate the effects
of some intervention, program, or treatment, a group of subjects is divided into two groups. The group
receiving the treatment to be evaluated is referred to as the treatment group, while those who do not are
referred to as the control or comparison group. In this example, mothers who are part of the prenatal care
program to reduce the likelihood of low birthweight is the treatment group, with a control group comprised
of women who do not take part in the program.
Returning to the two-sample t-test, the steps to conduct the test are similar to those of the one-
sample test.
Establish hypotheses
The first step to examining this question is to establish the specific hypotheses we wish to
examine. Specifically, we want to establish a null hypothesis and an alternative hypothesis to be
evaluated with data.
In this case:
Null hypothesis is that the difference between the two groups is 0. Another way of stating the null
hypothesis is that the difference between the mean of the treatment group of birthweight for
program babies and the mean of the control group of birthweight for poor women is zero.
Alternative hypothesis - the difference between the observed mean of birthweight for program
babies and the expected mean of birthweight for poor women is not zero.
From hospital records, we obtain the following values for these components:
Treatment Control
SD 420 425
N 75 75
Having calculated the t-statistic, compare the t-value with a standard table of t-values to determine
whether the t-statistic reaches the threshold of statistical significance.
With a t-score so high, the p-value is 0.001, a score that forms our basis to reject the null hypothesis and
conclude that the prenatal care program made a difference.
Page 94
ANOVA – Analysis of Variance
Our lesson on the t-test demonstrated how to compare differences of means between two
groups, such as comparing outcomes between control and treatment groups in an experimental study.
The t-test is a useful tool for comparing the means of two groups; however, the t-test is not good in
situations calling for the comparison of three or more groups. It can only compare one group's mean to a
known distribution or compare the means of two groups. With three or more groups, the t-test is not an
effective statistical tool. On a practical level, using the t-test to compare many means is a cumbersome
process in terms of the calculations involved. On a statistical level, using the t-test to compare multiple
means can lead to biased results.
Yet there are many kinds of questions in which we might want to compare the means of several
different groups at once. For example, in evaluating the effects of a particular social program, we might
want to compare the mean outcomes of several different program sites. Or we might be interested in
examining the relative performance of different members of a corporate sales team in terms of their
monthly or annual sales records. Alternatively, in an organization with several different sales managers,
we might ask whether some sales managers get more out of their sales staff than others.
With questions such as these, the preferred statistical tool is the ANOVA, (Analysis Of Variance.
There are some similarities between the t-test and ANOVA. Like the t-test, ANOVA is used to test
hypotheses about differences in the average values of some outcome between two groups; however,
while the t-test can be used to compare two means or one mean against a known distribution, ANOVA
can be used to examine differences among the means of several different groups at once. More
generally, ANOVA is a statistical technique for assessing how nominal independent variables influence a
continuous dependent variable.
This module describes and explains the one-way ANOVA, a statistical tool that is used to
compare multiple groups of observations, all of which are independent but may have a different mean for
each group. A test of importance for many kinds of questions is whether or not all the averages of a set of
groups are equal. There is another form of ANOVA that examines how two explanatory variables affect an
outcome variable; however, this application is not discussed in this module.
Assumptions
1. The standard deviations (SD) of the populations for all groups are equal - this is sometimes
referred to as an assumption of the homogeneity of variance. Again, we can represent this
assumption for groups 1 through n as
One application of the one-way ANOVA that might be of interest has to do with students'
performance in an introductory course in statistics. At the University of Technology (UTech), there are
three sections of an introductory statistics course offered: one in the morning, another in the afternoon,
and a third in the evening. These courses are taught by different instructors; however, given the
importance of statistics in the university's sequence of courses and recent efforts to implement a standard
curriculum, all three courses cover exactly the same material.
Karl Rousseau has recently been hired as the new chair of the statistics department at UTech. In taking
on his duties as the department head, he's interested in whether there's any variation in how well students
do in the course, based on whether they enroll in the morning, afternoon, or evening course. A morning
man himself, Prof. Rousseau has some doubt that there's much learning going on in the evening course;
however, given his position as chair, he is very interested in making sure that all three sections are getting
the same high-quality education. Moreover, he's too much of an empiricist to allow this idea to go
untested. So he proposes that at semester's end students in all three sections take the National
Assessment of Statistical Knowledge (NASK) to determine whether there are differences in student
performance.
He starts by generating a null hypothesis that all three groups will have the same mean score on the test.
In formula terms, if we use the symbol μ to represent the average score, the null hypothesis is expressed
through the following notation:
Notice in the graph that all three groups have the same average score (all three points are on the dashed
line) and all three groups have the same SD (noted by the fact that the line around the mean point for
each group is the same size). So the null hypothesis is that all three groups will have the same average
score on the NASK.
Page 94
The alternate hypothesis is that all means are not the same. It's important to point out that the opposite is
not that all means are different (i.e., μ1 μ2 μ5 ). It is possible that some of the means could be the same,
yet if they are not all identical, we would reject the null hypothesis. Rather, the alternative hypothesis is
that not all means are equal.
Figure 2 highlights some important features and one of the keys to understanding ANOVA. ANOVA
allows us to separate the total variability in the outcome (in this case, the variability in scores on the
NASK) into two parts: variability within groups and variability between groups. As you can see from the
graphs, there are differences within groups, with scores on the NASK ranging from roughly the same
amount above and below the mean for each group. But there are also differences between the groups,
with the evening group having somewhat higher scores than those of the afternoon group, although the
means of both groups are lower than the mean of the morning group.
4.15 950
Note: The standard deviation (SD) for each group is the same: 1.3.
With these data, we can calculate an ANOVA statistic to evaluate Prof. Rousseau's hypothesis. This is
done in multiple steps, as described below.
The first step is to calculate the variation between groups by comparing the mean of each group (or, in
this example, the mean of each of the three classes) with the mean of the overall sample (the mean score
Page 94
on the test for all students in this sample). This measure of between-group variance is referred to as
"between sum of squares" or BSS. BSS is calculated by adding up, for all groups, the difference between
the group's mean and the overall population mean, multiplied by the number of cases in the group. In
formula terms:
This sum of squares has a number of degrees of freedom equal to the number of groups minus 1. In this
case, dfB = (3-1) = 2
We divide the BSS figure by the number of degrees of freedom to get our estimate of the variation
between groups, referred to as "Between Mean Squares" as:
To measure the variation within groups, we find the sum of the squared deviation between scores on the
exam and the group average, calculating separate measures for each group, then summing the group
values. This is a sum referred to as the "within sum of squares" or WSS. In formula terms, this is
expressed as:
As in step 1, we need to adjust the WSS to transform it into an estimate of population variance, an
adjustment that involves a value for the number of degrees of freedom within. To calculate this, we take a
value equal to the number of cases in the total sample (N), minus the number of groups (k). In formula
terms,
This calculation is relatively straightforward. Simply divide the Between Mean Squares, the value obtained
in step 1, by the Within Mean Squares, the value calculated in step 2.
Then compare this value to a standard table with values for the F distribution to calculate the significance
level for the F value (link to F-test calculator). In this case, the significance level is less than .01. This is
extremely strong evidence against the null hypothesis, indicating that students' performance varies
significantly across the three classes.
Recap
To calculate an ANOVA, it is often convenient to arrange the statistics needed for calculation into a table
such as the one below:
To fill in this table with the data from the problem above, we have:
The chi-square test (Snedecor and Cochran, 1989) is used to test if a sample of data came from a
population with a specific distribution.
An attractive feature of the chi-square goodness-of-fit test is that it can be applied to any univariate
distribution for which you can calculate the cumulative distribution function. The chi-square goodness-of-
fit test is applied to binned data (i.e., data put into classes). This is actually not a restriction since for non-
binned data you can simply calculate a histogram or frequency table before generating the chi-square
test. However, the value of the chi-square test statistic are dependent on how the data is binned. Another
disadvantage of the chi-square test is that it requires a sufficient sample size in order for the chi-square
approximation to be valid.
Additional discussion of the chi-square goodness-of-fit test is contained in the product and process
comparisons chapter
Ha:
The data do not follow the specified distribution.
For the chi-square goodness-of-fit computation, the data are divided into k bins and the test
Test statistic is defined as
Statistic:
where is the observed frequency for bin i and is the expected frequency for bin i. The
expected frequency is calculated by
where F is the cumulative Distribution function for the distribution being tested, Yu is the upper
limit for class i, Yl is the lower limit for class i, and N is the sample size.
This test is sensitive to the choice of bins. There is no optimal choice for the bin width (since the
optimal bin width depends on the distribution). Most reasonable choices should produce similar,
Page 94
but not identical, results. Dataplot uses 0.3*s, where s is the sample standard deviation, for the
class width. The lower and upper bins are at the sample mean plus and minus 6.0*s,
respectively. For the chi-square approximation to be valid, the expected frequency should be at
least 5. This test is not valid for small samples, and if some of the counts are less than five, you
may need to combine some bins in the tails.
Generally speaking, the chi-square test is a statistical test used to examine differences with categorical
variables. There are a number of features of the social world we characterize through categorical
variables - religion, political preference, etc. To examine hypotheses using such variables, use the chi-
square test.
a. for estimating how closely an observed distribution matches an expected distribution - we'll refer
to this as the goodness-of-fit test
b. for estimating whether two random variables are independent.
1. Establish hypotheses.
2. Calculate chi-square statistic. Doing so requires knowing:
o Expected values
o Observed values
3. Assess significance level. Doing so requires knowing the number of degrees of freedom.
Testing Independence
The other primary use of the chi-square test is to examine whether two variables are independent or not.
What does it mean to be independent, in this sense? It means that the two factors are not related.
Typically in social science research, we're interested in finding factors that are related - education and
income, occupation and prestige, age and voting behavior. In this case, the chi- square can be used to
assess whether two variables are independent or not.
More generally, we say that variable Y is "not correlated with" or "independent of" the variable X if more of
one is not associated with more of another. If two categorical variables are correlated their values tend to
move together, either in the same direction or in the opposite.
Example
Return to the example discussed at the introduction to chi-square, in which we want to know whether
boys or girls get into trouble more often in school. Below is the table documenting the percentage of boys
and girls who got into trouble in school:
Boys 46 71 117
Girls 37 83 120
Total 83 154 237
To examine statistically whether boys got in trouble in school more often, we need to frame the question
in terms of hypotheses.
1. Establish Hypotheses
As in the goodness-of-fit chi-square test, the first step of the chi-square test for independence is to
establish hypotheses. The null hypothesis is that the two variables are independent - or, in this particular
case that the likelihood of getting in trouble is the same for boys and girls. The alternative hypothesis to
be tested is that the likelihood of getting in trouble is not the same for boys and girls.
Cautionary Note
It is important to keep in mind that the chi-square test only tests whether two variables are independent. It
cannot address questions of which is greater or less. Using the chi-square test, we cannot evaluate
directly the hypothesis that boys get in trouble more than girls; rather, the test (strictly speaking) can only
test whether the two variables are independent or not.
As with the goodness-of-fit example described earlier, the key idea of the chi-square test for
independence is a comparison of observed and expected values. How many of something were expected
and how many were observed in some process? In the case of tabular data, however, we usually do not
know what the distribution should look like (as we did with rolls of dice). Rather, in this use of the chi-
square test, expected values are calculated based on the row and column totals from the table.
The expected value for each cell of the table can be calculated using the following formula:
For example, in the table comparing the percentage of boys and girls in trouble, the expected count for
the number of boys who got in trouble is:
The first step, then, in calculating the chi-square statistic in a test for independence is generating the
expected value for each cell of the table. Presented in the table below are the expected values (in
parentheses and italics) for each cell:
Lastly, to determine the significance level we need to know the "degrees of freedom." In the case of the
chi-square test of independence, the number of degrees of freedom is equal to the number of columns in
the table minus one multiplied by the number of rows in the table minus one.
In this table, there were two rows and two columns. Therefore, the number of degrees of freedom is:
We then compare the value calculated in the formula above to a standard set of tables. The value
returned from the table is p< 20%. Thus, we cannot reject the null hypothesis and conclude that boys are
not significantly more likely to get in trouble in school than girls.
Recap
1. Establish hypotheses
2. Calculate expected values for each cell of the table.
b. Observed values
Page 94
4. Assess significance level. Doing so requires knowing the number of degrees of freedom
Exercise:
1. Ten subjects were given an attitude test on a controversial issue. Then they were shown a film
favourable to the ten subjects and the sae attitude test was administered. Make a directional test
at .05 level of significance.
Pretest Posttest
16 20
18 20
16 24
24 28
20 20
25 30
22 23
18 24
15 19
15
15
2. The following are data on the number of minutes that patients had to wait for their appointment
with 5 doctors. Use the F-test at .05 level of significance to test the hypothesis that the means of
the populations that are samples are equal.
Doctors
A B C D E
21 9 18 9 29
20 11 17 11 30
21 15 16 28 24
30 12 15 30
26 28 18 20 15
25
3. A random sample of 300 voters classified according to their political affiliation were asked if they were
in favour of the on going peace negotiation in Mindanao. Use Chi-Square test at .05 level of significance
to test that the sample belong to the population
Lakas 40 60 100
Laban 50 50 100
L.P 70 30 100
TOTAL 160 140
300
4. Two groups of high school students are matched for initial ability in a Social Science test. Group A is
taught by the lecture method while Group B by the experimental method. The data are presented below.
Formulate hypothesis and test them by applying appropriate statistical test.
Group A Group B
N 60 50
Pretest Mean 42.5 42.75
SD of the Pretest 5.5 5.3
Posttest mean 72.0 78.0
SD of the Posttest 6.5 6.1
Page 94
REFERENCES:
Devore, J. et. al. Applied Statistics for Engineers and Scientists. Crooks, Cole, Inc. 2005
Esllen B. et. al. Basic Statistics Textbook – Workbook. Graduation, Publishing, 2005
Punzalan, Twila and Gariel Uriarte. Statistics Made Simple.Manila: Rex Book Store, 1990.
Sweeney, B. Essentials of Statistics for the Behavioral Sciences. Thomson Publishing, 2006.
Page 94
Page 94
Page 94
Page 94
Page 94