Вы находитесь на странице: 1из 114

MODULES IN

STATISTICS IN
RESEARCH
UNIVERSITY OF
BATANGAS

GRADUATE
SCHOOL

MERCEDES A. MACARANDANG,
ED.D

(Modified January 2020)


MODULE 1 - MEANING AND IMPORTANCE OF STATISTICS

Objectives: At the end of the module the students shall be able to :

1. state the meaning and origin of the word “statistics;”


2. explain the scope of statistics;
3. discuss the functions of statistics;
4. relate statistics with research;
5. give the importance of statistics to research;
6. discuss the role of statistics in scientific inquiry;
7. name and define the two branches of statistics; and
8. give the definitions of basic terms used in statistics.

Meaning of Statistics

The word statistics is derived from

 the Italian word statista meaning statesman.


 Status which means condition
 State which means territory

Today, statistics is defined in three meanings: namely: singular, plural and general meanings

 In its singular sense, the word statistics refers to the branch of mathematics which deals with the
systematic collection, tabulation, presentation, analysis, and interpretation of quantitative data
which are collected in a methodical manner without bias.
 In its plural sense, statistics means a set of quantitative data or facts.
 In the more general (common) usage, statistics has two meanings: First, it refers to numerical
facts.

The second meaning of statistics refers to the field of disciplines of study. In this sense, the word
statistics is defined as “a group of methods that are used to collect, organize, present, analyze, and
interpret data to make decisions.

Generally, statistics is divided into statistical methods and statistical theory or mathematical statistics.
Statistical methods refer to those procedure and techniques used in the collection, presentation,
analysis and interpretation of quantitative data. Likewise, statistical theory or mathematical statistics
deals with the development and exposition of theories which constitutes the bases of the statistical
methods.

Scope of Statistics

The use of statistics is spread through all fields, namely: fisheries, agriculture, commerce, trade and
industry, health, education, nursing, medicine, biology, economics, psychology, sociology, engineering,
chemistry, physics and many others. It is said that statistics is the “tool” of all sciences. It is called the
“language of research”.

In education, statistics is the vital tool in evaluating the achievements of students and the performance of
mentors, staff, administrators. Statistical results serve as basis for promotion and retention of students.
Statistical treatment determines the effectiveness and ineffectiveness of instruction, research, extension
Page 94

and production.
Functions of Statistics

 To provide investigators means of measuring scientifically the conditions that may be involved in
a given problem and assessing the way in which they are related.
 To show the laws underlying facts and events that cannot be determined by individual
observations.
 To show relations of cause and effect that otherwise may remain unknown.
 To find out trends and behavior in related conditions which otherwise may remain ambiguous.

Importance of Statistics to Research

 Statistics permits the most exact kind of description.


 Statistics forces the researcher to be definite and exact in his procedures and in his thinking.
 Statistics enables the researcher to summarize the results in a meaningful and convenient form.
 Statistics enables the researcher to draw general conclusions: the process of extracting
conclusions is carried out according to accepted rules.
 Statistics enables the researcher to predict “how much” of a thing will happen under conditions he
knows and has measured.

Researcher’s Objectives in Studying Statistics

 To comprehend the logic of statistics;


 To find out where to apply statistical tools in different research problems and where not to apply
them;
 To interpret statistical results correctly and vividly;
 To determine the basic mathematics of statistics; and
 To master the language of statistics.

The Role Statistics in Scientific Inquiry

THEORY

EMPIRICAL OBSERVATIONS HYPOTHESIS

OBSERVATIONS

Figure 1 – The Role of Statistics in Scientific Inquiry

Figure 1 graphically represents the role of statistics in the research process. The diagram is based on the
thinking of Walter Wallace and illustrates how the knowledge base of any scientific enterprise grows and
develops. One point the diagram makes is that scientific theory and research continually shape each
other. Statistics are one of the most important means by which research and theory interact.

Since the figure is circular, it has no beginning or end, we could begin our discussion at any point.

 A theory is an explanation of the relationships between phenomena. In their attempt to


Page 94

understand these phenomena, they develop explanations. The explanation to any phenomenon is
provided by a theory.
 A hypothesis is a statement about the relationship between variables that, while logically derived
from the theory, is much more specific and exact.

 Observations may come from different data gathering procedures like surveys, questionnaires,
experiments, etc.

 Results of observations are analyzed and subjected to statistical procedures and then
conclusions are made which may either accept or reject the given hypothesis.

Without statistics, quantitative research is impossible. Without quantitative research, the


development of the social sciences would be severely impaired and perhaps arrested. Only by
application of statistical techniques can mere data help us shape and refine our theories and understand
the social world better. But it must be remembered that before any statistical analysis can legitimately be
applied, the preceding phases of the process must have been successfully completed.

As statistical analysis comes to an end, we would move on the next stage of the process. In this
phase, we would primarily b concerned with assessing our theory, but we would also look for other trends
in the data. As we developed tentative explanations, we might begin to revise or elaborate our theory. If
we change the theory to take into account these findings, however, a new research project designed to
test the revised theory is called for, and the wheel of science would begin to turn again.

In summary, statistics permit us to analyze data, identify and probe trends and relationships, to
develop generalizations and to revise and improve our theories. They are also an indispensable part of
the research enterprise. Without statistics, the interaction between theory and research would become
extremely difficult and the progress of our disciplines would be severely retarded.

DESCRIPTIVE AND INFERENTIAL STATISTICS

There are two branches of statistics

Statistics is a set of tools used to organize and analyze data. Data must either be numeric in
origin or transformed by researchers into numbers. For instance, statistics could be used to analyze
percentage scores English students receive on a grammar test: the percentage scores ranging from 0 to
100 are already in numeric form. Statistics could also be used to analyze grades on an essay by
assigning numeric values to the letter grades, e.g., A=4, B=3, C=2, D=1, and F=0.

Employing statistics serves two purposes, (1) description and (2) prediction. Statistics are used to
describe the characteristics of groups. These characteristics are referred to as variables. Data is
gathered and recorded for each variable. Descriptive statistics can then be used to reveal the
distribution of the data in each variable.

Statistics is also frequently used for purposes of prediction. Prediction is based on the concept of
generalizability: if enough data is compiled about a particular context (e.g., students studying writing in a
specific set of classrooms), the patterns revealed through analysis of the data collected about that context
can be generalized (or predicted to occur in) similar contexts. The prediction of what will happen in a
similar context is probabilistic. That is, the researcher is not certain that the same things will happen in
other contexts; instead, the researcher can only reasonably expect that the same things will happen.

Prediction is a method employed by individuals throughout daily life. For instance, if writing students begin
class every day for the first half of the semester with a five-minute freewriting exercise, then they will likely
come to class the first day of the second half of the semester prepared to again freewrite for the first five
Page 94

minutes of class. The students will have made a prediction about the class content based on their
previous experiences in the class: Because they began all previous class sessions with freewriting, it
would be probable that their next class session will begin the same way. Statistics is used to perform the
same function; the difference is that precise probabilities are determined in terms of the percentage
chance that an outcome will occur, complete with a range of error. Prediction is a primary goal of
inferential statistics.

Descriptive Statistics. The general function of statistics us to manipulate data so that the
original research question(s) can be answered. The researcher can call upon two general classes of
statistical techniques that, depending on the research situation, are available to accomplish the task. The
first class of techniques is called descriptive statistics and is relevant when (1) when the researcher needs
to summarize or describe the distribution of a single variable and (2) when the researcher wishes to
understand the relationship between two or more variables. If we are concerned with describing a single
variable, then our goal will be to arrange the values or scores of that variable so that the relevant
information can be quickly understood and appreciated. Percentages, graphs, and charts can be all used
as single-variable descriptive statistics. The process of allowing a few numbers to summarize many
numbers is called data reduction and is the basic goal of single-variable descriptive statistical
procedures. Descriptive statistics which is devoted to the summarization and description of data sets.
These includes topics on the measures of central tendency, measures of variability and measures of
correlation. Descriptive statistics consists of methods for organizing, displaying, and describing of data by
using tables, graphs and summary measures.

The second type of descriptive statistics is designed to help the investigator understand the
relationship between two or more variables. These statistics, called measure of association or
correlation, allow the researcher to quantify the strength and direction of relationship. These statistics are
very useful because they enable us to investigate two matters of central theoretical and practical
importance to any science: causation and prediction. These techniques help us trace the ways by which
some variables might have causal influence on others, and depending on the strength of the relationship,
they enable us to predict the scores on one variable from the scores of another.

Descriptive Statistics are used to describe the basic features of the data gathered from an
experimental study in various ways. They provide simple summaries about the sample and the measures.
Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.
It is necessary to be familiar with primary methods of describing data in order to understand phenomena
and make intelligent decisions. Various techniques that are commonly used are classified as:

 Graphical displays of the data in which graphs summarize the data or facilitate
comparisons.
 Tabular description in which tables of numbers summarize the data.

 Summary statistics (single numbers) which summarize the data.

In general, statistical data can be briefly described as a list of subjects or units and the data associated
with each of them. Although most research uses many data types for each unit, this introduction treats
only the simplest case.

There may be two objectives for formulating a summary:

1. To choose a statistic that shows how different units seem similar. Statistical textbooks call one
solution to this objective, a measure of central tendency.
2. To choose another statistic that shows how they differ. This kind of statistic is often called a
measure of statistical variability.

When summarizing a quantity like length or weight or age, it is common to answer the first question
Page 94

with the arithmetic mean, the median, or the mode. Sometimes, we choose specific values from the
cumulative distribution function called quantiles.
The most common measures of variability for quantitative data are the variance; its square root, the
standard deviation; the range; interquartile range; and the average absolute deviation (average
deviation).

Steps in descriptive statistics


1. Collect data
2. Classify data

3. Summarize data

4. Present data

Proceed to inferential statistics if there are enough data to draw a conclusion

Inferential Statistics. The second class of statistical techniques becomes relevant when we
wish to generalize our findings from a sample to a population. It is concerned with making decisions
about a large body of data in the population of interest by using samples. It consists of methods that use
sample results to help make predictions. A population is the total collection of all cases in which the
researcher is interested and wishes to understand. A population is usually large to be measured and
social scientists almost never have the resources or time to test every case in the population. Hence, the
need for inferential statistics, which involves using information from samples (carefully chosen subset of
the defined population) to make inferences about populations. Samples are of course, mush cheaper to
assemble, and if proper techniques are followed – generalization based on these samples can be very
accurate representations of the population.

Discrete and Continuous Variables

One of the puzzling aspects of studying statistics is learning when to use which statistics. There
are guidelines which should be remembered. The first of these concerns : discrete and continuous
variables; and the second concerns level of measurement.

A variable is said to be discrete if it has a basic measurement that cannot be subdivided. The
measurement process for discrete variables involves accurate counting of the number of unit per case.
For example, the number of people per household is a discrete variable.

A variable is continuous if the measurement of it can be divided infinitely – at least in a theoretical


sense. A good example of such a variable would be time, which can be measured in nanoseconds
(billionths of a second) or even smaller unit. In a sense, when we measure a continuous variable, we
always be approximating and rounding of scores.

Levels of Measurement

Every statistical technique involves performing some mathematical operations such as adding
scores or ranking cases. Before you can properly use a technique, you must measure a variable being
processed in a way that justifies the required mathematical operations.

The three levels of measurements are nominal, ordinal and interval-ratio. All measurement
involves classification as a minimum. In nominal measurement, classification into categories is the only
measurement permitted. The categories are not numerical and can be compared to each other only in
terms of the number of cases classified in them. Although at times, numerical labels are used to identify
the categories of a variable measured at the nominal level. The only mathematical operation permissible
with nominal variables is counting the number of occurrences that have been classified into the various
Page 94

categories of the variable.


The Ordinal level of measurement. Variables measured at the ordinal level allow the
categories to be ranked with respect to how much of the trait being measured they possess. The
categories from a kind of numerical scales that can be ordered from “high” to “low”. For example, the
variable socio-economic status (SES) is usually measured at the ordinal level in the social sciences.

The Interval-Ratio level of measurement. The categories of nominal level variables have no
numerical quality to them. Ordinal-level variables have categories that can be arrayed along a scale from
high to low, but the exact distances between categories are unknown. Variables measured at the
interval-ratio level not only classification and ranking but also allow the distance from category to category
(score to score) to be exactly defined. Interval-ratio variables are measured in units that have equal
intervals and true zero point. For example, the ages of your respondents is a measurement procedure
that would produce interval-ratio data because the unit of measurement (years) has equal intervals (the
distance from year to year is 365 years) and a true zero point (it is possible to be zero years old). Other
examples of interval-ratio variables would be income, number of children, weight, test scores, and years
married.

Other Basic Terms

Population and Sample

In statistics, we always deal with data either from a population or form a sample.

Population refers to the totality of observations of the entire universe of people or factors. Examples: all
teachers in Metro Manila, all government employees in the Philippines, etc.

Sample refers to a subset of the total population. Example: selected teachers in Metro Manila, selected
employees in the Philippines.

Representative Sample is a sample that represents the characteristics of the population as closely as
possible.

Random Sample is a sample drawn in such a way that each element of the population has equal
chances of being selected.

Element or members of a sample or population is a specific subject or object (for example, a person, a
firm, item, state or country) about which the information is collected.

Variable is a characteristic under study that assumes different values for different elements. In contrast
to a variable, the value of a constant is fixed.

Observation or measurement is the value of a variable for an element.

Statistic is the number that describes a characteristic of a sample.

Parameter is any characteristic of a population that is measurable.

Data are numbers or measurements that are collected as a result from observation, interview,
questionnaire, experimentation, test, and so forth.

Types of Data

There are two general types of data: (1) numerical and (2) categorical data.
Page 94
Numerical data are those that are expressed in numerical values, such as 5, 212, 5.34, etc. These are
classified into: discrete data and continuous data.

Discrete data are always expressed in whole numbers. They cannot be expressed in fractions
or decimals. Ex. 12 brothers, 29 students

Continuous data are those which can be expressed in decimals or fraction. Ex. 5.36 ft., 70.526
lbs., 71/2 meters

Categorical data are classificatory data. They are not expressed in numerical values. They are merely
labeled and classified into categories for statistical analysis

Measurement refers to the assignment of numbers to observations made of objects or persons in such a
way that the numbers can be subjected to statistical analysis by manipulating or using the needed
operations according to mathematical rules of correspondence.

Variable refers to a factor, property, attribute, characteristics, or behavior that differentiates a group of
persons, a set of things, events, etc.,which takes on two or more dimensions, categories or levels with
descriptive or numerical values that can be measured qualitatively and/or quantitatively. Ex. Sex (male/
female), socio-economic status (high/middle/low); geographic location (urban/rural), etc.

Types of variables:

Independent Variable refers to the factor, property, attribute that is introduced, manipulated, or
treated to determine if it influences or causes a change in the dependent variable. The antecedent, cause,
stimulus that is introduced at the outset of the investigation. Ex: a method of teaching, a kind of fertilizer.

Dependent variable is the factor, property, characteristic or attribute that is measured and made
the object of analysis. It is the consequent, effect, criterion, response or output that is analyzed and
treated statistically during the investigation for purposes of hypothesis testing.

Quantitative variable is a variable which can be measured quantitatively. The data collected are
called quantitative data.

Qualitative or categorical variable is a variable which cannot assume a numerical value but can
be classified into two or more categories. The data collected are called qualitative data.

SCALES OF MEASUREMENT

Nominal Scale applies to data that are divided into different categories and these are used only
for identification purposes. Ex. Names of companies, cars, gender, marital status, etc.

Ordinal Scale applies to data that are divided into different categories that could be ranked.

Interval Scale applies to data that can be ranked and for which the difference between two
values can be calculated and interpreted.

Ratio Scale applies to data that can be ranked and for which all arithmetic operation (addition,
subtraction, multiplication and division) can be done.
Page 94
Exercise 1:

1. Cite 5 instances where statistical techniques are applied.


2. Why is statistics called the “tool of all the sciences” and the “language of research”?
3. Give 2 examples of the following :
Nominal variables

Ordinal variables

Ratio variables

Discrete variables

Continuous variables

Quantitative variables

Qualitative variables

4. Below are some items from a public-opinion survey. For each item, indicate the level of measurement
and whether the variable will be discrete or continuous.

a. What is your occupation? __________________

b. How many years of school have you completed? ____________________

c. If you were asked to use one of these four names for your social class, which would you say
you belonged in? ____ Upper _____ Middle _____ Working _____ Lower

d. What is your age? ____________________

e. In what province were you born? __________________

f. What is your grade-point average? __________________

g. What is your major area? ______________________

h. The only way to deal with the drug problem is to legalize all drugs.

______ strongly agree ______ agree

______ undecided ______ disagree

_____ strongly disagree

I What is your astrological sign? _______________

j. How many brothers and sisters do you have? ___________

5. Read 3 theses using quantitative approach. Identify the research problems. Based on Chapter III of
these researches, identify the statistical measures used in each of the problems. Show how the findings
were presented.
Page 94

MODULE 2 – BASIC DESCRIPTIVE STATISTICS: – Percentages, Ratios and Rates, Tables, Charts
and Graphs
Lesson Objectives: At the end of the lesson, the students shall be able to:

1. differentiate percentages from proportions; ratios from rates;


2. find the percentages, proportions, ratios and rates of given data;
3. show how data can be presented in tables, charts and graphs; and
4. explain the importance of tabular and graphical representations of data.
5. rank given sets of scores;
6. give the meaning of ranks;
7. organize a set of scores into a frequency distribution; and
8. construct a graphic representation by histogram or frequency polygon for a frequency
distribution

Introduction

Research results do not speak for themselves. They must be arranged in ways that allow the
researcher (and his or her readers) to comprehend their meaning quickly. The primary concern of
descriptive statistics is to present research results clearly and concisely. Researchers use a process
called data reduction to organize data into presentable form. Data reduction involves using a few
numbers, a table, or a graph to summarize or stand for a larger array of data.

Data reduction may lose important information like precision and details, so summarizing
statistics might present a misleading picture of research results. This can be avoided if not totally
eradicated if the researcher takes into consideration several decisions in the choice of different
summarizing techniques. These are: how to present the data, what kind of information to lose, and how
much detail can safely be obscured.

In this lesson, we will consider several commonly used techniques for presenting research
results: percentages and proportions, ratios and rates, tables, charts and graphs.

Percentages and Proportions

Percentages and proportions supply a frame of reference for reporting research results in the
sense that they standardize the raw data: percentages to the base of 100 and proportions to the base of
1.00. The mathematical definitions of proportions and percentages are:

Proportion (p) = f/N

Percentage (%) = (f/N) x 100

Example: Of the 80 graduates of Bachelor in Secondary Education, 70 took the Licensure Examination.
Out of this, 59 passed. What are the percentages and proportions of takers, and passers?

Percentage (%) of takers = (f/N) x 100 = (70/80) x 100 = (0.875) x 100 = 87.5%

Percentage (%) of passers = (f/N) x 100 = (59/70) x 100 = (0.843) x 100 = 84.3%

Both results could have been expressed in proportions.


Page 94

Proportion of takers = f/N = 70/80 = 0.875

Proportion of passers = f/N – 59/70 = 0.843


Percentages and proportions are easier to read and comprehend than frequencies. This
advantage is particularly obvious when attempting to compare groups of different sizes. To make
comparison easier, the difference in size can be effectively eliminated by standardizing distributions to the
common base of 100 (or, in other words, by computing percentages for the distribution.

Example: Given the data presented in the following tables, we will see the advantage of
presenting them in percentages.

Table 1.1 – DECLARED MAJOR FIELDS OF STUDYIN THE TWO PROGRAMS OF THE COLLEGE OF
EDUCATION

Major Subjects BSED BEED

English 46 39

Filipino 36 29

Mathematics 52 49

Physical Education 23 18

General Science 50 42

Total 207 177

If the frequencies are the only one given in a set of data, making comparisons is difficult because
the total number of enrollments are different. To make comparisons easier, the difference in size can b
effectively eliminated by standardizing both distribution to the common base of 100 as shown in Table
1.2.

Table 1.2 – DECLARED MAJOR FIELDS OF STUDY IN THE TWO PROGRAMS OF THE COLLEGE OF
EDUCATION

Major Subjects BSED BEED

English 22.22% 22.03%

Filipino 17.39% 16.38

Mathematics 25.12% 27.68

Physical Education 11.11% 10.17%

General Science 24.15% 23.73

Total 100% 100%

The percentages in Table 1.2 make it easier to identify both differences and similarities between
Page 94

the two programs.

Some further rules on the use of percentages and proportions:


1. When working with a small number of cases (say, fewer than 20), it is usually preferable to report
the actual frequencies than percentages or proportions. With a small number of cases, the
percentages can change drastically with relatively minor changes in the data.

2. Always report the number of observations along with proportions and percentages, This permits
the reader to judge the adequacy of the sample size and, conversely, helps to prevent the
researcher from lying with statistics.

Ratios and Rates

Ratios and rates provide two additional ways in which the distribution of a variable can be simply
and dramatically summarized. Ratios are especially useful for comparing categories in terms of relative
frequency. Instead of standardizing the distribution of a variable to the base of 100 or 1.00, as we did in
computing percentages and proportions, we determine the ratios by dividing the frequency of one
category by the frequency in another. Mathematically, a ratio can be defined as:

F1 where f1 = the frequency of the first category

Ratio = -------

F2 f2 = the number of the second category

To illustrate the use of ratios, suppose you were interested in the relative sizes of male and
female students in the College of Education and found out that there are 225 female and 58 male
students in the college. To find the ratio of female students (f1) to male students (f2), divide 225 by 58.
The resultant ratio is 3.88. This number would mean that for every male student in the College of
Education, there are 3.88 female students.

Note that ratios can be very economical ways of expressing the relative predominance of two
categories. In our example, the predominance of female students in the College of Education is obvious
from the raw data. Ratios are a precise measure of the relative frequency of one category per unit of the
other category. They tell us in an exact way the extent to which one category outnumber the other.

Rates provide still another way of summarizing the distribution of a single variable. Rates are
defined as the number of actual occurrences of some phenomenon divided by the number of possible
occurrences per some unit of time. Rates are usually multiplied by some power of 10 to eliminate decimal
points. For example, the crude death rate for a population is defined as the number of deaths in that
population (actual occurrences) divided by the number of people in the population (possible occurrences)
per year. This quantity is then multiplied by 1000. The formula for the crude death rate can be expressed
as:

Number of death in a year


Crude death rate = ----------------------------------- x 1000
Total population

If there were 100 deaths during a given year in a town of 7000, the crude death rate for that year
would be
Page 94

Crude death rate = (100/7000) x 1000 = (.01429) x 1000 = 14.29


Or, for every 1000 people, there were 14.29 deaths during this particular year.

By the same token, if a school with an enrolment of 8,700 experienced 120 dropouts during a
particular academic year, the dropout rate would be:

Dropout rate = (120/8,700) x 1000 = 13.79

Or, for every 1000 enrolees, there were 13.79 students who stopped schooling during the academic year
in question.

So far, we have considered three techniques (proportions and percentages, ratios and rate) for
describing and summarizing data. All three techniques express clearly and concisely, the distribution of
a single variable. They represent different ways of expressing information so that it can be quickly
appreciated.

Tabular Presentation

In many research activities data are gathered from different sources. These collected data
through various methods need to be organized. To give meaning to these raw data, appropriate tables
and graphs are used. In this lesson, we will consider tabular presentation through frequency distribution
and different methods of graphical presentation.

Frequency Distribution

Raw data can be tabulated or organized into a frequency distribution headed by a number and a
title. Frequency distribution is defined as the arrangement of the gathered data by categories with their
corresponding frequencies and class marks or midpoints. It has a class frequency containing the number
of observations belonging to a class interval. Each class interval contains a grouping defined by the
limits, called the lower or upper limits. Between these lower and upper limits are called class boundaries.

Frequency Distribution of Nominal Data

Table 1

Enrolment in the College of Education During the Academic Year 2007 – 2008

Year Level Frequency (f)

First Year 66

Second Year 62

Third Year 71

Fourth Year 87

Total 386

This is an example of a table presenting nominal data. The table consists of two columns , the
Page 94

first of which pertains to the categories being presented and the second column pertains to the
frequencies of each of the categories. In this table the data in the nominal scale are labeled.
Frequency Distribution of Ordinal Data

Table 2

Frequency Distribution of Faculty Perceptions Toward Failing College Students

Perceptions Frequency (f)

Strongly agree 58

Agree 45

Moderately agree 39

Disagree 26

Strongly disagree 20

Total 188

Table 2 presents an example of the tabular presentation of ordinal data. For ordinal data, the
distributions are scaled or graded so that the score values in the distribution present the degree of the
particular characteristic of the variable. It is for this reason that this type of data is always presented in
order, arranging data from highest to lowest or vice-versa.

Frequency Distribution of Interval Data

A frequency distribution provides the classroom teacher with a systematic arrangement of raw
scores by tallying the frequency of occurrence of each score in the interval or in some instances score
values that have been grouped.

Steps in Setting Up a Frequency Distribution for Ungrouped Data

1. Arrange the scores from highest to lowest in a column headed X. The X represents the raw
score.
2. Head the second column Tallies and record a slash or tally mark for each score. If a score value
appears twice, this column will have two slashes, three values gives three slashes, and so on.
3. Count the slash marks and place the number corresponding to the total number of tallies for each
raw sore value in the third column. The f column represents the occurrence of each score.
4. Sum the f column and record the number of scores (N) as a total.

Given the following set of scores:

32 39 40 25 29 35 39 28 41 29 37 30
27 32 29 29
Page 94

The Frequency distribution will be:


X Tallies Frequency (f)

41 / 1
40 / 1
39 // 2
37 / 1
35 / 1
32 // 2
30 / 1
29 //// 4
28 / 1
27 / 1
25 / 1
N=16

Frequency Distribution for Grouped Scores

When the interval between the lowest and highest scores exceeds about 30 units, grouping
scores into intervals may aid in the analysis. Grouped data condenses the scores into a smaller number
of categories which may aid in interpretation of a large number of scores or a set of scores with a wide
range.

Group / Class / Step Frequency Distribution is the process of placing scores in scaled group
called classes or steps. A class / step is group of a specified number of consecutive scores single scores
or measures. The specified number of consecutive scores that a class/step contains is called interval.
The lower end-number of the class is called lower limit and the upper end-number of the class is called
upper limit.

Procedure in Grouping of Scores or Making a Frequency Distribution

1. Find the highest and the lowest scores. Get their difference (Range).
2. Determine the number of classes or steps by dividing the range by the number of steps or
classes desired. The ideal number of steps or classes ranges from 10 to 20 depending upon the
number of scores or measures. There is no fixed rule but the more scores the more number of
classes there should be.
3. Determine the lowest limit. This is done by looking into the lowest score. The lowest score can
be the lowest limit, but it is advisable that the lower limits be exactly divisible by the desired
interval. If the lowest score is 40 and the interval is 3, the lowest limit will be 39. Forty is not
exactly divisible by 3, so look for the number which is nearest the lowest score and exactly
divisible by 3. That number is 39.
4. Determine the upper lower limits by adding the interval to the previous lower limits.
5. Determine the upper limits of each lower limit until reaching the highest score or including the
highest score.
6. Tally each raw score according to the interval in which it falls.
7. Get the frequencies of the tallies in each of the class or step intervals.
8. Find the sum of the frequencies (N).
Example:

The following are test scores in a test in Philippine History:

47 32 58 37 24 28 55 38 35 44 49
Page 94

47 51 38 33 29 27 42 39 53 46 40
28 30 47 50 45 39 32 36 36 51 47

39 33 38 36 45 43 33 44 42 36 41

44 41 36 34

1. The highest score is 58 and the lowest score is 24. The range is 34.
2. To find the class interval, divide the range, 34 by 10 (the desired number of step or class-
intervals). The answer is 3.4, so 3 the step-interval.
3. The lowest score is 24. Since 24 is exactly divisible by 3, then it is the lowest limit .
4. The resulting frequency distribution would be:

Step Distribution

Class-interval Tallies Frequency

57 - 59 / 1
54 - 56 / 1
51 - 53 /// 3
48 - 50 // 2
45 - 47 /////-// 7
42 - 44 /////-/ 6
39 - 41 /////-/ 6
36 - 38 /////-//// 9
33 – 35 ///// 5
30 - 32 /// 3
27 - 29 //// 4
24 - 26 / 1

N= 48

Real Class Limits and Midpoints

To work with the distribution of a variable as if it were continuous, statisticians use real class
limits. To find the real class limits of any class interval, begin with the limits stated in the frequency
distribution (the stated class limits). Subtract 0.5 to the stated lower limit and add 0.5 to the stated upper
limit.

Stated Limits Real Limits


57 – 59 56.5 – 59.5
54 – 56 53.5 – 56.5
51 – 53 50.5 – 53.5
48 – 50 47.5 – 50.5

Note that, when conceptualized with real limits, the class interval overlap with each other and the
distribution can be seen as continuous.

In addition to real limits, you will need to work with midpoints of the class interval to construct
some types of graphs. Midpoints are defined as the points exactly halfway between the upper and real
lower limits and can be found by dividing the sum of the upper and lower real limits by two.
Page 94

Example:
Real Limits Real Limits Midpoints

57 – 59 56.5 – 59.5 58
54 – 56 53.5 – 56.5 55
51 – 53 50.5 – 53.5 52
48 – 50 47.7 – 50.5 49

Cumulative Frequency and Percentage Distribution

Two commonly used adjuncts to the basic frequency distribution for interval-ratio data are the
cumulative frequency and percentage distribution. Their primary purpose is to allow the researcher(and
his or her audience) to tell at a glance how many cases fall below a given score or class interval in the
distribution.

To construct a cumulative frequency (cf) column, begin with the lowest class interval in the
distribution. The entry in the cf columns for that interval will the same as the number of cases in the
interval. For the next higher interval, the cf will be all the cases in the interval plus all the cases in the first
interval, and so on.

The percentage column is determined by dividing the frequency of each class interval by the total
number of cases and multiplying the quotient by 100.

Class-interval Frequency CF (Up) CF (Down) Percentage


Relative Frequency

58 - 59 1 48 1 2.08
55 - 56 1 47 2 2.08
52 - 53 3 46 5 6.25
49 - 50 2 43 7 4.17
46 - 47 7 41 14 14.58
43 - 44 6 34 20 12.5
40 - 41 6 28 26 12.5
37 - 38 9 22 35 18.75
34 – 35 5 13 40 10.42
31 - 32 3 8 43 6.25
28 - 29 4 5 47 8.33
24 - 26 1 1 48 2.08

N= 48

CHARTS AND GRAPHS

Researchers frequently use charts and graphs to present their data in ways that are visually more
dramatic than frequency distributions. These devices are particularly useful for conveying an impression
of the overall shape of a distribution and for highlighting the clustering of cases in a particular range of
scores. The most common techniques are the pie and bar charts, histogram and line chart or frequency
polygon. The first two are appropriate for discrete variables at any level of measurement and the last two
Page 94

are used with interval-ration variables.


Pie Charts. To construct a pie chart, begin by computing the percentage of all cases that fall
unto each category of the variable. Then divide a circle (the pie) into segments (slices) proportional to the
percentage distribution. Be sure that the chart and all segments are clearly labelled.

FIGURE 1- SAMPLE PIE CHART : MARITAL STATUS OF RESPONDENTS ( N = 20)

DIVORCE
D
15% DIVORCED
SINGLE MARRIED
50% MARRIED
35% SINGLE

TABLE 3 : MARITAL STATUS OF RESPONDENTS

_________________________________________________

Status Frequency Percentage


(f) (%) __

Single 10 50
Married 7 35
Divorced 3 15
N = 20 100% __

Bar Charts. Like pie charts, bar charts are relatively straightforward. Conventionally, the
categories of the variable are arrayed along the horizontal axis (abscissa) and frequencies or percentages
of the variable, construct or draw a rectangle constant with width and height corresponding to the number
of cases in the category.

FIGURE 2 - SAMPLE BAR CHART: MARITAL STATUS OF RESPONDENTS


Page 94
12
10 50%
8
35% Series2
6
10 Series1
4 7 15%
2 3
0
Single Married Divorced

Graphic Representation of Frequency Distribution

The histogram is a graphical representation of a frequency distribution. Through a histogram,


the classroom teacher may present how students scored on a test. The histogram is prepare by placing
the test score values on a horizontal axis or baseline with the scores increasing in magnitude from left to
right. The scale for the vertical axis on the left side of the graph is used to indicate the number of students
earning the scores. The vertical axis begins with 0 and moves to the highest frequency appearing for any
score of score interval. The data on a histogram are shown in the form of bars. The width of the base of
each bar represents the score(s) in an interval and the height represents the number of student scores
falling within the interval.

FIGURE 3: HISTOGRAM OF A GROUPED SCORES

Histogram of A Grouped Scores

10
8

6
Series1
4

2
0
8 2 8 4 0 6
-2 -3 -3 -4 -5 -5
24 30 36 42 48 54

The frequency polygon is a graphical representation of a frequency distribution. It aids in the


understanding of the characteristics of distribution through the visual representation of the frequency of
scores associated with designated points on the baseline. A frequency polygon is constructed by
locating the midpoint of each interval and recording a dot to represent the number of scores falling in that
interval. The points are then plotted at the midpoints of the interval and then the points are connected by
Page 94

lines.
FIGURE 4: FREQUENCY POLYGON OF A GROUPED SCORES

Frequency Polygon of A Grouped Scores

10
9
8
7
6
5 Series1
4
3
2
1
0
24 27 30 33 36 39 42 45 48 51 54 57
- - - - - - - - - - - -
28 29 32 35 38 41 44 47 50 53 56 59

Statistical Organization of Test Scores

Statistical organization of scores is a systematic arrangement or grouping of scores. The


purpose is to determine their significant meaning. The results of tests in the form of scores may have
very little meaning if they are not organized in any way. Only after statistical organization may scores
show some kind of significance.

Ranking of Scores

One way of arranging scores is by ranking. Rank is the position of an observation, score or
individual in relation to the others in the group according to some characteristics such as magnitude,
quality of importance.

Ranking is the process of determining the relative position of values, measures, or scores
according to some bases such as magnitude, worth, quality, importance, or chronology. It is an
arrangement of values or scores form the highest to the lowest.

The following scores are obtained from a 60 item test in Assessment of Learning administered to
36 students:

56 44 32 34 22 52 21 18 40 38
30 41 50 30 47 30 49 36 20 46
30 50 27 40 33 49 36 27 48 33
41 25 36 48 24 19
Page 94
Procedure:

1. Arrange the scores in a descending order, that is, from the highest to the lowest, in a vertical
column X. Write each score as many times as it occurs.
2. Number the scores consecutively from the highest to the lowest under the symbol N.
3. Assign ranks under the symbol R. The rank of scores occurring once is the same as its
consecutive number. To find the ranks of scores occurring twice or more times, find the average
of the consecutive numbers.

Ranking of the data above:

X CN R X CN R X CN R

56 1 1 40 13 13.5 30 25 24.5

52 2 2 40 14 13.5 30 26 24.5

50 3 3 38 15 15 27 27 27.5

49 4 4.5 36 16 17 27 28 27.5

49 5 4.5 36 17 17 25 29 29

48 6 6.5 36 18 17 24 30 30.5

48 7 6.5 34 19 19 24 31 30.5

47 8 8 33 20 20 21 32 32.5

46 9 9 32 21 21.5 21 33 32.5

44 10 10 32 22 21.5 20 34 34

41 11 11.5 30 23 24.5 19 35 35

41 12 11.5 30 24 24.5 18 36 36

Ranking is used to indicate the relative position of a pupil or student in a group to which he/she
belongs. By ranking test scores, it is possible to compare the achievement of a pupil with those of the
others in the same group. A report of a student’s rank is a very good indication of individual performance
compared to general group performance. Ranking however does not consider the extent of the difference
between successive test scores. From the ranks, the percentage of pupils that surpasses a pupil or that
are surpassed by him can be determined. Ranks are generally well understood by students and parents.

In the above data:

1. The ranks of scores 56, 52, 50, 47, and 46 are their numbers namely: 1, 2, 3, 8 and 9. These
scores appear only once, their consecutive numbers are their ranks.
2. The rank of score 49 is 4.5 , that is the average of 4 and 5 ; 4 added to 5 divided by 2.
3. The rank of 30 is 24.5,that is the average of numbers 23, 24, 25 and 26.
4. Score 30 has a rank of 20. Nineteen students or 52.86 percent of the class surpassed the
student who got this score. This student surpassed 16 or 44.44 percent of his classmates.
Page 94

EXERCISE 2
1. At St. Mercy College, the number of males and females in the various fields of study are as follows:

Major Males Females

Humanities 117 83
Social Sciences 97 132
Natural Sciences 72 20
Business 156 139
Nursing 250 375
Education 48
239

Read each of the following problems carefully before constructing the fraction and solving for the
answer.

a. What percentage of social science major are male?


b. What proportion of business majors are female?
C. For the humanities what is the ratio of males to females?
D. What percentage of student body are the social science majors?
E. What is the ratio of the males to females for the entire sample?
F. What proportion of the nursing majors are male?
G. What percentage of the sample are natural science majors?
H. What is the ratio of humanities majors to business majors?
i. What is the ratio of female business majors to female nursing majors?
J. What proportion of the males are education majors?

2. Twenty high school students completed a class to prepare them for the College Board. Their scores
are as follows:

420 345 560 650 459 499 500 657


467 480 505 555 480 520 530 589
500 550 545 600

A. Display this data in a frequency distribution with columns for frequencies and percentages.

B. Construct a histogram and frequency polygon for these data.

3. Given is a set of test scores in Social Studies 1:

51 42 33 66 43 44 42 51 54 60

46 38 45 21 33 42 57 38 48 26

56 54 37 27 31 33 35 38 64 44

55 32 45 51 52 46 40 59 27 46

51 54 61 58 58 57 52 49 36 45

A. Construct a frequency distribution to display these data.


Page 94

B. What are the real limits and midpoints of the each class interval?
C. Add columns to the table to display the percentage distribution, cumulative frequency and
cumulative percentages.

D. Construct a histogram and a frequency polygon to display these data.

E. Write a paragraph summarizing this distribution of data.

F. Rank the scores.

F.1. What are the ranks of scores 45, 38, 51, 27 and 60?

F.2. What % of the class surpassed the student whose score is 46?

F.3. What % of the class is surpassed by the student/s whose score is 54?

Module 3: MEASURES OF CENTRAL TENDENCY AND CENTRAL LOCATION


Page 94

Objectives: At the end of the lesson the students shall be able to:
1. Define mean, median and mode;

2. Compute the mean, median and mode for ungrouped and ungrouped data;

3. Compute the percentiles and quartiles of given sets of scores; and

4. Compare and explain the appropriate uses of measures of central tendency.

MEASURES OF CENTRAL TENDENCY

Central tendency relates to a point in a distribution around which the scores tend to center. This
point can be used as the most representative value for a distribution of scores. A measure of central
tendency is helpful in showing where the average or typical score falls. The teacher can see how an
individual student performance relates to the average value or make comparisons about two or more
classes that took the same test.

         The benefit of frequency distributions, graphs, and charts is that they summarize the
overall shape of a distribution of scores in a way hat can be quickly comprehended.
However, it is necessary to report more detailed information about the distribution.
         Two kinds of statistics are useful; they are (1) measures of central tendency and (2)
measures of dispersion.

         Three commonly used measures of central tendency are : the mode, median, and
mean.

         These three summarize an entire distribution of scores by describing the most
common score (the mode), the middle case (the median), or the typical score of the cases
(the mean) of that distribution.

         These statistics are powerful because they can reduce huge arrays of data to a single,
easily understood number.

         The function of the central purpose of descriptive statistic is to summarize or “reduce”
data.

Median

 Median (Md) always represents the exact center of a distribution of scores.


         It is the score of the case that is in the exact middle of a distribution: half the cases
have scores higher and half the cases have scores lower than the case with the median
score.

         E.g.; in this set of scores 61, 75, 80, 87, 93, 80 is the median.

         How to find the median--- first, the cases must be placed in order from the highest to
the lowest score. Once this is done, find the central or middle case.
Page 94
         When the number of cases (N) is odd, the value of the median is unambiguous
because there will always be a middle cases; and, in this situation, the median is defined as
the score exactly halfway between the scores of the two middle cases.

         If the number is even, there will be two middle scores. The median will be the average
of the scores of the two middle cases.

         Since the median requires that scores be ranked from high to low, it cannot be
calculated for variables measured at the nominal level.

         The score of nominal-level variables cannot be ordered: the scores are different from
each other but do not form a mathematical scale of any sort.

  Therefore, the median can only be found either ordinal or interval-ratio data but is generally
more appropriate for the former (the ordinal)

 The median is the most exact measure of central tendency. Extreme low or high scores do
not much affect the median. The value of the median depends on the number of scores,
not much on the magnitude of the scores. If most of the scores are high, the median is
high, if most of the scores are low, the median is low.

Example:

The median in this test is 3.

Calculation of the Median for Ungrouped Scores.


Page 94

When the number of cases is odd, arrange the scores from highest to lowest or vice versa. Write
down all the scores, the median is the middlemost score.
Example: When the number of cases is odd.

20 21 19 19 18 22 23 16 15 22 21 18 25

The median is 20 : 25 23 22 22 21 21 20 19 19 18 18 16 15

When the number of cases is even:

37 40 35 24 19 38 27 36 18 20 39 28 22 32

Arranging the scores: 40 39 38 37 36 35 32 28 27 24 22 20 19 18

The middlemost scores are 32 and 28. The average of these two numbers is 30. So the median
is 30 .

Computation of the Median for Grouped Data

Given this frequency distribution / grouped data:

X F

90 – 94 1
85 – 89 2
80 – 84 7
75 – 79 9
70 – 74 11
65 – 69 8
60 – 64 5
55 – 59 5
50 – 54 1
45 – 49 1

N = 50

1. Use the formula:

Mdn = LL +﴾ (N/2 – F1) ﴿ x i


F

Where:

LL = the real lower limit of the median class


N/2 = half sum
Fl = partial sum
f = frequency of the class interval where the median lies
N = the number of cases
Page 94

i = the interval
2. Find the values of the symbols:

a. N/2 = 50/2 = 25
b. Fl = Add the frequencies of the score from the lower score end upward until reaching
half sum but not exceeding it. ( 1+ 1+ 5+ 5+ 8 = 20) Twenty (20) is the partial sum from
the lower limit. The median (25th score lies in the step-interval 70 – 74 and its frequency
is 11)
c. The value of f is 11 ( the frequency of the interval where the median lies)
d. LL is 69.9 ( the real lower limit of 70 – 74 = the interval where the median lies)
e. i, the interval of the class limits, is 5.

3. Substitute the values for the symbols in the formula and solve.

(25 – 20)
Mdn = 69.4 + ________ 5
11

5
= 69.5 + ___ x 5
11
= 69.5 + (.4545) x 5
= 69.5 +2.2725

Mdn = 71.77

4. Check the answer by using the formula:

(N/2 – Fu)

Mdn = Ul – ________ x i

in which : Ul = real upper limit of the interval where the median lies

N/2 = half sum


Fu = partial sum
f = frequency of the class interval where the median lies
N = the number of cases
i = the interval

4.1 Find the values of the symbols and solve.

4.1.1 N/2 = 50/2 = 25

4.1.2 Fu = Add the frequencies of the score from the upper score end downward until
reaching half sum but not exceeding it. ( 1+ 2+ 7+ 9 = 19) Nineteen (19) is the partial sum from the
upper limit. The median (25th score lies in the step-interval 70 – 74 and its frequency is 11)

The value of f is 11 ( the frequency of the interval where the median lies)
UL is 74.5 ( the real upper limit of 70 – 74 = the interval where the median lies)
Page 94

I, the interval of the class limits, is 5.


(25 – 19)
Mdn = 74.5 - ________ x 5

11

(6)

Mdn = 74.5 - ____ x 5

11

Mdn = 74.5 – (.5454) x 5

Mdn = 74.5 -2.727

Mdn = 71.77

Mean

The mean or the arithmetic mean is referred to as the average of scores or measures. It is
considered the best measure of central tendency due to the following qualities:

1. Each score contributes its proportionate share in computing the mean. The mean is
more stable than the median or the mode.
2. Since the mean means average, it is best understood and more widely used measure of
central tendency.
3. It is used as basis in computing other statistical measures like the average deviation,
standard deviation, coefficient of variability, coefficient of correlation, etc.
4. the arithmetic average
5. It reports the average score of a distribution, and its calculation is straightforward

To compute the mean, add the scores and then divide by the number of scores.

Formula:

         ∑ (X) = the summation of the scores

   the use of “mean” is fully justified only when working with interval-ratio data.

Characteristics of the mean


Page 94
         It is always the center of any distribution of scores. It is the point around which all of
the scores cancel out. ∑ (Xi-M)=0 . This algebraic relationship between the scores and the
mean indicates that the mean is a good descriptive measure of the centrality of scores.

         ∑ (Xi-M) = minimum. if the differences between the scores and the mean are squared
and then added, the resultant sum will be less than the sum of the squared differences
between the scores and any other point in the distribution.

         Every score in the distribution affects the mean. The mode and median are not so
affected. This quality is both an advantage and a disadvantage. The mean utilizes all the
available information—every score in the distribution affects the mean. On the other hand,
when a distribution has a few extreme cases (very high or very low scores), the mean may
become very misleading as a measure of centrality.

 Median and mean will be the same when a distribution is symmetrical (share a same point).

 When a distribution has some extremely high score (the positive skew), the mean will always
have a greater numerical value than the median.

 If the distribution has some very low scores (a negative skew), the mean will be lower in value
than the median.

 The relationships between medians and means also have a practical value; i.e. a quick
comparison of the median and mean will always tell you if a distribution is skewed and the
direction of the skew.

 For the good and honest researcher, the selection of a measure of central tendency for a badly
skewed distribution will hinge on what he or she wishes to show and, in most cases, either both
statistics or the median alone should be reported.

Computation of the Mean from Ungrouped Data ( When the number of cases is less than 30)

1. Use the formula: M = ∑X/N (The sum of X divided by N)


2. Write the sores in a column. They can be in any order.
3. Count the number of scores to get N.
4. Add the scores to get the sum.
5. Divide the sum by the number of cases.

Example: Given a set of scores in English given to 17 students.

68 70 56 45 60 54 63 48 35 29
45 63 36 49 36 55 47

The mean is:

X
68 M = 859/17
70 M = 50.529 or 50.53
56
45
60
Page 94

54
63
48
35
29
45
63
36
49
36
55
47
∑X = 859

Computation of the Mean for Grouped Data

1. The formula in finding the Mean for Grouped Data is:

X = AM + (∑fd/N) i

Where:

AM = assumed mean

∑fd = is the algebraic sum of the products of the frequencies and their corresponding deviations
from the assumed mean

N = the number of cases

I = the class interval.

2. Steps in the Computation of the Mean:

2.1 Prepare a table of frequency or frequency distribution.

2.2 Assume a mean. The assumed mean can be in any part of the frequency distribution, but it is
advisable to get the midpoint of the class-interval at the middle of the distribution, that one with the
highest frequency.

2.3 Fill column D starting from the step where the assumed mean lies, assign this a 0 deviation.
From 0, number the steps upward 1,2, 3 4, and downward 1,2, 3, 4 etc. All deviations above the
assumed mean have positive signs and all deviations below the assumed mean have negative signs.

2.4 Multiply the frequency by the deviation for each step to get the fd column, and get the sum of fd.
This is the algebraic sum of the fd column.

2.5 Divide summation fd by N and multiply by the class interval

(∑fd/N) x i
Page 94

2.6 Add the product to the assumed mean.

2.7 Check the answer by assuming another mean.


Example:

X f d fd
90 – 94 1 4 4
85 – 89 2 3 6
80 – 84 7 2 14
75 – 79 9 1 9 +33
70 – 74 11 0 0
65 – 69 8 -1 -8
60 – 64 5 -2 -10
55 – 59 5 -3 -15
50 – 54 1 -4 -4
45 – 49 1 -5 -5 -42
N = 50 Efd = -9

1. Assume a mean. Get the midpoint of the interval where the assumed mean lies.
AM = 72

2. Fill in Column d (deviation). The deviation is the spread of the score from a point of origin

3. Fill in Column fd . The sum of the positive values is +33 and that of the negative values is – 42.
The sum of fd is -9.

4. Substituting the formula:

M = 72 + (-9/50) 5

M = 72 + (-0.18) 5

M = 72 + (-0.9)

M = 72 – 0.9

M = 71.10
Page 94

5. Check your answer by assuming another mean.

X f d fd
90 – 94 1 5 5
85 – 89 2 4 8
80 – 84 7 3 21
75 – 79 9 2 18
70 – 74 11 1 11 +63
65 – 69 8 0 0
60 – 64 5 -1 -5
55 – 59 5 -2
-10 50 – 54 1
-3 -3 45 – 49 1 -
4 -4 -22

N = 50 Efd + 41

Given:

AM = 67
Efd = +4
I =5
N = 50

M = 67 + (+41/50) 5
M = 67 + (0.82) 5
M = 67 + 4.1
M = 71.10

Another method of computing the mean is through the midpoint method. The formula is:

M = EFM
N

X f M fM

90 – 94 1 92 92
85 – 89 2 87 174
80 – 84 7 82 574
75 – 79 9 77 693
70 – 74 11 72 792
65 – 69 8 67 536
60 – 64 5 62 310
55 – 59 5 57 285
50 – 54 1 52 52
45 – 49 1 47 47

N = 50 EfM=3555

Procedure:

1. Prepare a frequency distribution.


Page 94

2. Place Column M which represents the midpoints of each class interval.


3. Fill in Column fM by multiplying each frequency by each corresponding midpoint.

4. Find the sum of the data in Column fM.

5. Divide this by N.

M = 3555/50 = 71.10

THE MODE

  The mode of any distribution is the value that occurs most frequently.
   For example, in the set of scores 98, 92, 90, 90, 84, 64, the mode is 90 because it occurs
twice.

 It is a simple statistic, most useful when there is a need to have a “quick and easy” indicator
of central tendency and when it is worked with nominal-level variables.

 If a researcher desires to report only the most popular or common value of a distribution, or
if the variable under consideration is nominal, then the mode is the appropriate measure of
central tendency.

 Limitations of the mode: (1) some distributions have no mode at all or so many modes
that the statistic loses all meaning. (2) With ordinal and interval-ratio data, the modal score
may not be central to the distribution as a whole. That is, most common does not
necessarily mean “typical” in the sense of identifying the center of the distribution.

Example :

Freshman major instruments in Soochow University in 1999, the mode of this distribution, the single
largest category is those who major in piano.

Musical Instruments Frequency


Piano 10
Voice 6
Violin 5
Viola 3
Cello 5
Double Bass 1
Clarinet 2
Oboe 1
Flute 2
Bassoon 1
French Horn 2
Trumpet 1
Trombone 3
Tuba 2
Percussion 3
  N=47
Page 94

Example:
Distribution Frequency
Male 20
Female 20

In this case, there is no mode at all.

Example:

% of correct Frequency
58 2
60 2
62 3
64 2
66 3
67 4
68 1
69 1
70 1
93 5
  N=24

       The mode of the distribution is 93. But this is not the majority of the scores. It is not appropriate for
the instructor to summarize this distribution by reporting only the modal score because he won’t be able to
convey an accurate picture of the distribution as a whole.

Determining the Mode from Ungrouped Scores (Crude or Rough Mode)

Procedure:

1. Arrange the scores from highest to lowest.

2. The score the occurs most often is the crude mode.

Data:

25 30 37 41 52 52 30 37 42 37

52
52
Page 94

42
41
37 Mode = 37
37
37
37
30
30
30
25

Determining the Crude Mode from Grouped Scores (Frequency Distribution). The crude mode is
the midpoint of the interval with the highest frequency.

X F

90 – 94 1
85 – 89 2
80 – 84 7
75 – 79 9
70 – 74 11 Crude Mode = 72
65 – 69 8
60 – 64 5
55 – 59 5
50 – 54 1
45 – 49 1

N = 50

When a group of scores has two different scores with the same highest frequency, the group is
said to be bi-modal. If there are three different scores with the same highest frequency, the group is tri-
modal, four, quadri-modal, etc.

Computation of the True Mode

The formula for the True Mode is:

Mo = 3Mdn – 2M

In which;

Mo = the mode

Mdn = the median

M = the mean
Page 94

In the Frequency Distribution given above where the median is 71.77 and the mean is 71.10, the mode
is:
3 (71.77) – 2 (71.10)

= 215.31 – 142.2

= 73.11

The mode is merely the most typical value or the most frequent measure. It is computed when a quick
method of computing the most typical and approximate measure of central tendency is all that is needed.

Choosing a Measure of Central Tendency

 the selection should be based on level-of-measurement considerations and on an evaluation of


what each of the three statistics shows.
 The mode, median, and mean will be the same value only under certain specific conditions-- for
symmetrical distributions with one mode.

Tips for selecting

Use the mode when...

 variables are measured at the nominal level


 you want a quick and easy measure for ordinal and interval-ratio variables

  you want to report the most common score

Use the median when...

  variables are measured at the ordinal level


 variables measured at the interval-ratio level have highly skewed distributions

 you want to report the central score. The median always lies at the exact center of a distribution.

Use the mean when

 variables are measured at the interval-ratio level (except for highly skewed distributions)
 you want to report the typical score.  The mean is "the fulcrum that exactly balances all of the
scores."

 you anticipate additional statistical analysis.      

THE MEASURES OF CENTRAL LOCATION OR POINT MEASURES

The measures of location or point measures are the quartiles, deciles and percentiles. The
quartiles (Q1, Q2, Q3 and Q4) are points dividing the distribution into four equal parts. The deciles (D1,
D2, D3, . . . D10) are points which divide the total number of cases in a frequency distribution into ten
equal parts. The percentiles (P1, P2, P3, etc. ) are points which divide the score distribution into one
hundred equal parts.

The procedure in finding the point measures is almost the same as that of the median.
Page 94

Quartiles
The first quartile (Q1) is located at one-fourth of the number of cases, such that 25% of all the
cases lie at or below it and 75% at or above it.

The value of the third quartile corresponds to the value of the seventy-fifth percentile. Seventy-
five percent of all the cases lie at or above it and 25% lie at or below it.

The value of the second quartile is equal to the value of the median, such that 50% of all the
cases lie at or below it and 50% lie at or above it.

Formula:

(N/4 – F)
Q1 = LL + ________ x I
f

LL = is real lower limit of the interval where Q1 lies

N/4 = Number of cases divided by 4

F = partial sum

f = frequency of the interval where the Q1 lies

I = interval

Finding Q1

X F CM

90 – 94 1 50
85 – 89 2 49
80 – 84 7 47
75 – 79 9 40
70 – 74 11 31
65 – 69 8 20
60 – 64 5 12
55 – 59 5 7
50 – 54 1 2
45 – 49 1 1
N = 50

Procedure:

1. Add Column CM in the Frequency Distribution. It stands for the cumulative frequencies, this is done
by adding the scores from the lower score end upward.

2. Find N/4. 50/4 = 12.5. The twenty-fifth score lies in the interval 65 - 69.

3. Determine the partial sum (F). That is the sum of the frequencies upward which totals 25 (Q/4) but not
exceeding it. In the given distribution, the partial sum (F) is 12
Page 94

4. The value of f is 8 since it is the frequency of the interval where Q1 lies.


5. The value of LL or lower limit is 64.5

Substituting the formula:

(12.5 – 12)
Q1 = 64.5 + _______ x 5
8

(0.5)
Q1 = 64.5 + ___ x 5
8
Q1 = 64.5 + (0.06) x 5
Q1 = 64.5 + .30
Q1 = 64.80

Third Quartile

Formula:

(3N/4 - F)
Q3 = LL + ______ I
f

3N/4 = 3 x 50 = 150/4 = 37.5


4

LL = 74.5
F = 31
f= 9
I=5

Q3 = 74.5 + ( 37.5 – 31) x 5


9

Q3 = 74.5 + (6.5/9) x 5

Q3 = 74.5 + (.72) x 5

Q3 = 74.5 + 3.6

Q3 = 78.1

Percentiles

Percentiles are points dividing the distribution into 100 equal parts.

Formula:

(NPx – F)
Px = LL + _________ x I
Page 94

where:
Px = the number of percentile desired; NPx = Percentile Sum (N x the percentage desired); F =
partial sum ( the number of scores falling below the desired percentile); f = the
frequency of the interval where the desired percentile lies;

LL the exact lower limit of the interval where the desired percentile lies; I =
interval.

Finding the Percentiles

X F CM

90 – 94 1 50
85 – 89 2 49
80 – 84 7 47
75 – 79 9 40
70 – 74 11 31
65 – 69 8 20
60 – 64 5 12
55 – 59 5 7
50 – 54 1 2
45 – 49 1 1

N = 50

Procedure:

1. Determine the desired percentile. E.g. P20


2. Find the percentile sum by multiplying the number of cases (N) by the percentage
desired. 20% of 50 = 50 * .20 = 10
3. Find the partial sum by adding the frequencies of the scores from the lower score end
upward until reaching the percentile sum but not exceeding it. ( 1 + 1 + 5= 7 ). Percentile
20 or the 10th score lies at interval 60 – 64.
4. Determine f = the frequency of 60 – 64 is 5.
5. Determine LL. The exact or real lower limit of 60 – 64 is 59.5.
6. The interval is 5.
7. Substitute the formula.

(10 – 7) (3)

P20 = 59.5 + _______ x 5 = 59.5 + ___ x 5 = 59.5 + ( .6) x 5 = 59.5 + 3.0 = 62.5
5 5
Page 94

Another example: Find P65.

NPx = 50 x .65 = 32.5 ; F = 1+ 1+ 5+5+8+11 = 31; P65 lies at the interval 75 – 79;
The real lower limit (LL) is 74.5; the frequency of this interval (f) is 9; and the interval is 5. Substituting the
formula:

(32.5 – 31 ) (1.5)

P65 = 74.5 + _________ x 5 = 74.5 + ____ x 5 = 74.5 + (.17) x 5 = 74.5 + .85 =75.35
9 9

Something you need to know about the measures of central tendency…

         Deciles: the points that divide a distribution of scores into 10ths
         Mean: the arithmetic average of the scores. M represents the mean of a sample, and μ
is the mean of a population.

         Measures of central tendency: statistics that summarize a distribution of score by


reporting the most typical or representative value of the distribution.

         Median (Md): the point in a distribution of scores above and below which exactly half of
the cases fall.

         Mode: the most common value in a distribution or the largest category of a variable.

         Percentile: a point below, which a specific percentage of the cases fall.

         Quartiles: the points that divide a distribution into quarters.

         Σ (sigma): the summation of.

         Skew: the extent to which a distribution of scores has a few scores that are extremely
high (positive skew) or extremely low (negative skew).

         Xi (X sub i): any score in a distribution.

EXERCISE 3

1. Differentiate the mean from the mode and median. Discuss their uses and importance.

2. Find the mean, median and mode of the following set of scores:

89 77 63 99 92 93 94 65 62 82 86 76

3. Find the mean, median and mode of the following set of scores in Philippine History:

82 43 72 74 69 68 67 87 86 73

85 75 65 60 35 57 52 59 40 42

61 57 70 50 45 68 62 49 69 58

61 65 60 81 63 48 54 46 54 44

67 66 49 58 67 60 60 68 58 62
Page 94

4. Find Q1 and Q3.


5. Find P43, P50, P80, P90, P10.

MODULE 4 – MEASURES OF VARIABILITY OR DISPERSION

Objectives: At the end of the lessons, the students shall be able to:

1. define variability, index of qualitative variation, range, standard deviation, average deviation
and quartile deviation;

2. compute the different measures of variability;

3. explain the appropriate uses of measures of variability;

2. describe skewness and kurtosis and their use in interpretation of test scores; and

3. define, compute, compare and explain the appropriate uses of standard scores and how to
make test scores comparable.

Introduction

The measures of central tendency represented by the mean, median and mode are valuable
statistical measures, but they describe only the typical score representing the whole distribution. They
describe only the tendency of the scores to pile up at or near the middle of the distribution. The measures
of variability or dispersion are important . They show the tendency of the scores to spread or scatter
above or below the center point of the distribution. They show how close or how far the scores are from
each other. These measures also show the homogeneity or heterogeneity of different sets of scores.
The higher the measure of variability the more homogenous is the group; the lower the measure of
Page 94

variability, the more heterogenous is the group.


The most common measures or variability are the index of qualitative variation, range, the
standard deviation, the mean deviation and quartile deviation. The most important and most often used in
measurement and research and in advanced statistics is the standard deviation.

What is Variability?
Variability refers to how "spread out" a group of scores is. To see what we mean by spread out,
consider graphs in Figure 1. These graphs represent the scores on two quizzes. The mean score for each
quiz is 7.0. Despite the equality of means, you can see that the distributions are quite different.
Specifically, the scores on Quiz 1 are more densely packed and those on Quiz 2 are more spread out.
The differences among students was much greater on Quiz 2 than on Quiz 1.

Figure 1. Bar charts of two quizzes.


Quiz 1

Quiz 2

Page 94
The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is. Just as in the section on central tendency we discussed measures of the center of a
distribution of scores, in this chapter we will discuss measures of the variability of a distribution. There are
four frequently used measures of variability, the range: interquartile range, variance, and standard
deviation. In the next few paragraphs, we will look at each of these four measures of variability in more
detail.

Range
The range is the simplest measure of variability to calculate, and one you have probably encountered
many times in your life. The range is simply the highest score minus the lowest score. Let’s take a few
examples. What is the range of the following group of numbers: 10, 2, 5, 6, 7, 3, 4? Well, the highest
number is 10, and the lowest number is 2, so 10 - 2 = 8. The range is 8. Let’s take another example.
Here’s a dataset with 10 numbers: 99, 45, 23, 67, 45, 91, 82, 78, 62, 51. What is the range? The highest
number is 99 and the lowest number is 23, so 99 - 23 equals 76; the range is 76. It provides a quick
approximation of the spread of the scores, but it is not a dependable measure of variability because it is
calculated from only two values.

Index of Qualitative Variation (IQV)

The index of qualitative variation (IQV) is essentially the ratio of the amount of variation actually
observed of scores to the maximum variation that could exist in a distribution. The index varies from 0.00
(no variation) to 1.00 (maximum variation) and is used commonly with variables measured at the nominal
level. However, IQV can be used with any variable when scores have been grouped into a frequency
distribution.

Assume that a researcher is interested in comparing the racial heterogeneity of three small
groups of neighborhoods. By inspection, you see that neighbourhood A is the least heterogeneous of the
three. Neighborhood B is more heterogeneous than A, and neighborhood C is the most heterogenous of
the three. The computational formula for IQV is :

K(N - f )
IQV = ________
N (k – 1)

Where: k = the number of categories


N = the number of cases
f = the sum of squared differences

TABLE 5: RACIAL COMPOSITION OF THREE NEIGHBORHOODS


Page 94

Neighborhood A Neighborhood B Neighborhood C


Race Frequency Race Frequency Race Frequency
White 90 White 60 White 30
Black 0 Black 20 Black 30
Other 0 Other 10 Other
30 N=90 N=90
N=90

TABLE 5.1 FINDING THE SUM OF THE SQUARED FREQUENCIES

Neigborhood A Neighborhood B Neighborhood C

Frequency Squared F Frequency Squared F Frequency Squared F


White 90 8100 60 3600 30 900
Black 0 0 20 400 30 900
Other 0 0 10 100 30
900

IQV for Neighborhood A = 3(8100 – 8100)/ 8100 (2)

IQV = 3(0)/16,200
IQV = ).00

IQV for Neighborhood B = 3(8100 – 4100) / 8100 (2)

IQV = 12,000 / 16,200


IQV = 0.74

IQV for Neighborhood C = 3 (8100 – 2700) / 16,200

IQV = 16,200 /16,200


IQV – 1.00

Thus, the IQV, in a quantitative and precise way, substantiates our impressions. Neighborhood A exhibits
no variation on the variable “race”, Neighborhood B has substantial variation and Neighborhood C has the
maximum amount of variation.

Variance
Variability can also be defined in terms of how close the scores in the distribution are to the middle of the
distribution. Using the mean as the measure of the middle of the distribution, the variance is defined as
the average squared difference of the scores from the mean. The data from Quiz 1 are shown in Table 1.
The mean score is 7.0. Therefore, the column "Deviation from Mean" contains the score minus 7. The
column "Squared Deviation" is simply the previous column squares.

Deviation Squared
Scores
from Mean Deviation
Page 94
9 2 4

9 2 4

9 2 4

8 1 1

8 1 1

8 1 1

8 1 1

7 0 0

7 0 0

7 0 0

7 0 0

7 0 0

6 -1 1

6 -1 1

6 -1 1

6 -1 1

6 -1 1

6 -1 1

5 -2 4

5 -2 4

7 0 1.5
Page 94
One thing that is important to notice is that the mean deviation from the mean is 0. This will always
be the case. The mean of the squared deviations is 1.5. Therefore, the variance is 1.5. Analogous
calculations with Quiz 2 show that it's variance is 6.7. The formula for the variance is:

where σ2 is the variance, μ is the mean, and N is the number of numbers. For Quiz 1, μ = 7 and N = 20.
If the variance in a sample is used to estimate the variance in a population, then the previous
formula underestimates the variance and the following formula should be used:

where s2 is the estimate of the variance and M is the sample mean. Note that M is the mean of a sample
taken from a population with a mean of μ. Since, in practice, the variance is usually computed in a
sample, this formula is most often used. The simulation "estimating variance" illustrates the bias in the
formula with N in the denominator.
Let's take a concrete example. Assume the scores 1, 2, 4, and 5 were sampled from a larger
population. To estimate the variance in the population you would compute s 2 as follows:

 M = (1 + 2 + 4 + 5)/4 = 12/4 = 3.

s2 = [(1-3)2 + (2-3)2 + (4-3)2 + (5-3)2]/(4-1)

   = (4 + 1 + 1 + 4)/3 = 10/3 = 3.333

There are alternate formulas that can be easier to use if you are doing your calculations with a hand
calculator:

and
Page 94
For this example,

ΣX2 = 12 + 22 + 42 + 52 = 46

(ΣX)2 = (1 + 2 + 4 + 5)2/N = 144/4 = 36

σ2 = (46 - 36)/4 = 2.5 and

s2 = (46 - 36)/3 = 3.333 as with the other formula.

Standard Deviation
The standard deviation is simply the square root of the variance. This makes the standard
deviations of the two quiz distributions 1.225 and 2.588. The standard deviation is an especially useful
measure of variability when the distribution is normal or approximately normal because the proportion of
the distribution within a given number of standard deviations from the mean can be calculated. For
example, 68% of the distribution is within one standard deviation of the mean and approximately 95% of
the distribution is within two standard deviations of the mean. Therefore, if you had a normal distribution
with a mean of 50 and a standard deviation of 10, then 68% of the distribution would be between 50 - 10
= 40 and 50 +10 =60. Similarly, about 95% of the distribution would be between 50 - 2 x 10 = 30 and 50 +
2 x 10 = 70. The symbol for the population standard deviation is σ; the symbol for an estimate computed
in a sample is s. Figure 2 shows two normal distributions. Both distributions have means of 50. The blue
distribution has a standard deviation of 5; the red distribution has a standard deviation of 10. For the blue
distribution, 68% of the distribution is between 45 and 55; for the red distribution, 68% is between 40 and
60.

Page 94
Figure 2. Normal distributions with standard
deviations of 5 and 10.

Standard Deviation

The standard is the square root of the mean of the squared deviations of all scores from the
mean. It is basically a measure of how far each score is from the mean. Since the standard deviation is
based on deviations from the mean, these two statistics are used together to give meaning to test scores.

Computation of the Standard Deviation from Ungrouped Scores

SD = ∑√( X – M)2
N

Procedure:

1. List the scores under X column.

2. Find the mean of the scores.

3. Place Column X – M (deviations); get the values by subtracting the mean from each of the scores.
When the scores are less than the mean, the negative sign precedes the difference between the raw
score and the mean.

4. Place column (X-M); square each of the values.

5. Find the sum of the squared deviation and divide it by the number of cases.
Page 94
Example: Given this set of scores: 43, 41, 40, 38, 37, 33, 31, 29, 26, 24, 22

X (X – X) (X-X)2

43 7 49
41 5 25
40 4 16
38 2 4
37 1 1
33 -3 9
30 -6 36
29 -7 49
24 - 12 144
24 - 12 144
21 - 15 225
∑X = 360 702
N = 10
X = 36

SD = √702/36

SD = 19.5 = 4.415

Standard Deviation from Grouped Scores

The formula for standard deviation using the short method is:

∑fd ∑fd
SD = I √ ____ - ___
N N

Where SD is standard deviation

I is class interval

∑fd is the sum of the products of the frequencies by the deviations of the score from the mean,
squared.

∑fd is the sum of the products of the frequencies by the deviations of the score from the mean.

N is the number of cases.

Example:

X F d fd fd2
Page 94

90 – 94 1 4 4 16
85 – 89 2 3 6 18
80 – 84 7 2 14 28
75 – 79 9 1 9 +33 9
70 – 74 11 0 0 0
65 – 69 8 -1 -8 8
60 – 64 5 -2 -10 20
55 – 59 5 -3 -15 45
50 – 54 1 -4 -4 16
45 – 49 1 -5 -5 -42 25
N = 50 ∑fd = - 9 ∑fd 2 = 185

185 -9
SD = 5 √ ----- - -----
50 50

SD = 5 √3.7 – (-0.18)2

SD = 5 √ 3.7 - 0.0324

SD = 5 √ 3.8876

SD = 5 x 1.9150

SD = 9.575

Mean Deviation or Average Deviation.

The mean deviation is not very much used in statistical work. Nevertheless, there are times when
it becomes necessary to compute the mean or average deviation. The mean deviation is the square root
of the absolute values of the difference between the mean and the raw scores.

MD = ∑/X-M/ The symbol / / means that the signs are disregarded


N

Example:

X /X – M/

43 7
41 5
40 4
38 2
37 1
33 -3
30 -6
29 -7
24 - 12
24 - 12
21 - 15

∑X = 360 ∑= 74
Page 94

N = 10
X = 36
AD = 74/10 = 7.4

Quartile Deviation (Q)

When using the statistics of percentiles, deciles, quartiles, or the median which are based on the
order of the scores, the standard deviation cannot be used as a measure of variability, since the
deviations used in calculation of the standard deviation are based on the mean. The variability of a
distribution of scores can be used by using the two points, Q3 and Q1. A measure of the variability of the
middle 50 percent of the scores is considered to be a good estimate, because extreme scores or erratic
spacing between scores in the upper 25 percent and lower 25 percent are excluded in the computation.
This is the quartile deviation. This is the value that is equal to half the distance from Q1 to Q3.

(Q3 – Q1)
Q = ______
2

Where: Q = quartile deviation


Q3 = 75th percentile
Q1 = 25th percentile

Finding the Quartile Deviation

X F CM

90 – 94 1 50
85 – 89 2 49
80 – 84 7 47
75 – 79 9 40
70 – 74 11 31
65 – 69 8 20
60 – 64 5 12
55 – 59 5 7
50 – 54 1 2
45 – 49 1 1
N = 50

(12.5 – 12)
Q1 = 64.5 + _______ x 5
8

(0.5)
Q1 = 64.5 + ___ x 5
8

Q1 = 64.5 + (0.06) x 5

Q1 = 64.5 + .30
Page 94

Q1 = 64.80
Third Quartile

Formula:

(3N/4- F)
Q3 = LL + ______ I
f

3/4 = 3 x 50 = 150/4 = 37.5


4

LL = 74.5; F = 31; f= 9; I=5

Q3 = 74.5 + ( 37.5 – 31) x 5


9

Q3 = 74.5 + (6.5/9) x 5

Q3 = 74.5 + (.72) x 5

Q3 = 74.5 + 3.6

Q3 = 78.1 Q= 78.1 - 64.8 / 2 = 13.3

Interquartile Range
The interquartile range (IQR) is the range of the middle 50% of the scores in a distribution. It is computed
as follows:

IQR = 75th percentile - 25th percentile

The 75th percentile was called the upper hinge and the 25th percentile was called the lower hinge. Using
this terminology, the interquartile range is referred to as the H-spread.
A related measure of variability is called the semi-interquartile range. The semi-interquartile range is
defined simply as the interquartile range divided by 2. If a distribution is symmetric, the median plus or
minus the semi-interquartile range contains half the scores in the distribution.

Comparison of Measures of Variability

The range is the quick measure of variability although it is the crudest measure. When the
median is used as the measure of central tendency, the quartile deviation is used as the measure of
variability in test interpretation. The quartile deviation, like the median, is unaffected by a few extreme
scores in a distribution. The most used measure of variability is the standard deviation, since it is the
most stable and varies less from one sample to another than other measures.

Characteristics/Properties of Distributions
Page 94

To describe a frequency distribution by reporting its characteristics, a teacher will need to give at
least one measure of central tendency and at least one measure of variability. In addition to these two
values, further description requires information about the skewness and kurtosis of the distribution.
Skewness is the degree of symmetry of the scores. Kurtosis is the degree of peakedness or flatness of
the distribution curve.

Skewness refers to the degree of symmetry attached to the occurrence of the scores along the
score interval. When the scores tend to center around one point with those on both sides of that point
balancing each other, the distribution is said to have no skewness. If there are some scores in the
distribution that are so atypical of the group that the distribution becomes asymmetrical, then that
distribution is said to be skewed. If the atypical scores are above the measure of central tendency (in the
positive direction), the distribution is said to be positively skewed. Likewise, if the atypical scores are
below the measure of central tendency (in the negative direction), the distribution is said to be negatively
skewed.

Sk = 3 (M – Md)
SD

Distributions also differ from each other in terms of how large or "fat" their tails are. Figure 11 shows two
distributions that differ in this respect. The upper distribution has relatively more scores in its tails; its
shape is called leptokurtic. The lower distribution has relatively fewer scores in its tails; its shape is called
platykurtic .
Page 94
The characteristic of kurtosis is very closely related to the characteristic of variability. It can give
an indication of the degree of homogeneity of the group being tested in regard to the characteristic being
measured. If students tend to be much alike, the scores will generate a leptokurtic frequency polygon; if
students are very different, a platykurtic distribution is generated. A mesokurtic distribution is neither
platykurtic nor leptokurtic.

The kurtosis for the normal distribution is approximately 0.263. Hence if the Ku is greater than 0.263 , the
distribution is most likely platykurtic; while if the Ku is less than 0.263, the distribution is most likely
leptokurtic (Garett, 1973).

K= Q
(P90 – P10)

 The figures below show distributions differing in kurtosis. The top distribution has long tails. It is called
"leptokurtic." The bottom distribution has short tails. It is called "platykurtic."

STANDARD SCORES

A standard score is one of many derived scores used in testing today. Derived scores are
valuable to the classroom teacher. Since scores differ from different tests, the teacher can make them
comparable by expressing them in the same scale. For norm-referenced tests, it is meaningful to
interpret classroom test scores by locating a student’s score with reference to the average for the class
and to describe the distance between the score and the average in terms of the spread of the scores in
the distribution.

Tristan’s raw score on an English achievement test was 50. In the same class of students Tristan
scored 70 on the Mathematics achievement test. To compare the raw score on one test with a raw
score on another test to obtain a total or average score is meaningless. The units are not comparable
because the tests may have different possible total scores, means, and standard deviations. By
converting raw scores on both tests to standard scores, the units become comparable, and can be
Page 94

interpreted properly.
Using the deviation of a score from the mean (X – X) and the standard deviation (SD), a teacher
can build what is called a z-score.

z =X-X
SD

Z = a standard score
X = any raw score
X = the mean
SD = the standard deviation

For example, the means and standard deviation for Tristan’s two test scores are as follows:

Tristan’s Raw Score Mean Standard Deviation

English test 50 45 5.6

Mathematics test 70 75 7

Comparison can be made between the two scores because the scores were earned in the same group of
students. Substituting the formula:

For English For Mathematics

Z = 50 - 45 Z = 70 - 75
5.6 7

Z = 5/5.6 = .89 Z = -5/7 = -0.71

The two scores of Tristan can now be compared. Even if he got a higher score in Mathematics than in
English, he still did well in English as shown by the higher value of the standard score in that subject.

EXERCISE 4

1. Find the standard deviation and average deviation of the following set of scores:

32 39 40 25 29 35 39 28 41 29 37 30
27 32 29 26

2. Find the standard deviation, quartile deviation, skewness and kurtosis. Illustrate your answer.

ci f

54 – 56 3
51 – 53 3
48 – 50 1
45 – 47 5
42 – 44 6
39 – 41 9
36 – 38 5
33 – 35 7
Page 94

30 – 32 4
27 – 29 4
24 – 26 2
3. Vinn’s score in the midterm test in Statistics was 48 and 56 in the final test. The mean of the first test is 42 and the
standard deviation is 5. In the second test the mean is 60 and the standard deviation is 6. In which test did Vinn do
better?

4. Compute the Index of Qualitative Variation of four sections of Nursing Students.

Section A Section B Section C Section D

Above average 15 11 10 13
Average 20 28 18 21
Below Average 6 10 12 16

MODULE 5 – CORRELATIONAL STATISTICS

Objectives: At the end of the lessons, the students shall be able to:

1. define correlation;
Page 94

2. Differentiate between positive and negative correlation;


3. discuss the strengths, advantages and disadvantages of correlation statistics;

4. compute the following correlation measures: Pearson Product-Moment,


Coefficient of Variation; Gamma Coefficient; Lambda Coefficient; Phi Coefficient,
Spearman rho; Kendaul Tau’s Coefficient of Concordance, Point biserial
Coefficient.

What is Correlation

The correlation is a way to measure how associated or related two variables are. The researcher
looks at things that already exist and determines if and in what way those things are related to each other.
The purpose of doing correlations is to allow us to make a prediction about one variable based on what
we know about another variable.  

Correlation is a measure of the relation between two or more variables. The measurement scales
used should be at least interval scales, but other correlation coefficients are available to handle other
types of data. Correlation coefficients can range from -1.00 to +1.00. The value of -1.00 represents a
perfect negative correlation while a value of +1.00 represents a perfect positive correlation. A value of
0.00 represents a lack of correlation 

For example, there is a correlation between income and education. We find that people with
higher income have more years of education. (You can also phrase it that people with more years of
education have higher income.) When we know there is a correlation between two variables, we can
make a prediction. If we know a group’s income, we can predict their years of education. 

Positive correlation

In a positive correlation, as the values of one of the variables increase, the values of the second
variable also increase. Likewise, as the value of one of the variables decreases, the value of the other
variable also decreases. The example above of income and education is a positive correlation. People
with higher incomes also tend to have more years of education. People with fewer years of education
tend to have lower income.

Here are some examples of positive correlations:

1. SAT scores and college achievement—among college students, those with higher SAT scores also
have higher grades

2. Happiness and helpfulness—as people’s happiness level increases, so does their helpfulness
(conversely, as people’s happiness level decreases, so does their helpfulness)

This table shows some sample data. Each person reported income and years of education.

Years of
Participant Income
Education
Page 94

#1 125,000 19
#2 100,000 20

#3 40,000 16

#4 35,000 16

#5 41,000 18

#6 29,000 12

#7 35,000 14

#8 24,000 12

#9 50,000 16

#10 60,000 17

In this sample, the correlation is .79.

We can make a graph, which is called a scatterplot. On the scatterplot below, each point represents one
person’s answers to questions about income and education. The line is the best fit to those points. All
positive correlations have a scatterplot that looks like this. The line will always go in that direction if the
correlation is positive

Negative correlation
Page 94
In a negative correlation, as the values of one of the variables increase, the values of the second variable
decrease. Likewise, as the value of one of the variables decreases, the value of the other variable
increases.

This is still a correlation. It is like an “inverse” correlation. The word “negative” is a label that shows the
direction of the correlation.

There is a negative correlation between TV viewing and class grades—students who spend more time
watching TV tend to have lower grades (or phrased as students with higher grades tend to spend less
time watching TV).

Here are some other examples of negative correlations:

1. Education and years in jail—people who have more years of education tend to have fewer years in jail
(or phrased as people with more years in jail tend to have fewer years of education)

2. Crying and being held—among babies, those who are held more tend to cry less (or phrased as babies
who are held less tend to cry more)

We can also plot the grades and TV viewing data, shown in the table below. The scatterplot below shows
the sample data from the table. The line on the scatterplot shows what a negative correlation looks like.
Any negative correlation will have a line with that direction.

TV in hours per
Participant GPA
week

#1 3.1 14

#2 2.4 10

#3 2.0 20

#4 3.8 7

#5 2.2 25

#6 3.4 9

#7 2.9 15

#8 3.2 13
Page 94

#9 3.7 4
#10 3.5 21

In this sample, the correlation is -.63.

Strength

Correlations, whether positive or negative, range in their strength from weak to strong.

Positive correlations will be reported as a number between 0 and 1. A score of 0 means that there
is no correlation (the weakest measure). A score of 1 is a perfect positive correlation, which does not
really happen in the “real world.” As the correlation score gets closer to 1, it is getting stronger. So, a
correlation of .8 is stronger than .6; but .6 is stronger than .3.

The correlation of the sample data above (income and years of education) is .79.

Negative correlations will be reported as a number between 0 and -1. Again, a 0 means no
correlation at all. A score of –1 is a perfect negative correlation, which does not really happen. As the
correlation score gets close to -1, it is getting stronger. So, a correlation of -.7 is stronger than -.5; but -.5
is stronger than -.2.

Remember that the negative sign does not indicate anything about strength. It is a symbol to tell
you that the correlation is negative in direction. When judging the strength of a correlation, just look at the
number and ignore the sign.
Page 94

The correlation of the sample data above (TV viewing and GPA) is -.63.
 Imagine reading four correlational studies with the following scores. You want to decide which study had
the strongest results:

-.3  -.8   .4    .7

In this example, -.8 is the strongest correlation. The negative sign means that its direction is negative.

Advantage

An advantage of the correlation method is that we can make predictions about things when we
know about correlations. If two variables are correlated, we can predict one based on the other. For
example, we know that SAT scores and college achievement are positively correlated. So when college
admission officials want to predict who is likely to succeed at their schools, they will choose students with
high SAT scores.

We know that years of education and years of jail time are negatively correlated. Prison officials
can predict that people who have spent more years in jail will need remedial education, not college
classes.

Disadvantage

The problem that most students have with the correlation method is remembering that correlation does
not measure cause. Take a minute and chant to yourself: Correlation is not Causation! Correlation is not
Causation!

We know that education and income are positively correlated. We do not know if one caused the
other. It might be that having more education causes a person to earn a higher income. It might be that
having a higher income allows a person to go to school more. It might also be some third variable.

A correlation tells us that the two variables are related, but we cannot say anything about whether
one caused the other. This method does not allow us to come to any conclusions about cause and effect.

Reminders:

Anybody who wants to interpret the result of the coefficient of correlation should be guided by the
following:

1. The relationship of two variables dies not necessarily mean that one is the cause or the effect of the
other variable. It does not imply cause-effect relationship.

2. When the computed r is high, it does not necessarily mean that one factor is strongly dependent on
the other. This shown by height and intelligence of people. Making a correlation here does not make any
sense at all.

On the other hand, when the computed r is small it does not necessarily mean that one factor has
no dependence on the other factor. This may be applicable to I. Q. and grades in school. A low grade
Page 94

would suggest that a student did not make use of his time in studying.
3. If there is a reason to believe that the two variables are related and the computed r is high, these two
variables are really meant as associated.

On the other hand, if the variables correlated are low (though theoretically related) other factors
might be responsible for such small association.

4. Lastly, the meaning of correlation coefficient just simply informs us that when two variables change
there may be a strong or weak relationship taking place.

Measures of Correlation

Pearson’s Correlation Coefficient (r):

Correlation Coefficient, r :

The quantity r, called the linear correlation coefficient, measures the strength and the direction of
a linear relationship between two variables. The linear correlation
coefficient is sometimes referred to as the Pearson product moment correlation coefficient in honor of its
developer Karl Pearson.

The mathematical formula for computing r is:


                            

  

where n is the number of pairs of data.

The value of r is such that -1 < r < +1.  The + and – signs are used for positive
linear correlations and negative linear correlations, respectively. 
  Positive correlation:    If x and y have a strong positive linear correlation, r is close to +1.  An r
value of exactly +1 indicates a perfect positive fit.   Positive values indicate a relationship between x and
y variables such that as values for x increases, values for  y also increase.
  Negative correlation:   If x and y have a strong negative linear correlation, r is close to -1.  An r
value of exactly -1 indicates a perfect negative fit.   Negative values
indicate a relationship between x and y such that as values for x increase, values
for y decrease.
  No correlation:  If there is no linear correlation or a weak linear correlation, r is
close to 0.  A value near zero means that there is a random, nonlinear relationship
between the two variables
  Note that r is a dimensionless quantity; that is, it does not depend on the units
employed.
  A perfect correlation of ± 1 occurs only when the data points all lie exactly on a
straight line.  If r = +1, the slope of this line is positive.  If r = -1, the slope of thisline is negative. 
Page 94

  A correlation greater than 0.8 is generally described as strong, whereas a correlation less than
0.5 is generally described as weak.  These values can vary based upon the    "type" of data being
examined.  A study utilizing scientific data may require a stronger correlation than a study using social
science data.  

Interpreting Pearson's r

Correlations between Are said to be

±.8 and ±1.0 Very strong

±.6 and ±.79 Strong

±.4 and ±.59 Moderate

±.2 and ±.39 Weak

0 and ±.19 Very weak

Strong vs. Meaningful Relationships:

 Strong correlations is not the same as significant correlations


o Statistical Correlation does not always mean meaningful correlation

 Just “eyeballing” the correlation coefficient is not enough

 There are other, more sound ways of judging the meaningfulness of a correlation

o The coefficient of determination

o Hypothesis testing

Coefficient of Determination, r 2  or  R2 :

  The coefficient of determination, r 2, is useful because it gives the proportion of the variance
(fluctuation) of one variable that is predictable from the other variable. It is a measure that allows us to
determine how certain one can be in making predictions from a certain model/graph.
  The coefficient of determination is the ratio of the explained variation to the total
variation.
  The coefficient of determination is such that 0 <  r 2 < 1,  and denotes the strength
of the linear association between x and y. 

The coefficient of determination represents the percent of the data that is the closest to the line of
best fit.  For example, if r = 0.922, then r 2 = 0.850, which means that 85% of the total variation in y
can be explained by the linear relationship between x and y (as described by the regression
equation).  The other 15% of the total variation in y remains unexplained.
Page 94

  The coefficient of determination is a measure of how well the regression line


represents the data.  If the regression line passes exactly through every point on the
scatter plot, it would be able to explain all of the variation. The further the line is away from the points, the
less it is able to explain.

  Spearman's Rank Correlation Coefficient

In calculating this coefficient, we use the Greek letter 'rho' or r


The formula used to calculate this coefficient is:

r = 1 - (6 d2 ) / n(n2 - 1)

To illustrate this, consider the following worked example:


Researchers at the European Centre for Road Safety Testing are trying to find out how the age of cars
affects their braking capability. They test a group of ten cars of differing ages and find out the minimum
stopping distances that the cars can achieve. The results are set out in the table below:

Table : Car ages and stopping distances

Car Age Minimum Stopping at 40 kph


(months) (metres)

A 9 28.4

B 15 29.3

C 24 37.6

D 30 36.2

E 38 36.5

F 46 35.3

G 53 36.2

H 60 44.1
Page 94

I 64 44.8
J 76 47.2

These figures form the basis for the scatter diagram, below, which shows a reasonably strong positive
correlation - the older the car, the longer the stopping distance.

Graph 1: Car age and Stopping distance (data from Table 1 above)

To process this information we must, firstly, place the ten pieces of data into order, or rank them
according to their age and ability to stop. It is then possible to process these ranks.

Table 2: Ranked data from Table 1 above

Car Age Minimum Age Stopping


(months) Stopping at 40 rank rank
kph (metres)

A 9 28.4 1 1

B 15 29.3 2 2

C 24 37.6 3 7

D 30 36.2 4 4.5

E 38 36.5 5 6

F 46 35.3 6 3
Page 94

G 53 36.2 7 4.5
H 60 44.1 8 8

I 64 44.8 9 9

J 76 47.2 10 10

Notice that the ranking is done here in such a way that the youngest car and the best stopping
performance are rated top and vice versa. There is no strict rule here other than the need to be consistent
in your rankings. Notice also that there were two values the same in terms of the stopping performance of
the cars tested. They occupy 'tied ranks' and must share, in this case, ranks 4 and 5. This means they are
each ranked as 4.5, which is the mean average of the two ranking places. It is important to remember that
this works despite the number of items sharing tied ranks. For instance, if five items shared ranks 5, 6, 7,
8 and 9, then they would each be ranked 7 - the mean of the tied ranks.

Now we can start to process these ranks to produce the following table:

Table 3: Differential analysis of data from Table 2

Car Age Stopping Age Stopping D d2


(mths) distance rank rank

A 9 28.4 1 1 0 0

B 15 29.3 2 2 0 0

C 24 37.6 3 7 4 16

D 30 36.2 4 4.5 0.5 0.25

E 38 36.5 5 6 1 1
Page 94

F 46 35.3 6 3 -3 9
G 53 36.2 7 4.5 -2.5 6.25

H 60 44.1 8 8 0 0

I 64 44.8 9 9 0 0

J 76 47.2 10 10 0 0

d 32.5
2

Note that the two extra columns introduced into the new table are Column 6, 'd', the difference between
stopping distance rank and age rank; and Column 7, 'd 2', Column 6 entries squared. These squared
figures are summed at the foot of Column 7.

Calculation of Spearman Rank Correlation Coefficient (r) is:

r = 1 - (6 d2 ) / n(n2 - 1)

Number in sample (n) = 10


r = 1 - (6 x 32.5) / 10(10 x 10 - 1)
r = 1 - (195 / 10 x 99)
r = 1 - 0.197
r = 0.803

What does this tell us? When interpreting the Spearman Rank Correlation Coefficient, it is usually enough
to say that:

 for values of r of 0.9 to 1, the correlation is very strong.


 for values between 0.7 and 0.9, correlation is strong.

 and for values between 0.5 and 0.7, correlation is moderate.

This is the case whether r is positive or negative. In our case of car ages and stopping distance
performance, we can say that there is a strong correlation between the two variables.

Gamma

An alternative to rank-order correlation is the Goodman’s and Kruskal’s gamma (G). The value of
one variable can be estimated or predicted from the other variable when you have the knowledge of their
values. The gamma can also be used when ties are found in the ranking of the data.
Page 94

The formula for gamma is :


G = (Ns – N1) / (Ns + N1)

Where:

Ns = the number of pairs ordered in the parallel direction

N1 = the number of pairs in the opposite direction

G = the difference between the proportion of pairs ordered in the parallel direction and the
proportion of pairs ordered in the opposite direction.

Steps in solving for Gamma:

1. Arrange the ordering for one of the two characteristics from the highest to the lowest or
vice versa from top to bottom through the rows and for the other characteristic from the
highest to the lowest or vice versa from left to right through the column.
2. Compute Ns by multiplying the frequency in every cell by the series of the frequencies in
all of the other cells which are both to the right of the original cell below it and then sum
up the products obtained.

3. To solve N1, you simply reverse partially the process described in Step 2. You multiply
the frequency of every cell by the sum of the frequencies in all the cells to the left of the
original cell below it and then sum up the products obtained.

4. Apply the gamma formula.

Example: Compute the gamma for the data shown in the following table:

TABLE: EMPLOYEES RANKED ON SOCIO-ECONOMIC STATUS AND EDUCATIONAL


QUALIFICATION

Educational Status

Socio-Economic Upper Middle Lower Total


Status

Upper 24 19 5 48

Middle 12 54 29 95

Lower 9 26 25 60

Total 45 99 59 203

Solution:

Ns = 24(54) + 24(29) + 24(26) + 24(25) +19(29) + 19(25) + 12(26) + 12(25) + 54(25)


Page 94

= (1296 + 696 + 624 + 600 + 551 + 475 + 312 + 300 + 1350


= 6204

The procedure can also be written as:

Ns = 24(54 + 29 + 26 + 25) + 19(29 + 25) + 12(26 + 25) + 54(25)

= 1296 + 696 + 624 + 600 + 551 + 475 + 312 + 300 + 1350

=6204 t 19(12 + 9) + 29(9 +26) + 54(9)

= 60 + 270 + 45 + 130 + 228 + 171 + 261 + 754 + 486

= 2405

G= Ns – N1
Ns + N1

= 6204 – 2405
6204 + 2405

= 3799/8609 = .44

A gamma coefficient of +.44 indicates a moderately small positive correlation between socio-
economic status and educational qualification. The results suggests a correlation based on a dominance
of a parallel direction of the two variables. This means that there is 44 percent greater chance for a
parallel direction than opposite direction for the variables of socio-economic status and educational
status. If the gamma coefficient is -.44 it will indicate instead a moderately small negative correlation
based in a dominance of opposite direction.

Correlation Between Nominal Data

Lambda

The lambda coefficient is represented by the lower case Greek letterλ; which is also known as
Guttman’s Coefficient of probability. This is defined as the proportionate reduction in error measure which
shows the index of how much an error is reduced in prediction values of one variable from values of
another. It is also the other way of measuring to what degree the accuracy of the prediction can be
improved. If you have a lambda of .80, you have minimized the error of your prediction about the values
of the dependent variable by 80 percent; if your lambda is .30, you have minimized the error of your
prediction by only 30 percent. The lambda coefficient is a measure of association for comparing several
groups or categories at the nominal level.

Formula:

λc = Fbi – Mbc

Where: λ = N
the–lambda
Mbc coefficient
Fbi = the biggest cell frequencies in the ith row (with the sum taken over all the
rows)
Mbc = the biggest of the column totals
Page 94

N = the total number of observations


However, if your dependent variable is regarded as the row variable, the formula to use is:

λr= Fbj – Mbr


N – Mbr

Where: λ = the lambda coefficient


Fbj = the biggest cell frequencies in the jth column ( with the sum taken over all
of the columns)
Mbr = the biggest of the row totals
N = the total number of observations

Example: Compute the λc and λr for the data on the following table:

TABLE : A SEGMENT OF THE FILIPINO LECTORATE ACCORDING TO RELIGION AND POLITICAL


PARTY

Political Party

Religion Liberal Lakas United Party Total

Catholic 49 25 18 92

Iglesia ni Cristo 34 72 21 127

Protestant 26 25 20 71

Total 109 122 59 290

Solution:

λc = EFbi - Mbc
N – Mbc

= (49 + 72 + 26 ) - 122
290 – 122

= 147 – 122
181

= 25 / 181 = .1381215 = .14 Answer

λr = Efbj - Mbr
N – Mbr

= (49 + 72 + 21 ) – 127
290 – 127

= 142 – 127
Page 94

163 = 15 / 163 = .0920245 = .09 Answer


The obtained lambda coefficient of .14 indicates that when religion is treated as an independent
variable, the error reduced in the prediction (increases its accuracy) is 14 percent. While the obtained
lambda coefficient of .09 indicates that when political party is treated as independent variable, the error
minimized in the prediction (increases its accuracy) is 9 percent. These results prove that religion
accurately predicts political party more than political party predicting religion.

The Point Biserial Coefficient of Correlation , rpbi

This procedure is used when you are interested in getting the degree of relationship between two
variables where one variable is continuous such as test scores and the other is dichotomous nominal
variable such as gender. Your question in mind perhaps is, “Is gender related to intelligence?” In this
case the most appropriate statistical technique is the point biserial correlation, rpbi.

Formula:

rpbi = ∑F(∑FpY) – ∑Fp(∑FY)2

∑Fp∑Fw (∑F(∑FY ) – (∑FY)

Example:

IQ Scores

Y Y2 Fp Fw F FY FY2 FpY

12 144 3 1 4 48 576 36
11 121 5 1 6 66 726 55
10 100 7 0 7 70 700 70
9 81 8 0 8 72 648 72
8 64 9 3 12 96 768 72
7 49 7 5 12 84 588
49 6 36 6 7 3 78
468 36 5 25 3 2 5
25 125 15 4 16 2 8
10 40 160 8 3 9 1
9 10 30 90 3 2 4
0 5 5 10 20 0
51 41 92 619 4869 416

Solution:

rpbi = 92 (416) – 51 (619)2


51(41) 92 (4869 – (619)

= 38272 – 31569
2091 (447948 – 383161)

= 6703 / 2091 647887

= 6703 / (45.73) (254.53) = 6703 / 11639.66 = .58 Answer


Page 94
Legend: Fp = frequency of males in the distribution of IQ scores
Fw = frequency of females in the distribution of IQ scores
F = frequency of both males and females

Phi Coefficient

If both variables instead are nominal and dichotomous, the Pearson simplifies even further. First,
perhaps, we need to introduce contingency tables. A contingency table is a two dimensional table
containing frequencies by category. For this situation it will be two by two since each variable can only
take on two values, but each dimension will exceed two when the associated variable is not dichotomous.
In addition, column and row headings and totals are frequently appended so that the contingency table
ends up being n + 2 by m + 2, where n and m are the number of values each variable can take on. The
label and total row and column typically are outside the gridded portion of the table, however.

As an example, consider the following data organized by gender and employee classification (faculty/staff).

Class.\Gender Female (0) Male (1) Totals

Staff 10 5 15

Faculty 5 10 15

Totals: 15 15 30

Contingency tables are often coded as below to simplify calculation of the Phi coefficient.

Y\X 0 1 Totals

1 A B A+B

0 C D C+D

Totals: A + C B+D N

With this coding: phi = (BC - AD)/√((A+B)(C+D)(A+C)(B+D)).

For this example we obtain: phi = (25-100)/√(15•15•15•15) = -75/225 = -0.33, indicating a slight
correlation. Please note that this is the Pearson correlation coefficient, just calculated in a simplified
manner. However, the extreme values of |r| = 1 can only be realized when the two row totals are equal
and the two column totals are equal. There are thus ways of computing the maximal values, if desired.
Page 94

Kendall’s Coefficient of Concordance W


To determine the degree of agreement or concordance among subgroups in ranking a number of
sets of items or aspects on the variables of interest. Kendall’s coefficient of concordance W is used.

Formula: W= 12∑D2
m2 (n) (n2 – 1)

Where: W = Kendall’s coefficient of concordance 12


= a constant number
∑D2 = summation of squares of the deviations from the average ranks
m = number of subgroups or judges
n = number of items / projects ranked

Example: Do the supervisors differ in their rankings of the employees’ individual project studies?

TABLE: DATA FOR COMPUTING THE KENDALL’s COEFFICIENT OF CONCORDANCE W

Employees Project Supervisor Sum of Deviations


Study (n) Ranks from the
(m) Average D2
Ranks (D)

1 2 3 4

A 2 3 3 2 10 -12 144

B 9 10 10 10 39 17 289

C 1 2 1 3 7 -15 225

D 3 1 2 1 7 -15 225

E 4 5 4 5 18 -4 16

F 7 7 6 6 26 4 16

G 5 6 5 4 20 -2 4

H 6 8 8 8 30 8 64

I 8 9 7 7 31 9 81

J 10 4 9 9 32 10 100

220/10 = ∑D2= 1164


22

Computation:

W = 12∑D2
m2 (n) (n2 – 1)

= 12 (1164)
Page 94

42 (10) (102 – 1)
= 13968
16(10) (100 – 1) = 13968 / 15840 = 0.88

This implies that there is high degree of agreement among the supervisors in their ratings of the
employees’ projects.

EXERCISES:

1. Find the coefficient of correlation of the following midterm (x) and final (y) grades:

X 75 70 65 90 85 85 80 70 65 90
Y 80 75 65 95 90 85 90 75 70 90

2. Compute the Spearman Rank-Order Correlation

Student Test X Test Y

1 19 25
2 18 27
3 15 29
4 12 25
5 11 21
6 9 19
7 7 14
8 5 13
9 6
12

3. Compute the gamma coefficient to determine the degree of association between socio-economic
status and use of local goods.

Socio-Economic Status
High Medium Low
Use of f f f
Local Goods
Very Good 32 16
18 Good 22 18
24 Poor 12
12 36

4. Compute for Lambda for placement in a particular school.

Classroom Placement
Special Class Tutor Regular Class
Mentally
Retarded 40 25 15
Learning
Disabled 20 40
35 No Label 10
25 50
Page 94

5. Find the Phi Coefficient:


Marriage Practice Non-Christian Christian Total
One (1) wife 10 35 45

Two(2) wives or more 21 12 33

Total 31 47 78

6. Find the Point Biserial Coefficient

Y Fp Fw
15 5 3
14 7 3
13 9 5
12 10 8
11 11 5
10 9 6
9 8
9 8 6
10 7
5 6 6
4 5

7. Find the Kendall’s Coefficient of Concordance W

Contestants Judge 1 Judge 2 Judge 3 Judge 4 Judge 5


1 88 93 85 90 94
2 80 85 79 88 80
3 96 94 86 87 90
4 83 85 80 88 82
5 78 82 85 80 79
7 86 89 90 87
83 8 90 89 93
94 88 9 82 85 87
86 80 10 75 79 80
78 85

Page 94
MODULE 6 – INFERENTIAL STATISTICS

Objectives: At the end of the lessons, the students shall be able to:

1. define inferential statistics, hypothesis, hypothesis testing and other terms relative to
doing experimental researches.
2. discuss the importance of inferential statistics and hypothesis testing in doing
experimental researches;
3. differentiate directional from non-directional tests, null hypothesis from alternative
hypothesis;
4. explain the uses of inferential statistics;
5. List and explain the steps in hypothesis testing;
6. Compute Z-test, T-tests, F-test and Chi-Square test of given problems;
7.
Inferential Statistics
Inferential statistics deals with the analysis and interpretation of data. This statistics consists of
different statistical tools/tests used in the analysis of interval, ratio, nominal and ordinal data. These tests
are used in making inferences from or conclusions on larger groups, populations, or generalizations about
them on the basis of the information obtained by the study of one or more samples. The extent to which
the use of these statistics can be done with accuracy depends on the goodness of samples. The
sampling techniques / procedures are also of great important with regard to the use of these different
statistical tests.
Making Predictions Using Inferential Statistics

Inferential statistics are used to draw conclusions and make predictions based on the descriptions of data.
In this section, we explore inferential statistics by using an extended example of experimental studies.
Key concepts used in our discussion are probability, populations, and sampling.
Page 94

Experiments
A typical experimental study involves collecting data on the behaviors, attitudes, or actions of two or more
groups and attempting to answer a research question (often called a hypothesis). Based on the analysis
of the data, a researcher might then attempt to develop a causal model that can be populations.

A question that might be addressed through experimental research might be "Does grammar-based
writing instruction produce better writers than process-based writing instruction?" Because it would be
impossible and impractical to observe, interview, survey, etc. all first-year writing students and instructors
in classes using one or the other of these instructional approaches, a researcher would study a sample –
or a subset – of a population. Sampling – or the creation of this subset of a population – is used by many
researchers who desire to make sense of some phenomenon.

To analyze differences in the ability of student writers who are taught in each type of classroom, the
researcher would compare the writing performance of the two groups of students. Two key concepts used
to conduct the comparison are:

 Dependent Variables
 Independent Variables

Dependent Variables

In an experimental study, a variable whose score depends on (or is determined or caused by) another
variable is called a dependent variable. For instance, an experiment might explore the extent to which the
writing quality of final drafts of student papers is affected by the kind of instruction they received. In this
case, the dependent variable would be writing quality of final drafts.

Independent Variables

In an experimental study, a variable that determines (or causes) the score of a dependent variable is
called an independent variable. For instance, an experiment might explore the extent to which the writing
quality of final drafts of student papers is affected by the kind of instruction they received. In this case, the
independent variable would be the kind of instruction students received.

Probability

Beginning researchers most often use the word probability to express a subjective judgment about the
likelihood, or degree of certainty, that a particular event will occur. People say such things as: "It will
probably rain tomorrow." "It is unlikely that we will win the ball game." It is possible to assign a number to
the event being predicted, a number between 0 and 1, which represents degree of confidence that the
event will occur. For example, a student might say that the likelihood an instructor will give an exam next
week is about 90 percent, or .9. Where 100 percent, or 1.00, represents certainty, .9 would mean the
student is almost certain the instructor will give an exam. If the student assigned the number .6, the
likelihood of an exam would be just slightly greater than the likelihood of no exam. A rating of 0 would
indicate complete certainty that no exam would be given (Shoeninger, 1971).

The probability of a particular outcome or set of outcomes is called a p-value. In our discussion, a p-value
will be symbolized by a p followed by parentheses enclosing a symbol of the outcome or set of outcomes.
For example, p(X) should be read, "the probability of a given X score" (Shoeninger). Thus p(exam) should
be read, "the probability an instructor will give an exam next week."
Page 94

Population
A population is a group which is studied. In educational research, the population is usually a group of
people. Researchers seldom are able to study every member of a population. Usually, they instead study
a representative sample – or subset – of a population. Researchers then generalize their findings about
the sample to the population as a whole.

Sampling

Sampling is performed so that a population under study can be reduced to a manageable size. This can
be accomplished via random sampling, discussed below, or via matching.

Random sampling is a procedure used by researchers in which all samples of a particular size have an
equal chance to be chosen for an observation, experiment, etc (Runyon and Haber, 1976). There is no
predetermination as to which members are chosen for the sample. This type of sampling is done in order
to minimize scientific biases and offers the greatest likelihood that a sample will indeed be representative
of the larger population. The aim here is to make the sample as representative of the population as
possible. Note that the closer a sample distribution approximates the population distribution, the more
generalizable the results of the sample study are to the population. Notions of probability apply here.
Random sampling provides the greatest probability that the distribution of scores in a sample will closely
approximate the distribution of scores in the overall population.

Matching

Matching is a method used by researchers to gain accurate and precise results of a study so that they
may be applicable to a larger population. After a population has been examined and a sample has been
chosen, a researcher must then consider variables, or extrinsic factors, that might affect the study.
Matching methods apply when researchers are aware of extrinsic variables before conducting a study.
Two methods used to match groups are:

 Precision Matching
 Frequency Distribution

Although, in theory, matching tends to produce valid conclusions, a rather obvious difficulty arises in
finding subjects which are compatible. Researchers may even believe that experimental and control
groups are identical when, in fact, a number of variables have been overlooked. For these reasons,
researchers tend to reject matching methods in favor of random sampling.

Methods

Statistics can be used to analyze individual variables, relationships among variables, and differences
between groups. In this section, we explore a range of statistical methods for conducting these analyses.

Statistics can be used to analyze individual variables, relationships among variables, and differences
between groups.

Hypothesis Testing

Meaning of Hypothesis

Simply defined, a hypothesis is a tentative explanation for certain events, phenomena or


Page 94

behaviours. In statistical language, a hypothesis is a statement of prediction of the relationship between


or among variables. Plainly stated, a hypothesis is the most specific statement of a problem. It is a
requirement that these variables are measurable, and that the statement specifies how these variables
are related. Furthermore, the hypothesis is testable which means that the relationship between the
variables can be put into test by means of the application of an appropriate statistical test on the data
gathered about the variables.

Null Hypothesis

The null hypothesis, H0, represents a theory that has been put forward, either because it is believed to be
true or because it is to be used as a basis for argument, but has not been proved. For example, in a
clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than
the current drug. We would write

H0: there is no difference between the two drugs on average.

We give special consideration to the null hypothesis. This is due to the fact that the null hypothesis relates
to the statement being tested, whereas the alternative hypothesis relates to the statement to be accepted
if / when the null is rejected.

The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We
either "Reject H0 in favour of H1" or "Do not reject H0"; we never conclude "Reject H1", or even "Accept
H1".

If we conclude "Do not reject H0", this does not necessarily mean that the null hypothesis is true, it only
suggests that there is not sufficient evidence against H0 in favour of H1. Rejecting the null hypothesis
then, suggests that the alternative hypothesis may be true.

Alternative Hypothesis

The alternative hypothesis, H1, is a statement of what a statistical hypothesis test is set up to establish.
For example, in a clinical trial of a new drug, the alternative hypothesis might be that the new drug has a
different effect, on average, compared to that of the current drug. We would write

H1: the two drugs have different effects, on average.

The alternative hypothesis might also be that the new drug is better, on average, than the current drug. In
this case we would write

H1: the new drug is better than the current drug, on average.

The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We
either "Reject H0 in favour of H1" or "Do not reject H0". We never conclude "Reject H1", or even "Accept
H1".

If we conclude "Do not reject H0", this does not necessarily mean that the null hypothesis is true, it only
suggests that there is not sufficient evidence against H0 in favour of H1. Rejecting the null hypothesis
then, suggests that the alternative hypothesis may be true.Setting up and testing hypotheses is an
essential part of statistical inference. In order to formulate such a test, usually some theory has been put
Page 94

forward, either because it is believed to be true or because it is to be used as a basis for argument, but
has not been proved, for example, claiming that a new drug is better than the current drug for treatment of
the same symptoms.

In each problem considered, the question of interest is simplified into two competing claims / hypotheses
between which we have a choice; the null hypothesis, denoted H0, against the alternative hypothesis,
denoted H1. These two competing claims / hypotheses are not however treated on an equal basis:
special consideration is given to the null hypothesis.

There are different ways of stating a hypothesis. Let us consider an experiment involving two groups, an
experimental group, and a control group. The experimenter likes to test whether the treatment (values
clarification lessons) will improve the self-concept of the experimental group. The same treatment is not
given to the control group. It is presumed that any difference between the two groups after the treatment
can be attributed to the experimental treatment with a certain degree of confidence.

The hypothesis for this experiment can be stated in various ways:

1. No existence or existence of a difference between groups

Ho: There will be no significant difference in self-concept between the group that will be exposed
to values clarification lessons and the group which will not be exposed to the same.

H1: The self-concept of the group that will be exposed to values clarification lessons will differ
from that of the control group.

2. No existence or existence of an effect of the treatment

Ho: There will be no significant effect of the values clarification lessons on the self-concept of the
students.

H1: Values clarification lessons will have a significant effect on the self-concept of the students.

3. No existence or existence of relationship between the variables

Ho: The self-concept of the students will not relate to the values clarification lessons conducted
on them.

H1: The self-concept of the students will not be related to the values clarification lessons

they will be exposed to.

Directional and Non-Directional Tests of the Hypothesis

The null hypothesis is associated with a hypothesis of no difference, no effect, or no relationship.


In operational form, the Ho can be stated as Ho: u1 – u2 = 0. This can be put to test against the alternative
hypothesis, H1: u1 – u2 ≠ 0. This means that when Ho is rejected, the H1is accepted indicating the
Page 94

existence of difference between the two means. In this case, when the direction of the nature of the
difference is not stated, the test is considered ad non-directional. The non-directional tests makes use of
two tails or two-sides of the statistical model or distribution.
If the direction of the difference is stated, that is, the self-concept of one group I more positive
than that of the other group, the test becomes directional. Accordingly, the hypothesis is in its alternative
form. To wit, H1: u1 >u2 or u1 < u2. The former uses only the positive end of the distribution while the latter
uses the negative end in the rejection of the Ho. When comparing your statistical results with the
distributions in specified tables, be sure to note whether you used a one-tailed test or a two-tailed test.

One-sided Test

A one-sided test is a statistical hypothesis test in which the values for which we can reject the null
hypothesis, H0 are located entirely in one tail of the probability distribution.

In other words, the critical region for a one-sided test is the set of values less than the critical value of the
test, or the set of values greater than the critical value of the test.

A one-sided test is also referred to as a one-tailed test of significance.

The choice between a one-sided and a two-sided test is determined by the purpose of the investigation or
prior reasons for using a one-sided test.

Example

Suppose we wanted to test a manufacturers claim that there are, on average, 50 matches in a box. We
could set up the following hypotheses

H0: µ = 50,

against

H1: µ < 50 or H1: µ > 50

Either of these two alternative hypotheses would lead to a one-sided test. Presumably, we would want to
test the null hypothesis against the first alternative hypothesis since it would be useful to know if there is
likely to be less than 50 matches, on average, in a box (no one would complain if they get the correct
number of matches in a box or more).

Yet another alternative hypothesis could be tested against the same null, leading this time to a two-sided
test:

H0: µ = 50,

against

H1: µ not equal to 50

Here, nothing specific can be said about the average number of matches in a box; only that, if we could
reject the null hypothesis in our test, we would know that the average number of matches in a box is likely
to be less than or greater than 50.
Page 94

Two-Sided Test
A two-sided test is a statistical hypothesis test in which the values for which we can reject the null
hypothesis, H0 are located in both tails of the probability distribution.

In other words, the critical region for a two-sided test is the set of values less than a first critical value of
the test and the set of values greater than a second critical value of the test.

A two-sided test is also referred to as a two-tailed test of significance.

The choice between a one-sided test and a two-sided test is determined by the purpose of the
investigation or prior reasons for using a one-sided test.

Example

Suppose we wanted to test a manufacturers claim that there are, on average, 50 matches in a box. We
could set up the following hypotheses

H0: µ = 50,

against

H1: µ < 50 or H1: µ > 50

Either of these two alternative hypotheses would lead to a one-sided test. Presumably, we would want to
test the null hypothesis against the first alternative hypothesis since it would be useful to know if there is
likely to be less than 50 matches, on average, in a box (no one would complain if they get the correct
number of matches in a box or more).

Yet another alternative hypothesis could be tested against the same null, leading this time to a two-sided
test:

H0: µ = 50,

against

H1: µ not equal to 50

Here, nothing specific can be said about the average number of matches in a box; only that, if we could
reject the null hypothesis in our test, we would know that the average number of matches in a box is likely
to be less than or greater than 50.

Type I Error

In a hypothesis test, a type I error occurs when the null hypothesis is rejected when it is in fact true; that
is, H0 is wrongly rejected.

For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better,
on average, than the current drug; i.e.

H0: there is no difference between the two drugs on average.


Page 94
A type I error would occur if we concluded that the two drugs produced different effects when in fact there
was no difference between them.

The following table gives a summary of possible results of any hypothesis test:

Decision

Reject H0 Don't reject H0

H0 Type I Error Right decision


Truth
H1 Right decision Type II Error

A type I error is often considered to be more serious, and therefore more important to avoid, than a type II
error. The hypothesis test procedure is therefore adjusted so that there is a guaranteed 'low' probability of
rejecting the null hypothesis wrongly; this probability is never 0. This probability of a type I error can be
precisely computed as

P(type I error) = significance level =

The exact probability of a type II error is generally unknown.

If we do not reject the null hypothesis, it may still be false (a type II error) as the sample may not be big
enough to identify the falseness of the null hypothesis (especially if the truth is very close to hypothesis).

For any given set of data, type I and type II errors are inversely related; the smaller the risk of one, the
higher the risk of the other.

A type I error can also be referred to as an error of the first kind.

Type II Error

In a hypothesis test, a type II error occurs when the null hypothesis H0, is not rejected when it is in fact
false. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no
better, on average, than the current drug; i.e.

H0: there is no difference between the two drugs on average.

A type II error would occur if it was concluded that the two drugs produced the same effect, i.e. there is no
difference between the two drugs on average, when in fact they produced different ones.

A type II error is frequently due to sample sizes being too small.

The probability of a type II error is generally unknown, but is symbolised by and written

P(type II error) =
Page 94

A type II error can also be referred to as an error of the second kind.

Test Statistic
A test statistic is a quantity calculated from our sample of data. Its value is used to decide whether or not
the null hypothesis should be rejected in our hypothesis test.

The choice of a test statistic will depend on the assumed probability model and the hypotheses under
question.

Critical Value(s)

The critical value(s) for a hypothesis test is a threshold to which the value of the test statistic in a sample
is compared to determine whether or not the null hypothesis is rejected.

The critical value for any hypothesis test depends on the significance level at which the test is carried out,
and whether the test is one-sided or two-sided.

Critical Region

The critical region CR, or rejection region RR, is a set of values of the test statistic for which the null
hypothesis is rejected in a hypothesis test. That is, the sample space for the test statistic is partitioned
into two regions; one region (the critical region) will lead us to reject the null hypothesis H0, the other will
not. So, if the observed value of the test statistic is a member of the critical region, we conclude "Reject
H0"; if it is not a member of the critical region then we conclude "Do not reject H0".

Significance Level

The significance level of a statistical hypothesis test is a fixed probability of wrongly rejecting the null
hypothesis H0, if it is in fact true.

It is the probability of a type I error and is set by the investigator in relation to the consequences of such
an error. That is, we want to make the significance level as small as possible in order to protect the null
hypothesis and to prevent, as far as possible, the investigator from inadvertently making false claims.

The significance level is usually denoted by

Significance Level = P(type I error) =

Usually, the significance level is chosen to be 0.05 (or equivalently, 5%).

P-Value

The probability value (p-value) of a statistical hypothesis test is the probability of getting a value of the test
statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis H0, is
true.

It is the probability of wrongly rejecting the null hypothesis if it is in fact true.


Page 94

It is equal to the significance level of the test for which we would only just reject the null hypothesis. The
p-value is compared with the actual significance level of our test and, if it is smaller, the result is
significant. That is, if the null hypothesis were to be rejected at the 5% significance level, this would be
reported as "p < 0.05".

Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is, the more
convincing is the rejection of the null hypothesis. It indicates the strength of evidence for say, rejecting the
null hypothesis H0, rather than simply concluding "Reject H0' or "Do not reject H0".

Power

The power of a statistical hypothesis test measures the test's ability to reject the null hypothesis when it is
actually false - that is, to make a correct decision.

In other words, the power of a hypothesis test is the probability of not committing a type II error. It is
calculated by subtracting the probability of a type II error from 1, usually expressed as:

Power = 1 - P(type II error) =

The maximum power a test can have is 1, the minimum is 0. Ideally we want a test to have high power,
close to 1.

Hypothesis Testing

When you are evaluating a hypothesis, you need to account for both the variability in your
sample and how large your sample is. 

Hypothesis testing is generally used when you are comparing two or more groups. 

For example, you might implement want to determine the effectiveness of a teaching method, say
Method B. To evaluate it there is a need to compare the teaching results using the method introduced
with the method currently being used (Method A). The usual method of teaching is used to the Control
Group while the method being introduced is used in the experimental group. The teaching results in these
two groups are then compared.

When you are evaluating a hypothesis, you need to account for both the variability in your
sample and how large your sample is.  Based on this information, you'd like to make an assessment of
whether any differences you see are meaningful, or if they are likely just due to chance.  This is formally
done through a process called hypothesis testing.

Five Steps in Hypothesis Testing:

1. Specify the Null Hypothesis

2. Specify the Alternative Hypothesis

3. Set the Significance Level (a)

4. Calculate the Test Statistic and Corresponding P-Value


Page 94

5. Drawing a Conclusion
Step 1: Specify the Null Hypothesis

The null hypothesis (H0) is a statement of no effect, relationship, or difference between two or more
groups or factors.  In research studies, a researcher is usually interested in disproving the null hypothesis.

Examples:

 There is no difference in the teaching results using Method A and Method B

 Method B has no significant effect on the performance of the students.

 There is no association between method of teaching and performance of students.

Step 2: Specify the Alternative Hypothesis

The alternative hypothesis (H1) is the statement that there is an effect or difference.  This is usually
the hypothesis the researcher is interested in proving.  The alternative hypothesis can be one-sided (only
provides one direction, e.g., lower) or two-sided.  We often use two-sided tests even when our true
hypothesis is one-sided because it requires more evidence against the null hypothesis to accept the
alternative hypothesis.

Examples:

 The performance of students using Method B differs with the performance of students
using Method A. (two-sided).

 The performance of students using Method B is lower than the performance of students
who were taught using Method A. (one-sided).

 There is an association between performance and teaching method (two sided).

Step 3: Set the Significance Level 

The significance level (denoted by the Greek letter alpha— ) is generally set at 0.05.  This means that
there is a 5% chance that you will accept your alternative hypothesis when your null hypothesis is
actually true. The smaller the significance level, the greater the burden of proof needed to reject the null
hypothesis, or in other words, to support the alternative hypothesis.

Step 4: Calculate the Test Statistic and Corresponding P-Value

In another section we present some basic test statistics to evaluate a hypothesis. Hypothesis testing
generally uses a test statistic that compares groups or examines associations between variables. 
When describing a single sample without establishing relationships between variables, a confidence
interval is commonly used.
Page 94
The p-value describes the probability of obtaining a sample statistic as or more extreme by chance
alone if your null hypothesis is true.  This p-value is determined based on the result of your test statistic. 
Your conclusions about the hypothesis are based on your p-value and your significance level. 

Example:

 P-value = 0.01 This will happen 1 in 100 times by pure chance if your null hypothesis is
true. Not likely to happen strictly by chance.

Example:

 P-value = 0.75 This will happen 75 in 100 times by pure chance if your null hypothesis is
true. Very likely to occur strictly by chance.

Cautions About P-Values

Your sample size directly impacts your p-value.  Large sample sizes produce small p-values even
when differences between groups are not meaningful.  You should always verify the practical
relevance of your results.  On the other hand, a sample size that is too small can result in a failure to
identify a difference when one truly exists. 

Plan your sample size ahead of time so that you have enough information from your sample to show a
meaningful relationship or difference if one exists.

Example:

 Average ages were significantly different between the two groups (16.2 years vs. 16.7
years; p = 0.01; n=1,000). Is this an important difference?  Probably not, but the large
sample size has resulted in a small p-value.

Example:

 Average ages were not significantly different between the two groups (10.4 years vs.
16.7 years; p = 0.40, n=10). Is this an important difference?  It could be, but because the
sample size is small, we can't determine for sure if this is a true difference or just happened
due to the natural variability in age within these two groups.

  Step 5: Drawing a Conclusion

1. P-value <= significance level  => Reject your null hypothesis in favor of your
alternative hypothesis.  Your result is statistically significant.

2. P-value > significance level  => Fail to reject your null hypothesis.  Your result is not
statistically significant.

Hypothesis testing is not set up so that you can absolutely prove a null hypothesis.  Therefore,
when you do not find evidence against the null hypothesis, you fail to reject the null hypothesis. When you
Page 94

do find strong enough evidence against the null hypothesis, you reject the null hypothesis.  Your
conclusions also translate into a statement about your alternative hypothesis.  When presenting
the results of a hypothesis test, include the descriptive statistics in your conclusions as well.  Report exact
p-values rather than a certain range.  For example, "The intubation rate differed significantly by patient
age with younger patients have a lower rate of successful intubation (p=0.02)."  Here are two more
examples with the conclusion stated in several different ways.

Example:

 H0: There is no difference in survival between the intervention and control group.

 H1: There is a difference in survival between the intervention and control group.

  = 0.05; 20% increase in survival for the intervention group; p-value = 0.002

Conclusion:

 Reject the null hypothesis in favor of the alternative hypothesis.

 The difference in survival between the intervention and control group was statistically
significant.

 There was a 20% increase in survival for the intervention group compared to control
(p=0.001).

Example:

 H0: There is no difference in survival between the intervention and control group.

 H1: There is a difference in survival between the intervention and control group.

  = 0.05; 5% increase in survival between the intervention and control group; p-value =
0.20

Conclusion:

 Fail to reject the null hypothesis.

 The difference in survival between the intervention and control group was not statistically
significant.

 There was no significant increase in survival for the intervention group compared to
control (p=0.20).

Z-test

The Z-test is a statistical test used in inference which determines if the difference between a sample
mean and the population mean is large enough to be statistically significant, that is, if it is unlikely to have
occurred by chance.
Page 94

The Z-test is used primarily with standardized testing to determine if the test scores of a particular sample
of test takers are within or outside of the standard performance of test takers.
Notation and mathematics

In order for the Z-test to be reliable, certain conditions must be met. The most important is that since the
Z-test uses the population standard deviation, it must be known. The sample must be a simple random
sample of the population. If the sample came from a different sampling method, a different formula must
be used. It must also be known that the population varies normally (i.e., the sampling distribution of the
probabilities of possible values fits a standard normal curve). If it is not known that the population varies
normally, it suffices to have a sufficiently large sample, generally agreed to be ≥ 300 or 400.

In actuality, knowing the true σ of a population is unrealistic except for cases such as standardized testing
in which the entire population is known. In cases where it is impossible to measure every member of a
population it is more realistic to use a t-test, which uses the standard error obtained from the sample
along with the t-distribution.

The test requires the following to be known:

 σ (the standard deviation of the population)

First calculate the standard error (SE) of the mean:

SE = σ/ √n

The formula for calculating the z score for the Z-test is as follows:

Z=x–u
SE

where:

 x is a mean score to be standardized


 μ is the mean of the population

Finally, the z score is compared to a Z table, a table which contains the percent of area under the normal
curve between the mean and the z score. Using this table will indicate whether the calculated z score is
within the realm of chance or if the z score is so different from the mean that the sample mean is unlikely
to have happened by chance.

Example

Let's take a look at using the Z-test with standardized testing.

In a U.S. school district, a standardized reading test is used to test the performance of fifth grade students
in an elementary school against the national norm for fifth grade students. The number of fifth grade
students in this elementary school taking the test is 55 students.

The national norm test score, the population mean, for this particular standardized test is 100 points. The
population standard deviation for the year under study is 12.

The scores of the fifth grade students of the elementary school in this school district are a sample of the
Page 94

total population of fifth grade students in the U.S. which have also taken the test.
The school district is told that the mean for their particular school is 96, which is lower than the national
mean. Parents of the students become upset when they learn their school is below the national norm for
the reading test. The school district administration points out that the test scores are actually pretty close
to the population mean though they are lower.

The real question is this, is the school's mean test score sufficiently lower than the national norm as to
indicate a problem or is the school's mean test score within acceptable parameters. We will use the Z-test
to see.

First of all calculate the standard error of the mean:

SE = σ/ √n = 12 / √55 = 12 / 7.42 = 1.62

Next calculate the z score:

Z = M – u / SE = 96 – 100 /1.62 = - 2.47

Remember that a z score is the distance from the population mean in units of the population standard
deviation. This means that in our example, a mean score of 96 is −2.47 standard deviation units from the
population mean. The negative means that the sample mean is less than the population mean. Since the
normal curve is symmetric the Z table is always expressed in positive z scores so if the calculated z score
is negative, look it up in the table as if it were non-negative.

Next we look the z score up in a Z table and we find that a z score of −2.47 is 49.32%. This means that
the area under the normal curve between the population mean and our sample mean is 49.32%.

What this tells us is that 49.32% plus 50% or 99.32% of the time, a randomly selected group of 55
students have a higher average score than these 55 students had. This is because our z score is
negative so we are below the population mean. So not only do we include the distance between our
sample mean and the population mean, we also include the area under the normal curve which is greater
than the population mean.

If our sample mean had been 104 rather than 96, then our z score would have been 2.47 which would
have indicated that our sample mean was above the population mean. That would have indicated that the
fifth grade students in our sample were in the top 0.7% of the nation.

But let's get back to our original question. Is there a problem with the reading program at our elementary
school? Our question can be reformulated to say, is the mean from our elementary school, a sample from
the general population of fifth grade students, far enough outside of the norm that we need to take a
corrective action to improve the reading program?

Let's put this in the form of a hypothesis which we are going to test with our statistical analysis. Our
alternative hypothesis is that our sample mean is significantly different from the population mean and that
corrective action is necessary. Our null hypothesis is that the difference is purely attributable to chance
and no action is necessary.

To answer this question, we need to determine what is the level of confidence (confidence level) we want
to use. Typically a 0.05 confidence level is used meaning that if the null hypothesis is true we stand only a
5% chance of rejecting it anyway.
Page 94

In the case of our sample mean, the z score of −2.47 which provides us a value of 49.32% means that
49.32% plus 50% or 99.32% of the time, a randomly selected group of 55 students have a higher average
score than the 55 students in our sample had. To test our null hypothesis, we have to conduct a two-sided
test. Since our sample is outside of this area by 1.82%, we have to reject the null hypothesis because the
value of 1.82% is less than 5%, our confidence level.

Therefore we can conclude with a 95% confidence level that the test performance of the students in our
sample were not within the normal variation.

References
 Sprinthall, Richard C. Basic Statistical Analysis: Seventh Edition, copyright 2003, Pearson
Education Group

Example problem using the Z-test in the process to test statistical hypotheses for a research
problem

Research Problem: We randomly select a group of 9 subjects from a population with a mean IQ of 100

and standard deviation of 15 ( , ).


We give the subjects intensive "Get Smart" training and then administer an IQ test. The sample mean IQ
is 113 and the sample standard deviation is 10. Did the training result in a significant increase in IQ
score?

The research question for this experiment is - Does training subjects with the Get Smart training program,
increase their IQ significantly over the average IQ for the general population? We will use the six step
process to test statistical hypotheses for this research problem.

1. State null hypothesis and alternative hypothesis:

2. Set the alpha level:

3. Calculate the value of the proper statistic:

Since this problem involves comparing a single group's mean with the population mean and the
standard deviation for the population is known, the proper statistical test to use is the Z-test.

Z = 2.6

4. State the rule for rejecting the null hypothesis:

We need to find the value of Z that will only be exceeded 5% of the time since we have set our
alpha level at .05. Since the Z score is normally distributed (or has the Z distribution), we can find
this 5% level by looking at the Z / T table. The associated Z-score would be 1.64 (or 1.65).
Page 94

Our rejection rule then would be: Reject H0 if .

5. Decision: Reject H0, p < .05, one-tailed.


Our decision rule said reject H0 if the Z value is equal to or greater than 1.64. Our Z value was 2.6
and 2.6 is greater than 1.64 so we reject H0. We also add to the decision the alpha level (p < .05)
and the tailedness of the test (one-tailed).

6. Statement of results: The average IQ of the group taking the Get Smart training program is
significantly higher than that of the general population.

If we reject the null hypothesis, we accept the alternative hypothesis. The statement of results
then states the alternative hypothesis which is the research question stated in the affirmative
manner.

We mentioned that we use the Z-test to compare the mean of a sample with the population mean when
the population standard deviation is known. We will now turn to the statistic to use when the standard
deviation of the population is not known, the one-sample t-test.

The T-test

The t-test is one type of inferential statistics. It is used to determine whether there is a significant
difference between the means of two groups. With all inferential statistics, we assume the dependent
variable fits a normal distribution. When we assume a normal distribution exists, we can identify the
probability of a particular outcome. We specify the level of probability (alpha level, level of significance, p)
we are willing to accept before we collect data (p < .05 is a common value that is used). After we collect
data we calculate a test statistic with a formula. We compare our test statistic with a critical value found
on a table to see if our results fall within the acceptable level of probability.

When the difference between two population averages is being investigated, a t-test is used. In
other words, a t-test is used when we wish to compare two means (the scores must be measured on an
interval or ratio scale). We would use a t-test if we wished to compare the reading achievement of boys
and girls. With a t-test, we have one independent variable and one dependent variable. The independent
variable (gender in this case) can only have two levels (male and female). The dependent variable would
be reading achievement. If the independent had more than two levels, then we would use a one-way
analysis of variance (ANOVA).

The test statistic that a t-test produces is a t-value. Conceptually, t-values are an extension of z-
scores. In a way, the t-value represents how many standard units the means of the two groups are apart. 

With a t-test, the researcher wants to state with some degree of confidence that the obtained
difference between the means of the sample groups is too great to be a chance event and that some
difference also exists in the population from which the sample was drawn. In other words, the difference
that we might find between the boys' and girls' reading achievement in our sample might have occurred
by chance, or it might exist in the population. If our t-test produces a t-value that results in a probability
of .01, we say that the likelihood of getting the difference we found by chance would be 1 in a 100 times.
We could say that it is unlikely that our results occurred by chance and the difference we found in the
sample probably exists in the populations from which it was drawn.

Five factors contribute to whether the difference between two groups' means can be considered
significant:
1. How large is the difference between the means of the two groups? Other factors being equal, the
greater the difference between the two means, the greater the likelihood that a statistically
significant mean difference exists. If the means of the two groups are far apart, we can be fairly
Page 94

confident that there is a real difference between them.


2. How much overlap is there between the groups? This is a function of the variation within the
groups. Other factors being equal, the smaller the variances of the two groups under
consideration, the greater the likelihood that a statistically significant mean difference exists. We
can be more confident that two groups differ when the scores within each group are close
together.

3. How many subjects are in the two samples? The size of the sample is extremely important in
determining the significance of the difference between means. With increased sample size,
means tend to become more stable representations of group performance. If the difference we
find remains constant as we collect more and more data, we become more confident that we can
trust the difference we are finding.

4. What alpha level is being used to test the mean difference (how confident do you want to be
about your statement that there is a mean difference). A larger alpha level requires less
difference between the means. It is much harder to find differences between groups when you
are only willing to have your results occur by chance 1 out of a 100 times (p < .01) as compared
to 5 out of 100 times (p < .05).

5. Is a directional (one-tailed) or non-directional (two-tailed) hypothesis being tested? Other factors


being equal, smaller mean differences result in statistical significance with a directional
hypothesis. For our purposes we will use non-directional (two-tailed) hypotheses.

Assumptions Underlying the t Test


1. The samples have been randomly drawn from their respective populations
2. The scores in the population are normally distributed

3. The scores in the populations have the same variance (s1=s2) Note: We use a different
calculation for the standard error if they are not.

Three Types of t-tests


 Pair-difference t-test (a.k.a. t-test for dependent groups, correlated t-test) df= n (number of
pairs) -1

This is concerned with the difference between the average scores of a single sample of
individuals who are assessed at two different times (such as before treatment and after
treatment). It can also compare average scores of samples of individuals who are paired in some
way (such as siblings, mothers, daughters, persons who are matched in terms of a particular
characteristics).

 t-test for Independent Samples (with two options)


This is concerned with the difference between the averages of two populations. Basically, the
procedure compares the averages of two samples that were selected independently of each
other, and asks whether those sample averages differ enough to believe that the populations
from which they were selected also have different averages. An example would be comparing
math achievement scores of an experimental group with a control group.
1. Equal Variance (Pooled-variance t-test) df=n (total of both groups) -2      Note:
Used when both samples have the same number of subject or when s1=s2 (Levene
or F-max tests have p > .05).

2. Unequal Variance (Separate-variance t-test) df dependents on a formula, but a rough


estimate is one less than the smallest group    Note: Used when the samples have
different numbers of subjects and they have different variances --  s1<>s2 (Levene
or F-max tests have p < .05).
Page 94

How do I decide which type of t-test to use?


Page 94
The t- Test for Correlated Samples
The t-test for correlated sample is used when comparing the means before and after treatment. It
is also used to compare the means of the pretest and posttest. The formula is:

t= D

√ ∑ D2 – (∑D)2

n (n – 1)
Where: D = the mean difference between the pretest and posttest

∑D2 = the sum of squares of the difference between the pretest an posttest

∑D = the summation of the difference between the pretest and the post test

n = the sample size

Example: An experimental study was conducted on the effect of programmed materials in English on the
performance of 20 selected college students. Before the program was implemented the pretest was
administered and after 5 months the same instrument was used to get the posttest result. The following
is the result of the experiment.

Pretest Posttest D D2

20 25 -5 25
30 35 -5 25
10 25 -15 225
15 25 -10 100
20 20 0 0
10 20 -10 100
18 22 -4 16
14 20 -6
36 15 20
-5 25 20 15
5 25 18
30 -12 144 15
10 5 25
15 16 -1 1
20 25 -5 25
18 10 8 64
40 45 -5 25
10 15 -5 25
10 10 0 0
12 18 -6 36
20 25 -5
25 ∑D=
-81 ∑D2 = 947
Page 94

D = -81 / 20

= -4.05
Solution:

1. Problem: Is there a significant difference between the pretest and the posttest means on the use of
programmed materials in English?

2. Hypothesis:

Ho : There is no significant difference between the pretest and posttest or the use of the
programmed materials did not affect the students’ performance in English.

H1 : There is significant difference between the pretest and posttest means.

3. Level of Significance

df = n – 1

= 20 – 1 = 19

t.05 = 1.729

4. Statistics : t-test for correlated samples.

5. Computation:

t= -4.05

√ 947 – (-81)2 / 20
20 (20 -1)
= - 4.05
√ 947 – 328.05
20(19)
= - 4.05
√ 618.95 / 380
= -4-05
√1.6288

= -4.05 / 1.2762

= -3.17

4. Decision: The computed t is higher than the critical value of t, reject the null hypothesis.

5. Conclusion: There is significant difference between the pretest and the posttest means. The
posttest result is higher than the pretest result, implying that the use of programmed materials in
English is effective.
Page 94
T-test for Uncorrelated Samples

We often want to know whether the means of two populations on some outcome differ. For
example, there are many questions in which we want to compare two categories of some categorical
variable (e.g., compare males and females) or two populations receiving different treatments in context of
an experiment. The two-sample t-test is a hypothesis test for answering questions about the mean where
the data are collected from two random samples of independent observations, each from an underlying
normal distribution:

The steps of conducting a two-sample t-test are quite similar to those of the one-sample test. In
this example we will examine a program's effect by comparing the birthweights of babies born to women
who participated in an intervention with the birthweights of a group that did not.

A comparison of this sort is very common in medicine and social science. To evaluate the effects
of some intervention, program, or treatment, a group of subjects is divided into two groups. The group
receiving the treatment to be evaluated is referred to as the treatment group, while those who do not are
referred to as the control or comparison group. In this example, mothers who are part of the prenatal care
program to reduce the likelihood of low birthweight is the treatment group, with a control group comprised
of women who do not take part in the program.

Returning to the two-sample t-test, the steps to conduct the test are similar to those of the one-
sample test.

Establish hypotheses

The first step to examining this question is to establish the specific hypotheses we wish to
examine. Specifically, we want to establish a null hypothesis and an alternative hypothesis to be
evaluated with data.

In this case:

 Null hypothesis is that the difference between the two groups is 0. Another way of stating the null
hypothesis is that the difference between the mean of the treatment group of birthweight for
program babies and the mean of the control group of birthweight for poor women is zero.
 Alternative hypothesis - the difference between the observed mean of birthweight for program
babies and the expected mean of birthweight for poor women is not zero.

Calculate test statistic

Calculation of the test statistic requires three components:

1. The average of both sample (observed averages)

Statistically, we represent these as


Page 94

2. The standard deviation (SD) of both averages

Statistically, we represent these as


3. The number of observations in both populations, represented as

From hospital records, we obtain the following values for these components:

  Treatment Control

Average Weight 3100 g 2750 g

SD 420 425

N 75 75

With these pieces of information, we calculate the following statistic, t:

Use this value to determine p-value

Having calculated the t-statistic, compare the t-value with a standard table of t-values to determine
whether the t-statistic reaches the threshold of statistical significance.

With a t-score so high, the p-value is 0.001, a score that forms our basis to reject the null hypothesis and
conclude that the prenatal care program made a difference.
Page 94
ANOVA – Analysis of Variance

About the ANOVA Test

Our lesson on the t-test demonstrated how to compare differences of means between two
groups, such as comparing outcomes between control and treatment groups in an experimental study.
The t-test is a useful tool for comparing the means of two groups; however, the t-test is not good in
situations calling for the comparison of three or more groups. It can only compare one group's mean to a
known distribution or compare the means of two groups. With three or more groups, the t-test is not an
effective statistical tool. On a practical level, using the t-test to compare many means is a cumbersome
process in terms of the calculations involved. On a statistical level, using the t-test to compare multiple
means can lead to biased results.

Yet there are many kinds of questions in which we might want to compare the means of several
different groups at once. For example, in evaluating the effects of a particular social program, we might
want to compare the mean outcomes of several different program sites. Or we might be interested in
examining the relative performance of different members of a corporate sales team in terms of their
monthly or annual sales records. Alternatively, in an organization with several different sales managers,
we might ask whether some sales managers get more out of their sales staff than others.

With questions such as these, the preferred statistical tool is the ANOVA, (Analysis Of Variance.
There are some similarities between the t-test and ANOVA. Like the t-test, ANOVA is used to test
hypotheses about differences in the average values of some outcome between two groups; however,
while the t-test can be used to compare two means or one mean against a known distribution, ANOVA
can be used to examine differences among the means of several different groups at once. More
generally, ANOVA is a statistical technique for assessing how nominal independent variables influence a
continuous dependent variable.

This module describes and explains the one-way ANOVA, a statistical tool that is used to
compare multiple groups of observations, all of which are independent but may have a different mean for
each group. A test of importance for many kinds of questions is whether or not all the averages of a set of
groups are equal. There is another form of ANOVA that examines how two explanatory variables affect an
outcome variable; however, this application is not discussed in this module.

Assumptions

Analysis of Variance methods have in common a set of two assumptions:

1. The standard deviations (SD) of the populations for all groups are equal - this is sometimes
referred to as an assumption of the homogeneity of variance. Again, we can represent this
assumption for groups 1 through n as

2. The samples are randomly selected from the population.


Page 94
The One-Way ANOVA

One application of the one-way ANOVA that might be of interest has to do with students'
performance in an introductory course in statistics. At the University of Technology (UTech), there are
three sections of an introductory statistics course offered: one in the morning, another in the afternoon,
and a third in the evening. These courses are taught by different instructors; however, given the
importance of statistics in the university's sequence of courses and recent efforts to implement a standard
curriculum, all three courses cover exactly the same material.

Karl Rousseau has recently been hired as the new chair of the statistics department at UTech. In taking
on his duties as the department head, he's interested in whether there's any variation in how well students
do in the course, based on whether they enroll in the morning, afternoon, or evening course. A morning
man himself, Prof. Rousseau has some doubt that there's much learning going on in the evening course;
however, given his position as chair, he is very interested in making sure that all three sections are getting
the same high-quality education. Moreover, he's too much of an empiricist to allow this idea to go
untested. So he proposes that at semester's end students in all three sections take the National
Assessment of Statistical Knowledge (NASK) to determine whether there are differences in student
performance.

He starts by generating a null hypothesis that all three groups will have the same mean score on the test.
In formula terms, if we use the symbol μ to represent the average score, the null hypothesis is expressed
through the following notation:

Graphically, the null hypothesis can be represented in the following manner:

Notice in the graph that all three groups have the same average score (all three points are on the dashed
line) and all three groups have the same SD (noted by the fact that the line around the mean point for
each group is the same size). So the null hypothesis is that all three groups will have the same average
score on the NASK.
Page 94

The alternate hypothesis is that all means are not the same. It's important to point out that the opposite is
not that all means are different (i.e., μ1 μ2 μ5 ). It is possible that some of the means could be the same,
yet if they are not all identical, we would reject the null hypothesis. Rather, the alternative hypothesis is
that not all means are equal.

Graphically, the alternate hypothesis might look like this:

Figure 2 highlights some important features and one of the keys to understanding ANOVA. ANOVA
allows us to separate the total variability in the outcome (in this case, the variability in scores on the
NASK) into two parts: variability within groups and variability between groups. As you can see from the
graphs, there are differences within groups, with scores on the NASK ranging from roughly the same
amount above and below the mean for each group. But there are also differences between the groups,
with the evening group having somewhat higher scores than those of the afternoon group, although the
means of both groups are lower than the mean of the morning group.

In table form, the scores for students look like this:

  Average NASK Score Number of Students

Morning 4.12 313

Afternoon 3.99 340

Evening 4.37 297

  4.15 950

Note: The standard deviation (SD) for each group is the same: 1.3.

With these data, we can calculate an ANOVA statistic to evaluate Prof. Rousseau's hypothesis. This is
done in multiple steps, as described below.

1. Calculate the Variation Between Groups

The first step is to calculate the variation between groups by comparing the mean of each group (or, in
this example, the mean of each of the three classes) with the mean of the overall sample (the mean score
Page 94

on the test for all students in this sample). This measure of between-group variance is referred to as
"between sum of squares" or BSS. BSS is calculated by adding up, for all groups, the difference between
the group's mean and the overall population mean, multiplied by the number of cases in the group. In
formula terms:

Plugging in the values, we get the following:

This sum of squares has a number of degrees of freedom equal to the number of groups minus 1. In this
case, dfB = (3-1) = 2

We divide the BSS figure by the number of degrees of freedom to get our estimate of the variation
between groups, referred to as "Between Mean Squares" as:

2. Calculate the Variation Within Groups

To measure the variation within groups, we find the sum of the squared deviation between scores on the
exam and the group average, calculating separate measures for each group, then summing the group
values. This is a sum referred to as the "within sum of squares" or WSS. In formula terms, this is
expressed as:

With the values from above in this formula, we have:

As in step 1, we need to adjust the WSS to transform it into an estimate of population variance, an
adjustment that involves a value for the number of degrees of freedom within. To calculate this, we take a
value equal to the number of cases in the total sample (N), minus the number of groups (k). In formula
terms,

Then we can calculate the a value for "Within Mean Squares" as


Page 94
3. Calculate the F test statistic

This calculation is relatively straightforward. Simply divide the Between Mean Squares, the value obtained
in step 1, by the Within Mean Squares, the value calculated in step 2.

Then compare this value to a standard table with values for the F distribution to calculate the significance
level for the F value (link to F-test calculator). In this case, the significance level is less than .01. This is
extremely strong evidence against the null hypothesis, indicating that students' performance varies
significantly across the three classes.

Recap

To calculate an ANOVA, it is often convenient to arrange the statistics needed for calculation into a table
such as the one below:

Source Sum of Squares Degrees of Freedom Mean Squares

Between Mean Squares


Between BSS dfB
BSS/dfB

Within Mean Squares


Within WSS dfW
WSS/dfW

Total TSS = BSS + WSS    

To fill in this table with the data from the problem above, we have:

Source Sum of Squares Degrees of Freedom Mean Squares

Between 23.36 2 11.68

Within 1600.43 947 1.69

Total 1623.79 950  


Page 94
Chi-Square Goodness-of-Fit Test

The chi-square test (Snedecor and Cochran, 1989) is used to test if a sample of data came from a
population with a specific distribution.

An attractive feature of the chi-square goodness-of-fit test is that it can be applied to any univariate
distribution for which you can calculate the cumulative distribution function. The chi-square goodness-of-
fit test is applied to binned data (i.e., data put into classes). This is actually not a restriction since for non-
binned data you can simply calculate a histogram or frequency table before generating the chi-square
test. However, the value of the chi-square test statistic are dependent on how the data is binned. Another
disadvantage of the chi-square test is that it requires a sufficient sample size in order for the chi-square
approximation to be valid.

The chi-square test is an alternative to the Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit


tests. The chi-square goodness-of-fit test can be applied to discrete distributions such as the binomial and
the Poisson. The Kolmogorov-Smirnov and Anderson-Darling tests are restricted to continuous
distributions.

Additional discussion of the chi-square goodness-of-fit test is contained in the product and process
comparisons chapter

The chi-square test is defined for the hypothesis:

H0: The data follow a specified distribution.

Ha:
The data do not follow the specified distribution.
For the chi-square goodness-of-fit computation, the data are divided into k bins and the test
Test statistic is defined as
Statistic:

where is the observed frequency for bin i and is the expected frequency for bin i. The
expected frequency is calculated by

where F is the cumulative Distribution function for the distribution being tested, Yu is the upper
limit for class i, Yl is the lower limit for class i, and N is the sample size.

This test is sensitive to the choice of bins. There is no optimal choice for the bin width (since the
optimal bin width depends on the distribution). Most reasonable choices should produce similar,
Page 94

but not identical, results. Dataplot uses 0.3*s, where s is the sample standard deviation, for the
class width. The lower and upper bins are at the sample mean plus and minus 6.0*s,
respectively. For the chi-square approximation to be valid, the expected frequency should be at
least 5. This test is not valid for small samples, and if some of the counts are less than five, you
may need to combine some bins in the tails.

Generally speaking, the chi-square test is a statistical test used to examine differences with categorical
variables. There are a number of features of the social world we characterize through categorical
variables - religion, political preference, etc. To examine hypotheses using such variables, use the chi-
square test.

The chi-square test is used in two similar but distinct circumstances:

a. for estimating how closely an observed distribution matches an expected distribution - we'll refer
to this as the goodness-of-fit test
b. for estimating whether two random variables are independent.

To recap the steps used in calculating a goodness-of-fit test with chi-square:

1. Establish hypotheses.
2. Calculate chi-square statistic. Doing so requires knowing:

o The number of observations

o Expected values

o Observed values

3. Assess significance level. Doing so requires knowing the number of degrees of freedom.

4. Finally, decide whether to accept or reject the null hypothesis.

Testing Independence

The other primary use of the chi-square test is to examine whether two variables are independent or not.
What does it mean to be independent, in this sense? It means that the two factors are not related.
Typically in social science research, we're interested in finding factors that are related - education and
income, occupation and prestige, age and voting behavior. In this case, the chi- square can be used to
assess whether two variables are independent or not.

More generally, we say that variable Y is "not correlated with" or "independent of" the variable X if more of
one is not associated with more of another. If two categorical variables are correlated their values tend to
move together, either in the same direction or in the opposite.

Example

Return to the example discussed at the introduction to chi-square, in which we want to know whether
boys or girls get into trouble more often in school. Below is the table documenting the percentage of boys
and girls who got into trouble in school:

  Got in Trouble No Trouble Total


Page 94

Boys 46 71 117

Girls 37 83 120
Total 83 154 237

To examine statistically whether boys got in trouble in school more often, we need to frame the question
in terms of hypotheses.

1. Establish Hypotheses

As in the goodness-of-fit chi-square test, the first step of the chi-square test for independence is to
establish hypotheses. The null hypothesis is that the two variables are independent - or, in this particular
case that the likelihood of getting in trouble is the same for boys and girls. The alternative hypothesis to
be tested is that the likelihood of getting in trouble is not the same for boys and girls.

Cautionary Note

It is important to keep in mind that the chi-square test only tests whether two variables are independent. It
cannot address questions of which is greater or less. Using the chi-square test, we cannot evaluate
directly the hypothesis that boys get in trouble more than girls; rather, the test (strictly speaking) can only
test whether the two variables are independent or not.

2. Calculate the expected value for each cell of the table

As with the goodness-of-fit example described earlier, the key idea of the chi-square test for
independence is a comparison of observed and expected values. How many of something were expected
and how many were observed in some process? In the case of tabular data, however, we usually do not
know what the distribution should look like (as we did with rolls of dice). Rather, in this use of the chi-
square test, expected values are calculated based on the row and column totals from the table.

The expected value for each cell of the table can be calculated using the following formula:

For example, in the table comparing the percentage of boys and girls in trouble, the expected count for
the number of boys who got in trouble is:

The first step, then, in calculating the chi-square statistic in a test for independence is generating the
expected value for each cell of the table. Presented in the table below are the expected values (in
parentheses and italics) for each cell:

  Got in Trouble No Trouble Total

Boys 46 (40.97) 71 (76.02) 117


Page 94

Girls 37 (42.03) 83(77.97) 120


Total 83 154 237

3. Calculate Chi-square statistic

With these sets of figures, we calculate the chi-square statistic as follows:

In the example above, we get a chi-square statistic equal to:

4. Assess significance level

Lastly, to determine the significance level we need to know the "degrees of freedom." In the case of the
chi-square test of independence, the number of degrees of freedom is equal to the number of columns in
the table minus one multiplied by the number of rows in the table minus one.

In this table, there were two rows and two columns. Therefore, the number of degrees of freedom is:

We then compare the value calculated in the formula above to a standard set of tables. The value
returned from the table is p< 20%. Thus, we cannot reject the null hypothesis and conclude that boys are
not significantly more likely to get in trouble in school than girls.

Recap

To recap the steps used in calculating a goodness-of-fit test with chi-square:

1. Establish hypotheses
2. Calculate expected values for each cell of the table.

3. Calculate chi-square statistic. Doing so requires knowing:

a. The number of observations

b. Observed values
Page 94

4. Assess significance level. Doing so requires knowing the number of degrees of freedom
Exercise:

1. Ten subjects were given an attitude test on a controversial issue. Then they were shown a film
favourable to the ten subjects and the sae attitude test was administered. Make a directional test
at .05 level of significance.

Pretest Posttest

16 20
18 20
16 24
24 28
20 20
25 30
22 23
18 24
15 19
15
15

2. The following are data on the number of minutes that patients had to wait for their appointment
with 5 doctors. Use the F-test at .05 level of significance to test the hypothesis that the means of
the populations that are samples are equal.

Doctors

A B C D E
21 9 18 9 29
20 11 17 11 30
21 15 16 28 24
30 12 15 30
26 28 18 20 15
25

3. A random sample of 300 voters classified according to their political affiliation were asked if they were
in favour of the on going peace negotiation in Mindanao. Use Chi-Square test at .05 level of significance
to test that the sample belong to the population

Political Affiliation Favor Not in Favor Total


Page 94

Lakas 40 60 100
Laban 50 50 100
L.P 70 30 100
TOTAL 160 140
300

4. Two groups of high school students are matched for initial ability in a Social Science test. Group A is
taught by the lecture method while Group B by the experimental method. The data are presented below.
Formulate hypothesis and test them by applying appropriate statistical test.

Group A Group B
N 60 50
Pretest Mean 42.5 42.75
SD of the Pretest 5.5 5.3
Posttest mean 72.0 78.0
SD of the Posttest 6.5 6.1

Page 94
REFERENCES:

Alferez, M. MSA Statistics and Probability. MSA AAI. 2006

Broto, Antonio S. Statistics Made Simple. 2003, University of Eastern Philippines.

Daleon, Sixto. Fundamentals of Statistics.Metro Manila: National Bookstore. 2000.

Devore, J. et. al. Applied Statistics for Engineers and Scientists. Crooks, Cole, Inc. 2005

Esllen B. et. al. Basic Statistics Textbook – Workbook. Graduation, Publishing, 2005

Hogg, R. Probability and Statistics Inference. Pearson Publishing, 2006.

Punzalan, Twila and Gariel Uriarte. Statistics Made Simple.Manila: Rex Book Store, 1990.

Sweeney, B. Essentials of Statistics for the Behavioral Sciences. Thomson Publishing, 2006.

Page 94
Page 94
Page 94
Page 94
Page 94

Вам также может понравиться