Вы находитесь на странице: 1из 71

Course notes in

Statistics

Geneveve M. Parreño, D.Sc. (Statistics)


Overview
Research that uses statistical analysis is
clearly having an impact on society, both in our
everyday lives and in more abstract situations. On
television we see commercials that report
research “demonstrating” that “brand A is three
times as effective as brand X”. In national
magazines and news papers, we read results of
surveys of public opinion and attitudes toward
politicians. Many magazines include special
sections designed to disseminate to the public at
large the results of research in the physical and
behavioral sciences.
Overview
As our society becomes more
technologically complex, greater demands
are being placed on professionals to
understand and use the results of research
designed to solve applied problems. This
generally requires a working understanding
of statistical methods.
Overview
Knowledge of statistical analysis also helps to
foster new and creative ways of thinking about
problems. Several colleagues have remarked on
the new insights they developed when they
developed a problem from the perspective of
statistical analysis. Statistical thinking can be
useful aid in suggesting alternative answers to
questions and posing new ones. In addition,
statistics helps to develop one’s skills in critical
thinking, with both inductive and deductive
inference. These skills can be applied to any area
of inquiry and hence are extremely useful.
Definition of basic statistics terms
Statistics is any numerical data or quantitative
analysis. It is also a certain kind of measure
used to evaluate a selected property of the
collection of items under consideration.
 As a branch of science, it is concerned with the
scientific methods of collecting, organizing,
summarizing, presenting and analyzing data, as
well as drawing valid conclusions and making
reasonable decisions on the basis of such
analysis.
Definition
Sampling – selection of part but representative
cross section of the population
Representative – property of the proportion of the
population if that portion reflects the
characteristics of the population
Survey – the collection of the information on a
defined population to satisfy a definite need
Definition
Parameter – a value calculated from a
population distribution
Statistic – a value calculated from a sample
distribution
Constant – a property whereby the members
of the group do not differ from one another
Variable - any quantity or measure or
characteristics which may possess different
numerical values or categories
Definition
Observation – a realized value of a variable
Data – is a collection of observations
Example 1. Below are illustrations of variables together with
their possible values

Variable Possible Observations


S = sex of students Male, Female
E = employment status of an Temporary, Permanent,
employee Contractual
I = monthly income of a person
in pesos
i0
N = number of children of a n= 0, 1, 2, 3,…
teacher
Population and Sample N n

Population is a totality of all actual or


conceivable objects of a certain class
under consideration. It can be finite or
infinite.
Sample is a finite number of objects or
persons selected from the population. It is
a set of measurements that constitute part
of the totality of all possible measurement
of the same quantities.
Example: Identify the population under study
and variable/s of interest
a. The Office of Admissions is studying the
relationship between the score in the entrance
examination during application and general
weighted average (GWA) upon graduation
among graduates of the university from 2010-
2015.

Population: collection of all graduates of the


university from the years 2010-2015
Variable of interest: score in the entrance
examination and GWA
b. The research division of a certain
pharmaceutical company is investigating the
effectiveness of a new diet pill in reducing
weight on female adults.

Population: set of all female adults who will


use the diet pill

Variable of interest: weight before taking the


diet pill, weight after taking the diet pill
c. The Department of Health is interested in
determining the percentage of children below
12 years old infected by the Hepatitis B virus in
Iloilo in 2015

Population: set of all children below 12 years old in


Iloilo City in 2015

Variable of interest: whether or not the child has


ever been infected by the Hepatitis B virus
Aims of Statistics
Statistics aims to uncover structure in data,
to explain variation…

 Descriptive
 Inferential
Types of statistics
1. Descriptive statistics is the method of collecting,
organizing, and utilizing numerical data derived from the
empirical world. It is the phase of statistics that seeks to
describe and analyze a given group without drawing any
conclusions or inferences about a larger group. It is
concerned with
- characterizing what is “typical” or common in a group
 indicating how widely the individuals in the group vary
 presenting other aspects of the distribution values
with respect to the variable(s) being considered.

 Examples: frequency, percentages, proportions, mean,


standard deviation, correlation coefficient, construction of
tables, charts and graphs
Types of statistics
2. Inferential statistics comprises some methods
concerned with the analysis of a subset of data
leading to predictions or inferences about the entire
set of data. Among the common types of analysis
are:
- testing for the existence of an association between
variables
- identifying the form of an observed relationship
- refining observed associations into causal
relationships
- generalizing and predicting on the basis of observed
data.
Examples: estimation, hypothesis testing
Uses of Statistics
• Political and Economic Leaders
- Use statistical indicators to determine the socio
economic performances of the different
provinces/regions
- To detect possible problems that need immediate
attention
- To predict the results of certain policies that they are
planning to implement

• Medicine
- Researchers rely on the use of statistics to develop
new drugs and to test the effectiveness
- To understand the spread of diseases and study their
prevention, diagnosis, and treatment
Uses of Statistics
• Economics
- Helps economists analyze international and local
markets through estimation of important indicators
such as unemployment rate, foreign exchange rates,
total amount of exports and imports, and GNP/GDP

- To forecast economic fluctuations and trends, verify


economic theory, and formulate policies such as
control in oil prices and importation of agricultural
products
Uses of Statistics
• Business Sector
- Market researchers conduct surveys, feasibility
studies, and test before marketing a new product
- Manufacturers use statistics in quality control to
ensure that their products reach their consumers in
excellent condition
- Stock analysts use statistics to compare stock market
averages so they can determine whether individual
stocks are over or under valued
- Auditors use sampling techniques in statistics to
examine the books of their clients
- Forecasting techniques are useful in business in the
formulation of policies based on conditions expected
to come across in the future
Uses of Statistics
• Educators
- Use statistical methods to determine the validity and
reliability of testing procedures
- Use statistics to compare different teaching
techniques and evaluate the performance of students
and teachers

• Tourism
- People in the tourism apply statistical techniques to
estimate the number of tourist arrivals, determine top
tourist destinations, measures the proportion of
people who are actively involved in travel, and identify
performance indicators for transportation and
accommodation establishments
Types of Variables
Qualitative variable
 differ in quality VARIABLES
 non-numerical values Qualitative Quantitative
Quantitative variable
 differ in quantity
 numerical values Discrete Continuous
a. Discrete
 countable
b. Continuous
 measurable
c. Constant
Variables according to Functional
Relationships

Dependent Variable Independent Variable


- a factor, property, characteristics - a factor, property, attribute,
or attribute that is measured and characteristic or approach that
made the object of analysis.
is introduced, manipulated or
- consequent, effect, criterion,
response, or output that is treated to determine if it
analyzed and treated statistically influenced or causes change on
during investigation for the the dependent variable.
purpose of the study.
Exercises:
1. Identify if variable is qualitative or quantitative
• civil status of a survey respondent
• Brand of soap being used by a survey
respondent
• Highest educational level attained by a survey
respondent
• Total annual income of a survey respondent
• No. of children in household
• Time consumed listening to a radio station
Identify if variable is categorical or numerical

A survey of households was conducted in an exclusive


village with the following information:

• Number of members of a household who are working


• Ownership of cellphone
• Length of longest call made on a cellphone
• Amount spent on food by a household head
• Occupation of household head
• Total family income
• Highest educational attainment of household head
Variables according to Continuity of Values
continuous variables – a variable which can
theoretically assume any value between two given
values or a specified range. It can answer a
question “How much…” and can be express in
whole numbers, fractions, or decimals (e.g. height,
weight, length and width)

discrete variables – a characteristic which can only


assume designated values. It can answer “How
many…” and always expressed in whole numbers
(e.g. size of the family, number of buildings)
Variables according to Level of Measurements

1. NOMINAL VARIABLE
 a property of the numbers of the group defined by
an operation which allows making of statements
only of equality of difference.
 It classifies items or individuals into two or more
categories. Numerals are assigned to label objects
or persons but these numbers cannot be ordered
or added.
 Numbers or symbols assigned to each category of
a variable merely identify the class. They do not
indicate anything other than that they are different.
Variables according to Level of Measurements

2. ORDINAL VARIABLE
 -a property whereby members of a particular
group are ranked.
 -specifies the relative position of items or
individuals with respect to a given characteristics
with no indications as to the distance between
positions.
 -the basic requirement is that one must be able to
determine whether an item has more, the same, or
less of the attribute being considered than the
other items.
Different ways to measure the
same variable
Nominal level
Question: Are currently in pain? Yes No
Question: How would you characterize the type of
pain? Sharp, Dull, Throbbing
Ordinal level
Question: How bad is the pain right now?
None, Mild, Moderate, Severe
Question: Compared with yesterday, is the pain
less severe, about the same, or more severe?
Variables according to Level of Measurements

3. INTERVAL VARIABLE
 -a property defined by an operation which pertains
making of statements of equality of intervals rather
than just statements of sameness or difference
and greater than or less than.
 - It does not have a “true” zero point; although 0
maybe arbitrarily assigned.
Variables according to Level of Measurements

3. RATIO VARIABLE
 -a property whereby an operation which permits
making of statements of equality of ratios in
addition to statements of sameness or difference,
greater than or less than and quality or inequality
of differences.
 -Numbers on a ratio scale indicate the actual
amounts of the characteristics being measured.
 -This is the only scale that has an absolute or
natural zero, the point of origin being a fixed one.
Examples of Variables:
 Sex – male, female
 Socio-Economic Status – high, middle, low
 Geographical location – urban, rural
 Grade level - nursery, kindergarten,
primary, intermediate
 Teacher Behavior – cognitive, affective,
innovative, motivational
 Academic Achievement – Mean test
scores
Examples of Variables:
 Academic Performance - GPA’s – O, VS, S,
US
 Job satisfaction- strongly agree, agree,
uncertain, disagree, strongly disagree
 Managerial Style – people oriented, task
oriented
 Investment Climate – favorable, unfavorable
 Teacher problem – instructional materials,
instructional approaches and strategies
Sources of Data
SOURCES OF DATA
1. Documentary Sources
-data obtained in published and unpublished
documents.
e.g. reports, manuscripts, letters and diaries

2. Field Sources
-include living persons who have the fundamental
knowledge about or have been in intimate contact
with social conditions and changes over a
considerable period of time. Source is more
personal and direct.
Documentary Sources

a.Primary Sources
-first hand data wherein the responsibility for
their complication and promulgation remain
under the same authority that originally
gathered them.

b. Secondary Sources
-data that have been transcribed or compiled
from original sources.
Agencies where a researcher can avail
primary data

 Central Bank (CB) is a primary source of


data on banking and finance
 Philippine Statistics Authority is a primary
source of data on population, housing, and
establishments
 Pulse Asia is a primary source of data on
opinions or sentiments of the people on
current issues
Example of secondary data

 The United Nations’ compiled data for its


yearbook, which were originally gathered
by government statistical agencies of
different countries

 A medical researcher’s documented data


for his research paper, which were
originally collected by the Department of
Health
Example of secondary data
 The documented data of the research
team of a congressman for its report which
were originally collected by the
Department of education and Commission
on Higher Education

 The documented data of a student for his


thesis, which were originally collected by
the Department of Labor and Employment
Methods Used in the Collection of Data
1. Direct or Interview Method
- a method of person to person exchange
between the interviewer and interviewee.
- It provides consistent and more precise
information since clarification may be given by
the interviewee.
- Questions may be repeated or modified to suit
each interviewee’s level of understanding.
However, this method is time consuming,
expensive and has limited field of coverage.
Methods Used in the Collection of Data

2. Indirect or Questionnaire Method


-written responses are given to prepared questions.

Questionnaire- a list of questions which are


intended to elicit answers to the problem of study. It
can be mail or handled to the informant with
minimum explanation. It ensures anonymity.
Interview- allows for greater flexibility in eliciting
information since the interviewer and the person
interviewed are both present when the questions
are asked and answered.
Methods Used in the Collection of Data
3. Observation
- Recording of the behavior at the time of
occurrence. The investigator observes the behavior
of individuals or organizations and their outcomes. It
is usually used when the subjects can’t talk or write.

4. Registration Method or Use of Available


Records
-the method of gathering information is enforced by
certain laws. (e.g. registration of birth, deaths, motor
vehicles, marriages and licenses.)
Methods Used in the Collection of Data

5. Experimental Method
-is used when the object is no determined the cause
and effect relationship of certain phenomena under
condition.
Presentation of Data
Three Ways of Presenting Data

1. Textual Presentation
-the data is presented in paragraph form.
-the write can emphasize the importance of some
figures or can call attention to the relevance of other
figures. It consists of describing sample data in
expository form. It should be arranged according to
data importance emphasizing important figures, and
it should also justify or explain irregularities in
figures.
Three Ways of Presenting Data

2. Tabular Presentation
-the data is presented in rows and columns or
through tables.

3.Graphical Presentation
-the data is presented in visual form. It can present
clear picture of numerical data. It also simplifies
concepts that would otherwise have been
expressed in so many ways.
Kinds of Statistical charts
 Line diagrams or curves
 Area charts
 Bar charts
 Pie chart
 Pictographs
 Statistical Maps
Sample graphs: Line graphs
Line graph
Line graph
Pie chart
Scatterplot
Bar graph
Bar graph
Bar graph
Bar graph
Gantt chart
Box plot
Box plot
Misleading Graphs

Learn to recognize misleading


graphs and statistics.
Graphs and statistics are
often used to persuade.
Advertisers and others
may accidentally or
intentionally present
information in a
misleading way.

For example, art is often used to make a


graph more interesting, but it can distort
the relationships in the data.
The following things are important
to consider when looking at a
graph:

1.Title
2.Labels on both axes of a line or bar chart
and on all sections of a pie chart
3.Source of the data
4.Key to a pictograph
5.Uniform size of a symbol in a pictograph
6.Scale: Does it start with zero? If not, is
there a break shown
7.Scale: Are the numbers equally spaced?
The Exaggerated use of Area or Volume.

Key:
50 50 50

Key: This is bad.


50 50 50
The various sized pictures
Dog:
distorts the graph.
Dog:
Cat:

Cat: Horse:
This is much better
Horse: All pictures must be the
same size. (Area)
Displaced Axes
1000 1000

995 900

990 800

Number of Votes
985 700
Number of Votes

980 600

975 500

970 400

965 300

960 200

955 100

950 0
Pizza Hot Dogs Hamburgers Pizza Hot Dogs Hamburgers
Favourite Food Favourite Food

The number of votes appears The number of votes appears


significantly here. not so significantly here.
Displaced Axes
Massive Increase In
House Prices this Year
Average House Price

Average House Price


82 000 80 000

60 000
81 000
40 000
80 000
20 000

0
1998 1999 1998 1999
Year Year

The increase in price was not as big as it


appeared to be.
1993, 1996 and 1998 are
missing.
What is misleading about this graph?

95

94

93
Period 1
92 Period 2

91

90
Ch 5 Ch 6 Ch 7 Ch 8

Comparison of Class Averages


Identifying Misleading Graphs

Explain why each graph is


misleading.

The graph
suggests that the
stock will
continue to
increase through
2020, but there’s
no way to foresee
the future.
Identifying Misleading Graphs

Explain why each graph is misleading.

Because the scale


leaves out 0 to 100, the
bar heights make it
appear that the sixth
grade sold about three
times as many tickets
as either of the other
two grades. In fact, the
sixth grade sold only
about 20% more.
Identifying Misleading Graphs

Explain why each graph


is misleading.

The scale is so
compressed that it’s
hard to see any
difference among the
brands.
Explain why each
graph is
misleading.
% of Return on Investment
60
The graph
50 suggests that the
rate of
40 investment
return will
30
continue to
20
increase, but
there are no
10 guarantees

0
1 2 3 4 5* 6*
* projected
Explain why each
graph is misleading.

Preferred Juice Flavors The graph appears


150
to indicate that
148 significantly more
146 people prefer
144
grape drink over
the others when in
142 fact there is a
140 small margin of
difference. (0 to
Grape Cherry Apple 140 is not
graphed)
Explain why each graph is
misleading.

Drink Sales
120
100
80
No data from 50 to 120 This graph is too
compressed to see
60
much difference
40 between the
20 brands indicating
that they are fairly
0
Brand X Brand Y Brand Z equal.