Statistics

Overview

Research that uses statistical analysis is

clearly having an impact on society, both in our

everyday lives and in more abstract situations. On

television we see commercials that report

research “demonstrating” that “brand A is three

times as effective as brand X”. In national

magazines and news papers, we read results of

surveys of public opinion and attitudes toward

politicians. Many magazines include special

sections designed to disseminate to the public at

large the results of research in the physical and

behavioral sciences.

Overview

As our society becomes more

technologically complex, greater demands

are being placed on professionals to

understand and use the results of research

designed to solve applied problems. This

generally requires a working understanding

of statistical methods.

Overview

Knowledge of statistical analysis also helps to

foster new and creative ways of thinking about

problems. Several colleagues have remarked on

the new insights they developed when they

developed a problem from the perspective of

statistical analysis. Statistical thinking can be

useful aid in suggesting alternative answers to

questions and posing new ones. In addition,

statistics helps to develop one’s skills in critical

thinking, with both inductive and deductive

inference. These skills can be applied to any area

of inquiry and hence are extremely useful.

Definition of basic statistics terms

Statistics is any numerical data or quantitative

analysis. It is also a certain kind of measure

used to evaluate a selected property of the

collection of items under consideration.

As a branch of science, it is concerned with the

scientific methods of collecting, organizing,

summarizing, presenting and analyzing data, as

well as drawing valid conclusions and making

reasonable decisions on the basis of such

analysis.

Definition

Sampling – selection of part but representative

cross section of the population

Representative – property of the proportion of the

population if that portion reflects the

characteristics of the population

Survey – the collection of the information on a

defined population to satisfy a definite need

Definition

Parameter – a value calculated from a

population distribution

Statistic – a value calculated from a sample

distribution

Constant – a property whereby the members

of the group do not differ from one another

Variable - any quantity or measure or

characteristics which may possess different

numerical values or categories

Definition

Observation – a realized value of a variable

Data – is a collection of observations

Example 1. Below are illustrations of variables together with

their possible values

S = sex of students Male, Female

E = employment status of an Temporary, Permanent,

employee Contractual

I = monthly income of a person

in pesos

i0

N = number of children of a n= 0, 1, 2, 3,…

teacher

Population and Sample N n

conceivable objects of a certain class

under consideration. It can be finite or

infinite.

Sample is a finite number of objects or

persons selected from the population. It is

a set of measurements that constitute part

of the totality of all possible measurement

of the same quantities.

Example: Identify the population under study

and variable/s of interest

a. The Office of Admissions is studying the

relationship between the score in the entrance

examination during application and general

weighted average (GWA) upon graduation

among graduates of the university from 2010-

2015.

university from the years 2010-2015

Variable of interest: score in the entrance

examination and GWA

b. The research division of a certain

pharmaceutical company is investigating the

effectiveness of a new diet pill in reducing

weight on female adults.

use the diet pill

diet pill, weight after taking the diet pill

c. The Department of Health is interested in

determining the percentage of children below

12 years old infected by the Hepatitis B virus in

Iloilo in 2015

Iloilo City in 2015

ever been infected by the Hepatitis B virus

Aims of Statistics

Statistics aims to uncover structure in data,

to explain variation…

Descriptive

Inferential

Types of statistics

1. Descriptive statistics is the method of collecting,

organizing, and utilizing numerical data derived from the

empirical world. It is the phase of statistics that seeks to

describe and analyze a given group without drawing any

conclusions or inferences about a larger group. It is

concerned with

- characterizing what is “typical” or common in a group

indicating how widely the individuals in the group vary

presenting other aspects of the distribution values

with respect to the variable(s) being considered.

standard deviation, correlation coefficient, construction of

tables, charts and graphs

Types of statistics

2. Inferential statistics comprises some methods

concerned with the analysis of a subset of data

leading to predictions or inferences about the entire

set of data. Among the common types of analysis

are:

- testing for the existence of an association between

variables

- identifying the form of an observed relationship

- refining observed associations into causal

relationships

- generalizing and predicting on the basis of observed

data.

Examples: estimation, hypothesis testing

Uses of Statistics

• Political and Economic Leaders

- Use statistical indicators to determine the socio

economic performances of the different

provinces/regions

- To detect possible problems that need immediate

attention

- To predict the results of certain policies that they are

planning to implement

• Medicine

- Researchers rely on the use of statistics to develop

new drugs and to test the effectiveness

- To understand the spread of diseases and study their

prevention, diagnosis, and treatment

Uses of Statistics

• Economics

- Helps economists analyze international and local

markets through estimation of important indicators

such as unemployment rate, foreign exchange rates,

total amount of exports and imports, and GNP/GDP

economic theory, and formulate policies such as

control in oil prices and importation of agricultural

products

Uses of Statistics

• Business Sector

- Market researchers conduct surveys, feasibility

studies, and test before marketing a new product

- Manufacturers use statistics in quality control to

ensure that their products reach their consumers in

excellent condition

- Stock analysts use statistics to compare stock market

averages so they can determine whether individual

stocks are over or under valued

- Auditors use sampling techniques in statistics to

examine the books of their clients

- Forecasting techniques are useful in business in the

formulation of policies based on conditions expected

to come across in the future

Uses of Statistics

• Educators

- Use statistical methods to determine the validity and

reliability of testing procedures

- Use statistics to compare different teaching

techniques and evaluate the performance of students

and teachers

• Tourism

- People in the tourism apply statistical techniques to

estimate the number of tourist arrivals, determine top

tourist destinations, measures the proportion of

people who are actively involved in travel, and identify

performance indicators for transportation and

accommodation establishments

Types of Variables

Qualitative variable

differ in quality VARIABLES

non-numerical values Qualitative Quantitative

Quantitative variable

differ in quantity

numerical values Discrete Continuous

a. Discrete

countable

b. Continuous

measurable

c. Constant

Variables according to Functional

Relationships

- a factor, property, characteristics - a factor, property, attribute,

or attribute that is measured and characteristic or approach that

made the object of analysis.

is introduced, manipulated or

- consequent, effect, criterion,

response, or output that is treated to determine if it

analyzed and treated statistically influenced or causes change on

during investigation for the the dependent variable.

purpose of the study.

Exercises:

1. Identify if variable is qualitative or quantitative

• civil status of a survey respondent

• Brand of soap being used by a survey

respondent

• Highest educational level attained by a survey

respondent

• Total annual income of a survey respondent

• No. of children in household

• Time consumed listening to a radio station

Identify if variable is categorical or numerical

village with the following information:

• Ownership of cellphone

• Length of longest call made on a cellphone

• Amount spent on food by a household head

• Occupation of household head

• Total family income

• Highest educational attainment of household head

Variables according to Continuity of Values

continuous variables – a variable which can

theoretically assume any value between two given

values or a specified range. It can answer a

question “How much…” and can be express in

whole numbers, fractions, or decimals (e.g. height,

weight, length and width)

assume designated values. It can answer “How

many…” and always expressed in whole numbers

(e.g. size of the family, number of buildings)

Variables according to Level of Measurements

1. NOMINAL VARIABLE

a property of the numbers of the group defined by

an operation which allows making of statements

only of equality of difference.

It classifies items or individuals into two or more

categories. Numerals are assigned to label objects

or persons but these numbers cannot be ordered

or added.

Numbers or symbols assigned to each category of

a variable merely identify the class. They do not

indicate anything other than that they are different.

Variables according to Level of Measurements

2. ORDINAL VARIABLE

-a property whereby members of a particular

group are ranked.

-specifies the relative position of items or

individuals with respect to a given characteristics

with no indications as to the distance between

positions.

-the basic requirement is that one must be able to

determine whether an item has more, the same, or

less of the attribute being considered than the

other items.

Different ways to measure the

same variable

Nominal level

Question: Are currently in pain? Yes No

Question: How would you characterize the type of

pain? Sharp, Dull, Throbbing

Ordinal level

Question: How bad is the pain right now?

None, Mild, Moderate, Severe

Question: Compared with yesterday, is the pain

less severe, about the same, or more severe?

Variables according to Level of Measurements

3. INTERVAL VARIABLE

-a property defined by an operation which pertains

making of statements of equality of intervals rather

than just statements of sameness or difference

and greater than or less than.

- It does not have a “true” zero point; although 0

maybe arbitrarily assigned.

Variables according to Level of Measurements

3. RATIO VARIABLE

-a property whereby an operation which permits

making of statements of equality of ratios in

addition to statements of sameness or difference,

greater than or less than and quality or inequality

of differences.

-Numbers on a ratio scale indicate the actual

amounts of the characteristics being measured.

-This is the only scale that has an absolute or

natural zero, the point of origin being a fixed one.

Examples of Variables:

Sex – male, female

Socio-Economic Status – high, middle, low

Geographical location – urban, rural

Grade level - nursery, kindergarten,

primary, intermediate

Teacher Behavior – cognitive, affective,

innovative, motivational

Academic Achievement – Mean test

scores

Examples of Variables:

Academic Performance - GPA’s – O, VS, S,

US

Job satisfaction- strongly agree, agree,

uncertain, disagree, strongly disagree

Managerial Style – people oriented, task

oriented

Investment Climate – favorable, unfavorable

Teacher problem – instructional materials,

instructional approaches and strategies

Sources of Data

SOURCES OF DATA

1. Documentary Sources

-data obtained in published and unpublished

documents.

e.g. reports, manuscripts, letters and diaries

2. Field Sources

-include living persons who have the fundamental

knowledge about or have been in intimate contact

with social conditions and changes over a

considerable period of time. Source is more

personal and direct.

Documentary Sources

a.Primary Sources

-first hand data wherein the responsibility for

their complication and promulgation remain

under the same authority that originally

gathered them.

b. Secondary Sources

-data that have been transcribed or compiled

from original sources.

Agencies where a researcher can avail

primary data

data on banking and finance

Philippine Statistics Authority is a primary

source of data on population, housing, and

establishments

Pulse Asia is a primary source of data on

opinions or sentiments of the people on

current issues

Example of secondary data

yearbook, which were originally gathered

by government statistical agencies of

different countries

for his research paper, which were

originally collected by the Department of

Health

Example of secondary data

The documented data of the research

team of a congressman for its report which

were originally collected by the

Department of education and Commission

on Higher Education

thesis, which were originally collected by

the Department of Labor and Employment

Methods Used in the Collection of Data

1. Direct or Interview Method

- a method of person to person exchange

between the interviewer and interviewee.

- It provides consistent and more precise

information since clarification may be given by

the interviewee.

- Questions may be repeated or modified to suit

each interviewee’s level of understanding.

However, this method is time consuming,

expensive and has limited field of coverage.

Methods Used in the Collection of Data

-written responses are given to prepared questions.

intended to elicit answers to the problem of study. It

can be mail or handled to the informant with

minimum explanation. It ensures anonymity.

Interview- allows for greater flexibility in eliciting

information since the interviewer and the person

interviewed are both present when the questions

are asked and answered.

Methods Used in the Collection of Data

3. Observation

- Recording of the behavior at the time of

occurrence. The investigator observes the behavior

of individuals or organizations and their outcomes. It

is usually used when the subjects can’t talk or write.

Records

-the method of gathering information is enforced by

certain laws. (e.g. registration of birth, deaths, motor

vehicles, marriages and licenses.)

Methods Used in the Collection of Data

5. Experimental Method

-is used when the object is no determined the cause

and effect relationship of certain phenomena under

condition.

Presentation of Data

Three Ways of Presenting Data

1. Textual Presentation

-the data is presented in paragraph form.

-the write can emphasize the importance of some

figures or can call attention to the relevance of other

figures. It consists of describing sample data in

expository form. It should be arranged according to

data importance emphasizing important figures, and

it should also justify or explain irregularities in

figures.

Three Ways of Presenting Data

2. Tabular Presentation

-the data is presented in rows and columns or

through tables.

3.Graphical Presentation

-the data is presented in visual form. It can present

clear picture of numerical data. It also simplifies

concepts that would otherwise have been

expressed in so many ways.

Kinds of Statistical charts

Line diagrams or curves

Area charts

Bar charts

Pie chart

Pictographs

Statistical Maps

Sample graphs: Line graphs

Line graph

Line graph

Pie chart

Scatterplot

Bar graph

Bar graph

Bar graph

Bar graph

Gantt chart

Box plot

Box plot

Misleading Graphs

graphs and statistics.

Graphs and statistics are

often used to persuade.

Advertisers and others

may accidentally or

intentionally present

information in a

misleading way.

graph more interesting, but it can distort

the relationships in the data.

The following things are important

to consider when looking at a

graph:

1.Title

2.Labels on both axes of a line or bar chart

and on all sections of a pie chart

3.Source of the data

4.Key to a pictograph

5.Uniform size of a symbol in a pictograph

6.Scale: Does it start with zero? If not, is

there a break shown

7.Scale: Are the numbers equally spaced?

The Exaggerated use of Area or Volume.

Key:

50 50 50

50 50 50

The various sized pictures

Dog:

distorts the graph.

Dog:

Cat:

Cat: Horse:

This is much better

Horse: All pictures must be the

same size. (Area)

Displaced Axes

1000 1000

995 900

990 800

Number of Votes

985 700

Number of Votes

980 600

975 500

970 400

965 300

960 200

955 100

950 0

Pizza Hot Dogs Hamburgers Pizza Hot Dogs Hamburgers

Favourite Food Favourite Food

significantly here. not so significantly here.

Displaced Axes

Massive Increase In

House Prices this Year

Average House Price

82 000 80 000

60 000

81 000

40 000

80 000

20 000

0

1998 1999 1998 1999

Year Year

appeared to be.

1993, 1996 and 1998 are

missing.

What is misleading about this graph?

95

94

93

Period 1

92 Period 2

91

90

Ch 5 Ch 6 Ch 7 Ch 8

Identifying Misleading Graphs

misleading.

The graph

suggests that the

stock will

continue to

increase through

2020, but there’s

no way to foresee

the future.

Identifying Misleading Graphs

leaves out 0 to 100, the

bar heights make it

appear that the sixth

grade sold about three

times as many tickets

as either of the other

two grades. In fact, the

sixth grade sold only

about 20% more.

Identifying Misleading Graphs

is misleading.

The scale is so

compressed that it’s

hard to see any

difference among the

brands.

Explain why each

graph is

misleading.

% of Return on Investment

60

The graph

50 suggests that the

rate of

40 investment

return will

30

continue to

20

increase, but

there are no

10 guarantees

0

1 2 3 4 5* 6*

* projected

Explain why each

graph is misleading.

150

to indicate that

148 significantly more

146 people prefer

144

grape drink over

the others when in

142 fact there is a

140 small margin of

difference. (0 to

Grape Cherry Apple 140 is not

graphed)

Explain why each graph is

misleading.

Drink Sales

120

100

80

No data from 50 to 120 This graph is too

compressed to see

60

much difference

40 between the

20 brands indicating

that they are fairly

0

Brand X Brand Y Brand Z equal.

