Вы находитесь на странице: 1из 49
Research Methodology Lecture 6 : Methods of Data Collection

Research Methodology

Lecture 6 : Methods of Data Collection

Research Methodology Lecture 6 : Methods of Data Collection
Research Methodology Lecture 6 : Methods of Data Collection
Research Methodology Lecture 6 : Methods of Data Collection
Research Methodology Lecture 6 : Methods of Data Collection
Research Methodology Lecture 6 : Methods of Data Collection
Research Methodology Lecture 6 : Methods of Data Collection

Data collection

After a researcher defines the things, phenomena, or

variables to be studied, a problem and hypothesis are formulated.

The next step is for the researcher to determine how the

variables or things being studies must be measured, observed, or recorded

Appropriate data collection is essential to the validity of a

study

Data collection: definition

Data collection is a term used to describe a process of

preparing and collecting data,

The purpose of data collection is to obtain information to keep on record, to make decisions about important issues,

to pass information on to others.

Primarily, data are collected to provide information regarding a specific topic.

Data collection plan

Pre collection activity: agree on goals, target

data, definitions, methods

Collection: data collection

Presenting findings: usually involves some form

of sorting analysis and or presentation

Methods of Data collection

Qualitative

typically involves qualitative data, i.e., data obtained through methods such interviews, on-site observations, and focus groups that is in narrative rather than numerical form

Quantitative

use numerical and statistical processes to answer specific

questions. Statistics are used in a variety of ways to support

inquiry or program assessment/evaluation.

answer specific questions. Statistics are used in a variety of ways to support inquiry or program

Methods of Data collection

Qualitative data collection

they tend to be open-ended and have less structured

protocols (i.e., researchers may change the data collection

strategy by adding, refining, or dropping techniques or

informants)

they rely more heavily on interactive interviews;

respondents may be interviewed several times to follow

up on a particular issue, clarify concepts or check the reliability of data

may be interviewed several times to follow up on a particular issue, clarify concepts or check

Methods of Data collection

Qualitative data collection

findings are not generalizable to any specific population

Data collection in a qualitative study takes a great deal of time.

The researcher needs to record any potentially useful data

The qualitative methods most commonly used in evaluation can be classified in:

in-depth interview

observation methods

document review

commonly used in evaluation can be classified in: • in-depth interview • observation methods • document

Methods of Data collection

Quantitative data collection

They produce results that are easy to summarize, compare, and generalize.

Participants may be randomly assigned to different treatments.

Collect data on participant and situational characteristics in

order to statistically control for their influence on the dependent, or outcome, variable.

situational characteristics in order to statistically control for their influence on the dependent, or outcome, variable.

Methods of Data collection

Quantitative data collection

To generalize from the research participants to a larger population, the researcher will employ probability sampling to select participants.

Typical quantitative data gathering strategies include:

Experiments trials.

Observing and recording well-defined events (e.g., counting the number of patients waiting in emergency at specified times of the day).

Obtaining relevant data from management information systems.

Questionnaires

Administering surveys with closed-ended questions (e.g., face-to face and telephone interviews, questionnaires, etc.)

• Administering surveys with closed-ended questions (e.g., face-to face and telephone interviews, questionnaires, etc.)

Methods of Data collection:

type of the study

Census: Data from every member of a population. In most studies,

a census is not practical, because of the cost and/or time required.

Sample survey. Data from a subset of a population, in order to estimate population attributes.

Experiment. A controlled study in which the researcher attempts to understand cause-and-effect relationships. The study is "controlled" in the sense that the researcher controls (1) how subjects are

assigned to groups and (2) which treatments each group receives.

Observational study. Like experiments, observational studies attempt to understand cause-and-effect relationships. However,

unlike experiments, the researcher is not

group receives.

able to control (1) how

subjects are assigned to groups and/or (2) which treatments each

Methods of Data collection Pros and cons

Resources. When the population is large, a sample survey has a big resource

advantage over a census. A well-designed sample survey can provide very precise

estimates of population parameters - quicker, cheaper, and with less manpower than a census.

Generalizability. Generalizability refers to the appropriateness of applying findings

from a study to a larger population. Generalizability requires random selection. If participants in a study are randomly selected from a larger population, it is appropriate to generalize study results to the larger population; if not, it is not appropriate to generalize.

Observational studies do not feature random selection; so it is not appropriate to

generalize from the results of an observational study to a larger population.

control assignment of subjects to treatment groups, investigating causal relationships.

Causal inference. Cause-and-effect relationships can be teased out when subjects are randomly assigned to groups. Therefore, experiments, which allow the researcher to

are the best method for

Where do data come from?

Take a step back – if we’re starting from baseline, how do we collect / find data?

Secondary data

data someone else has collected

Primary data

data you collect

Secondary Data: Sources

County transportation departments

Vital Statistics birth, death certificates

Private and foundation databases

City and county governments

Surveillance data from state government programs

Federal agency statistics - Census, etc.

Secondary Data: Limitations

When was it collected? For how long?

May be out of date for what you want to analyze.

May not have been collected long enough for detecting trends.

Is the data set complete?

There may be missing information on some observations

Unless such missing information is caught and corrected for, analysis will be biased.

Secondary Data: Limitations

Are there confounding problems?

Sample selection bias?

Source choice bias?

In time series, did some observations drop out over time?

Are the data consistent/reliable?

Did variables drop out over time?

Did variables change in definition over time?

Is the information exactly what you need?

In some cases, may have to use “proxy variables” – variables that may approximate something you really wanted to measure.

Are they reliable?

Is there correlation to what you actually want to measure?

Secondary Data Advantages

No need to reinvent the wheel.

If someone has found the data, take advantage of it.

It will save you money.

Even if you have to pay for access, often it is cheaper in terms of money than

collecting your own data.

It will save you time.

Primary data collection is very time consuming.

It may be very accurate.

When especially a government agency has collected the data.

It has great exploratory value

Exploring research questions and formulating hypothesis to test.

Primary Data - Examples

Surveys
Focus groups

Questionnaires

Diaries

Personal interviews

Biophysiologic Measures (in vivo/in vitro)

Experiments and observational study

Questionnaires

Advantages:

Can be posted, e-mailed or faxed with a wide geographic coverage

Can cover a large number of people or

organizations. Relatively cheap.

Avoids embarrassment on the part of the

respondent.

Possible anonymity of respondent.

No interviewer bias.

Questionnaires

Disadvantages:

Design problems.

Questions have to be relatively simple.

Historically low response rate (although inducements may help).

Time delay whilst waiting for responses to be returned.

Require a return deadline.

Several reminders may be required.

International validity

Not possible to give assistance if required.

Problems with incomplete questionnaires.

Respondent can read all questions beforehand and

then decide whether to complete or not.

Personal Interviews

(structured; semistructured; unstructured)

Advantages:

Serious approach by respondent resulting in accurate information.

Good response rate.

Complete and immediate.

Interviewer in control and can give help if there is a

problem.

Can investigate motives and feelings.

Can use recording equipment.

If one interviewer used, uniformity of approach.

Used to pilot other methods.

Personal Interviews

(structured; semistructured; unstructured)

Disadvantages:

Need to set up interviews, time consuming and geographic limitations.

Can be expensive.

Normally need a set of questions. Respondent bias tendency to please or

impress, create false personal image, or end

interview quickly.

Embarrassment possible if personal questions.

If many interviewers, training required.

Phone interviews

Advantages:

Relatively cheap and quick.

Can cover reasonably large numbers of

people or organisations. Wide geographic coverage. High response rate.

Help can be given to the respondent.

Can tape answers.

Phone interviews

Disadvantages:

Questionnaire required. Not everyone has a telephone.

Repeat calls are inevitable average 2.5 calls

to get someone. Time is wasted.

Respondent has little time to think.

Cannot use visual aids. Can cause irritation.

Good telephone manner is required.

Primary Data - Limitations

Do you have the time and money for:

Designing your collection instrument?

Selecting your population or sample?

Administration of the instrument?

Entry/collection of data?

Uniqueness

May not be able to compare to other populations

Researcher error

Sample bias

Other confounding factors

Data collection:

Take home message

Data collection is essential for study validity

Experimental research is mostly based on quantitative method of data collection

Will the data answer my research question?

If that data exist in secondary form, then use them to the extent

you can, keeping in mind limitations.

But if it does not, and you are able to fund primary collection, then it is the method of choice.

mind limitations. ✓ But if it does not, and you are able to fund primary collection,

Quantitative Methods

Experiment: Research situation with at least one independent variable, which is

manipulated by the researcher

Dependent and Independent Variable

Independent Variable: The variable in the

study under consideration. The cause for the

outcome for the study.

Dependent Variable: The variable being affected by the independent variable. The

effect of the study

y = f(x)

Which is which here?

Key Factors for High Quality Experimental Design

Data should not be contaminated by poor measurement or errors in procedure.

Eliminate confounding variables from study or

minimize effects on variables.

Representativeness: Does your sample represent the population you are studying? Must use random sample techniques.

What Makes a Good Quantitative Research Design?

4 Key Elements

Freedom from Bias

Freedom from Confounding

Control of Extraneous Variables

Statistical Precision to Test Hypothesis

Bias, Confounding and Extraneous Variables

Bias: When observations favour some individuals in the population over others.

Confounding: When the effects of two or more

variables cannot be separated.

Extraneous Variables: Any variable that has an

effect on the dependent variable.

Need to identify and minimize these variables. e.g., Erosion potential as a function of clay

content. rainfall intensity, vegetation & duration

would be considered extraneous variables.

Precision versus accuracy

"Precise" means sharply

defined or measured.

"Accurate" means truthful or

correct.

Both Accurate and Precise Accurate Not precise Not accurate But precise Neither accurate nor precise

Both Accurate and Precise

Both Accurate and Precise
Both Accurate and Precise

Accurate Not precise

Accurate Not precise
Accurate Not precise

Not accurate

But precise

Neither accurate nor precise

Both Accurate and Precise Accurate Not precise Not accurate But precise Neither accurate nor precise
Both Accurate and Precise Accurate Not precise Not accurate But precise Neither accurate nor precise

Interpreting Results of Experiments

Goal of research is to draw conclusions. What did the study mean?

What, if any, is the cause and effect of the

outcome?

Introduction to Sampling

Sampling is the problem of accurately acquiring the necessary data in order to form a representative view of the problem.

This is much more difficult to do than is generally realized.

Overall Methodology:

State the objectives of the survey

Define the target population

Define the data to be collected

Define the variables to be determined

Define the required precision & accuracy

Define the measurement `instrument'

Define the sample size & sampling method,

then select the sample

Sampling

Distributions:

When you form a sample you often show it by

a plotted distribution known as a histogram .

A histogram is the distribution of frequency of occurrence of a certain variable within a specified range.

NOT A BAR GRAPH WHICH LOOKS VERY

SIMILAR

Interpreting quantitative findings

Descriptive Statistics : Mean, median, mode, frequencies

Error analyses

Mean

In science the term mean is really the

arithmetic mean

Given by the equation

n

X = 1 / n

mean • Given by the equation n • X = 1 / n  x i=1

x

i=1

i

Or more simply put, the sum of values divided by the number of values summed

Median

Consider the set

1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16, 19

In this case there are 13 values so the median is the middle

value, or (n+1) / 2

(13+1) /2 = 7

Consider the set

1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16

In the second case, the mean of the two middle values is the median or (n+1) /2

(12 + 1) / 2 = 6.5

~

(6+7) / 2 = 6.5

Mode

The most frequent value in a data set

Consider the set

1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 13, 14, 16, 19

In this case the mode is 1 because it is the most common value

There may be cases where there are more than one mode as in this case

Consider the set

1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 11, 13, 14, 16, 19

In this case there are two modes (bimodal) : 1 and 11 because

both occur 4 times in the data set.

USES AND MISUSES OF

STATISTICS

Uses of Statistics

Describe data

Compare two or more data sets

Determine if a relationship exists between

variables

Test hypothesis (educated guess)

Make estimates about population characteristics

Predict past or future behavior of data

Use of statistics can be impressive to employers.

Sources of Misuse

There are two main sources of misuse of statistics:

Evil intent on part of a dishonest researcher

Unintentional errors (stupidity) on part of a researcher who does not know any better

Misuses of Statistics

Samples

Voluntary-response sample (or self-selected sample)

One in which the subjects themselves decide whether to be

included---creates built-in bias

Telephone call-in polls (radio)

Mail-in polls

Internet polls

Small Samples

Too few subjects used

Convenience

Not representative since subjects can be easily accessed

Misuses of Statistics

Graphs

Can be drawn

inappropriately leading

to false conclusions

Watch the “scales”

Omission of labels or units on the axes

Exaggeration of one- dimensional increase by using a two-dimensional

graph

Omission of labels or units on the axes • Exaggeration of one- dimensional increase by using

Misuses of Statistics

Survey Questions

Loaded Questions---unintentional wording to elicit

a desired response

Order of Questions

Nonresponse (Refusal)subject refuses to answer

questions

Self-Interest ---Sponsor of the survey could enjoy monetary gains from the results

Misuses of Statistics

Missing Data (Partial Pictures)

Detached Statistics ---no comparison is made

Percentages --

Precise Numbers

People believe this implies accuracy

Implied Connections

Correlation and Causality when we find a statistical association between two variables, we cannot

conclude that one of the variables is the cause of (or

directly affects) the other variable