Вы находитесь на странице: 1из 42

QUANTITATIVE DATA ANALYSIS

Chapter 10
Objectives
After completing this chapter, you should be able to:
Code and enter interview responses
Edit interview responses
Handle omission
Transform data
Create a data file
Get a feel for the data
Test the goodness of data
Getting the ready for analysis
After completing this chapter, you should be able to do the
following:
Coding and data entry Data coding involves assigning a
number to the participants so they can be entered into a
database.
Coding the responses - this is done for categorical data,
for example gender (Male = 1, Female =2); race (Malay =1,
Chinese=2, Indian=0 and Others =4). Non responses can
be assigned any code. Generally 9 or 999
Data entry - Raw data can be entered through any
software program. Each row of the editor represents a case
or observation; each column represents a variable.
Pre-Data Analysis
Data obtained through questionnaires, interviews,
observation, or through secondary sources need to be
edited
For example
Blank responses, if any, have to be handled in some way
Data has to be codified and keyed into some kind of software (like
SPSS)
Pre-Data Analysis
Editing data
Data from interviews and open-ended questionnaires have to be
deciphered and coded (make sense and defined)
Questionnaires need to be checked for completeness and
inconsistencies; for example, a respondent who forgot to respond
to marital status but responded she have been married for 12 years
Pre-Data Analysis
Handling blank responses
Answers may be left blank due to several possible reasons:
did not understand the question
did not know the answer
simply did not want to answer
If a respondent did not answer many of the questions in the
questionnaire (say 25%) then it may be better to just throw away
the questionnaire
If only 2 or 3 questions were left unanswered (out of 30 questions)
we need to decide how these blank responses are to be handled
If the sample size is big, we may choose to omit this case to ensure
the validity of the research
Pre-Data Analysis
Handling blank responses
One way to handle a blank response to an interval-scaled item
with a mid point would be to assign the the mid point in the scale
as the response to that particular item
Another way is to allow the computer to ignore the blank response
when the analyses are done
Another way is to assign the item the mean value of the
responses of all those who responded to that particular item
Another way is to assign the item the mean value of the
responses of this particular respondent to all the other questions
measuring this variable
Another way is to assign the item a random number within the
range of the scale
Pre-Data Analysis
Coding
Responses to demographic variables can be coded
using numbers (1-male, 2-female)
For interval data (for example, captured using Likert
type scale) we can use 1 for Strongly Agree, 2 for
Agree.until 5 for Strongly Disagree
For negatively worded questions, we need to reverse
the order of the values given (or we can use the
computer to reverse the value once they are keyed in)
Getting the ready for analysis
Getting the ready for analysis
Editing data data editing deals with detecting and
correcting illogical, inconsistent, or illegal data and
omissions in the information returned by the participants of
the study.

Example 1:

I prefer online tests/assessment to be:


__________ include as part of final results (high
percentage 20% or more contributes to final results)

__________ just for learning purposes (perhaps only


10% contribution to final results
Editing the data: (Good or badly designed)
Please rank your preference (1 -10) for the following online assessment
tools:
Rank 1-10
i. Multiple-choice questions
ii. Short answer questions
iii. Formative assessment activity
iv. Simulation (business games, virtual lab etc.)
v. Discussion /bulletin boards
vi. Online collaboration (discussion with lecturers and colleagues
online)
vii. E-mail submission /e-learn submission of assignment
viii. True-false questions
ix. Reflection journals
x. Projects / E-portfolio
Editing the data: Good or badly designed
questionnaire)
Please tick your preference for the following online assessments tools
Least (2) Some (4) Most
prefer what Prefer
red (1) Prefer red (5)
red (3)

i. Multiple-choice questions
ii. Short answer questions
iii. Formative assessment activity
iv. Simulation (business games, virtual lab etc.)
v. Discussion /bulletin boards
vi. Online collaboration (discussion with lecturers and
colleagues online)
vii. E-mail submission /e-learn submission of assignment
viii. True-false questions
ix. Reflection journals
x. Projects / E-portfolio
Getting a feel for the data descriptive
statistics
Data transformation a variation of data coding, is
the process of changing the original numerical
representation of a quantitative value to another value.

Frequencies the number of times various


subcategories of a certain phenomenon occur (% and
the cumulative %)

Bar charts and pie charts frequencies can also be


displayed as bar charts, histograms or pie charts.
Inferential statistics
Inferential statistics are the procedures used to make
inferences from sample to population. After we collect the
data from a sample, we would like to use these data to
describe the population (Figure 6.4). For example, if the
sample mean is 95 (statistic, ), we want to know the
possibility that the sample may come from a population
with mean of 100 (parameter, ). The common
procedures used to test hypotheses are the chi-squared
test (2-test), t-test, and F-test.
Descriptive statistics
Descriptive statistics are the procedures used
to describe data distributions and relationships
between variables either in the population or
sample. The common statistics used to describe
data distributions are the measures of central
tendency (mean, median & mode), measures of
variability (range, standard deviation & variance),
and measures of relationships between variables
(correlation & regression coefficients
Cross Statistic
Example
Normality : Data distribution must be normal
Measures of Central Tendency
Define: the score that shows the centre/concentration
of a set of data.
The Mean or the average is the number of individual
observations divided by the sum of individual
observations (X=(X1+X2+X3+.XN)/N

The median is the central item in a group of


observation when they are arrayed in either an
ascending or descending order.

The mode is signified by the most frequently


occurring phenomenon
Measures of Dispersion : defined as the score that shows
the degree of spread of a set of data.

Range refers to the extreme values in a set of


observations, i.e. the difference between the highest
and the lowest value.

Variance is calculated by subtracting the mean from


each of the observations in the data set, taking the
square of this difference, and dividing the total of these
by the number of observations.
1 2+ 2 2+. 2
=

Measures of Central Tendency
Standard Deviation refers to the measure of
dispersion for interval and ratio scaled data, offers an
index of the spread of a distribution or the variability of
the data.

In a normal distribution:
1. Practically all observations fall within three standard
deviations of the average of the mean.
2. More than 90% of the observations are within two
standard deviations of the mean
3. More than half of the observations are within one
standard deviation of the mean
Standard Deviation
Interpretation of descriptive analysis from
SPSS
N Minimum Maximum Mean Std.
Deviation
Final Exam 154 0 74 48.21 12.806

Age 154 20 27 22.62 1.036

Valid 154

Answer the following questions:


1. What is the total number of respondents?
2. What is the range for final exam and age of the respondents?
3. What is the average age for the respondents and average marks the
students obtained for their final exam
4. From the value for the standard deviation, interpret what it means.
Age vs Frequency
From the histogram
shown, calculate
(estimate) the number of
respondents who are
20, 21, 22, 23, 24, 25, 26
and 27
Age and Final Exam

Age 21 Age 22 Age 23 Age 24 Age 25 Age 27


Final 40.80 47.26 49.12 50.94 59 48.50
Exam
(mean)
Minimum 0 23 21 13 50 42
Maximum 66 74 70 72 68 55
Standard 26.809 11.481 12.427 15.622 12.72 9.122
Deviation

From the above data, describe the relationship between age and the mean for
final exam, the relationship between the range for each age and standard
deviation.
What does standard deviation mean?
Hypothesis Testing
Hypothesis determines the validity of the assumption (null
hypothesis

T-test it is used to judge the significance of a sample


mean or of the difference between the means of two
samples when the sample size is small and population
standard deviation is not known.

F-test this test is used to compare two-independent


samples. This test is used in the form of ANOVA where
it is known as F-ratio , if the significance of more than two
sample means is required to be judged at a time.
Table 3: T-test for perceived risk towards online shopping
between online shoppers and non-shoppers

Dependent Group n Mean Std. t- Sig


Variables Deviation value
Perceived Sales Non-shopper 76 3.579 1.142 3.567 0.000
Risk Shopper 164 2.990 1.150

Perceived Non-shopper 76 2.691 0.907 3.035 0.003


Security Risk Shopper 164 2.271 1.100

Develop the null and alternative hypothesis for t-test.

From the above , interpret the results based on the hypothesis developed.
Table 2: ANOVA for perceived risk towards online shopping
Dependent Group n Mean Std. F- Sig
Variables Deviation value
Perceived Sales Sony 52 3.199 1.142 0.150 0.699
Risk Rio 56 3.133 1.150

Perceived Sony 52 2.375 0.907 0.002 0.964


Security Risk Rio 58 2.384 1.100

Develop the null and alternative hypothesis for ANOVA test

From the above , interpret the results based on the hypothesis developed
Relationships between variables (Relationship
between two nominal variables)
tests of significance
helps us to see whether or not two nominal
variables are related.

Correlations A Pearson correlation matrix will


indicate the direction, strength and significance
of the bivariate relationships among all the
variables that were measured at an interval or
ratio level.
Relationships between variables (Relationship
between two nominal variables)

Theoretically, there could be a perfect


positive correlation between two variables,
which is represented by 1.0 (plus 1) or a
perfect negative correlation which would be
-1.0 (minus 1).
Relationship between independent variables (age,
feedback, learning flexibility and attitude) versus
dependent variable (use of online assessment tools)

Age Feedback Learning Attitude


Flexibility
Pearson 0.035 0.170* 0.205* 0.212**
correlation
Significant (2- 0.668 0.036 0.011 0.008
tailed)
N 154 154 154 154

Write possible hypotheses to be tested for the above (null and alternative
hypothesis)
If the results obtained is as above, which of the hypothesis is accepted?
Testing goodness of data
Reliability is established by testing for both
consistency and stability. Consistency indicates
how well the items measuring a concept hang
together as a set. Cronbachs alpha is a
reliability coefficient that indicates how well the
items in a set are positively correlated to one
another. The closer Cronbachs alpha is to 1,
the higher the internal consistency reliability.
Testing goodness of data
Validity : extend that an instrument measures
what it supposed to measure
Factorial validity can be established by
submitting the data for factor analysis. The
results of the factor analysis will confirm
whether or not the theorized dimensions
emerge.
Criterion related validity can be established
by testing for the power of the measure to
differentiates individual who are known to be
different. Eg measure new mathematics test
and science test) use correlation coefficient
between score of new instrument with score of
other instrument at the same time.
Testing goodness of data
Convergent validity can be established when
there is a high degree of correlation between two
divergent sources responding to the same
measure.
Another way of expressing internal consistency;
highly reliable scales contain convergent validity
Discriminant validity - can be established when
two distinctly different concepts are not correlated
to each other with a measure of a different
construct.
Testing goodness of data
Construct Validity: the extend that an instrument
measures the construct/concept based on
theories or observations (use expert advise in a
particular area of study, based on certain theories
or observations and also correlating the score
using new inventory with the score using
established/standard inventory.
Content validity is the extent that an instrument measures
the content/cognitive processes/abilities as prescribed
by a course. This validity is more relevant to tests
(instruments) that measure the construct. It is usually
logically assessed by experts in a particular field of study,
based on the learning outcomes of a subject/course. One
way to assess the validity is to compare test questions with
learning outcomes
1340

EXHIBIT 13.7 Reliability and Validity on Target

2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Anaylsis