Вы находитесь на странице: 1из 62

Multivariate

Data Analysis
Text Book:
Analysis 6th/7th edition by

Multivariate Data
Joseph F. Hair, Jr;William
C. Black ;Barry J. Babin and Ronald L. Tatham. Published by Dorling
Kindersely. Pearson Education in South Asia.

REFERENCE BOOKS AND MATERIAL

Darren George Paul Mallery: SPSS for Windows Step by Step, 8th Edition, Pearson,
2008
Introduction to Statistics By Wallpole 3rd Edition
Francis, A. (2004). Business Mathematics and Statistics (6th ed.). Int. Thomson Business
Press.

Nadeem Talib

PhD. (Scholar)-Management Sci.


MS (Mgt. Sci.),
MBA
M.Sc .(Maths),
DCS

Software/Statistical Packages

SPSS AND AMOS

Tree

Stress

Minor root

Research
Use:

Basic: produces new knowledge


Applied: produces answers and solutions
Purpose:

Exploratory: uncover new


elements\relationships
Descriptive: detailed picture
Explanatory: examines causal
relationships

Research

Time
Cross-Sectional: observations at one time point
Longitudinal: observations across time points
Design
Qualitative: open questions and verbal data
Quantitative: specific questions and numeric
data
Choices are guided by your question and resources

What is Multivariate analysis

Multivariate refers to all statistical techniques


that simultaneously analyze multiple
measurements on the individual or objects
under investigation. Here multivariate simple
means examining relationships between or
among more than two variables.
Thus, any simultaneous analysis of more than
two variables can loosely considered
multivariate analysis.
Many multivariate techniques are extension of
univariate analysis of single-variable variance,
and bi-variate analysis (Cross-classification,
correlation, analysis of variance, and simple
regression used to analyze two variables.

Why
The need for multivariate analysis comes from the
simple fact of life that just about everything is in some
way interrelated with other things.
Inflation, for instance, is related to taxes, interest
rates, the money supply, oil prices, the business
cycle, foreign wars, and a good deal more. Buyers'
reactions to an advertisement are related to the price
of the item, competitors, warranty terms, previous
experiences with the product, conversations with
neighbors, credibility of the actor used in the
commercial, and season of the year.

What is Multivariate analysis

For example simple regression (with one predictor variable ) is


extended in the multivariate case to include several predictor
variables.

Similarly single dependent variable found in the analysis of


variance is extended to include multiple dependent variables
in multivariate analysis of variance.
Uni-variate analysis of single variable distribution.

Bi-variate analysis-correlation or simple regression regression


used to analyze two variable.

Some Multivariate techniques e.g. multiple regression,


multivariate analysis of variance, factor analysis, discriminant
analysis. Here factor analysis which identifies the structure
underlying a set of variables and discriminant analysis which
differentiates among groups based on the variables.

Basic Concepts in
multi-Variate analysis
The Variate:- Variate is the linear combination
of variables with empirically determined
weights. The variables are specified by
researchers and the weights are determined by
multi-variate techniques to meet your
objectives.
w1 x1 w2 x 2 w3 x3 .......... wn x n
Variate Value
Here Xn are the observed variable and Wn are
the weights determined by multi-variate
techniques.

The result is a single value representing a


combination of the entire set of variables
that best achieves the objective of the
specific multivariate analysis.
Example:
In multiple regression ,the variate is
determined in a manner that maximizes the
correlation between multiple independent
variables and single dependent variable.
In Factor analysis ,the variates are formed
to best represent the underlying structure or
patterns of variables as represented by their
inter-correlation.

How to choose
appropriate statistical
test
Exploratory

Descriptive

Hypothesis Testing
Multivariate Methods:

Parametric Vs.
Non Parametric

Level of Measurement

Hypothesis Testing

Measure of
Difference

Measure of
Association

Measure of Difference
If test variable is
not Normal

Two Groups

Multiple Groups

One Group

IR
One-Sample
t-test

no
Chi-Square
one sample
K-S one
Sample
Run Test

K-W
Related
Samples

IR
Paired
Sample
t-test
Sign test
Wilcoxen
Match
pair

Independent
Samples

IR
Indp.
Sample ttest

ANOVA

no

TSFD
in sub
catego
ries

Post Hoc

Mann
Whitney
test-S

Tuckey
Test

Hom
o

Hetro

Duntts-3
Test

Measure of Association

Dependency

One Dependent

Causal

Simple
Regress
ion

Interdependency

Multiple dependents

Correlation

Multiple Regression
Stepwise,forward,backward

MANOVA
MANCOVA

Pearson Correlation
Spearman
Partial

Level of
Measurements
Nominal
Binomial
Chi square
one sample
McNemar
Fisher Exact
&
Chi square
two samples
test
Chi square
for k samples

Ordinal

Interval

K-S one
sample test
Run test
Wicoxon
matched pair
test
Mann-whitney
U test & K-S
Friedman two
way ANOVA
Kruskal-Wallis
test

t-test
t-test for
paired samples
t-test
repeated
measures
ANOVA
one way
ANOVA
MANOVA
Simple
regression
Multiple
Regression

Ratio

Level of Measurement
The process of assigning numbers to
objects
Measurement is used to capture some
construct
For example, if research is needed
on the construct of depression, it
is likely that some systematic
measurement tool will be needed
to assess depression.

Types of Measurement Scales

1. Nominal

2. Ordinal

3. Interval

Scale

4. Ratio

Types of Measurement Scales


Nominal Scales - A type of categorical data in which
objects fall into unordered categories.. Therefore, no
comparison can be made in terms of one category being
higher than the other.
Examples:
country of origin , Social Class, Income class etc.
biological sex (male or female)
Hair color
blonde, brown, red, black, etc.
Race
Caucasian, African-American, Asian, etc.
Smoking status
smoker, non-smoker
17

Nominal Scale

Sometimes numbers are used to designate


category membership

Example:
Country of Origin
1 = United States
3 = Canada
2 = Mexico 4 = Other 5. Occupation
6.Political party affiliation.

However, in this case, it is important to keep in


mind that the numbers do not have intrinsic
meaning

Ordinal Scales

A type of categorical data in which order is


important.Data values represent categories with
some intrinsic order (for example, low, medium,
high; strongly agree, agree, disagree, strongly
disagree).

Ordinal scales provide no measure of the actual magnitude in


absolute terms ,only the order of values. The researcher
knows the order but not the amount of difference between the
values.

For example - final position of horses in a thoroughbred race


is an ordinal variable. The horses finish first, second, third,
fourth, and so on. The difference between first and second is
not necessarily equivalent to the difference between second
and third, or between third and fourth.

Ordinal Scales

Does not assume that the intervals between numbers are equal

Example:
finishing place in a race (first place, second place)

1st place

1 hour

2 hours

2nd place 3rd place

3 hours

4 hours

4th place

5 hours

6 hours

7 hours

8 hours

20

Ratio Scales - captures the properties of the other


types of scales, but also contains a true zero, which
represents the absence of the quality being
measured.
So classification +order + distance + absolute
zero
For example - heart beats per minute has a very
natural zero point. Zero means no heart beats.
Weight (in grams) is also a ratio variable. Again, the
zero value is meaningful, zero grams means the
absence of weight.
Examples: age ,Income, weight etc.
21

Ratio

Thus the difference between a person of 35 and a person


38 is the same as the difference between people who are
12 and 15. A person can also have an age of zero.

Ratio data can be multiplied and divided because not


only is the difference between 1 and 2 the same as
between 3 and 4, but also that 4 is twice as much as 2.
Interval and ratio data measure quantities and hence
are quantitative. Because they can be measured on a
scale, they are also called scale data.

Types of Data and Measurement Scales


Data

Metric
or
Quantitative

Nonmetric
or
Qualitative

Nominal
Scale

Ordinal
Scale

Interval
Scale

Ratio
Scale

SO

Religion
Social

Class
Temperature
Weight

The impact of choice of


measurement scale

Understanding the different types of measurement


scales is important for TWO reasons.
1.the research must identify the measurement scale
of each variable used, so that non-metric data are
not incorrectly used as metric data and vice versa
(as in case of representing gender as 1 for male
and 2 for female). If the researcher incorrectly
defines this measurement as metric, then it may be
used in-appropriately (e.g. finding the mean value
of gender)
The measurement scale is also critical in
determining which multivariate techniques are most
applicable to the data, with consideration made for
both independent and dependent variables.

Parametric Vs. Non Parametric


Whether or not the mean and standard
deviation of scores are used in a test
Generally use parametric test for
measurements of interval or ratio level.
For lower measurement levels use Nonparametric tests
When

sample sizes are small, or variables are not


normally distributed, revert to non-parametric tests

Nonparametric statistics or distribution-free tests are


those that do not rely on parameter estimates or precise
assumptions about the distributions of variables.

Parametric /Non
Parametric

Parametric
These tests are more powerful because
their data are derived from ratio and
interval data.

Non-parametric
These tests are used with on nominal
and ordinal data.

Assumption of
Parametric Tests

The observation must be independent (the


selection of one case should not affect the chances
for any other case to be included in the sample).

The observation should be drawn from normally


distributed populations.

These populations should have equal variances.


The measurement scales should be at least interval
so that arithmetic operations can be used with
them.

Why Learn Statistics?


So you are able to make better sense of
the ubiquitous use of numbers:
Business memos
Business research
Technical reports
Technical journals
Newspaper articles
Magazine articles
Chap 1-30

What is statistics?

A branch of mathematics taking and transforming


numbers into useful information for decision makers

Methods for processing & analyzing numbers

Methods for helping reduce the uncertainty inherent


in decision making

Chap 1-31

Why Study Statistics?


Decision Makers Use Statistics To:

Present and describe business data and information


properly
Draw conclusions about large groups of individuals or
items, using information collected from subsets of the
individuals or items.
Make reliable forecasts about a business activity
Improve business processes

Basic Business Statistics, 11e 2009 Prentice-Hall, Inc..

Chap 1-32

Types of Statistics
Statistics
The

branch of mathematics that transforms


data into useful information for decision
makers.
Descriptive Statistics
Collecting, summarizing, and
describing data

Inferential Statistics
Drawing conclusions and/or
making decisions concerning a
population based only on sample
data

Chap 1-33

Descriptive Statistics
Collect

data

e.g., Survey
Present

data

e.g., Tables and graphs


X

Characterize data

e.g., Sample mean =


Chap 1-34

Inferential Statistics
Estimation

e.g., Estimate the


population mean weight
using the sample mean
weight
Hypothesis

testing

e.g., Test the claim that


the population mean
weight is 120 pounds
Drawing conclusions about a large group of
based
on a subset of the large group.
Basic individuals
Business Statistics, 11e 2009
Prentice-Hall, Inc..

Chap 1-35

Basic Vocabulary of
Statistics
VARIABLE
A variable is a characteristic of an item or individual.
DATA
Data are the different values associated with a variable.
OPERATIONAL DEFINITIONS
Data values are meaningless unless their variables have
operational definitions, universally accepted meanings that
are clear to all associated with an analysis.

Chap 1-36

Basic Vocabulary of
Statistics
POPULATION
A population consists of all the items or individuals about
which you want to draw a conclusion.

SAMPLE
A sample is the portion of a population selected for analysis.
PARAMETER
A parameter is a numerical measure that describes a
characteristic of a population.
STATISTIC
A statistic is a numerical measure that describes a
characteristic of a sample.
Chap 1-37

Population vs. Sample


Population

Measures used to describe the


population are called parameters

Sample

Measures computed from


sample data are called statistics
Chap 1-38

Why Collect Data?

A marketing research analyst needs to assess the effectiveness


of a new television advertisement.

A pharmaceutical manufacturer needs to determine whether a


new drug is more effective than those currently in use.

An operations manager wants to monitor a manufacturing


process to find out whether the quality of the product being
manufactured is conforming to company standards.

An auditor wants to review the financial transactions of a


company in order to determine whether the company is in
compliance with generally accepted accounting principles.

Chap 1-39

Sources of Data
Primary Sources: The data collector is the one using the data for
analysis
Data from a political survey
Data collected from an experiment
Observed data
Secondary Sources: The person performing data analysis is not the
data collector
Analyzing census data
Examining data from print journals or data published on the
internet.

Chap 1-40

Sources of data fall into


four categories
Data

distributed by an organization or
an individual

designed experiment

survey

An

observational study
Chap 1-41

Types of Variables

Categorical (qualitative) variables have


values that can only be placed into
categories, such as yes and no.

Numerical (quantitative) variables have


values that represent quantities.

Chap 1-42

Types of Data
Data

Categorical

Numerical

Examples:

Marital Status
Political Party
Eye Color
(Defined categories)

Discrete
Examples:

Number of Children
Defects per hour
(Counted items)

Continuous
Examples:

Weight
Voltage
(Measured characteristics)
Chap 1-43

Levels of Measurement

A nominal scale classifies data into distinct


categories in which no ranking is implied.
Categorical Variables

Categories

Personal Computer
Ownership

Yes / No

Type of Stocks Owned

Growth Value Other

Internet Provider

Microsoft Network / AOL/ Other

Chap 1-44

Levels of Measurement

An ordinal scale classifies data into distinct


categories in which ranking is implied
Categorical Variable

Ordered Categories

Student class designation

Freshman, Sophomore, Junior,


Senior

Product satisfaction

Satisfied, Neutral, Unsatisfied

Faculty rank

Professor, Associate Professor,


Assistant Professor, Instructor

Standard & Poors bond ratings

AAA, AA, A, BBB, BB, B, CCC,


CC, C, DDD, DD, D

Student Grades

A, B, C, D, F
Chap 1-45

Levels of Measurement

An interval scale is an ordered scale in which


the difference between measurements is a
meaningful quantity but the measurements do not
have a true zero point.

A ratio scale is an ordered scale in which the


difference between the measurements is a
meaningful quantity and the measurements have
a true zero point.

Chap 1-46

Interval and Ratio


Scales

Basic Business Statistics, 11e 2009 Prentice-Hall, Inc..

Chap 1-47

Two Types of
Multivariate Techniques:
1. Dependence
2. Interdependence

Classification of Multivariate Methods:


Multivariate
Methods

Dependence
Methods

One
Dependent
Variable

Metric

Multiple
Regression
and Conjoint

Multiple
Relationships
Structural
Equations

Nonmetric

Discriminant
Analysis
and Logit

Interdependence
Methods

Several
Dependent
Variables

Metric

MANOVA
and
Canonical

Nonmetric
Canonical
Correlation,
Dummy
Variables

Metric

Factor
Analysis

Nonmetric

Cluster
Analysis

Metric
MDS

Nonmetric
MDS and
Correspondence
Analysis

Variate (Y) = X1W1 + X2W2 + . . . + XnWn


Each respondent has a variate value (Y).
The Y value is a linear combination of the entire set of

variables that best achieves the statistical objective.

Potential Independent Variables:


X1 = income
X2 = education
X3 = family size
X4 = occupation
X5 = ? ?

Multiple Regression
A metric dependent variable
is predicted by several
metric independent variables.

Multiple Regression
Dependent = # of credit cards
For Example monthly

Independent Variables

expenditure on
the dinning out (Dependent

X1 = income
X2 = education
X3 = family size

Variable)
Might be predicted from
information regarding
a familys income, its size,

X4 = occupation

and the age Of the head of

X5 = ? ?

household (independent
Variables)

Example of Multiple Regression Application in


Financial Services Industry

Bank Selection Factors:


Outcome Measures:
1.

Customer Satisfaction

2.

Likely to Recommend

3.

Future Purchases

1.

Trust

2.

Competent Employees

3.

Excellent Customer Service

4.

Good Financial Services

5.

Friendly Employees

6.

Interest Rates Paid

7.

Convenient Locations

8.

Interest Rates Charged

9.

Care about Community

10. Open When You Want


11. Innovative Services

Discriminant Analysis
A non-metric (categorical)
dependent variable is predicted by
several metric independent variables.

Discriminant Analysis
Dependent Variable = Credit Risk
(Good Risk vs. Bad Risk)
Independent Variables:
X1 = income
X2 = education
X3 = family size
X4 = occupation
X5 = ? ?

Example of Discriminant Analysis Application in


Consumer Products Industry

Bath Soap
Product Features:

Outcome Measure:

Will Purchase
Will Not Purchase

Pleasant smell.
Skin creme feel.
Lathers well.
Cleans well.
Deodorant.

Rinses off easily.


Moisturizing.
No soap residue in soap dish.
No ring around sink or tub.

Structural Equations Modeling (SEM)


Estimates multiple, interrelated
dependence relationships based on
two components:
1.

Structural Model

2.

Measurement Model

STRUCTURAL EQUATIONS MODEL


Perceived
Social
Support

Perceived
Self
Efficacy

Service
Quality
Expectations

Experience

Information
Search

Factor Analysis
. . . . analyzes the structure of the
interrelationships among a large number of
variables to determine a set of common
underlying dimensions (factors).

Cluster Analysis
. . . . groups objects (respondents, products,
firms, variables, etc.) so that each object is
similar to the other objects in the cluster and

different from objects in all the other clusters.

Dependence Technique

The different dependence techniques can be categorized by


two characterized
(1) the number of dependent variables ( i.e. single or multiple)
and
(2) the type of measurement scale employed by the
variables( i.e. metric or non-metric)

Factor analysis Example

Assume you ask customers to rate the restaurant


on the following six variables:
Food taste, food temperature, freshness , waiting
time, cleanliness, and friendliness of employees
The analyst would like to combine these six
variables into a smaller number. By analyzing the
customer responses, the analyst might find that the
variables
food taste, temperature and freshness combine
together to form a single factor of food quality,
While the variables waiting time, cleanliness an
friendliness of employees combine to form another
single factors Service Quality

Вам также может понравиться