Вы находитесь на странице: 1из 21

BAIT 6022 - Advanced

Modeling Techniques

Course Faculty: Dr. Sridhar Vaithianathan


Reach me @: Office 2
Mobile: 99899 04245
Email: sridhar.v@imthyderabad.edu.in
(or) sridhar_we@yahoo.com
Consultation: Anytime during office hours with prior
appointment.
Statistical Modeling - Fundamentals

February 9, 2017 Statistical Modeling: Dr.SVN 2


Presentation Outline
Introduction to Statistical Modeling (SM)
First things first
Best Fit Model
The principle of Parsimony Occams
Razor
Types of Statistical Model
Steps involved in model simplification
February 9, 2017 Statistical Modeling: Dr.SVN 3
Introduction to Statistical Modeling
(SM)
The hardest part of any statistical work is :
Getting Started.
And one of the hardest things about getting
started is:
Choosing the right kind of statistical
analysis.
Choice of statistical Analysis depends on:
Question trying to Answer.
Key is to identify:
Response Variable & Explanatory Variables
Response Variable whose variation we are trying to understand.

February 9, 2017 Statistical Modeling: Dr.SVN 4


VARIABLES
Which Variable to include : purport to measure.
A continuous measurement is a variable such as height or weight that
can take any real numbered value.
A categorical variable is a factor with two or more levels: sex is a
factor with two levels (male and female), and colour might be a factor
with seven levels (red, orange, yellow, green, blue, indigo, violet).
It is essential, therefore, that you can answer
the following questions:
Which of your variables is the response variable?
Which are the explanatory variables?
Are the explanatory variables continuous or categorical, or a mixture
of both?
What kind of response variable do you have: is it a continuous
measurement, a count, a proportion, a time at death, or a category?

February 9, 2017 Statistical Modeling: Dr.SVN 5


Selecting appropriate statistical
method:
The explanatory variables
(a) All explanatory variables continuous Regression
(b) All explanatory variables categorical Analysis of variance (ANOVA)
(c) Explanatory variables both continuous and categorical Analysis of
covariance (ANCOVA)
The response variable
(a) Continuous Normal regression, ANOVA or ANCOVA
(b) Proportion Logistic regression
(c) Count Log-linear models
(d) Binary Binary logistic analysis
(e) Time at death Survival analysis
The object is to determine the values of the
parameters in a specific model that lead to the
best fit of the model to the data.

February 9, 2017 Statistical Modeling: Dr.SVN 6


First things first

The commonest mistake is to try to do the


statistical modelling straight away.

The best thing is to spend a substantial


amount of time, right at the outset, getting
to know your data and what they show.

Guide your thinking as to exactly what kind


of statistical modelling is most appropriate.

February 9, 2017 Statistical Modeling: Dr.SVN 7


Initial Steps
In a spreadsheet, make sure the dataframe is correct in structure
and content:
Do all of the values of each variable appear in the same column?
Are all the zeros really 0, or should they be NA?
Does every row contain the same number of entries?
Are there any variable names with blank spaces in them?
Read the dataframe into R using read.table (or read.csv if factor
levels (like place names) contain blank spaces) (p. 139).
Look at the head and the tail of the dataframe and check for
mistakes (p. 161).
Plot every one of the variables on its own to check for gross errors
(plot(x), plot(y) etc.; see p. 190).
Look at the relationships between variables (use tapply, plot, tree
and gam) (p. 768).
Think about model choice (p. 1)

February 9, 2017 Statistical Modeling: Dr.SVN 8


Initial Steps continued

Which explanatory variables should be included?


What transformation of the response is most appropriate?
Which interactions should be included?
Which non-linear terms should be included?
Is there pseudo-replication, and if so, how should it be dealt with?
Should the explanatory variables be transformed?
Try to use the simplest kind of analysis that is appropriate to your data
and the question you are trying to answer (e.g. do a one-way ANOVA
rather than a mixed-effects model) (p. 344).
Fit a maximal model and simplify it by stepwise deletion (p. 391).
Check the minimal adequate model for constancy of variance and
normality of errors using plot(model) (p. 405).
Emphasize the effect sizes and standard errors (summary.lm), and play
down the analysis of deviance table (summary.aov) (p. 382).
Document carefully what you have done, and explain all the steps you
took. That way, you should be able to understand what you did and why
you did it, when you return to the analysis in 6 months time.

February 9, 2017 Statistical Modeling: Dr.SVN 9


Best Fit Model
What, exactly, do we mean when we say that the parameter values
should afford the best fit of the model to the data?

The convention we adopt is that our techniques should lead to


unbiased, variance-minimizing estimators.

We define best in terms of maximum likelihood.

This notion may be unfamiliar, so it is worth investing some time to get a


feel for it. This is how it works:
given the data,
and given our choice of model,
what values of the parameters of that model
make the observed data most likely?
We judge the model on the basis how likely the data would be if the
model were correct.

February 9, 2017 Statistical Modeling: Dr.SVN 10


The principle of parsimony
(Occams razor)
Given a set of equally good explanations for a given phenomenon,
the correct explanation is the simplest explanation.
- William of Occam, 14th Century.
For statistical modelling, the principle of parsimony means that:
models should have as few parameters as possible;
linear models should be preferred to non-linear models;
experiments relying on few assumptions should be preferred to those
relying on many;
models should be pared down until they are minimal adequate;
simple explanations should be preferred to complex explanations.
The process of model simplification is an integral part of hypothesis testing
in R. In general, a variable is retained in the model only if it causes a
significant increase in deviance when it is removed from the current model.

A model should be as simple as possible Einstein.


However, we must be careful not to throw the baby out with the
bathwater.

February 9, 2017 Statistical Modeling: Dr.SVN 11


Types of statistical model
Fitting models to data is the central
function of R.

The process is essentially one of exploration.

There are no fixed rules and no absolutes.

The object is to determine a minimal adequate


model from the large set of potential models
that might be used to describe the given set of
data.
February 9, 2017 Statistical Modeling: Dr.SVN 12
February 9, 2017 Statistical Modeling: Dr.SVN 13
TAM Model Measurement Model
(CFA)

Sridhar Vaithianathan, IMT


Hyderabad 14
TAM Model Structural Model
(Hypotheses Testing)

Sridhar Vaithianathan, IMT


Hyderabad 15
February 9, 2017 Statistical Modeling: Dr.SVN 16
Types of statistical model continued

Parsimony says that, other things being equal,


we prefer:
a model with n 1 parameters to a model with n parameters;
a model with k 1 explanatory variables to a model with k
explanatory variables;
a linear model to a model which is curved;
a model without a hump to a model with a hump;
a model without interactions to a model containing interactions
between factors.
All the above are subject, of course, to the
caveats that the simplifications make good
scientific sense and do not lead to significant
reductions in explanatory power.

February 9, 2017 Statistical Modeling: Dr.SVN 17


Scale of Measurement
Just as there is no perfect model, so there may
be no optimal scale of measurement for a
model.

Suppose, for example, we had a process that had Poisson errors


with multiplicative effects amongst the explanatory variables.
Then, we must choose between three different scales, each of
which optimizes one of three different properties:
the scale of y would give constancy of variance;

the scale of y2/3 would give approximately normal errors;

the scale of ln(y) would give additivity.

Thus, any measurement scale is always going to be a compromise,


and we should choose the scale that gives the best overall
performance of the model.

February 9, 2017 Statistical Modeling: Dr.SVN 18


Steps involved in model
simplification

February 9, 2017 Statistical Modeling: Dr.SVN 19


SUMMARY
Choosing the right kind of statistical
analysis
spend a substantial amount of time, right at

the outset, getting to know your data and


what they show.
Maximum Likelihood Best Fit Model to DATA

Select Appropriate Variable Parsimony

Determine a minimal adequate model

Choose the scale that gives the best overall

performance of the model.


Follow Model Simplification Steps

February 9, 2017 Statistical Modeling: Dr.SVN 20


Next Week

Model Criticism
Model Checking
Summary of Statistical Models in R
Misspecified Model
Model Checking - Hands on in R
Statistical Modeling Summary

February 9, 2017 Statistical Modeling: Dr.SVN 21

Вам также может понравиться