Академический Документы
Профессиональный Документы
Культура Документы
Data
Yue Jiao
Screening data
Deal with
(1) Accuracy
(2) Missing data
(3) Fit between data set and the assumptions
(4) Transformations of variables
(5) Outliers
(6) perfect or near- perfect correlations
Accuracy
Proofreading: For large number of data, screening
for accuracy involves examination of descriptive
statistics and graphic representations of the
variables
Honest correlation: It is important that the
correlations, whether between two continuous
variables or between a dichotomous and
continuous variable, be as accurate as possible
Accuracy
Inflated Correlation: When composite variables are
constructed from several individual items by
pooling responses to individual items, correlations
are inflated if some items are reused
Deflated Correlation: A falsely small correlation
between two continuous variables is obtained if the
range of responses to one or both of the variables
is restricted in the sample
Missing Data
Missing data are characterized as:
MCAR (missing completely at random),
MAR (missing at random, called ignorable nonresponse),
MNAR (missing not at random or nonignorable).
Missing Data:
(1) Delete missing data, if only a few cases have missing
data and they seem to be a random subsample of the
whole sample.
(2) Estimate missing data, using prior knowledge;
inserting mean values; using regression; expectationmaximization; and multiple imputation.
(3) Another option with randomly missing data involves
analysis of a missing data correlation matrix.
(4) Treat missing data as data.
(5) Repeating Analyses With and Without Missing Data
Outlier
An outlier is a case with such an extreme value on one variable (a
univariate outlier) or such a strange combination of scores on two
or more variables (multivariate outlier) that it distorts statistics.
Reason:
(1) incorrect data entry.
(2) failure to specify missing-value codes in computer syntax so that
missing-value indicators are read as real data.
(3) the outlier is not a member of the population from which you
intended to sample.
(4) intended population but the distribution for the variable in the
population has more extreme values than a normal distribution
Normality
Two components of normality:
(1) Skewness has to do with the symmetry of the
distribution; a skewed variable is a variable whose
mean is not in the center of the distribution.
(2) Kurtosis has to do with the peakedness of a
distribution; a distribution is either too peaked (with
short, thick tails) or too flat (with long, thin tails).
When a distribution is normal, the values of
skewness and kurtosis are zero
Linearity
The assumption of linearity is that there is a
straight-line relationship between two variables
(where one or both of the variables can be
combinations of several variables).
Linearity is important in a practical sense because
Pearsons r only captures the linear relationships
among variables; if there are substantial nonlinear
relationships among variables, they are ignored.
Homoscedasticity
Assumption of homoscedasticity is that the
variability in scores for one continuous variable is
roughly the same at all values of another
continuous variable.
Homoscedasticity is related to the assumption of
normality because when the assumption of
multivariate normality is met, the relationships
between variables are homoscedastic.