Correlation

PEREZ, EULA JEAN MARIE G.
BS PSYCHOLOGY 1-D
Psy Stat 125

Mrs. Patricia Mico F. Verdillo
Correlation is a statistical technique that can show whether and how strongly pairs
The state or relation of being correlated; specifically : a relation existing between
of variables are related.

phenomena or things or between
mathematical or statistical variables which tend to vary, be associated, or occur together in a way not expected on the basis of
chance alone
In statistics, the degree of association between two RANDOM VARIABLES. The correlation between the graphs of two data sets
is the degree to which they resemble each other.
Techniques in Determining Correlation

There are several different correlation techniques. The Survey System's optional Statistics Module includes the
most common type, called the Pearson or product-moment correlation. The module also includes a variation on this
type called partial correlation. The latter is useful when you want to look at the relationship between two variables
while removing the effect of one or two other variables.
Like all statistical techniques, correlation is only appropriate for certain kinds of data. Correlation works for
quantifiable data in which numbers are meaningful, usually quantities of some sort. It cannot be used for purely
categorical data, such as gender, brands purchased, or favorite color.
Examples:
1.
the correlation of brain size and intelligence
2.
Researchers have found a direct correlation between smoking and lung cancer.
3.
She says that there's no correlation between being thin and being happy.
Correlation Coefficient
The main result of a correlation is called the correlation coefficient (or "r"). It ranges from -1.0 to +1.0. The closer
r is to +1 or -1, the more closely the two variables are related. If r is close to 0, it means there is no relationship
between the variables. If r is positive, it means that as one variable gets larger the other gets larger. If r is negative it
means that as one gets larger, the other gets smaller (often called an "inverse" correlation).
A correlation report can also show a second result of each test - statistical significance. In this case, the significance
level will tell you how likely it is that the correlations reported may be due to chance in the form of random sampling
error. If you are working with small sample sizes, choose a report format that includes the significance level. This
format also reports the sample size.
A key thing to remember when working with correlations is never to assume a correlation means that a change in
one variable causes a change in another. Sales of personal computers and athletic shoes have both risen strongly in
the last several years and there is a high correlation between them, but you cannot assume that buying computers
causes people to buy athletic shoes (or vice versa).
The second caveat is that the Pearson correlation technique works best with linear relationships: as one variable gets
larger, the other gets larger (or smaller) in direct proportion. It does not work well with curvilinear relationships (in
which the relationship does not follow a straight line). An example of a curvilinear relationship is age and health
care. They are related, but the relationship doesn't follow a straight line. Young children and older people both tend
to use much more health care than teenagers or young adults. Multiple regression (also included in the Statistics
Module) can be used to examine curvilinear relationships, but it is beyond the scope of this article.
SOURCES:
http://www.surveysystem.com/correlation.htm
http://www.merriam-webster.com/dictionary/correlation
PEREZ, EULA JEAN MARIE G.

BS PSYCHOLOGY 1-D
Psy Stat 125

Mrs. Patricia Mico F. Verdillo
a functional relationship between two or more correlated variables that is often empirically determined from data
and is used especially to predict values of one variable when given values of the others
a function that yields the mean value of a random variable under the condition that one or more independent
variables have specified values
In statistics, regression analysis is a statistical process for estimating the relationships among variables. It
includes many techniques for modeling and analyzing several variables, when the focus is on the relationship
between a dependent variable and one or more independent variables.
Regression analysis is used when you want to predict a continuous dependent variable from a number of
independent variables.
Number of cases
When doing regression, the cases-to-Independent Variables (IVs) ratio should ideally be 20:1; that is 20 cases for
every IV in the model. The lowest your ratio should be is 5:1 (i.e., 5 cases for every IV in the model).
Simple Linear Regression

Simple linear regression is when you want to predict values of one variable, given values of another variable.
For example, you might want to predict a person's height (in inches) from his weight (in pounds). Imagine a
sample of ten people for whom you know their height and weight.
Standard Multiple Regression

Standard multiple regression is the same idea as simple linear regression, except now you have several
independent variables predicting the dependent variable. To continue with the previous example, imagine that
you now wanted to predict a person's height from the gender of the person and from the weight. You would use
standard multiple regression in which gender and weight were the independent variables and height was the
dependent variable.
Linearity
Regression analysis also has an assumption of linearity. Linearity means that there is a straight line relationship
between the IVs and the DV. This assumption is important because regression analysis only tests for a linear
relationship between the IVs and the DV. Any nonlinear relationship between the IV and DV is ignored. You can test
for linearity between an IV and the DV by looking at a bivariate scatterplot (i.e., a graph with the IV on one axis and
the DV on the other). If the two variables are linearly related, the scatterplot will be oval.
Multicollinearity and Singularity

Multicollinearity is a condition in which the IVs are very highly correlated (.90 or greater) and singularity is when
the IVs are perfectly correlated and one IV is a combination of one or more of the other IVs. Multicollinearity and
singularity can be caused by high bivariate correlations (usually of .90 or greater) or by high multivariate
correlations. High bivariate correlations are easy to spot by simply running correlations among your IVs. If you do
have high bivariate correlations, your problem is easily solved by deleting one of the two variables, but you should
check your programming first, often this is a mistake when you created the variables. It's harder to spot high
multivariate correlations. To do this, you need to calculate the SMC for each IV. SMC is the squared multiple
correlation ( R2 ) of the IV when it serves as the DV which is predicted by the rest of the IVs. Tolerance, a related
concept, is calculated by 1-SMC. Tolerance is the proportion of a variable's variance that is not accounted for by the
other IVs in the equation. You don't need to worry too much about tolerance in that most programs will not allow a
variable to enter the regression model if tolerance is too low.
Statistically, you do not want singularity or multicollinearity because calculation of the regression coefficients is
done through matrix inversion. Consequently, if singularity exists, the inversion is impossible, and if
multicollinearity exists the inversion is unstable. Logically, you don't want multicollinearity or singularity because if
they exist, then your IVs are redundant with one another. In such a case, one IV doesn't add any predictive value over
another IV, but you do lose a degree of freedom. As such, having multicollinearity/ singularity can weaken your
analysis. In general, you probably wouldn't want to include two IVs that correlate with one another at .70 or greater.
Sources:
http://www.merriam-webster.com/dictionary/regression
http://en.wikipedia.org/wiki/Regression_analysis
http://dss.princeton.edu/online_help/analysis/regression_intro.htm

Correlation

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Correlation

Загружено:

Авторское право:

Доступные форматы

PEREZ, EULA JEAN MARIE G.

Psy Stat 125

of variables are related.

Techniques in Determining Correlation

the correlation of brain size and intelligence

PEREZ, EULA JEAN MARIE G.

Psy Stat 125

Simple Linear Regression

Standard Multiple Regression

Multicollinearity and Singularity

Вам также может понравиться