Академический Документы
Профессиональный Документы
Культура Документы
BS PSYCHOLOGY 1-D
Correlation is a statistical technique that can show whether and how strongly pairs
The state or relation of being correlated; specifically : a relation existing between
mathematical or statistical variables which tend to vary, be associated, or occur together in a way not expected on the basis of
chance alone
In statistics, the degree of association between two RANDOM VARIABLES. The correlation between the graphs of two data sets
is the degree to which they resemble each other.
Examples:
1.
2.
Researchers have found a direct correlation between smoking and lung cancer.
3.
She says that there's no correlation between being thin and being happy.
Correlation Coefficient
The main result of a correlation is called the correlation coefficient (or "r"). It ranges from -1.0 to +1.0. The closer
r is to +1 or -1, the more closely the two variables are related. If r is close to 0, it means there is no relationship
between the variables. If r is positive, it means that as one variable gets larger the other gets larger. If r is negative it
means that as one gets larger, the other gets smaller (often called an "inverse" correlation).
A correlation report can also show a second result of each test - statistical significance. In this case, the significance
level will tell you how likely it is that the correlations reported may be due to chance in the form of random sampling
error. If you are working with small sample sizes, choose a report format that includes the significance level. This
format also reports the sample size.
A key thing to remember when working with correlations is never to assume a correlation means that a change in
one variable causes a change in another. Sales of personal computers and athletic shoes have both risen strongly in
the last several years and there is a high correlation between them, but you cannot assume that buying computers
causes people to buy athletic shoes (or vice versa).
The second caveat is that the Pearson correlation technique works best with linear relationships: as one variable gets
larger, the other gets larger (or smaller) in direct proportion. It does not work well with curvilinear relationships (in
which the relationship does not follow a straight line). An example of a curvilinear relationship is age and health
care. They are related, but the relationship doesn't follow a straight line. Young children and older people both tend
to use much more health care than teenagers or young adults. Multiple regression (also included in the Statistics
Module) can be used to examine curvilinear relationships, but it is beyond the scope of this article.
SOURCES:
http://www.surveysystem.com/correlation.htm
http://www.merriam-webster.com/dictionary/correlation
a functional relationship between two or more correlated variables that is often empirically determined from data
and is used especially to predict values of one variable when given values of the others
a function that yields the mean value of a random variable under the condition that one or more independent
variables have specified values
In statistics, regression analysis is a statistical process for estimating the relationships among variables. It
includes many techniques for modeling and analyzing several variables, when the focus is on the relationship
between a dependent variable and one or more independent variables.
Regression analysis is used when you want to predict a continuous dependent variable from a number of
independent variables.
Number of cases
When doing regression, the cases-to-Independent Variables (IVs) ratio should ideally be 20:1; that is 20 cases for
every IV in the model. The lowest your ratio should be is 5:1 (i.e., 5 cases for every IV in the model).
Linearity
Regression analysis also has an assumption of linearity. Linearity means that there is a straight line relationship
between the IVs and the DV. This assumption is important because regression analysis only tests for a linear
relationship between the IVs and the DV. Any nonlinear relationship between the IV and DV is ignored. You can test
for linearity between an IV and the DV by looking at a bivariate scatterplot (i.e., a graph with the IV on one axis and
the DV on the other). If the two variables are linearly related, the scatterplot will be oval.
concept, is calculated by 1-SMC. Tolerance is the proportion of a variable's variance that is not accounted for by the
other IVs in the equation. You don't need to worry too much about tolerance in that most programs will not allow a
variable to enter the regression model if tolerance is too low.
Statistically, you do not want singularity or multicollinearity because calculation of the regression coefficients is
done through matrix inversion. Consequently, if singularity exists, the inversion is impossible, and if
multicollinearity exists the inversion is unstable. Logically, you don't want multicollinearity or singularity because if
they exist, then your IVs are redundant with one another. In such a case, one IV doesn't add any predictive value over
another IV, but you do lose a degree of freedom. As such, having multicollinearity/ singularity can weaken your
analysis. In general, you probably wouldn't want to include two IVs that correlate with one another at .70 or greater.
Sources:
http://www.merriam-webster.com/dictionary/regression
http://en.wikipedia.org/wiki/Regression_analysis
http://dss.princeton.edu/online_help/analysis/regression_intro.htm