19 views

Uploaded by Zeinab Goda

class notes about correlation and regression

- HDI
- Quantitative Techniques
- Analysing Plant Trails by Comparing Recovery-grade Regression Lines
- Statistics-Linear Regression and Correlation Analysis
- Chap 002
- Chap005 2
- Part 4C (Quantitative Methods for Decision Analysis) 354.doc
- Unit 17 Correlation and Regression
- Analysis Techniques with SAS
- analysis of teeth estimation
- SPE-56691-MS
- Statistics-Toolbox-R2013a.pdf
- linear regression
- Assessment of Analysis Techniques used in determining Grounding System Potential Rise from the Fall of Potential Method
- Part 10 Buildup Followed CR Drawdown 3slides on 1page
- Lucas Extraction Signal
- 9_Ashwani_Power_System_State_Estimation.pdf
- description: tags: DAS tutorial wlsr
- Multivariate Regression Techniques for Analyzing Auto-Crash Variables in Nigeria
- STAT14S_pspp_2

You are on page 1of 50

with data. This definition stresses the view that statistics is a tool concerned with collection, organization and analysis of numerical facts and observations..the major concerned with descriptive statistics is to present information in a convenient, usable, and understandable form

TYPE OF TECHNIQUE STATISTICAL TECHNIQUE PURPOSE

Univariate

Frequency distribution, measures of central tendency, std deviation, Correlation, percentage table, chi-square Elaboration paradigm, linear and multiple regression

Bivariate

Describe a relationship or the association between two variables Describe relationships among several variables, or see how several independent variables have an effect on a dependent variable.

Multivariate

Descriptive Research Questions are not answered with inferential statistics. They merely describe or summarize data, without trying to generalize to a larger population of individual. Mean, Percentage, SD, Mod, Median, etc.

INFERENTIAL STATISTICS rely on principles from probability sampling, whereby a researcher uses a random process to select cases from the entire population. Inferential statistics are a precise way to talk about how confident a researcher can be when inferring from the results in a sample to the population.

Associational Research Questions are those in which 2 or more variables are associated or related. This approach usually involves an attempt to see how 2 or more variables covary (as one grows larger, the other grows larger or smaller) or one or more variables enables one to predict another variable. Pearson Correlation, Spearman Correlation, Eta Correlation, etc.

Difference Research Questions: For these questions, we compare scores (on the dependent variable) of 2 or more different groups, each of which is composed of individuals with one of the values or levels on the independent variable. This type of question attempts to demonstrate that groups are not the same on the dependent variable. T-test, ANOVA, ANCOVA, MANOVA, MANCOVA, etc.

CORRELATION

The correlation is one of the most common and most useful statistics.

Definition - A correlation is a single number that describes the degree of relationship (dependence) between two variables. It characterizes the existence of a relationship between variables. Relationship between 2 variables can vary from strong to weak. More accurately, correlation is the co-variation of standardized variables.

However, a correlation does not imply causation. meaning Because there is a strong positive or strong negative correlation between 2 variables, this does not mean that one variable is caused by the other variable. Many statisticians claim that a strong correlation never implies a cause-effect relationship between two variables.

GENERALLY

Two variables may correlate to each other in 3 possible ways: Positive relationship: Both variables vary in the same direction as one goes up, the other goes up. Eg. Salary and years of education are positively correlated because people who get the highest salaries tend to be the ones who have gone to school the longest. Negative relationship: Two variables vary in the opposite direction as one up, the other goes down. Eg. The number of problems faced and the amount of immunoglobulin A in a persons system are negatively correlated because as the number of problems goes up, the amount of immunoglobulin A tends to go down. Zero relationship: Two variables has no relationship with each other one changes without affecting the other. Eg. Average speed of car driven and average speed of mouse. Also, the relationship between personality fluctuations and movement of distant stars has a zero correlation.

The degree of correlation between two variables can be established using two methods:

Scatter plot a graph with plotted values for two variables being compared. Correlation Coefficient methods.

SCATTER PLOTS

Example of positive correlation - Cardiovascular fitness score and months machine owned

Example of negative correlation - Hours of exercise per week and months of machine owned

Researchers laid out 10 circular plots, each 4 meters in diameter, in an area where beavers were cutting down cottonwood trees. The number of stumps and the number of clusters of beetle larvae were recorded in each plot with the following results:

Stumps 2 2 1 3 4 1 5 3 1 2 Beetle Larvae 10 30 12 24 40 11 56 40 8 14

The scatter plot for the previous data:

From the scatter plot, there appears to be a fairly strong positive association between the number of cottonwood stumps and the number of clusters of beetle larvae.

CORRELATION COEFFICIENT

Correlation coefficient

Correlation coefficient is used to measure the degree of correlation between variables - It is a quantitative indicator. There are several type of correlation coefficient depending of the type of relationship.

The most common is Pearsons correlation coefficient (denoted by r) which is sensitive only to a linear relationship between two variables.

Other types of common correlation coefficients include Spearmens rank correlation coefficient (denoted by ) and Kendalls rank correlation coefficient (denoted by ).

Correlation Coefficient

A correlation coefficient is a calculated number that indicates the degree of correlation between two variables: Perfect positive correlation usually is calculated as a value of 1 (or 100%). Perfect negative correlation usually is calculated as a value of -1.

Correlation Coefficient

TABLE 1.0 Interpreting a Correlation Coefficient

Size of the Correlation coefficient General Interpretation

0.8 to 1.0 0.6 to 0.8 0.4 to 0.6 0.2 to 0.4 0.0 to 0.2

Very strong relationship Strong relationship Moderate relationship Weak relationship Weak or no relationship

Correlation Coefficient

A much more precise way to interpret the correlation coefficient: Computing the coefficient of determination. The coefficient of determination is the percentage of variance in one variable that is accounted for by the variance in the other variable. Coefficient of determination = Square of correlation coefficient

Example: If the correlation between GPA and the number of hours of study is 0.7, then the coefficient of determination is _______. This means _______% of the variance in GPA can be explained by the variance in studying time. The stronger the correlation, the more the variance can be explained. However, this means that _______ % cannot be explained. The amount of unexplained variance is called the coefficient of alienation (or coefficient of non-determination).

If we have a series of n measurements of X and Y written as xi and yi where i = 1, 2, ..., n, then the sample correlation coefficient can be used to estimate the population Pearson correlation r between X and Y. The sample correlation coefficient is written as:

where x and y are the sample means of X and Y, and sx and sy are the sample standard deviations of X and Y. This can also be written as:

Is there a linear relationship between the age at which a child first begins to speak and his or her mental ability later on? To answer this question a study was conducted in which the age (in months) at which a child first spoke and the child's score on an aptitude test as a teenager were recorded: Draw a scatter plot and determine whether there appears to be a linear relationship between these two variables. If so, describe the relationship, calculate r, and determine what percentage of the variability in the aptitude score can be explained by the variability in the age at which a child begins speaking. Age 15 26 Score 95 71

10

9 15 20

83

91 102 87

18

11 8 20

93

100 104 94

The scatter plot for the data:

There appears to be a moderate negative association between the age at which a baby first begins to speak and mental ability later in life.

Calculation of the correlation coefficient:

The variability in the age at which a child first speaks explains only about 36% (r2 = 0.36) of the variability in aptitude test scores later in life.

Exercise

Compute the correlation between the mens Height (in cm) and Weights (in kg) for the following data:

Man Height (X) Weight (Y)

A

B C D E

182

167 175 182 180

86

61 70 75 70

<0.2

0.7 0.9 high correlation; marked relationship >0.9 very high correlation; very dependable relationship

Words of Caution

Ex amine your data distribution (i.e using scatter plot) before you do anything with the correlation and make sure you know the dos and donts with the correlation coefficient! Correlation coefficient is just an index of relationship which tells nothing about the cause and effect of the relationship! Limit yourself to linear relationship if you dont have adequate statistical background!

REGRESSION

Regression Analysis

In statistics, regression analysis is a statistical technique for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables.

More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. All regression analysis test whether a significant quantitative relationship exists.

Linear Regression Line of Best Fit Regression Equation

Suppose we are asked to investigate the relationship between two variables namely Variable P (being the independent) and variable Q (being the dependent):

Pair Pair 1 Pair 2 Pair 3 Pair 4 Variable P 10 20 30 40 Variable Q 7 12 17 22

What would be the predicted value of Q if P = 15? If P = 25? How do you predict these?

20

Pair 4

15

Pair 3

Q variable

10

Pair 2

Pair 1

0 10 20 30 40

P variable

Notice that if we connect these points, we would get a straight line. This line fits ALL the observed points. This straight line is called the line of best fit or regression line.

The line of best fit defines a basis for predicting values of Q, given values of P (and vice versa).

The concept of the line of best fit can be extended to form a basis for linear regression as well as non-linear regression.

Linear Regression

Non-Linear Regression

Regression Models

Regression models involve the following variables: The unknown parameters, denoted as , which may represent a scalar or a vector. The independent variables, X. The dependent variable, Y. Regression models can predict a value of the Y variable given values of the X variables. Prediction within the range of values in the dataset used for model-fitting is known informally as interpolation. Prediction outside this range of the data is known as extrapolation.

Linear Regression

In linear regression, data is modeled using linear predictor functions, and unknown model parameters are estimated from the data.

Such models are called linear models. Most commonly, linear regression refers to a model in which the conditional mean of Y given the value of X is an affine function of X. Less commonly, linear regression could refer to a model in which the median, or some other quantile of the conditional distribution of Y given X is expressed as a linear function of X.

Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of Y given X, rather than on the joint probability distribution of Y and X, which is the domain of multivariate analysis.

Non-Linear Regression

In non-linear regression, data are modeled by a function which is a non-linear combination of the model parameters and depends on one or more independent variables. As linear regression is much easier, some non-linear regression can be transformed or segmented to a linear regression.

The method of least squares gives a way to find the best estimate of a particular measurement or data, assuming that the errors (i.e. the differences from the true value) are random and unbiased. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in the results of every single equation. The best fit in the least-squares sense minimizes the sum of squared residuals, a residual being the difference between an observed value and the fitted value provided by a model.

The method of least squares calculates the line of best fit by minimising the sum of the squares of the vertical distances of the points to the line. Lets illustrate with a simple example.

Continued from previous slide.

Fit a least square line to the following data.

X

Y

1

2

2

5

3

3

4

8

5

7

Solution: X 1 Y 2 XY 2 X2 1

2

3 4 5

5

3 8 7

10

9 32 35

4

9 16 25

The equation of least square line Normal Equation for a Normal Equation for b

Eliminate a from equation (1) and (2), multiply equation (2) by 3 and subtract form equation (2), we get the values of a and b. Here a = 1.1 and b = 1.3, the equation of least square line becomes .

Exercise

A researcher investigates the relationship between individuals score on a Reading Aptitude Test and the average amount of hours he/she spends for reading (simply called Hours): The data gathered from 10 students are as follows:

Student S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

Hours (Y) 5 1 2 7 8 9 3 2 5 8

DO NOT WORRY ABOUT APPLYING THE EQUATIONS! You will use SPSS (Statistical Package for Social science) to obtain all the analysis

The first step in any applied research is to get a good THEORETICAL grasp of the topic to be studied. The best data analyst dont start with the data, they start with theory.

THANK YOU

- HDIUploaded byadi08642
- Quantitative TechniquesUploaded bysanjayifm
- Analysing Plant Trails by Comparing Recovery-grade Regression LinesUploaded bySarvesha Moodley
- Statistics-Linear Regression and Correlation AnalysisUploaded byDr Rushen Singh
- Chap 002Uploaded byMatthew Stevenson
- Chap005 2Uploaded bypujarze2
- Part 4C (Quantitative Methods for Decision Analysis) 354.docUploaded byOeln Cainglet
- Unit 17 Correlation and RegressionUploaded bycooooool1927
- Analysis Techniques with SASUploaded byjaydeep
- analysis of teeth estimationUploaded byAisyah Rieskiu
- SPE-56691-MSUploaded byAmr Hegazy
- Statistics-Toolbox-R2013a.pdfUploaded byLyly Magnan
- linear regressionUploaded byamar
- Assessment of Analysis Techniques used in determining Grounding System Potential Rise from the Fall of Potential MethodUploaded byDes Lawless
- Part 10 Buildup Followed CR Drawdown 3slides on 1pageUploaded byChai Cws
- Lucas Extraction SignalUploaded byLeticia Klotz
- 9_Ashwani_Power_System_State_Estimation.pdfUploaded byDaroyni Saad
- description: tags: DAS tutorial wlsrUploaded byanon-953656
- Multivariate Regression Techniques for Analyzing Auto-Crash Variables in NigeriaUploaded byiiste
- STAT14S_pspp_2Uploaded byPatricia De Guzman Calado
- Stevenson Chapter 3 - ForecastingUploaded byRehabUddin
- Discussion Week 7R.docxUploaded byPandurang Thatkar
- UntitledUploaded byMayank Saraswat
- Bruce Hansen Econometircs Book 2012Uploaded byVarun Malhotra
- BAB_3Uploaded byAbeer Abdullah
- chp_10.1007_978-3-319-74222-9_4Uploaded byAnonymous zXVPi2Ply
- 01542030Uploaded bySriram Elango
- 0210034Uploaded byrajutheone
- THe LION WayUploaded byfernandodojo
- 2004_1_Loss_ignition_JP.pdfUploaded byJANET GT

- SDNUploaded byZeinab Goda
- Organizational InstitutionalismUploaded byZeinab Goda
- 9780821389270.pdfUploaded byZeinab Goda
- Corporate0gove0finance0institutionsUploaded byZeinab Goda
- Belk - Handbook of Qualitative Research Methods in Marketing (1)Uploaded byHassan Belhabib
- 9781464809507.pdfUploaded byJuan M. Nava Davila
- MSM-WP2013-08Uploaded bynisha
- Qualitative Inquiry and Research DesignUploaded byZeinab Goda
- OECDGender Equality in Education, Employment and EntrepreneurshipUploaded byZeinab Goda
- Liberating Women from Poverty via Micro Financing: A Review of Sudan and Selected CountriesUploaded byZeinab Goda
- 2012 Woman's Report GlobalUploaded byMarina Vidović
- Qualitative Research MethodologyUploaded bygrootpyp
- Encyclopedia of Survey Research Methods Grounded Theory ResearchUploaded byZeinab Goda
- TemplatesUploaded bySuellen Ogena
- Men of GodUploaded byzzmh1234
- eBook Be Free of PMSUploaded byZeinab Goda

- Using Excel for Eco No Metric. BookUploaded byNicholas Musillami
- 4818-Exam2aUploaded byMaliha Jahan
- EViews 7 Users Guide IUploaded byitaliano5
- Topic 6 Two Variable Regression Analysis Interval Estimation and Hypothesis TestingUploaded byOliver Tate-Duncan
- Fixed EffectsUploaded bylloewens
- Regression. Text Book SolutionUploaded byking wondre
- STA6167_Project_1_Ramin_Shamshiri_SolutionUploaded byRaminShamshiri
- Ho MediationUploaded byfaseeh333
- 133 Business StatisticsUploaded byziabutt
- practice 2 from analysis of financial time seriesUploaded byapi-285777244
- WCEE2012_2143Uploaded byJashwin Ullal
- workshopUploaded byRAJESH KUMAR
- Assignment1 SolutionUploaded byxxx
- dhsfdgUploaded byricky5ricky
- A PASCAL Program for Fitting Non Linear Regression Models on a MicrocomputerUploaded byBogdan Vicol
- 193228Uploaded byMoldovan Alina
- Annotated SPSS Output.docUploaded byPinal Shah
- B%20SC%20ECONOMICS%20HONOURS%202013-14%20SYLLABUS.pdfUploaded byEdmund Zin
- 106 Project 1Uploaded bymichelleyuu
- Regression in R & Python paperUploaded bydesaiha
- Probability and Statistics Course NotesUploaded byCammi Smith
- r Studio Cheat SheetUploaded bySuren Markosov
- Homework 3Uploaded byMichael
- TNOU Economics SyllabusUploaded bypavan6754
- Reserva IbnrUploaded byandy077
- Solution Hw 1Uploaded byNixon Patel
- LOVUploaded byMei Fadillah
- forecasting.pdfUploaded byandresacastro
- Multiple Regression AnalysisUploaded byfollow_your_dreams
- Bootstrap Methods: Another Look at the JackknifeUploaded byJeiel França