109 views

Uploaded by Nagarjuna Appu

hi dis is abt correlation and regresson

- Cheat Sheet for Test 4 Updated
- Molecular Biology
- Correlation and Regression
- syntax
- Level 1 - John Escott - Prince William
- Simple and Multiple Regression Analysis
- ResearchMethodology_statistical data analysis
- Nature Watch Mammals
- Janitor Fish
- 114414914 Groebner Business Statistics 7 Ch15
- belajar lagi
- Linear Correlation and Regression
- dd-12.pdf
- sentence structure
- Longman Living English Structure
- Notas de clase - Regresión Lineal
- english tenses
- assumptions_in_multiple_regression.pdf
- Statistics Project1
- Chapter 16

You are on page 1of 58

After studying the material in this chapter, you should be able to: Calculate and interpret the simple correlation between two variables. Determine whether the correlation is significant. Calculate and interpret the simple linear regression coefficients for a set of data. Understand the basic assumptions behind regression analysis.

(continued)

After studying the material in this chapter, you should be able to: Calculate and interpret confidence intervals for the regression coefficients. Recognize regression analysis applications for purposes of prediction and description. Recognize some potential problems if regression analysis is used incorrectly.

Scatter Diagrams

A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred to as a scatter diagram. diagram

A dependent variable is the variable to be predicted or explained in a regression model. This variable is assumed to be functionally related to the independent variable.

An independent variable is the variable related to the dependent variable in a regression equation. The independent variable is used in a regression model to estimate the value of the dependent variable.

(a) Linear

(b) Linear

(c) Curvilinear

(d) Curvilinear

(e) No Relationship

Correlation

The correlation coefficient is a quantitative measure of the strength of the linear relationship between two variables. The correlation ranges from + 1.0 to 1.0. A correlation of 1.0 indicates a perfect linear relationship, whereas a correlation of 0 indicates no linear relationship.

Correlation

SAMPLE CORRELATION COEFFICIENT

r=

( x x )( y y ) [ ( x x ) ][ ( y y )

2

where: r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable

Correlation

SAMPLE CORRELATION COEFFICIENT or the algebraic equivalent:

r=

[n( x 2 ) ( x) 2 ][n( y 2 ) ( y ) 2 ]

n xy x y

Correlation

Sales y 487 445 272 641 187 440 346 238 312 269 655 563 Years x 3 5 2 8 2 6 7 1 4 2 9 6

yx 1,461 2,225 544 5,128 374 2,640 2,422 238 1,248 538 5,895 3,378

y2 237,169 198,025 73,984 410,881 34,969 193,600 119,716 56,644 97,344 72,361 429,025 316,969

x2 9 25 4 64 4 36 49 1 16 4 81 36

= 4,855

Correlation

n xy x y

2

r=

[n( x ) ( x) ][n( y ) ( y ) ]

2 2 2

r=

= 0.8325

Correlation

Correlation

TEST STATISTIC FOR CORRELATION

t=

1 r n2

2

where: t = Number of standard deviations r is from 0 r = Simple correlation coefficient n = Sample size

df = n 2

H 0 : = 0.0 (no correlation) H A : 0 .0

= 0.05

Rejection Region /2 = 0.025

t.025 = 2.228

0

1 n 0 . = 64 9 2 . 3 7 1 5 2 1 0

t.025 = 2.228

t r = 0 = . 8 1 2 3 2 5 r

Since t=4.752 > 2.048, reject H0, there is a significant linear relationship

Correlation

Spurious correlation occurs when there is a correlation between two otherwise unrelated variables.

Simple linear regression analysis analyzes the linear relationship that exists between a dependent variable and a single independent variable.

SIMPLE LINEAR REGRESSION MODEL (POPULATION MODEL)

y = 0 + 1 x +

where: y = Value of the dependent variable x = Value of the independent variable 0 = Populations y-intercept 1 = Slope of the population regression line = Error term, or residual

The simple linear regression model has four assumptions: Individual values if the error terms, i, are statistically independent of one another. The distribution of all possible values of is normal. The distributions of possible i values have equal variances for all value of x. The means of the dependent variable, for all specified values of the independent variable, y, can be connected by a straight line called the population regression model.

REGRESSION COEFFICIENTS In the simple regression model, there are two coefficients: the intercept and the slope.

The interpretation of the regression slope coefficient is that is gives the average change in the dependent variable for a unit increase in the independent variable. The slope coefficient may be positive or negative, depending on the relationship between the

The least squares criterion is used for determining a regression line that minimizes the sum of squared residuals.

A residual is the difference between the actual value of the dependent variable and the value predicted by the regression model.

y y

Sales in Thousands

Y 390 400 300 312 20 0 100

y = 150 + 60 x

ESTIMATED REGRESSION MODEL (SAMPLE MODEL)

yi = b0 + b1 x

where:

b0 = Unbiased estimate of the regression intercept b1 = Unbiased estimate of the regression slope x = Value of the independent variable

LEAST SQUARES EQUATIONS

b1

algebraic equivalent:

b1 =

( x x )( y y ) = (x x)

2

x y xy

n ( x) 2 x2 n

and

b0 = y b1 x

SUM OF SQUARED ERRORS

SSE = y b0 y b1 xy

2

(Midwest Example)

Sales y 487 445 272 641 187 440 346 238 312 269 655 563 Years x 3 5 2 8 2 6 7 1 4 2 9 6

xy 1,461 2,225 544 5,128 374 2,640 2,422 238 1,248 538 5,895 3,378

y2 237,169 198,025 73,984 410,881 34,969 193,600 119,716 56,644 97,344 72,361 429,025 316,969

x2 9 25 4 64 4 36 49 1 16 4 81 36

= 4,855

b1 =

x y xy

n ( x ) 2 x2 n

The least squares regression line is:

y = 175.8288 + 49.9101( x)

SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations ANOVA df Regression Residual Total 1 10 11 SS MS F Significance F 191600.622 191600.622 22.58527906 0.000777416 84834.29469 8483.429469 276434.9167

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% 175.8288191 54.98988674 3.197475563 0.00953244 53.30369475 298.3539434 53.30369475 298.3539434 49.91007584 10.50208428 4.752397191 0.000777416 26.50996978 73.3101819 26.50996978 73.3101819

The sum of the residuals from the least squares regression line is 0. The sum of the squared residuals is a minimum. The simple regression line always passes through the mean of the y variable and the mean of the x variable. The least squares coefficients are unbiased estimates of 0 and 1.

SUM OF RESIDUALS

( y y) = 0

SUM OF SQUARED RESIDUALS

( y y)

TOTAL SUM OF SQUARES

where: TSS = Total sum of squares n = Sample size y = Values of the dependent variable y = Average value of the dependent variable

TSS = ( y y )

SUM OF SQUARES ERROR (RESIDUALS)

SSE = ( y y )

where: SSE = Sum of squares error n = Sample size y y = Values of the dependent variable = Estimated value for the average of

SUM OF SQUARES REGRESSION

where: SSR = Sum of squares regression y = Average value of the dependent variable y = Values of the dependent variable y = Estimated value for the average of

SSR = ( y y )

SUMS OF SQUARES

The coefficient of determination is the portion of the total variation in the dependent variable that is explained by its relationship with the independent variable. The coefficient of determination is also called R-squared and is denoted as R2.

COEFFICIENT OF DETERMINATION (R2)

SSR R = TSS

2

(Midwest Example)

2

69.31% of the variation in the sales data for this sample can be explained by the linear relationship between sales and years of experience.

COEFFICIENT OF DETERMINATION SINGLE INDEPENDENT VARIABLE CASE 2 2

R =r

STANDARD DEVIATION OF THE REGRESSION SLOPE COEFFICIENT (POPULATION)

b1 =

where:

(x x)

regression slope (Called the standard error of the slope) = Population standard error of the estimate

ESTIMATOR FOR THE STANDARD ERROR OF THE ESTIMATE

SSE s = n k 1

where: SSE = Sum of squares error n = Sample size k = number of independent variables in the model

ESTIMATOR FOR THE STANDARD DEVIATION OF THE REGRESSION SLOPE

sb1 =

where:

(x x)

( x ) x n

2

TEST STATISTIC FOR TEST OF SIGNIFICANCE OF THE REGRESSION SLOPE

b1 1 t= sb1

df = n 2

where: b1 = Sample regression slope coefficient 1 = Hypothesized slope sb1 = Estimator of the standard error of the slope

H 0 : 1 = 0.0 H A : 1 0 .0

= 0.05

Rejection Region /2 = 0.025 Rejection Region /2 = 0.025

Since t=4.753 > 2.048, reject H0: conclude that the true slope is not zero

4 9 . 9 = 1 s

1

t.025 = 2.228

1 0 = . 5 4 0 . 7 5 3

0

b

1

t.025 = 2.228

t 1 = b

MEAN SQUARE REGRESSION

where: SSR = Sum of squares regression k = Number of independent variables in the model

SSR MSR = k

MEAN SQUARE ERROR

where: SSE = Sum of squares error n = Sample size k = Number of independent variables in the model

SSE MSE = n k 1

Significance Test

H 0 : 1 = 0.0 H A : 1 0.0

F Ratio MSR 191,600.6 = = 22.59 MSE 8,483.43

Rejection Region = 0.05

= 0.05

F = 4.96

Since F= 22.59 > 4.96, reject H0: conclude that the regression model explains a significant amount of the variation in the dependent variable

x Develop a scatter plot of y and x. You are looking for a linear relationship between the two variables. y Calculate the least squares regression line for the sample data. z Calculate the correlation coefficient and the simple coefficient of determination, R2. { Conduct one of the significance tests.

CONFIDENCE INTERVAL ESTIMATE FOR THE REGRESSION SLOPE

b1 t / 2 sb1

or equivalently:

b1 t / 2

where: sb1 = Standard error of the regression slope coefficient s = Standard error of the estimate

(x x)

df = n 2

2

y CONFIDENCE INTERVAL FOR |

y t / 2 s

where:

xp

1 + 2 n (x x)

( x p x )2

variable t = Critical value with n - 2 d.f. s = Standard error of the estimate n = Sample size x xp = Specific value of the independent

PREDICTION INTERVAL FOR | x p Y

1 (xp x) y t / 2 s 1 + + 2 n (x x)

2

Residual Analysis

Before using a regression model for description or prediction, you should do a check to see if the assumptions concerning the normal distribution and constant variance of the error terms have been satisfied. One way to do this is through the use of residual plots. plots

Key Terms

Coefficient of Determination Correlation Coefficient Dependent Variable Independent Variable Least Squares Criterion Regression Coefficients

Regression Slope Coefficient Residual Scatter Plot Simple Linear Regression Analysis Spurious Correlation

- Cheat Sheet for Test 4 UpdatedUploaded byKayla Shelton
- Molecular BiologyUploaded byjmunozbio@yahoo.com
- Correlation and RegressionUploaded byMuhammad Rehan Tahir
- syntaxUploaded byKacoii Collins
- Level 1 - John Escott - Prince WilliamUploaded byLinda LA
- Simple and Multiple Regression AnalysisUploaded byUmair Khan Niazi
- ResearchMethodology_statistical data analysisUploaded byPinkAlert
- Nature Watch MammalsUploaded bySteve Parish
- Janitor FishUploaded byMora Joram
- 114414914 Groebner Business Statistics 7 Ch15Uploaded byZeeshan Riaz
- belajar lagiUploaded byJunnie Sitanggang
- Linear Correlation and RegressionUploaded bySylvia Cheung
- dd-12.pdfUploaded byDushyant Patel
- sentence structureUploaded byJulie Thuy Dung Ngo
- Longman Living English StructureUploaded byÁnyelo Flores
- Notas de clase - Regresión LinealUploaded byDiegoFernandoGonzálezLarrote
- english tensesUploaded byKirat Singh
- assumptions_in_multiple_regression.pdfUploaded byHira Mustafa Shah
- Statistics Project1Uploaded bytakesomething
- Chapter 16Uploaded bymanjunk25
- Statistics II Week 6 HomeworkUploaded byteacher.theacestud
- Multiple Regression AnalysisUploaded byAli Alshaqah
- population and lifespan - the linear regression mini-projectUploaded byapi-318490244
- simple linear regressionUploaded byapi-285777244
- sbe10_11Uploaded byRAMA
- Analysis pre finalUploaded byAbhishek Modi
- (Bonar)Edit.id.EnUploaded bydian
- 8.IJMPERDDEC20188Uploaded byTJPRC Publications
- Usda Kristina Tassariya - 105040101111010 - Output Lpm Dan LogitUploaded byTassariya

- dashUploaded byNagarjuna Appu
- UntitledUploaded byNagarjuna Appu
- HeritageUploaded byNagarjuna Appu
- Consumer Perception Towards Heritage MilkUploaded byNagarjuna Appu
- UntitledUploaded byNagarjuna Appu
- weidmann lux value perceptionUploaded bydoquangminh
- NagUploaded byNagarjuna Appu
- Chapter-3 Reasearch DesignUploaded byNagarjuna Appu
- Constructive AnatomyUploaded bygazorninplotz
- UntitledUploaded byNagarjuna Appu
- 01Uploaded byNagarjuna Appu
- DrUploaded byNagarjuna Appu

- Bbs12e Onlinetopic Ch12-8Uploaded byBoboho Hero
- Research Method Tf 1Uploaded byAdzaki Fikriansyah Adzaki
- Chap 11Uploaded byHeo Mọi Mọi
- ArgumentsUploaded byLoev Mei Naht
- Hypothesis Testing IUploaded byMehdi Hooshmand
- Coba2 SummaryUploaded byUmmi Khairun Niswah
- Abduction DefenseUploaded bySatyanarayana Hegde
- Business Research Methods summary, Saunders 2009Uploaded byMaurice_Smulde_7234
- Nguyên lý thống kê cơ bản trong các nghiên cứu lâm sàng, PGS.TS Lê Hoàng NinhUploaded byDạy Kèm Quy Nhơn Official
- On the Notion of Cause-RussellUploaded byAhmed Kabil
- Ch9 April24 (Monday) Notes(6)Uploaded byGiaanNguyen
- Major Mark - BiznosisUploaded bycamael_14
- Final Cheat SheetUploaded byJosh Potash
- Ruminations: Sundry Notes and Essays on LogicUploaded byAvi Sion
- Hypothesis Research QuestionUploaded byChristine Little
- Chapter8_new.pptxUploaded byNick Golding
- SIMPLE AND BIAS-CORRECTED MATCHING ESTIMATORS FOR AVERAGE TREATMENT EFFECTSUploaded bycognoscenti75
- W13Uploaded bySlyvester Chin
- Janus Faces of GoetheUploaded byRadu Serban Vasilescu
- Formulas in Inferential StatisticsUploaded byLlarx Yu
- Abazi2016_Enticing Consumers to Enter Fashion StoresUploaded byMarko Trending
- 03_anova.pdfUploaded bydaria_ioana
- Battista Mondin, Saint Thomas Aquinas’ Philosophy. in the Commentary to the Sentences (Inglés)Uploaded byReynierRodríguezGonzález
- Students Tutorial Answers Week9Uploaded byHeoHamHố
- Language and SocietyUploaded byCatty Elena
- Confidence Interval or P-Value?Uploaded byExcels1
- Critical Thinking for ESLUploaded byxoxoxxoxox
- Statistical management ch 14Uploaded byAnindito W Wicaksono
- Non Parametric TestingUploaded bydrnareshchauhan
- Table StatisticsUploaded byYoel Orlando 'Treasure'