Вы находитесь на странице: 1из 2

Simple regress. requires 2 variable. Least Sqrs Regres. is line that min.sqrd dist. between data points.

^y =0.125 x41.4, x thous, plug 8 for 8k. 3 Sample Statistics for Simple Regress: ^y
x ,s, p , b0,b1}Sample Stat} { , , p, 0(Beta Not),
1 }Pop Parameter } No relationship betw. 2 quant. V in Pop: not every slope will have 0.

:est. from regression, b1, b0. {

Sampling variation: variability from sample to sample. There is relation in pop where more yrs study at
Uni=smarter Ho:b1=0 Ha:b1>0. Reject Ho (Stat Significat): b1(slope)/amount of residuals. If quotient LG enough,
then reject. Residual Stand. Error:

^y i

. R: relative

^y : Predication Y:Actual. When RSE=SD of Y:


2
R : 80 , 80 perf predic .

:Absolute. If

Assumptions/Conditions for regress. inference: 1.independent 2.linear 3.SD of Y is same for all X 4.response y
varies normally around mean. Homoscedastic: random scatter Not Linear: curved Hteroscedastic: change in
variability. SD not equal. Confidence Interval: used to est. when believe Avg will be dependent variable Prediction
Inter: used when believe single value Confidence interval will be narrower because variability of sampling distrib. Is
generally smaller relative to pop dist (part of CLT). Explanatory:Independent Response:trying to predict, depend.
N: sample size P: # of indep. variables. Look at Box Plots individually to find outliers, strong skewness. Look at
Scatter-Plots in combo because dimensions, lack of linear relationships. 3 Adjustments to unsymetric
data/relations with response not linear: logarithm, square, exponentiate. BoxPlot(Left skewed,high side
outlier):
BoxPlot(symmetric data):
Scatterplot(Curvilinear): P Values forGlobal:Top
Individual:Bottom Global:Ho:independ variables do not fit data well Ha:Combo of independ. variables do fit well.
Stat conclusion for global test do not have to match results from tests for individual: combo may fit well but
not each indiv. variable. Sum of Sqrs total has same numerator as Stan. Dev. ANOVA table for global breaks Sum
of Sqrs Total into SS Regression & SS ResidualBy + & -

^y

from SS Total. F-RATIO: Denom small if

predictions very close to actual. Num LG when predict do not have flat slope (model/error). F=LG: Reject Ho for Ha.
R-SQUARED: equal to 1=perf predict.

^y i= yi

Multi-Collinearity: multiple explanatory variables that are correlated with response AND with each other(home w/
more bathrooms will sell for higher price). Affect: may reject ho for ha on signif tests for indiv. bis. Executing
Multiple Regression Key Phrase: in the presence of other explanatory variable in the equations. Model contains
Multi-collinear: Remove, Try to Explain. Potential Explanatory Variables Increase=Possible Models Increase
A lot
Best Model Procedure: will give all models with variables & report in order from lg to small R-sqr. Variatable
Selection: determine final regression model Forward Procedure: Add 1 explanatory V with largest R-Square, then
add the 1 with the 2nd largest, etc. VIF: indicates whether you have multicollinearity in explan v. >10, clear signal of
it. >1.15: needs attention. Categorial Independent Variable: if two values, build numerical variable (0 or 1)
Interaction in Data: one variable depends on the, influence. Create new variable by multiplying X by Y Curvilinear:
whichever has larger Adj. R-sqr. Qualitative Variable(Yes/No): cannot put into geometrical space New Customers
among Prospects: categorical(binary) variable Odds of probability: (p/(1-p)) Log(exponential in reverse) of 1=0.
Log of 0=-infinity. No base then assume base is the natural log. Lower+Upper of probability: (0,1), of odds: (0,
infinity), of log of odds (-infinity, infinity). Interpret output for log odds of default: For every $1 log odds of deault
go down by (B)-0.007. College, go up by 0.629. Odds of Default:college, goes up by 87% (1.875, Under Exp(B))

Вам также может понравиться