Data Driven Decision Regression

Yi-Ting Chang
006699414
Data Driven Decision Making
 This simple linear regression is a positive slope.

 100 samples were run to find sample linear fit
 Response = 3.17 + 3.09Predictor is the sample linear fit
 The slope of the sample data is 3.09.
 Response = Y, Predictor = X
 𝑦 = 𝑚𝑥 + 𝑏, m is the slope.
 𝑦̂ = 𝑏𝑂 + 𝑏1 𝑥, 𝑏1 is the slope.
 The method of least square is used to find the regression line.
 My R Square is 0.92, very large, Y variable can be explained by X.
 My RMSE, standard deviation of the residuals (prediction errors) is 2.08,
very large, so the sample data points are very spread out, not concentrated
around the linear fit.
 The F value and T value are very small, <.0001, and F ration and t Ration are
very large, so I reject 𝐻0 .
The distribution is very close to a normal distribution, a bell shape. There’s a

line at the center of this distribution. It’s the 𝑏0 of this linear regression
equation which is 3.19. As n gets larger, the distribution will be closer to
normal.
The 95% of the data is 2 standard deviation away from the mean from
approximately 5.9 to 7.3. In this case X helps predicting Y.
I calculate the distance from the mean as below:

𝑠 1.81155
𝑥̅ ± 𝑡𝑛−1,𝛼 = 6.6568 ± 𝑡100−1, .25 √100 = 6.6568 ± 1.984 * .181155 =
2 √𝑛
6.6568 ± .359411 (68% values lie between this range)
6.6568 ± .359411*2 = 5.9379~7.3756 (95% values lie between this range)

1. Using the Countif.xls file, develop a regression model to predict Salary by using all the remaining variables. Use α =
0.05. Evaluate this model—perform all the tests. Run a stepwise model and evaluate it.
First, I use Multivariate SVD Imputation in JMP to find the missing value in Usefulness. Then, I changed Major into 0 and
1 code and set Salary as Y and the rest of the numeric columns as X, and predicted salary by Data Analysis in Excel with
this function:
Y=36019.53-5893.08*Major Code+89.50296*Usefulness-2014.2*Gender code-2612.12*GPA+6107.477*Years
Major Gender Major Code Usefulness Gender cod e GPA Years Salary Pred. Salary
Business M a l e 0 3 0 3.53 4 . 4 0 52125 53940.1546
Business Female 0 1 1 2.58 4 . 1 8 52325 52884.8196
Business M a l e 0 4 0 3.52 5 . 3 0 63042 59552.5079
A & S M a l e 1 3 0 3 4 . 4 9 54928 49981.1702
Business M a l e 0 4 0 3.22 4 . 0 6 50599 52762.8727
A & S Female 1 2 1 3.06 3 . 8 7 42036 43934.1061
A & S Female 1 3 1 2.35 4.64 46427 50580.9715
A & S M a l e 1 3 0 3.22 5.08 51865 53009.9151
A & S Female 1 3.022442 1 2.86 2 . 0 3 33263 33310.2843
Business Female 0 5 1 3 . 6 5 . 7 6 58434 60228.2823
Business M a l e 0 4 0 3 5 . 3 8 61551 61399.4086
A & S M a l e 1 5 0 3.11 1 . 3 8 31235 30878.5899
Business M a l e 0 2 0 3.43 4.83 58730 56738.0787
A & S Female 1 4 1 3.31 2 . 9 1 35830 37596.9042
Business M a l e 0 2.6231026 0 2.62 4 . 8 7 53267 59153.9646
Business M a l e 0 5 0 3.28 5 . 5 4 65437 61734.7142
A & S Female 1 4 1 2.68 3 . 7 6 47591 44433.8952
A & S Female 1 4 1 3.16 3 . 2 7 42659 40187.4139
Business M a l e 0 3 0 3.84 4 . 3 3 50996 52702.8739
A & S M a l e 1 2 0 3.29 3 . 0 2 40185 40156.1614
A & S M a l e 1 5 0 3.69 2.32 33155 35104.5884
Business Female 0 1 1 2.54 3.38 52695 48103.323
SU M MA R Y O U T P U T
Salary vs. Years
Regression Statistics 80000
M u l t i p l e R 0.958187 60000
R S q u a r e 0.918122
40000
Adjusted R Square 0 . 8 9 2 5 3 5
Standard Erro r 3 2 5 7 . 7 4 7 20000
Observations 2 2
0
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00
A N O V A
d f S S M S F Significance F
Regression 5 1.9E+09 3.81E+08 35.88248 3.81E-08
R e s i d u a l 1 6 1.7E+08 10612918
T o t a l 2 1 2.07E+09
Coefficients Standard Error t S t a t P-value Lower 95 % Upper 95% Lower 95.0% Upper 95.0%
I n t e r c e p t 36019.53 7924.17 4.545528 0.000331 19221.04 52818.02 19 22 1. 0 4 52818.02
Major Code -5893.08 1830.547 -3.2193 0.005356 -9773.67 -2012.5 -9773.67 -2012.5
Usefulness 89.50296 670.9087 0.133406 0.895536 -1332.76 1511.766 -1332.76 1511.766
Gender code -2014.2 1661.857 -1.21202 0.243101 -5537.18 1508.781 -5537.18 1508.781
G P A -2612.12 2231.072 -1.17079 0.258825 -7341.78 2117.542 -7341.78 2117.542
Y e a r s 6107.477 757.6679 8.060889 5.03E-07 4501.293 7713.661 45 01 .2 9 3 7713.661
2. Using the hmeq.jmp file, develop the best model you can to predict loan amount. Evaluate each model and use α =
0.05.
1. Recode the headings, correct the miss spellings.
2. Change the type of Default into Numeric Continuous.
3. Add a new Numeric Continuous Reason Code column and set HomeImp as 1 and DebCon as 0, and the rest of
them are missing values because I can’t analyze them in Nominal form.
4. Add 3 new nNumeric Continuous columns, JobCode1, JobCode2, and JobCode3, because there are 6 categories
under Job and I can’t analyze them in Nominal form.
5. Set Sales 001, Profexe 010, Other 100, Office 000, Mgr 101, Self 111, and the rest of them are missing values
6. Check all missing value in all column, I notice that if the type is Numeric Continuous, all the missing value will
be ．. I suggest that 0 in these columns are not missing value.
7. Use Multivariate SVD Imputation to find all missing value.
8. Put Loan as Y and rest of the Numeric Continuous columns as X and find the regression data in Fit Model and
Show Prediction Expression and save the formula.
9. The Predict loan amount is at the last column with a default alpha 0.05.

Data Driven Decision Regression

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Driven Decision Regression

Загружено:

Авторское право:

Доступные форматы

Yi-Ting Chang

Data Driven Decision Making

 This simple linear regression is a positive slope.

The distribution is very close to a normal distribution, a bell shape. There’s a

I calculate the distance from the mean as below:

6.6568 ± .359411*2 = 5.9379~7.3756 (95% values lie between this range)

Y=36019.53-5893.08Major Code+89.50296Usefulness-2014.2Gender code-2612.12GPA+6107.477*Years

1. Recode the headings, correct the miss spellings.

2. Change the type of Default into Numeric Continuous.

7. Use Multivariate SVD Imputation to find all missing value.

Вам также может понравиться