Академический Документы
Профессиональный Документы
Культура Документы
note the number of delays vs. non delays. Let's stay the first 400,000
records are non-delays
keep deleting until there are no values highlighted (as it deletes, it will
recalculate the mean + 3 sigma and it will highlight new values, that's
ok, just keep deleting)
Frequency Charts (e.g. frequency of house size from real estate)
Multiple Regression
You are trying to find how a dependent variable is related to independent variable.
You want to check:
Add Ins --> XLMiner --> Partition Data --> Standard Partition
To find out, you go to XLMiner - Charts - Matrix Plot - pick all the variables you
There, you say what kind of set you want by putting in a variable So here you will pick sq ft because
t(test), s (training), v (validation)
This generated the partitions and you can use the hyperlinks at the top to
It's the continuous variable where beds and baths are discrete
stitch between training and validation data
You also know that you have to separate them because beds and baths are
Nave Bayes
discontinuous (lower right corner)
Partition Data
So let's say you can't decide which variable to use. Run regression for all three
variables independently (In Excel --> Data --> Data Analysis --> Regression.
you find the "distance" between this new record and the existing records
Classification Tree
Partition Data
10
0
46
884
No of
cases
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
Success Rate
when chosen
at Random
0
5.3
10.6
15.9
21.2
26.5
31.8
37.1
42.4
47.7
53
58.3
63.6
68.9
74.2
79.5
84.8
90.1
95.4
100.7
106
Number of
Success When
Logit is used Decile Lift Prior Prob
0
Success
46
Failure
75
7.075472
89
95
4.481132
97
99
3.113208
101
101
2.382075
103
103
1.943396
104
104
1.63522
104
105
1.415094
105
105
1.238208
106
106
1.111111
106
106
1
0.106
0.894