Вы находитесь на странице: 1из 97

DCET CSE-Dept

Data Set 1: weather .arff


Preprocess tab: How to use the Data Set editor?
1) Open the WEKA GUI chooser.
2) Click the Explorer button
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
3) In the )reprocessor t*b click the Open +ile button.
,) -elect the p*th to open the .e*ther.*r++ /*t* set.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
0) Now click on the edit button to open the viewer window containing the
table Relation: weather.
") We can edit the weather.ar data!et u!ing the viewer window.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Dataset: weather.arff Classify tab:
How to classify weather.arff with J48 decision tree learner?
1 Open the WEKA GUI chooser. Click on the explorer button.
! 1hen click on the open +ile *n/ select the /*t*set .e*ther.*r++.
" Click on the cl*ssi+2 t*b.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
4 Click on the choose button *n/ exp*n/ cl*ssi+iers. (o. exp*n/ 1rees *n/ click on 3,.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
# $i%ht clic& on trees.J48 and select 'isuali(e tree.
) 'esult .ill be /ispl*2e/ *s +ollo.s4
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
* $i%ht clic& on trees.J48 and select sa'e result buffer as follows:
8 -*5e the +ile in .*r++ +or6*t *s +ollo.s4
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
+ ,ow open the file which you ha'e sa'ed- open with .S/word
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: weather
nstances: !4
"ttri#utes: 5
outlook
tem$erature
humi%it&
win%&
$la&
'est mo%e: !0-fol% cross-(ali%ation
=== Classifier mo%el )full trainin* set+ ===
J48 $rune% tree
------------------
outlook = sunn&
, humi%it& -= .5: &es )2.0+
, humi%it& / .5: no )0.0+
outlook = o(ercast: &es )4.0+
outlook = rain&
, win%& = 'R12: no )2.0+
, win%& = 3"4S2: &es )0.0+
5um#er of 4ea(es : 5
Si6e of the tree : 8
'ime taken to #uil% mo%el: 0.02 secon%s
=== Stratifie% cross-(ali%ation ===
=== Summar& ===
Correctl& Classifie% nstances 7 84.285. 9
ncorrectl& Classifie% nstances 5 05..!40 9
:a$$a statistic 0.!88
Mean a#solute error 0.285.
Root mean s;uare% error 0.48!8
Relati(e a#solute error 80 9
Root relati(e s;uare% error 7..8588 9
'otal 5um#er of nstances !4
=== <etaile% "ccurac& =& Class ===
'> Rate 3> Rate >recision Recall 3-Measure Class
0...8 0.8 0.. 0...8 0..0. &es
0.4 0.222 0.5 0.4 0.444 no
=== Confusion Matri? ===
a # --- classifie% as
. 2 , a = &es
0 2 , # = no
Dataset: weather.arff 0ssociate tab:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
How to apply 0priori learner?
7o*/ .e*ther.*r++ 8o to *ssoci*te p*nel *n/ *ppl2 *priori le*rner.
1) Open WEKA GUI chooser9 select explorer button.
2) Go to open +ile select /*t*set .e*ther.*r++.
3) Click on the *ssoci*te t*b.
,) (o. click the st*rt button it .ill /ispl*2 belo. .in/o..
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Dataset: weather.arff 1isuali(e tab:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
7o*/ .e*ther.*r++ 8o to 5isu*li:e t*b.
1 Open WEKA GUI chooser9 select explorer button.
! Go to open +ile select /*t*set .e*ther.*r++.
" Click on the 5isu*li:e t*b4
4 (e. .in/o. .ill *ppe*r *s +ollo.s.
# (o. click on the boxes .hich 2ou .ish to check.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Ex*6ple4
) ;ou .ill 8et the belo. .in/o..
2he end of the 3irst 45peri6ent.
2) 7ain insi%ht for runnin% pre/defined decision trees and e5plore results usin% .S 890P
0nalytics?
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
1he purpose o+ this experi6ent is to 8ener*te * /ecision tree +or * 8i5en /*t* set. We c*n either
.rite our o.n /*t*set or use * pre/e+ine/ /*t* set pro5i/e/ to us *s in this c*se .e *re usin8 the
<l*bor /*t*set <.
1) We st*rt o++ b2 openin8 the WEKA explorer .in/o..
2) Click on explorer button4
3) In the pre processor t*b click the open +ile button4
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
,) -elect the p*th to open the l*bor.*r++ /*t* set4
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
0) Once .e i6port the /*t* set9 the so+t.*re itsel+ 8ener*tes the +ollo.in84
") WEKA /ispl*2s *ll the *ttributes o+ the i6porte/ /*t*set *n/ sho.s so6e st*tistics b*se/
8r*ph.
!) Click on the cl*ssi+2 t*b *n/ choose the *l8orith6 :J48; *s sho.n belo.4
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
) A+ter selectin8 3,9 .e c*n see the /e+*ults *re th*t is Cross=>*li/*tion *n/ in the /rop
/o.n box9 (o6in*l *ttribute pep is selecte/ *s /e+*ult. Click on the st*rt button *n/ *n
output screen *s belo. .ill be seen.
?) In the result list ri8ht click trees.3, @ choose the option 5isu*li:e tree9 *s sho.n belo.4
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
10) 1he output *s belo. .ill be 8ener*te/4
11) We c*n s*5e the result bu++er *s +ollo.s4
1he bu++er result 8i5es in+or6*tion *bout the 1) 'A1E9 %) 'A1E9 )'ECI-IO(9 'ECA779
%=(EA-U'E *n/ the CO(%U-IO( #A1'IA.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
12) 1.*r++ Binclu/es 1i6e 1*ken9 $ecision tree /et*ils9 con+usion 6*trix *n/ *ll other /et*ils)
Upon openin8 the 1.*r++ +ile .e 8et9
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: la#or-ne*-%ata
nstances: 5.
"ttri#utes: !.
%uration
wa*e-increase-first-&ear
wa*e-increase-secon%-&ear
wa*e-increase-thir%-&ear
cost-of-li(in*-a%@ustment
workin*-hours
$ension
stan%#&-$a&
shift-%ifferential
e%ucation-allowance
statutor&-holi%a&s
(acation
lon*term-%isa#ilit&-assistance
contri#ution-to-%ental-$lan
#erea(ement-assistance
contri#ution-to-health-$lan
class
'est mo%e: !0-fol% cross-(ali%ation
=== Classifier mo%el )full trainin* set+ ===
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
J48 $rune% tree
------------------
wa*e-increase-first-&ear -= 2.5: #a% )!5.2.A2.2.+
wa*e-increase-first-&ear / 2.5
, statutor&-holi%a&s -= !0: #a% )!0...A4...+
, statutor&-holi%a&s / !0: *oo% )00.78A!.0+
5um#er of 4ea(es : 0
Si6e of the tree : 5
'ime taken to #uil% mo%el: 0.00 secon%s
=== Stratifie% cross-(ali%ation ===
=== Summar& ===
Correctl& Classifie% nstances 42 .0.8842 9
ncorrectl& Classifie% nstances !5 28.0!58 9
:a$$a statistic 0.44!5
Mean a#solute error 0.0!72
Root mean s;uare% error 0.4887
Relati(e a#solute error 87...!5 9
Root relati(e s;uare% error 7...888 9
'otal 5um#er of nstances 5.
=== <etaile% "ccurac& =& Class ===
'> Rate 3> Rate >recision Recall 3-Measure Class
0.. 0.240 0.807 0.. 0.85! #a%
0..5. 0.0 0.824 0..5. 0..87 *oo%
=== Confusion Matri? ===
a # --- classifie% as
!4 8 , a = #a%
7 28 , # = *oo%
2he end of the Second 45peri6ent.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
" 3or a %i'en dataset %enerate the 0ssociation rules usin% <4=0 and based on these
association rules describe which rules are stron% and which rules are wea&?
1he purpose o+ this experi6ent is to 8ener*te the *ssoci*te rules +or * 8i5en /*t*set in this c*se
.e *re usin8 the <cont*ct lensesC /*t*set.
)oint to re6e6ber here is th*t *ssoci*tion rules c*n be onl2 8ener*te/ +or no6in*l *ttributes Bth*t
is onl2 *ttributes .hich h*5e * choice *s in D2es9 noE or D6*le9 +e6*leE etc). F*se/ on the
8ener*te/ rules .e h*5e to c*lcul*te the support *n/ con+i/ence 5*lues *n/ then /escribe .hich
rules *re stron8 or .e*k.
1) We st*rt o++ b2 WEKA explorer .in/o..
2) Click on the explorer button4
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
3) In preprocess t*b click the open +ile button4
,) -elect the p*th to open cont*ct lenses /*t*set4
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
0) Once .e i6port the /*t* set9 the so+t.*re itsel+ 8ener*tes the +ollo.in84
") WEKA /ispl*2s *ll the *ttributes o+ the i6porte/ /*t*set *n/ sho.s so6e st*tistics b*se/
8r*ph.
!) Click on the *ssoci*te t*b4
) (o. click the st*rt button it .ill /ispl*2 the belo. .in/o.4
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
?) -*5e the result bu++er *s +ollo.s4
10) -*5e .ith .*r++ extension
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
11) Open the s*5e/ +ile .ith #-=.or/ it .ill /ispl*2 the +ollo.in8 result4
=== Run information ===
Scheme: weka.associations."$riori -5 !0 -' 0 -C 0.7 -< 0.05 -1
!.0 -M 0.! -S -!.0
Relation: contact-lenses
nstances: 24
"ttri#utes: 5
a*e
s$ectacle-$rescri$
asti*matism
tear-$ro%-rate
contact-lenses
=== "ssociator mo%el )full trainin* set+ ===
"$riori
=======
Minimum su$$ort: 0.2
Minimum metric -confi%ence/: 0.7
5um#er of c&cles $erforme%: !8
Benerate% sets of lar*e itemsets:
Si6e of set of lar*e itemsets 4)!+: !!
Si6e of set of lar*e itemsets 4)2+: 2!
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Si6e of set of lar*e itemsets 4)0+: 8
=est rules foun%:
!. tear-$ro%-rate=re%uce% !2 ==/ contact-lenses=none !2 conf:)!+
2. asti*matism=&es tear-$ro%-rate=re%uce% 8 ==/ contact-lenses=none
8 conf:)!+
0. asti*matism=no tear-$ro%-rate=re%uce% 8 ==/ contact-lenses=none
8 conf:)!+
4. s$ectacle-$rescri$=h&$ermetro$e tear-$ro%-rate=re%uce% 8 ==/
contact-lenses=none 8 conf:)!+
5. s$ectacle-$rescri$=m&o$e tear-$ro%-rate=re%uce% 8 ==/ contact-
lenses=none 8 conf:)!+
8. contact-lenses=soft 5 ==/ asti*matism=no tear-$ro%-rate=normal 5
conf:)!+
.. asti*matism=no contact-lenses=soft 5 ==/ tear-$ro%-rate=normal 5
conf:)!+
8. tear-$ro%-rate=normal contact-lenses=soft 5 ==/ asti*matism=no 5
conf:)!+
7. contact-lenses=soft 5 ==/ tear-$ro%-rate=normal 5 conf:)!+
!0. contact-lenses=soft 5 ==/ asti*matism=no 5 conf:)!+
Con+i/ence 5*lues *re *lre*/2 8i5en. -upport 5*lues 6ust be c*lcul*te/ b2 /i5i/in8 the 5*lue in
e*ch rule b2 the tot*l no. o+ inst*nces i.e9
%or ex*6ple9 1
st
rule s*2s th*t there *re 12 inst*nces .here9 +or te*r=pro/=r*te *ttribute the 5*lue
is <re/uce/C *n/ +or *ll th*t cont*ct=lenses *ttribute 5*lue is <noneC so support is c*lcul*te/ *s 12
$i5i/e/ b2 tot*l no. o+ inst*nces .hich is 2,.
Gence9 12H2,I0.0 thus the support +or rule 1 is 0.0 *n/ con+i/ence is 1.
7ike.ise9 support *n/ con+i/ence 5*lues shoul/ be c*lcul*te/ +or *ll the rules. F*se/ on the
Juestion 8i5en .e c*n /eci/e *s to .hether the rule is stron8 or .e*k. Bin the Juestion it .oul/ be
6entione/ *s rules .ith support 5*lue 0.0 or *bo5e *n/ con+i/ence 5*lue 1 *re *ll stron8 rules9 so
b*se/ on such * Juestion .e 6ust c*lcul*te the 5*lues *n/ /e6onstr*te .hich rules *re stron8
.hich rules *re .e*k).
2he end of the 2hird 45peri6ent.
4 7enerate a report usin% the i6pro6ptu tool of the Co%nos usin% attributes fro6
pro'ided tables?
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
>sin% standard $eport:
1he purpose o+ this experi6ent is to sho. the 8ener*tion o+ * report usin8 the i6pro6ptu tool o+
the co8nos usin8 the t*bles pro5i/e/.
We st*rt co8nos *s +ollo.s4
1) Click on st*rt K pro8r*6s K IF# co8nos ! 5ersion 0 K IF# co8nos i6pro6ptu.
;ou .ill 8et the p*8e *s belo.4

2) Now !elect !tandard report:

0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
3) Ne"t !elect go!ale! a! #entioned below:
,) Then click on open $ou will get below #entioned page:

0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
0) %ou can !elect an$thing &ro# the li!t here we are !electing '!$!te#!( and
click on 'ok(.

") )$ de&ault the title o& the report i! 't$pe here $our title( rena#e thi! a! $ou
like here in thi! ca!e 'dcet( * a# giving.
!) Then click on Ne"t.
) $ou will get the &ollowing page:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
?) Select an$ one &ro# the report in thi! ca!e 'li!t report(.
10) Then click on Ne"t.
11) Select countr$ then an$ one attribute &ro# the li!t a&ter !electing '+dd(
button will be enabled then click on '+dd( .
12) +&ter clicking on +dd $ou will get the &ollowing page.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
13) Now !elect product! &ro# the li!t and an$ attribute o& the product here in thi!
ca!e 'product na#e(.
1,) Now !elect order! &ro# the li!t and an$ attribute o& the order! here in thi!
ca!e 'order nu#ber(.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
10) Now !elect '!ale! !ta( &ro# the li!t and an$ attribute o& the '!ale! !ta(
here in thi! ca!e
'!ale! !ta code(.
1") *& $ou want $ou can take an$ nu#ber o& attribute! then click on ne"t $ou will
get &ollowing page:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
1!) The above page i! !howing that the output o& $our report will be in thi!
&or#at
1) Now click on ,ni!h
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
2he end of the fourth 45peri6ent.
0) 7enerate a report usin% the i6pro6ptu tool of the Co%nos usin% attributes fro6
pro'ided tables?
>sin% open an e5istin% report
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
1he purpose o+ this experi6ent is to sho. the 8ener*tion o+ * report usin8 the i6pro6ptu tool o+
the co8nos usin8 the t*bles pro5i/e/.
We st*rt co8nos *s +ollo.s4
1) Click on st*rt K pro8r*6s K IF# co8nos ! 5ersion 0 K IF# co8nos i6pro6ptu.
;ou .ill 8et the p*8e *s belo.4
-. Select an e"i!ting report a! &ollow!:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
/. Select an$ one o& the ,le &ro# the li!t a! !hown below:
0. Then click on open:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
1. Select !$!te#! and click on ok:
2. %ou will get another page !howing the countr$ !elect an$ one:
3. +&ter !electing countr$ click on ok.
4. Report will be generated with the available attribute! in that countr$.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
5. Save a! above page a! !hown below:
16. 7ive an$ ,le na#e with the e"ten!ion o& the ,le '.i#r(.
11. Click on !ave report will be !aved with that ,le na#e a! !hown below:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
2he end of the fifth e5peri6ent.
) 7enerate a report usin% the i6pro6ptu tool of the Co%nos usin% attributes fro6
pro'ided tables?
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Create a $eport usin% a te6plate:
1he purpose o+ this experi6ent is to sho. the 8ener*tion o+ * report usin8 the i6pro6ptu tool o+
the co8nos usin8 the t*bles pro5i/e/.
We st*rt co8nos *s +ollo.s4
1) Click on st*rt K pro8r*6s K IF# co8nos ! 5ersion 0 K IF# co8nos i6pro6ptu.
;ou .ill 8et the p*8e *s belo.4
-. Select the create a report u!ing a te#plate:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
/. Select an$ te#plate &ro# the li!t and click on ok here in thi! ca!e '!i#ple
li!t(.
0. %ou will get the &ollowing page !elect 'go!ale!( click on open.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
1. Select !$!te#! a! !hown below.
2. Select one attribute &ro# li!t:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
3. Then !elect product a! !hown below:
4. Then !elect order nu#ber:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
5. Then !elect !ale! !ta code:
16. Click on ok $ou will get the report a! !hown below:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
11. Change the title in $our report with 'c!e(.
1-. Now !ave the report with --.i#r: 8$ou can write an$ na#e.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
1/. 9inal report a! &ollow!:
2he end of si5th e5peri6ent.
* 9oad data fro6 hetero%eneous sources includin% te5t files into a predefined
warehouse sche6a. Case Study:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
The purpo!e o& thi! e"peri#ent i! to draw a warehou!e !che#a u!ing a ca!e
!tud$.
We h*5e three /i++erent sche6*s n*6el24 star sche6a? snowfla&e sche6a and fact
constellation sche6a.
-a Star sche6a:
1he All electronics s*les is rel*te/ to the storin8 *n/ 6*int*inin8 the ite6s *5*il*ble9 keep tr*ck
o+ the ti6eLke29 br*nchLke2 *n/ loc*tion recor/s.
2i6e sales ite6
$i6ension t*ble %*ct t*ble $i6ension t*ble

@ranch location
$i6ension t*ble $i6ension t*ble

1he *bo5e st*r sche6* +or <*ll electronic s*lesC speci+ies I +*ct t*ble4 <sales;.
1he sales t*ble h*s +our /i6ensions4 ti6eLke29 ite6Lke29 br*nchLke29 loc*tionLke2.
An/ t.o 6e*sures4 /oll*rsLsol/9 unitsLsol/.
1his sche6* *llo.s the /i6ensions or ke2 to be sh*re/ *6on8 the +*ct t*ble.
(otice th*t in the st*r sche6*9 e*ch /i6ension is represente/ b2 onl2 one t*ble *n/ e*ch t*ble cont*ins *
set o+ *ttributes.
-bSnowfla&e sche6a:
0300!" #$ %A&'U$$I(

1i6eLke2
Ite6Lke2
Fr*nchLke2
7oc*tionLke2
$oll*rsLsol/
UnitsLsol/
Ite6Lke2
Ite6Ln*6e
Fr*n/
12pe
-upplierLt2pe
Fr*nchLke2
Fr*nchLn*6e
Fr*nchLt2pe
1
DCET CSE-Dept
1he sno.+l*ke sche6* is * 5*ri*nt o+ the st*r sche6* 6o/el9 .here so6e /i6ension t*bles *re
nor6*li:e/9 thereb2 +urther splittin8 the /*t* into *//ition*l t*bles. 1he resultin8 sche6* 8r*ph +or6s *
sh*pe si6il*r to * sno.+l*ke.
2i6e Sales ite6
$i6ension t*ble +*ct t*ble /i6ension t*ble
Supplier /i6ension t*ble
@ranch
$i6ension t*ble 9ocation

$i6ension t*ble
City /i6ension t*ble
1he 6*Mor /i++erence bet.een the sno.+l*ke *n/ st*r sche6* is th*t the /i6ension t*bles o+ the
sno.+l*ke 6o/el 6*2 be kept in nor6*li:e/ +or6 to re/uce re/un/*ncies. -uch * t*ble is e*s2 to
6*int*in *n/ s*5es stor*8e sp*ce.
1he *bo5e sno.+l*ke sche6* +or <*ll electronics s*leC speci+ies one +*ct t*ble4 :sale;.
1he sales t*ble h*s +our /i6ensions4 ti6eLke29 ite6Lke29 br*nchLke29 loc*tionLke2.
An/ t.o 6e*sures4 /oll*rsLsol/9 unitsLsol/.
1his sche6* *llo.s the /i6ensions or ke2 to be sh*re/ *6on8 the +*ct t*ble.
1he sno.+l*ke sche6* is not *s popul*r *s the st*r sche6* in /*t* .*rehouse /esi8n.
0300!" #$ %A&'U$$I(

Ite6Lke2
Ite6Ln*6e
Fr*n/
12pe
-upplierLke2
supplierLke2
-upplierLt2pe
Fr*nchLke2
Fr*nchLn*6e
Fr*nchLt2pe
loc*tionLke2
-treet
Cit2Lke2
Cit2Lke2
Cit2
)ro5inceLorLst*te
Countr2
1
DCET CSE-Dept
1he 6*in /i++erence bet.een the t.o sche6*s is in the /e+inition o+ /i6ension t*bles.
1he sin8le /i6ension t*ble +or ite6 in the st*r sche6* is nor6*li:e/ in the sno.+l*ke sche6*9 resultin8 in
ne. ite6 *n/ supplier t*bles.
(otice th*t +urther nor6*li:*tion c*n be per+or6e/ on pro5inceLorLst*te *n/ countr2 in the sno.+l*ke
sche6* sho.n in *bo5e /i*8r*6.
-c 3act constellation sche6a:
-ophistic*te/ *pplic*tions 6*2 reJuire 6ultiple +*ct t*bles to sh*re /i6ension t*bles.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
1his kin/ o+ sche6* c*n be 5ie.e/ *s * collection o+ st*rs *n/ hence is c*lle/ * %ala5y
sche6a or * fact constellation.
2i6e sales ite6 shippin%
$i6ensiont*ble %*ct t*ble $i6ension t*ble +*ct t*ble

@ranch location
shipper
$i6ension t*ble $i6ension t*ble /i6ension t*ble
1he *bo5e +*ct constell*tion sche6* +or <*ll electronics s*leC speci+ies t.o +*ct t*ble4 :sale;?
:shippin%;.
1he sales t*ble h*s +our /i6ensions4 ti6eLke29 ite6Lke29 br*nchLke29 loc*tionLke2.
An/ t.o 6e*sures4 /oll*rsLsol/9 unitsLsol/.
1he shippin8 +*ct t*ble h*s +i5e /i6ensions4 ti6eLke29 ite6Lke29 shipperLke29 +ro6 loc*tion9 to loc*tion.
An/ t.o 6e*sures4 /oll*rsLsol/9 unitsLsol/.
0300!" #$ %A&'U$$I(

1i6eLke2
Ite6Lke2
shipperLke2
+ro6 7oc*tion
1o loc*tion
$oll*rsLsol/
UnitsLsol/
shipperLke2
shipperLn*6e
7oc*tionLke2
-hipperLt2pe
loc*tionLke2
-treet
Cit2
)ro5inceLorLst*te
countr2
1
DCET CSE-Dept
The !che#a allow! the di#en!ion! or ke$! to be !hared a#ong the
&act table!.
2he end of se'enth e5peri6ent.
8 7enerate the 7er6an credit dataset in 0$33 -0ttribute $elation 3ile 3or6at 3or6at:
Nrel*tion 8er6*nLcre/it
N*ttribute checkin8Lst*tus O PQ0P9 P0QIAQ200P9 PKI200P9 Pno checkin8PR
N*ttribute /ur*tion re*l
N*ttribute cre/itLhistor2 O Pno cre/itsH*ll p*i/P9 P*ll p*i/P9 Pexistin8 p*i/P9 P/el*2e/ pre5iousl2P9
Pcritic*lHother existin8 cre/itPR
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
N*ttribute purpose O Pne. c*rP9 Puse/ c*rP9 +urnitureHeJuip6ent9 r*/ioHt59 P/o6estic *ppli*nceP9
rep*irs9 e/uc*tion9 5*c*tion9 retr*inin89 business9 otherR
N*ttribute cre/itL*6ount re*l
N*ttribute s*5in8sLst*tus O PQ100P9 P100QIAQ000P9 P000QIAQ1000P9 PKI1000P9 Pno kno.n
s*5in8sPR
N*ttribute e6plo26ent O une6plo2e/9 PQ1P9 P1QIAQ,P9 P,QIAQ!P9 PKI!PR
N*ttribute inst*ll6entLco66it6ent re*l
N*ttribute person*lLst*tus O P6*le /i5HsepP9 P+e6*le /i5H/epH6*rP9 P6*le sin8leP9 P6*le 6*rH.i/P9
P+e6*le sin8lePR
N*ttribute otherLp*rties O none9 Pco *pplic*ntP9 8u*r*ntorR
N*ttribute resi/enceLsince re*l
N*ttribute propert2L6*8nitu/e O Pre*l est*teP9 Pli+e insur*nceP9 c*r9 Pno kno.n propert2PR
N*ttribute *8e re*l
N*ttribute otherLp*26entLpl*ns O b*nk9 stores9 noneR
N*ttribute housin8 O rent9 o.n9 P+or +reePR
N*ttribute existin8Lcre/its re*l
N*ttribute Mob O Pune6pHunskille/ non resP9 Punskille/ resi/entP9 skille/9 Phi8h Ju*li+Hsel+
e6pH686tPR
N*ttribute nu6L/epen/ents re*l
N*ttribute o.nLtelephone O none9 2esR
N*ttribute +orei8nL.orker O 2es9 noR
N*ttribute cl*ss O 8oo/9 b*/R
N/*t*
PQ0P9"9Pcritic*lHother existin8 cre/itP9r*/ioHt5911"?9Pno kno.n s*5in8sP9PKI!P9,9P6*le
sin8leP9none9,9Pre*l est*teP9"!9none9o.n929skille/9192es92es98oo/
P0QIAQ200P9,9Pexistin8 p*i/P9r*/ioHt590?019PQ100P9P1QIAQ,P929P+e6*le /i5H/epH6*rP9none929Pre*l
est*teP9229none9o.n919skille/919none92es9b*/
Pno checkin8P9129Pcritic*lHother existin8 cre/itP9e/uc*tion920?"9PQ100P9P,QIAQ!P929P6*le
sin8leP9none939Pre*l est*teP9,?9none9o.n919Punskille/ resi/entP929none92es98oo/
PQ0P9,29Pexistin8 p*i/P9+urnitureHeJuip6ent9!29PQ100P9P,QIAQ!P929P6*le sin8leP98u*r*ntor9,9Pli+e
insur*nceP9,09none9P+or +reeP919skille/929none92es98oo/
PQ0P92,9P/el*2e/ pre5iousl2P9Pne. c*rP9,!09PQ100P9P1QIAQ,P939P6*le sin8leP9none9,9Pno kno.n
propert2P9039none9P+or +reeP929skille/929none92es9b*/
Pno checkin8P93"9Pexistin8 p*i/P9e/uc*tion9?0009Pno kno.n s*5in8sP9P1QIAQ,P929P6*le
sin8leP9none9,9Pno kno.n propert2P9309none9P+or +reeP919Punskille/ resi/entP9292es92es98oo/
Subtas&s -1urn in 2our *ns.ers to the +ollo.in8 t*sks
1. 9ist all the cate%orical -or no6inal attributes and the real/'alued attributes seperately.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
%ro6 the Ger6*n Cre/it Assess6ent C*se -tu/2 8i5en to us9the +ollo.in8 *ttributes *re
+oun/ to be *pplic*ble +or Cre/it='isk Assess6ent4
2otal 1alid 0ttributes:/
1. checkin8Lst*tus
2. /ur*tion
3. cre/it histor2
,. purpose
0. cre/it *6ount
". s*5in8sLst*tus
!. e6plo26ent /ur*tion
. inst*ll6ent r*te
?. person*l st*tus
10. /ebitors
11. resi/enceLsince
12. propert2
1,. inst*ll6ent pl*ns
10. housin8
1". existin8 cre/its
1!. Mob
1. nu6L/epen/ents
1?. telephone
20. +orei8n .orker
Cate%orical or ,o6ianal attributes-.hich t*kes 1rueH+*lse9;esHno etc 5*lues:/
1. checkin8Lst*tus
2. cre/it histor2
3. purpose
,. s*5in8sLst*tus
0. e6plo26ent
". person*l st*tus
!. /ebtors
. propert2
?. inst*ll6ent pl*ns
10. housin8
11. Mob
12. telephone
13. +orei8n .orker
$eal 'alued attributes:/
1. /ur*tion
2. cre/it *6ount
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
3. cre/it *6ount
,. resi/ence
0. *8e
". existin8 cre/its
!. nu6L/epen/ents
!. <hat attributes do you thin& 6i%ht be crucial in 6a&in% the credit assesse6ent ? Co6e
up with so6e si6ple rules in plain 4n%lish usin% your selected attributes.
0ccordin% to 6e the followin% attributes 6ay be crucial in 6a&in% the credit ris&
assess6ent.
1. Cre/itLhistor2
2. E6plo26ent
3. )ropert2L6*8nitu/e
,. Mob
0. /ur*tion
". cr/itL*6ount
!. inst*ll6ent
. existin8 cre/it
F*se/ on the *bo5e *ttributes9 .e c*n 6*ke * /ecision .hether to 8i5e cre/it or
not.
". 8ne type of 6odel that you can create is a Decision 2ree / train a Decision 2ree usin%
the co6plete dataset as the trainin% data. $eport the 6odel obtained after trainin%.
A /ecision tree is * +lo. ch*rt like tree structure .here e*ch intern*l no/eBnon=le*+)/enotes *
test on the *ttribute9 e*ch br*nch represents *n outco6e o+ the test 9*n/ e*ch le*+
no/eBter6in*l no/e)hol/s * cl*ss l*bel.
$ecision trees c*n be e*sil2 con5erte/ into cl*ssi+ic*tion rules.
e.8. I$39C,.0 *n/ CA'1.
J48 pruned tree
1. Usin8 WEKA 1ool9 .e c*n 8ener*te * /ecision tree b2 selectin8 the <classify tabC.
2. In cl*ssi+2 t*b select choose option .here * list o+ /i++erent /ecision trees *re
*5*il*ble. %ro6 th*t list select J48.
3. (o. un/er test option 9select trainin% data test option.
,. 1he resultin8 .in/o. in WEKA is *s +ollo.s4
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
0. 1o 8ener*te the /ecision tree9 ri8ht click on the result list *n/ select 'isuali(e tree
option b2 .hich the /ecision tree .ill be 8ener*te/.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
". 1he obt*ine/ /ecision tree +or cre/it risk *ssess6ent is 5er2 l*r8e to +it on the
screen.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
!. 1he /ecision tree *bo5e is uncle*r /ue to * l*r8e nu6ber o+ *ttributes.
4. Suppose you use your abo'e 6odel trained on the co6plete dataset? and classify credit
%oodAbad for each of the e5a6ples in the dataset. <hat B of e5a6ples can you classify
correctly? -2his is also called testin% on the trainin% set <hy do you thin& you cannot
%et 1CC B trainin% accuracy?
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Dn the abo'e 6odel we trained co6plete dataset and we classified credit %oodAbad
for each of the e5a6ples in the dataset.
%or ex*6ple4
I%
purposeI5*c*tion 1GE(
cre/itIb*/
E7-E
purposeIbusiness 1GE(
Cre/itI8oo/
In this .*2 .e cl*ssi+ie/ e*ch o+ the ex*6ples in the /*t*set.
We cl*ssi+ie/ 0.0S o+ ex*6ples correctl2 *n/ the re6*inin8 1,.0S o+ ex*6ples *re
incorrectl2 cl*ssi+ie/. We c*nTt 8et 100S tr*inin8 *ccur*c2 bec*use out o+ the 20
*ttributes9 .e h*5e so6e unnecess*r2 *ttributes .hich *re *lso been *n*l2:e/ *n/ tr*ine/.
$ue to this the *ccur*c2 is *++ecte/ *n/ hence .e c*nTt 8et 100S tr*inin8 *ccur*c2.
#. Ds testin% on the trainin% set as you did abo'e a %ood idea? <hy or <hy not?
Accor/in8 to the rules9 +or the 6*xi6u6 *ccur*c29 .e h*5e to t*ke 2H3 o+ the /*t*set *s
tr*inin8 set *n/ the re6*inin8 1H3 *s test set. Fut here in the *bo5e 6o/el .e h*5e t*ken
co6plete /*t*set *s tr*inin8 set .hich results onl2 0.0S *ccur*c2.
1his is /one +or the *n*l2:in8 *n/ tr*inin8 o+ the unnecess*r2 *ttributes .hich /oes not
6*ke * cruci*l role in cre/it risk *ssess6ent. An/ b2 this co6plexit2 is incre*sin8 *n/
+in*ll2 it le*/s to the 6ini6u6 *ccur*c2. I+ so6e p*rt o+ the /*t*set is use/ *s * tr*inin8
set *n/ the re6*inin8 *s test set then it le*/s to the *ccur*te results *n/ the ti6e +or
co6put*tion .ill be less.
1his is .h2U .e pre+er not to t*ke co6plete /*t*set *s tr*inin8 set.
>se2rainin% Set $esult for the table 7er6anCreditData:
Correctl2 Cl*ssi+ie/ Inst*nces 00 0.0 S
Incorrectl2 Cl*ssi+ie/ Inst*nces 1,0 1,.0 S
K*pp* st*tistic 0."201
#e*n *bsolute error 0.2312
'oot 6e*n sJu*re/ error 0.3,
'el*ti5e *bsolute error 00.03!! S
'oot rel*ti5e sJu*re/ error !,.2010 S
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
1ot*l (u6ber o+ Inst*nces 1000
). 8ne approach for sol'in% the proble6 encountered in the pre'ious Euestion is usin%
cross/'alidation? Describe what cross/'alidation is briefly. 2rain a Decision 2ree a%ain
usin% cross/'alidation and report your results. Does your accuracy increaseAdecrease?
<hy?
Cross 'alidation:/
In k=+ol/ cross=5*li/*tion9 the initi*l /*t* *re r*n/o6l2 portione/ into VkT 6utu*ll2
exclusi5e subsets or +ol/s $19 $29 $39 . . . . . .9 $k. E*ch o+ *pproxi6*tel2 eJu*l si:e.
1r*inin8 *n/ testin8 is per+or6e/ VkT ti6es. In iter*tion I9 p*rtition $i is reser5e/ *s the
test set *n/ the re6*inin8 p*rtitions *re collecti5el2 use/ to tr*in the 6o/el.
1h*t is in the +irst iter*tion subsets $29 $39 . . . . . .9 $k collecti5el2 ser5e *s the tr*inin8
set in or/er to obt*in *s +irst 6o/el. Which is teste/ on $i. 1he secon/ tr*ine/ on the
subsets $19 $39 . . . . . .9 $k *n/ test on the $2 *n/ so onW.
1. -elect classify t*b *n/ J48 /ecision tree *n/ in the test option select cross 'alidation
r*/io button *n/ the nu6ber o+ +ol/s *s 1C.
2. (u6ber o+ +ol/s in/ic*tes nu6ber o+ p*rtition .ith the set o+ *ttributes.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
3. K*pp* st*tistics ne*rin8 1 in/ic*tes th*t there is 100S *ccur*c2 *n/ hence *ll the errors
.ill be :eroe/ out9 but in re*lit2 there is no such tr*inin8 set th*t 8i5es 100S *ccur*c2.
Cross 1alidation $esult at folds: 1C for the table 7er6anCreditData:
Correctl2 Cl*ssi+ie/ Inst*nces !00 !0.0 S
Incorrectl2 Cl*ssi+ie/ Inst*nces 2?0 2?.0 S
K*pp* st*tistic 0.2,"!
#e*n *bsolute error 0.3,"!
'oot 6e*n sJu*re/ error 0.,!?"
'el*ti5e *bsolute error 2.0233 S
'oot rel*ti5e sJu*re/ error 10,."0"0 S
1ot*l (u6ber o+ Inst*nces 1000
Gere there *re 1000 inst*nces .ith 100 inst*nces per p*rtition.
Cross 1alidation $esult at folds: !C for the table 7er6anCreditData:
Correctl2 Cl*ssi+ie/ Inst*nces "? "?. S
Incorrectl2 Cl*ssi+ie/ Inst*nces 302 30.2 S
K*pp* st*tistic 0.22",
#e*n *bsolute error 0.30!1
'oot 6e*n sJu*re/ error 0.,3
'el*ti5e *bsolute error 0.000" S
'oot rel*ti5e sJu*re/ error 10".003 S
1ot*l (u6ber o+ Inst*nces 1000
Cross 1alidation $esult at folds: #C for the table 7er6anCreditData:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Correctl2 Cl*ssi+ie/ Inst*nces !0? !0.? S
Incorrectl2 Cl*ssi+ie/ Inst*nces 2?1 2?.1 S
K*pp* st*tistic 0.203
#e*n *bsolute error 0.3,,
'oot 6e*n sJu*re/ error 0.,20
'el*ti5e *bsolute error 2.?30, S
'oot rel*ti5e sJu*re/ error 100.22" S
1ot*l (u6ber o+ Inst*nces 1000
Cross 1alidation $esult at folds: 1CC for the table 7er6anCreditData:
Correctl2 Cl*ssi+ie/ Inst*nces !10 !1 S
Incorrectl2 Cl*ssi+ie/ Inst*nces 2?0 2? S
K*pp* st*tistic 0.20!
#e*n *bsolute error 0.3,,,
'oot 6e*n sJu*re/ error 0.,!!1
'el*ti5e *bsolute error 1.?0? S
'oot rel*ti5e sJu*re/ error 10,.11", S
1ot*l (u6ber o+ Inst*nces 1000
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
)ercent*8e split /oes not *llo. 100S9 it *llo.s onl2 till ??.?S
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Percenta%e Split $esult at #CB:
Correctl2 Cl*ssi+ie/ Inst*nces 3"2 !2., S
Incorrectl2 Cl*ssi+ie/ Inst*nces 13 2!." S
K*pp* st*tistic 0.2!20
#e*n *bsolute error 0.3220
'oot 6e*n sJu*re/ error 0.,!",
'el*ti5e *bsolute error !".3023 S
'oot rel*ti5e sJu*re/ error 10".,3!3 S
1ot*l (u6ber o+ Inst*nces 000
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Percenta%e Split $esult at ++.+B:
Correctl2 Cl*ssi+ie/ Inst*nces 0 0 S
Incorrectl2 Cl*ssi+ie/ Inst*nces 1 100 S
K*pp* st*tistic 0
#e*n *bsolute error 0."""!
'oot 6e*n sJu*re/ error 0."""!
'el*ti5e *bsolute error 221.!00, S
'oot rel*ti5e sJu*re/ error 221.!00, S
1ot*l (u6ber o+ Inst*nces 1

0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
*. Chec& to see if the data shows a bias a%ainst Fforei%n wor&ersF -attribute !C? or
Fpersonal/statusF-attribute +. 8ne way to do this -Perhaps rather si6ple 6inded is to
re6o'e these attributes fro6 the dataset and see if the decision tree created in those cases is
si%nificantly different fro6 the full dataset case which you ha'e already done. 2o re6o'e
an attribute you can use the reprocess tab in <e&aGs 7>D 45plorer. Did re6o'in% these
attributes ha'e any si%nificant effect? Discuss.

1his incre*ses in *ccur*c2 bec*use the t.o *ttributes <+orei8n .orkersC *n/ <person*l
st*tusC*re not 6uch i6port*nt in tr*inin8 *n/ *n*l2:in8. F2 re6o5in8 this9 the ti6e h*s
been re/uce/ to so6e extent *n/ then it results in incre*se in the *ccur*c2.
1he /ecision tree .hich is cre*te/ is 5er2 l*r8e co6p*re/ to the /ecision tree .hich
.e h*5e tr*ine/ no.. 1his is the 6*in /i++erence bet.een these t.o /ecision trees.
A+ter +ori8n .orker is re6o5e/9 the *ccur*c2 is incre*se/ to 0.?S
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
I+ .e re6o5e ?
th
*ttribute9 the *ccur*c2 is +urther incre*se/ to "."S .hich sho.s th*t these t.o
*ttributes *re not si8ni+ic*nt to per+or6 tr*inin8.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Cross 5*li/*tion *+ter re6o5in8 ?
th
*ttribute.
)ercent*8e split *+ter re6o5in8 ?
th
*ttribute.
A+ter re6o5in8 the 20th *ttribute9 the cross 5*li/*tion is *s *bo5e.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
A+ter re6o5in8 20
th
*ttribute9 the percent*8e split is *s *bo5e.
8. 0nother Euestion 6i%ht be? do you really need to input so 6any attributes to %et %ood
results? .aybe only a few would do. 3or e5a6ple? you could try Hust ha'in% attributes !? "?
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
#? *? 1C? 1* -and !1? the class attribute -naturally. 2ry out so6e co6binations. -Iou had
re6o'ed two attributes in proble6 * $e6e6ber to reload the arff data file to %et all the
attributes initially before you start selectin% the ones you want.
-elect *ttribute !?"?#?*?1C?1*?!1 *n/ click on in'ert to re6o5e the re6*inin8 *ttributes.
Gere *ccur*c2 is /ecre*se/.
-elect r*n/o6 *ttributes *n/ then check the *ccur*c2.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
A+ter re6o5in8 the *ttributes 19,9"99?91191291391,91091"9191? *n/ 209.e select the le+t o5er
*ttributes *n/ 5isu*li:e the6.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
A+ter .e re6o5e 1, *ttributes9 the *ccur*c2 h*s been /ecre*se/ to !".,S hence .e c*n
+urther tr2 r*n/o6 co6bin*tion o+ *ttributes to incre*se the *ccur*c2.
Cross 5*li/*tion
)ercent*8e split
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
5. So6eti6es? the cost of reHectin% an applicant who actually has a %ood credit-case 1
6i%ht be hi%her than acceptin% an applicant who has bad credit -case !.Dnstead of
countin% the 6isclassifications eEually in both cases? %i'e a hi%her cost to the first
case -say cost # and lower cost to the second case. Iou can do this by usin% a cost
6atri5 in <e&a. 2rain your Decision 2ree a%ain and report the Decision 2ree and
cross/'alidation results. 0re they si%nificantly different fro6 results obtained in
proble6 ) -usin% eEual cost?
In the )roble6 "9 .e use/ eJu*l cost *n/ .e tr*ine/ the /ecision tree. Fut here9 .e consi/er
t.o c*ses .ith /i++erent cost. 7et us t*ke cost 0 in c*se 1 *n/ cost 2 in c*se 2. When .e 8i5e
such costs in both c*ses *n/ *+ter tr*inin8 the /ecision tree9 .e c*n obser5e th*t *l6ost eJu*l to
th*t o+ the /ecision tree obt*ine/ in proble6 ". C*se1 Bcost 0) C*se2 Bcost 0)
1ot*l Cost 320 1!00
A5er*8e cost 3.2 1.!00
We /onTt +in/ this cost +*ctor in proble6 ". As there .e use eJu*l cost. 1his is the 6*Mor
/i++erence bet.een the results o+ proble6 " *n/ proble6 ?.
1he cost 6*trices .e use/ here4
C*se 14 0 1
1 0

C*se 24 2 1
1 2
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
1.-elect classify t*b.
2. -elect .ore 8ption +ro6 1est Option.
3.1ick on cost sensiti'e 4'aluation *n/ 8o to set.
,.-et classes *s 2.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
0.Click on 'esi:e *n/ then .eTll 8et cost 6*trix.
".1hen ch*n8e the 2
n/
entr2 in 1
st
ro. *n/ 2
n/
entr2 in 1
st
colu6n to 0.0
!.1hen con+usion 6*trix .ill be 8ener*te/ *n/ 2ou c*n +in/ out the /i++erence
bet.een 8oo/ *n/ b*/ *ttribute.
.Check *ccur*c2 .hether itTs ch*n8in8 or not.
16. Do you thin& it is a %ood idea to prefer si6ple decision trees instead of ha'in% lon%
co6ple5 decision trees? How does the co6ple5ity of a Decision 2ree relate to the
bias of the 6odel?
When .e consi/er lon8 co6plex /ecision trees9 .e .ill h*5e 6*n2 unnecess*r2 *ttributes in the
tree .hich results in incre*se o+ the bi*s o+ the 6o/el. Fec*use o+ this9 the *ccur*c2 o+ the 6o/el
c*n *lso e++ecte/.
1his proble6 c*n be re/uce/ b2 consi/erin8 si6ple /ecision tree. 1he *ttributes .ill be less *n/
it /ecre*ses the bi*s o+ the 6o/el. $ue to this the result .ill be 6ore *ccur*te.
-o it is * 8oo/ i/e* to pre+er si6ple /ecision trees inste*/ o+ lon8 co6plex trees.
1. Open *n2 existin8 A'%% +ile e.8 l*bour.*r++.
2. In preprocess t*b9 select 099 to select *ll the *ttributes.
3. Go to classify t*b *n/ then use tr*nin8 set .ith J48 *l8orith6.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
,. 1o 8ener*te the /ecision tree9 ri8ht click on the result list *n/ select 'isuali(e tree
option9 b2 .hich the /ecision tree .ill be 8ener*te/.
0. 'i8ht click on J48 *l8orith6 to 8et Generic ObMect E/itor .in/o.
". In this96*ke the unpruned option *s true .
!. 1hen press 8= *n/ then start. .e +in/ the tree .ill beco6e 6ore co6plex i+ not
prune/.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
1isuali(e tree

. 1he tree h*s beco6e 6ore co6plex.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
11. Iou can 6a&e your Decision 2rees si6pler by prunin% the nodes. 8ne approach is
to use $educed 4rror Prunin% / 45plain this idea briefly. 2ry reduced error prunin%
for trainin% your Decision 2rees usin% cross/'alidation -you can do this in <e&a
and report the Decision 2ree you obtain ? 0lso? report your accuracy usin% the
pruned 6odel. Does your accuracy increase ?
$educed/error prunin%:/
1he i/e* o+ usin8 * sep*r*te prunin8 set +or prunin8X.hich is *pplic*ble to /ecision trees *s
.ell *s rule setsXis c*lle/ re/uce/=error prunin8. 1he 5*ri*nt /escribe/ pre5iousl2 prunes * rule
i66e/i*tel2 *+ter it h*s been 8ro.n *n/ is c*lle/ incre6ent*l re/uce/=error prunin8.
Another possibilit2 is to buil/ * +ull9 unprune/ rule set +irst9 prunin8 it *+ter.*r/s b2 /isc*r/in8
in/i5i/u*l tests.
Go.e5er9 this 6etho/ is 6uch slo.er. O+ course9 there *re 6*n2 /i++erent .*2s to *ssess the
.orth o+ * rule b*se/ on the prunin8 set. A si6ple 6e*sure is to consi/er ho. .ell the rule
.oul/ /o *t /iscri6in*tin8 the pre/icte/ cl*ss +ro6 other cl*sses i+ it .ere the onl2 rule in the
theor29 oper*tin8 un/er the close/ .orl/ *ssu6ption.
I+ it 8ets p inst*nces ri8ht out o+ the t inst*nces th*t it co5ers9 *n/ there *re ) inst*nces o+ this
cl*ss out o+ * tot*l 1 o+ inst*nces *lto8ether9 then it 8ets positi5e inst*nces ri8ht. 1he inst*nces
th*t it /oes not co5er inclu/e ( = n ne8*ti5e ones9 .here n I t Y p is the nu6ber o+ ne8*ti5e
inst*nces th*t the rule co5ers *n/ ( I 1 = ) is the tot*l nu6ber o+ ne8*ti5e inst*nces.
1hus the rule h*s *n o5er*ll success r*tio o+ Dp ZB( = n)E 1 9 *n/ this Ju*ntit29 e5*lu*te/ on the
test set9 h*s been use/ to e5*lu*te the success o+ * rule .hen usin8 re/uce/=error prunin8.
1. 'i8ht click on 3, *l8orith6 to 8et Generic ObMect E/itor .in/o.
2. In this96*ke reduced error prunin% option *s true *n/ *lso the unpruned
option *s true .
3. 1hen press 8= *n/ then start.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
,. We +in/ th*t the *ccur*c2 h*s been incre*se/ b2 selectin8 the re/uce/ error prunin8
option.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
+ Dataset : car.arf
Pre process tab:
How to use the dataset editor?
1. open the WE:+ 7;* choo!er:
-. Click the E"plorer button:

/. in the preproce!! tab click the open ,le button:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
0. !elect the path to open the car.ar data!et:
1. now click on the edit button to open the viewer window containing the table:
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
2. we can edit the car.ar data!et u!ing the viewer window:
How many instances does the dataset contain?
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
The data!et car.ar contain! 13-4 in!tance!.
E"a#ple! are:
How many attributes are used to represent the instances? What
are they?
Seven attribute! are u!ed to repre!ent the in!tance!. The$ are : bu$ing< #aint<
door!< per!on!< lug=boot< !a&et$< cla!!.
How to apply a flter in order to remove attributes and instances?
1. >pen the WE:+ 7;* choo!er and click on the E"plorer button.
-. Click on the open ,le button and !elect the car.ar data!et.
/. Click on the choo!e button.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
0. To delete attribute! e"pand ,lter! then e"pand un!upervi!ed. E"pand
attribute! and click on re#ove.
1. right click on the re#ove te"t ,eld and enter the attribute indice! value. Now
!elect true &or invertSelection drop down and click >:. 9inall$ click +ppl$.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
2. To delete in!tance! e"pand 9ilter! then e"pand Supervi!ed. Now e"pand
in!tance! and click on !trati,edRe#ove9old!. 9inall$< click +ppl$ to delete the
in!tance!.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Classify tab:
How to classify car.arf with the !"decision tree learner?
1. >pen the WE:+ 7;* choo!er. Click on the E"plorer button.
-. then click on open ,le and !elect the data!et car.ar
/. Click on the cla!!i&$ tab.
0. Click on the choo!e button and e"pand cla!!i,er!. Now e"pand tree! and
click on ?04
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
1. Click on the u!e training !et radio button. Now click on the !tart button to
di!pla$ the re!ult.
How to e#amine the tree in classifer output panel?
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
@@@ Run in&or#ation @@@
Sche#e: weka.cla!!i,er!.tree!.?04 -C 6.-1 -A -
Relation: car
*n!tance!: 13-4
+ttribute!: 3
bu$ing
#aint
door!
per!on!
lug=boot
!a&et$
cla!!
Te!t #ode: evaluate on training data
@@@ Cla!!i,er #odel 8&ull training !et. @@@
?04 pruned tree
------------------
!a&et$ @ low: unacc 8132.6.
!a&et$ @ #ed
B per!on! @ -: unacc 815-.6.
B per!on! @ 0
B B bu$ing @ vhigh
B B B #aint @ vhigh: unacc 81-.6.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
B B B #aint @ high: unacc 81-.6.
B B B #aint @ #ed
B B B B lug=boot @ !#all: unacc 80.6.
B B B B lug=boot @ #ed: unacc 80.6C-.6.
B B B B lug=boot @ big: acc 80.6.
B B B #aint @ low
B B B B lug=boot @ !#all: unacc 80.6.
B B B B lug=boot @ #ed: unacc 80.6C-.6.
B B B B lug=boot @ big: acc 80.6.
B B bu$ing @ high
B B B lug=boot @ !#all: unacc 812.6.
B B B lug=boot @ #ed
B B B B door! @ -: unacc 80.6.
B B B B door! @ /: unacc 80.6.
B B B B door! @ 0: acc 80.6C1.6.
B B B B door! @ 1#ore: acc 80.6C1.6.
B B B lug=boot @ big
B B B B #aint @ vhigh: unacc 80.6.
B B B B #aint @ high: acc 80.6.
B B B B #aint @ #ed: acc 80.6.
B B B B #aint @ low: acc 80.6.
B B bu$ing @ #ed
B B B #aint @ vhigh
B B B B lug=boot @ !#all: unacc 80.6.
B B B B lug=boot @ #ed: unacc 80.6C-.6.
B B B B lug=boot @ big: acc 80.6.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
B B B #aint @ high
B B B B lug=boot @ !#all: unacc 80.6.
B B B B lug=boot @ #ed: unacc 80.6C-.6.
B B B B lug=boot @ big: acc 80.6.
B B B #aint @ #ed: acc 81-.6.
B B B #aint @ low
B B B B lug=boot @ !#all: acc 80.6.
B B B B lug=boot @ #ed: acc 80.6C-.6.
B B B B lug=boot @ big: good 80.6.
B B bu$ing @ low
B B B #aint @ vhigh
B B B B lug=boot @ !#all: unacc 80.6.
B B B B lug=boot @ #ed: unacc 80.6C-.6.
B B B B lug=boot @ big: acc 80.6.
B B B #aint @ high: acc 81-.6.
B B B #aint @ #ed
B B B B lug=boot @ !#all: acc 80.6.
B B B B lug=boot @ #ed: acc 80.6C-.6.
B B B B lug=boot @ big: good 80.6.
B B B #aint @ low
B B B B lug=boot @ !#all: acc 80.6.
B B B B lug=boot @ #ed: acc 80.6C-.6.
B B B B lug=boot @ big: good 80.6.
B per!on! @ #ore
B B lug=boot @ !#all
B B B bu$ing @ vhigh: unacc 812.6.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
B B B bu$ing @ high: unacc 812.6.
B B B bu$ing @ #ed
B B B B #aint @ vhigh: unacc 80.6.
B B B B #aint @ high: unacc 80.6.
B B B B #aint @ #ed: acc 80.6C1.6.
B B B B #aint @ low: acc 80.6C1.6.
B B B bu$ing @ low
B B B B #aint @ vhigh: unacc 80.6.
B B B B #aint @ high: acc 80.6C1.6.
B B B B #aint @ #ed: acc 80.6C1.6.
B B B B #aint @ low: acc 80.6C1.6.
B B lug=boot @ #ed
B B B bu$ing @ vhigh
B B B B #aint @ vhigh: unacc 80.6.
B B B B #aint @ high: unacc 80.6.
B B B B #aint @ #ed: acc 80.6C1.6.
B B B B #aint @ low: acc 80.6C1.6.
B B B bu$ing @ high
B B B B #aint @ vhigh: unacc 80.6.
B B B B #aint @ high: acc 80.6C1.6.
B B B B #aint @ #ed: acc 80.6C1.6.
B B B B #aint @ low: acc 80.6C1.6.
B B B bu$ing @ #ed: acc 812.6C1.6.
B B B bu$ing @ low
B B B B #aint @ vhigh: acc 80.6C1.6.
B B B B #aint @ high: acc 80.6.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
B B B B #aint @ #ed: good 80.6C1.6.
B B B B #aint @ low: good 80.6C1.6.
B B lug=boot @ big
B B B bu$ing @ vhigh
B B B B #aint @ vhigh: unacc 80.6.
B B B B #aint @ high: unacc 80.6.
B B B B #aint @ #ed: acc 80.6.
B B B B #aint @ low: acc 80.6.
B B B bu$ing @ high
B B B B #aint @ vhigh: unacc 80.6.
B B B B #aint @ high: acc 80.6.
B B B B #aint @ #ed: acc 80.6.
B B B B #aint @ low: acc 80.6.
B B B bu$ing @ #ed
B B B B #aint @ vhigh: acc 80.6.
B B B B #aint @ high: acc 80.6.
B B B B #aint @ #ed: acc 80.6.
B B B B #aint @ low: good 80.6.
B B B bu$ing @ low
B B B B #aint @ vhigh: acc 80.6.
B B B B #aint @ high: acc 80.6.
B B B B #aint @ #ed: good 80.6.
B B B B #aint @ low: good 80.6.
!a&et$ @ high
B per!on! @ -: unacc 815-.6.
B per!on! @ 0
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
B B bu$ing @ vhigh
B B B #aint @ vhigh: unacc 81-.6.
B B B #aint @ high: unacc 81-.6.
B B B #aint @ #ed: acc 81-.6.
B B B #aint @ low: acc 81-.6.
B B bu$ing @ high
B B B #aint @ vhigh: unacc 81-.6.
B B B #aint @ high: acc 81-.6.
B B B #aint @ #ed: acc 81-.6.
B B B #aint @ low: acc 81-.6.
B B bu$ing @ #ed
B B B #aint @ vhigh: acc 81-.6.
B B B #aint @ high: acc 81-.6.
B B B #aint @ #ed
B B B B lug=boot @ !#all: acc 80.6.
B B B B lug=boot @ #ed: acc 80.6C-.6.
B B B B lug=boot @ big: vgood 80.6.
B B B #aint @ low
B B B B lug=boot @ !#all: good 80.6.
B B B B lug=boot @ #ed: good 80.6C-.6.
B B B B lug=boot @ big: vgood 80.6.
B B bu$ing @ low
B B B #aint @ vhigh: acc 81-.6.
B B B #aint @ high
B B B B lug=boot @ !#all: acc 80.6.
B B B B lug=boot @ #ed: acc 80.6C-.6.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
B B B B lug=boot @ big: vgood 80.6.
B B B #aint @ #ed
B B B B lug=boot @ !#all: good 80.6.
B B B B lug=boot @ #ed: good 80.6C-.6.
B B B B lug=boot @ big: vgood 80.6.
B B B #aint @ low
B B B B lug=boot @ !#all: good 80.6.
B B B B lug=boot @ #ed: good 80.6C-.6.
B B B B lug=boot @ big: vgood 80.6.
B per!on! @ #ore
B B bu$ing @ vhigh
B B B #aint @ vhigh: unacc 81-.6.
B B B #aint @ high: unacc 81-.6.
B B B #aint @ #ed: acc 81-.6C1.6.
B B B #aint @ low: acc 81-.6C1.6.
B B bu$ing @ high
B B B #aint @ vhigh: unacc 81-.6.
B B B #aint @ high: acc 81-.6C1.6.
B B B #aint @ #ed: acc 81-.6C1.6.
B B B #aint @ low: acc 81-.6C1.6.
B B bu$ing @ #ed
B B B #aint @ vhigh: acc 81-.6C1.6.
B B B #aint @ high: acc 81-.6C1.6.
B B B #aint @ #ed
B B B B lug=boot @ !#all: acc 80.6C1.6.
B B B B lug=boot @ #ed: vgood 80.6C1.6.
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
B B B B lug=boot @ big: vgood 80.6.
B B B #aint @ low
B B B B lug=boot @ !#all: good 80.6C1.6.
B B B B lug=boot @ #ed: vgood 80.6C1.6.
B B B B lug=boot @ big: vgood 80.6.
B B bu$ing @ low
B B B #aint @ vhigh: acc 81-.6C1.6.
B B B #aint @ high
B B B B lug=boot @ !#all: acc 80.6C1.6.
B B B B lug=boot @ #ed: vgood 80.6C1.6.
B B B B lug=boot @ big: vgood 80.6.
B B B #aint @ #ed
B B B B lug=boot @ !#all: good 80.6C1.6.
B B B B lug=boot @ #ed: vgood 80.6C1.6.
B B B B lug=boot @ big: vgood 80.6.
B B B #aint @ low
B B B B lug=boot @ !#all: good 80.6C1.6.
B B B B lug=boot @ #ed: vgood 80.6C1.6.
B B B B lug=boot @ big: vgood 80.6.
Nu#ber o& Deave! : 1/1
SiEe o& the tree : 14-
Ti#e taken to build #odel: 6.61 !econd!
@@@ Evaluation on training !et @@@
@@@ Su##ar$ @@@
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept
Correctl$ Cla!!i,ed *n!tance! 1220 52.-52/ F
*ncorrectl$ Cla!!i,ed *n!tance! 20 /.36/3 F
:appa !tati!tic 6.5154
Aean ab!olute error 6.6-04
Root #ean !Guared error 6.1110
Relative ab!olute error 16.4011 F
Root relative !Guared error /-.5161 F
Total Nu#ber o& *n!tance! 13-4
@@@ Detailed +ccurac$ )$ Cla!! @@@
TH Rate 9H Rate Hreci!ion Recall 9-Aea!ure Cla!!
6.533 6.615 6.55- 6.533 6.540 unacc
6.520 6.6-4 6.563 6.520 6.5/0 acc
6.4-2 6.663 6.4/4 6.4-2 6.4/- good
6.402 6.66/ 6.513 6.402 6.44 vgood
@@@ Con&u!ion Aatri" @@@
a b c d I-- cla!!i,ed a!
114- -1 / 6 B a @ unacc
16 /36 - - B b @ acc
6 5 13 / B c @ good
6 0 2 11 B d @ vgood
0300!" #$ %A&'U$$I(

1
DCET CSE-Dept

0300!" #$ %A&'U$$I(

1

Вам также может понравиться