Вы находитесь на странице: 1из 11

Department of Electrical, Electronic

and Computer Engineering


EAI 320 - Intelligent systems

Practical 5 Report
Author:
J.H Mervitz

Student number:
u12014223

April 12, 2016

DECLARATION OF ORIGINALITY
UNIVERSITY OF PRETORIA
The University of Pretoria places great emphasis upon integrity and ethical conduct in the preparation of
all written work submitted for academic evaluation.
While academic staff teach you about referencing techniques and how to avoid plagiarism, you too have a
responsibility in this regard. If you are at any stage uncertain as to what is required, you should speak to
your lecturer before any written work is submitted.
You are guilty of plagiarism if you copy something from another authors work (e.g. a book, an article
or a website) without acknowledging the source and pass it off as your own. In effect you are stealing
something that belongs to someone else. This is not only the case when you copy work word-for-word
(verbatim), but also when you submit someone elses work in a slightly altered form (paraphrase) or use
a line of argument without acknowledging it. You are not allowed to use work previously produced by
another student. You are also not allowed to let anybody copy your work with the intention of passing if
off as his/her work.
Students who commit plagiarism will not be given any credit for plagiarised work. The matter may also be
referred to the Disciplinary Committee (Students) for a ruling. Plagiarism is regarded as a serious contravention of the Universitys rules and can lead to expulsion from the University.
The declaration which follows must accompany all written work submitted while you are a student of the
University of Pretoria. No written work will be accepted unless the declaration has been completed and
attached.

Full names of student:

Joseph Henry Mervitz

Student number:

u12014223

Topic of work:

EAI320 Practical 5 Report

Declaration
1. I understand what plagiarism is and am aware of the Universitys policy in this regard.
2. I declare that this assignment report is my own original work. Where other peoples work has been
used (either from a printed source, Internet or any other source), this has been properly acknowledged and referenced in accordance with departmental requirements.
3. I have not used work previously produced by another student or any other person to hand in as my
own.
4. I have not allowed, and will not allow, anyone to copy my work with the intention of passing it off
as his or her own work.

SIGNATURE:

DATE:

Contents
1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1

Finding the most common attribute . . . . . . . . . . . . . . . . .

2.2

Calculating Entropy (H) . . . . . . . . . . . . . . . . . . . . . . .

2.3

Calculating Information Gain (IG) . . . . . . . . . . . . . . . . .

2.4

Constructing the Decision Tree . . . . . . . . . . . . . . . . . . .

2.5

Format of training data . . . . . . . . . . . . . . . . . . . . . . .

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.1

Input x1 path . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2

Input x2 path . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6.1

Python source code . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

Assignment five deals with machine learning in artificial intelligence and a selected
algorithm for classifying information, the ID3 Algorithm. The practical explores an
implementation of the algorithm that can be used in general to cope with any data set
given that the training data is correct. Incorrect data will lead to incorrect results.
The ID3 algorithm uses various methods to determine the roots/sub-roots of the decision tree that is used to classify data. The goal of the tree is to produce a step by
step evaluation procedure that uses the correct attributes in the data to classify input
data. The selected attributes should split the data equally so that there is no grey area
distinguishing between the data.
Entropy calculations are used to assist in deciding the certainty of using a potential
attribute as a root of the tree or a sub-tree. Entropy is used when calculating the
information gain. The higher the information gain for an attribute the better chance
there is that it will be selected to split data.

2
2.1

Method
Finding the most common attribute

A frequency array is used to count how many times an attribute is found in the data
set. This will then be used to determine what the most common attribute/feature is in
the data. This information is advantageous to know before calculating the information
gain. An attribute that occurs the most frequently in a scenario where others dont
occur can be used to classify data without having to calculate the information gain.

2.2

Calculating Entropy (H)

The entropy is the measure of uncertainty and is calculated as follows:


H(x) =

n
X

P (xi )logb P (xi )

(1.1)

i=1

H(x) = 0 refers to perfectly classified information while H(x) = 1 refers to completely


random information.

2.3

Calculating Information Gain (IG)

The information gain is calculated as follows:


IG(X, A) = H(X)

|Xv |
H(Xv )
|X|
v=values
X

(1.2)

Higher information gain is more favourable.

2.4

Constructing the Decision Tree

Building the decision tree is a recursive procedure whereby the root node is created
first. Then depending in the polarity of all the examples, a single node tree with a root
of that polarity should be returned. If all attributes are empty then then a tree with
a root node containing the most common attribute should be returned. Otherwise the
attribute that has the largest information gain should be selected as a root of a tree
with the branches being the possible values of the attribute. Also then check the subset
of examples of each value. If it is empty then create a leaf node with the most common
value else the ID3 algorithm is called again.

2.5

Format of training data

Training data is provided to the program in .csv format.

3
3.1

Results
Input x1 path

The figure below displays the path travelled (in red) for input x1.

Figure 1: Tree traversal for x1 input.

3.2

Input x2 path

The figure below displays the path travelled (in red) for input x2.

Figure 2: Tree traversal for x2 input.

Discussion

The ID3 algorithm performed well and classified the two test inputs correctly. Input x1
was classified as yes and input x2 was classified as yes as well. This is evident from from
Figure 1 and Figure 2.
The subroutines functioned correctly as the attributes with the highest information gain
were chosen as the root of the tree and the corresponding sub-trees. Patrons classifies
data the best in the beginning with Estimate followed by Price. The rest of the attributes
form the lower nodes of the tree and are not shown as the test inputs only traversed up
to Price.
The tree is not very deep due to the fact that the data set and the number of attributes
is not that large. Larger data sets and number of attributes can cause the tree to be
much deeper and thus can be computationally expensive.
Since the data set is not that large there might be over fitting.
Possible causes include:
Small data set
Agent models noise and fails to find underlying relationships in data
Hypothesis space and number of features grow

Conclusion

The ID3 algorithm is an effective machine learning algorithm capable of classifying any
data set given that the data set is:
Large enough data set
Little or no noise that will not compromise the agents learning
Hypothesis space and number of features stay constant
Correct and accurate data
The algorithm produced outputs for inputs x1 and x2 which both resulted in Yes.
The algorithm proved to be computationally effective on the sample data as the data
set was not that large. Larger data sets can prove to be computationally expensive as
more nodes for the tree must be stored.

6
6.1

Appendix
Python source code

import math
d e f f i n d k e y ( key , l i s t ) :
for i in l i s t :
i f i == key :
r e t u r n True
else :
return False
d e f max value ( x ) :
maximum = 0
common =
f o r k i n x . keys ( ) :
i f x [ k]>maximum :
maximum = x [ k ]
common = k
r e t u r n common
d e f most common ( d a t a s e t , f e a t u r e s , g o a l ) :
v a l u e f r e q u e n c y = {}
index = f e a t u r e s . index ( goal )
for e in data set :
value frequency [ e [ index ] ] = 1
for e in data set :
i f ( f in d k ey ( e [ index ] , value frequency ) ) :
value frequency [ e [ index ] ] = value frequency [ e [ index ] ] + 1
r e t u r n max value ( v a l u e f r e q u e n c y )
d e f H( d a t a s e t , f e a t u r e s , g o a l a t t r i b u t e ) :
v a l u e s f r e q u e n c y = {}
i = 0
for x in features :
i f ( g o a l a t t r i b u t e == x ) :
break
++i
for e in data set :
8

i f ( find key ( e [ i ] , values frequency ) ) :


values frequency [ e [ i ] ] = values frequency [ e [ i ] ] + 1.0
else :
values frequency [ e [ i ] ] = 1.0

return entropy ( values frequency , d a t a s e t )


def entropy (x , d a t a s e t ) :
e = 0.0
for f in x . values ( ) :
e = e + ( f / l e n ( d a t a s e t ) ) math . l o g ( f / l e n ( d a t a s e t ) , 2 )
return e
d e f G( d a t a s e t , f e a t u r e s , a t t r , g o a l a t t r i b u t e ) :
v a l u e s f r e q u e n c y = {}
subH = 0 . 0
i = f e a t u r e s . index ( a t t r )
for e in data set :
i f ( find key ( e [ i ] , values frequency ) ) :
values frequency [ e [ i ] ] = values frequency [ e [ i ] ] + 1.0
else :
values frequency [ e [ i ] ] = 1.0
f o r v i n v a l u e s f r e q u e n c y . keys ( ) :
p
= v a l u e s f r e q u e n c y [ v ] / sum ( v a l u e s f r e q u e n c y . v a l u e s ( ) )
sdata
= [ e f o r e i n d a t a s e t i f e [ i ] == v ]
subH = subH + p H( sdata , f e a t u r e s , g o a l a t t r i b u t e )
c a l c = H( d a t a s e t , f e a t u r e s , g o a l a t t r i b u t e ) subH
return calc
d e f max gain ( d a t a s e t , f e a t u r e s , g o a l ) :
top = f e a t u r e s [ 0 ]
maximum gain = 0
for a in features :
new = G( d a t a s e t , f e a t u r e s , a , g o a l )
i f new>maximum gain :
maximum gain = new
top = a
r e t u r n top
def best attr ( data set , features , goal ) :

r e t u r n max gain ( d a t a s e t , f e a t u r e s , g o a l )

def extract values ( data set , features , attr ) :


index = f e a t u r e s . index ( a t t r )
values = [ ]
for e in data set :
i f e [ i n d e x ] not i n v a l u e s :
v a l u e s . append ( e [ i n d e x ] )
return values
def buildTree ( data set , f e a t u r e s , goal , rec ) :
rec = rec + 1
data set = data set [ : ]
v a l s = [ item [ f e a t u r e s . i n d e x ( g o a l ) ] f o r item i n d a t a s e t ]
d f = most common ( d a t a s e t , f e a t u r e s , g o a l )
i f not d a t a s e t o r ( l e n ( f e a t u r e s ) 1 ) <= 0 :
return df
e l i f v a l s . count ( v a l s [ 0 ] ) == l e n ( v a l s ) :
return vals [ 0 ]
else :
b = best attr ( data set , features , goal )
t r e e = {b : { } }
extracted vals = extract values ( data set , features , b)
for v in extracted vals :
examples = [ [ ] ]
index = f e a t u r e s . index (b)
for e in data set :
i f ( e [ i n d e x ] == v ) :
new entry = [ ]
f o r i i n range ( 0 , l e n ( e ) ) :
i f ( i != i n d e x ) :
new entry . append ( e [ i ] )
examples . append ( new entry )
examples . remove ( [ ] )
new attr = f e a t u r e s [ : ]
n e w a t t r . remove ( b )
s t r e e = b u i l d T r e e ( examples , n e w a t t r , g o a l , r e c )
tree [b ] [ v ] = stree
return tree

10