Вы находитесь на странице: 1из 77

M.

TECH(CSE)-II SEMESTER SOFTWARE LAB-2


Page 1 of 77








OOAD LAB PROGRAMS


















M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 2 of 77

INTRODUCTION TO UML
WHAT IS UML?
The Unified Modeling Language (UML) is a standard language for writing software blueprints. The
UML is a language for the artifacts of a software-intensive system.
Visualizing
Specifying
Constructing
Documenting
The UML is appropriate for modeling systems ranging from enterprise information systems to
distributed Web-based applications and even to hard real time embedded systems.

CONCEPTUAL MODEL OF UML
To understand UML, you need to form a conceptual model of the language. This requires
learning three major elements:
I. Basic Building Blocks
II. Rules
III. Common Mechanisms

I) BASI C BUI LDI NG BLOCKS
There are three kinds of basic building blocks. They are
1. Things
2. Relationships
3. Diagrams

1) THI NGS in the UML
There are four kinds of things in the UML
i. Structural Things
ii. Behavioral Things
iii. Annotational Things
iv. Grouping Things
1) Structural Things
Structural Things are nouns of UML models. There are 7 kinds of Structural things
a) Class
A class is a set of objects that share the same attributes, operations, relationships and semantics.
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 3 of 77



b) I nterface
An Interface is a collection of operations that specify a service of a class or component

c) Collaboration
Collaboration describes co-operative work of an element.

d) Use Case
Use Case describes set of sequence of actions that a system performs that yields an observable
result of value to a particular actor.


e) Component
Component represents physical packaging of logical elements like classes, interfaces and
collaborations.


f) Node
A Node is physical element that exists at run-time and having at least some memory and
processing capability.


ii) Behavioral Things
UseCase
Actor
Component
Node
ISpelling
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 4 of 77

Behavioral Things are the verbs of UML representing behavior over time and space. There are
two kinds of Behavioral things

a) I nteraction
Interaction is used to show communication between two objects.

b) State Machine
State Machine specifies sequence of states of an object.

iii) Annotational Things
Annotational Things are explanatory parts of UML. Only one type.

Note
Note is used to give comments to an element or collection of elements.


iv) Grouping Things
Grouping Things are the organizational parts of the model. Only one type.
Package
Package is a general purpose mechanism for organizing elements or things into groups or
packages.


2) RELATI ONSHI PS
There are 4 kinds of Relationships in the UML:
a) Dependency: It is denoted by dashed line with an arrow.

Note
Package
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 5 of 77

Dependency is a relationship between two things in which a change to one thing (Independent
thing) may affect the other thing (Dependent thing).

b) Association: It is denoted by a solid line.
____________________
Association is a structural relationship that describes a set of links, a link being a connection
among objects. Aggregation is a special kind of association, representing a structural relationship
between a whole and its parts.


c) Generalization: It is denoted by a solid line with a hollow arrow head pointing to the parent


Generalization is a relationship in which the child will share the behavior of the parent.


d) Realization: It is denoted by dashed lines with a hollow arrow head.

Realization is a relationship between classifiers, where one classifier specifies a contract that another
classifier guarantees to carry out
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 6 of 77


3) DI AGRAMS
There are nine types of Diagrams in UML, which are classified into two types
i) Structural Diagrams (static diagrams)
These are of four types


a) Class Diagram
A Class diagram shows a set of classes, interfaces, and collaborations and their relationships. A
class consists of class name, attributes, operations and responsibilities.

b) Object Diagram
An Object diagram shows a set of objects and their relationships. They represent snapshots of
instances of the things found in class diagrams.

c) Component Diagram
A Component diagram shows the organizations and dependencies among a set of components.
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 7 of 77


d) Deployment Diagram
A Deployment diagram shows the configuration of run-time processing nodes and the
components that are present in them.

Component and Deployment diagrams are called as Physical Diagrams.

ii) Behavioral Diagrams (Dynamic diagrams)
These are of five types
a) Use Case Diagram
A Use Case diagram shows a set of use cases and actors and their relationships. An Actor can be
a human or a system. The role of actor is written below.

I nteraction Diagrams
An Interaction diagram shows an interaction, consisting of a set of objects and their
relationships, including the messages they exchange among them.
Two types of Interaction diagrams are:
b) Sequence Diagram
A Sequence diagram is an interaction diagram that emphasizes the time-ordering of messages.
To show interaction between objects we use three types of messages.
Simple Messages:

A Simple message shows how control is passed from one object to other without describing
communication in detail i.e. without indicating whether it is synchronous or asynchronous message.
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 8 of 77


Synchronous Messages:

If sender object waits for a reply from receiver object from destination, such messages are called
Synchronous messages. Here, only one object can send a message at a given instance of time.
Asynchronous Messages:


If sender object continues executing while target is processing the message then such messages
are said to be Asynchronous messages. Here, multiple messages are executed at a time.
Object Lifeline: An Object life line is vertical dashed lines that represent the existence of an
object over a period of time.
Focus of Control: It is represented by rectangle that shows the period of time during which an
object performs some actions.


c) Collaboration Diagram
A Collaboration diagram is an interaction diagram that emphasizes the structural organization of
the objects that send and receive messages.



S: Student A: Admin
1: Request form
2: give form
3: fill form
4: submit
5: check form
6: eligible
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 9 of 77

d) State chart Diagram
A State chart diagram shows a state machine, consisting of states, transitions, events, and
activities.
Event: It refers to happening of an activity at a given time and place.




e) Activity Diagram
An activity diagram is a special kind of state chart diagram that shows the flow from activity to
activity within a system, which are connected by a triggerless transition. We can check some conditions
using decision box, which is denoted by a diamond.

Activity: It is a major task that must take place in order to fulfill an operation contract.

I nitial Activity: This shows the starting point of the flow. It is denoted by solid circle
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 10 of 77


Final Activity: This shows the end of the flow in the activity diagram. It is denoted by a solid circle
nested in a circle.

Decision Box: A point in an Activity diagram where a flow splits into several mutually exclusive
guarded flows. It has one incoming transition and two outgoing transitions.

Forking and J oining:-We use synchronization bar to specify the forking and joining of parallel flows of
control.

A synchronization bar is a thick horizontal or vertical line.

A Fork may have one incoming transition and two or more outgoing transitions, each of which
represents an independent flow of control.
A J oin may have two or more incoming transitions and one outgoing transition.

Swimlanes:
They are used to group related activities into one column.





Student
Listen
Watch
Success
Forking
Joining
synchronization
bar
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 11 of 77


1. AUTOMATIC TELLER MACHINE (ATM)
Aim: To create a system to perform Bank ATM transaction

Project Statement: The given project is to model automatic teller machine in which a customer uses
his/her ATM card to draw money from ATM.
Functional requirements:
The ATM machine provides facilities like withdraw amount, check balance, change password, mini-
statement, and transfer of amount etc to customers.

USE CASE DIAGRAM FOR ATM

Fig 1.1 Use case diagram for ATM

Documentation:
Objective: As shown above in figure 1.1 , the use case diagram we have two actors, customer and
system performing different operations, represented by the use cases for the atm system.
The use case diagram is initiated with the customer inserting the card
display options (menu) screen
issue cash
system
Enter pin no
Select option
Enter amount
Collect cash and card
customer
withdraw
money/change
password/etc.
display pin no screen
Insert card
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 12 of 77

Flow of messages: in response to which the system displays the pin no screen. The customer enters
the pin no. The system verifies the pin no and displays an options menu. The customer selects the
option he wants(withdraw money in this case).Further the customer enters the amount he would like to
withdraw. The system issues cash.
The use case diagram terminates with the user collecting the cash and the card.
Alternative flows: In case the pin no entered by the customer is invalid, he is directed to re-entering
the pin no.
In case the amount entered by the user is insufficient or out of limit, then the user
is directed to re-enter the amount.
CLASS DIAGRAM FOR ATM


Fig 1.2 Class diagram for ATM

Documentation: As shown above in figure 1.2, the classes details are as follows.
Class name: System (ATM machine)
Attributes: system id, location
Functions: display pin no screen, display menu, issue cash and check balance
Relationships: association with customer and database.
Class name: Customer
Attributes: name, addr, acct no, pin no
Functions: insert card, enter pin num, select option, enter amt and collect cash
Class name: Database
Attributes: bank name, location
Functions: maintain acct details, add customer, del customer.






M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 13 of 77

SEQUENCE DIAGRAM FOR ATM

Fig 1.3 Sequence diagram for ATM









u : Customer s : System d : database
insert card
display pin no screen
enter pin no request pin no details
display options menu
select option
enter amount request balance details
issue cash
sending
check
sending
check if balance is adequate
send card details
collect cash
update database
reload data
exit
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 14 of 77

COLLABORATION DIAGRAM FOR ATM





Fig 1.4 Collaboration diagram for ATM






















u : Customer
s : System
d : database
7: check
13: check if balance is adequate
1: insert card
4: enter pin no
9: select option
10: enter amount
15: collect cash
3: display pin no screen
8: display options menu
14: issue cash
18: exit
2: send card details
5: request pin no details
11: request balance details
16: update database
6: sending
12: sending
17: reload data
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 15 of 77

ACTIVITY DIAGRAM FOR ATM

Fig 1.5 Activity diagram for ATM
Documentation:
As shown above in figure 1.5 activity diagram starts when the customer inserts his card. He
enters the pin no on to the screen. The system then verifies whether the pin no is valid. In case it is not
valid the customer is asked to re-enter the pin no. In case it is valid, the customer selects the account
insert card
start
enter the pin
number
select the
account type
select the
options
enter the
amount
collect the
amount
stop
collect card
valid
Invalid Pin no
insufficient balance
Sufficient balance available
within limit
Limit exceeds
no more transactions
further transactions
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 16 of 77

type he wants to use and the option (in this case withdrawing money).The customer then enters the
amount he wants to withdraw. This is scrutinized first by checking if sufficient balance is present in
the account and second if it is within limit. In case both satisfy, the cash is issued to the customer. The
customer collects the cash. In case he wants to further select some options he is provided with the
option to do so. If not, he collects his card and the activity diagram stops.
STATE DIAGRAM FOR ATM


Fig 1.6 State diagram for ATM
Documentation:
As shown above in figure 1.6, state diagram starts when the customer inserts his/her card.
He/She enters the pin no on to the screen. The system then verifies whether the pin no is valid. In case it
is not valid the customer is asked to re-enter the pin no. In case it is valid, the customer selects the
account type he wants to use and the option (in this case withdrawing money).The customer then enters
the amount he wants to withdraw. This is scrutinized first by checking if sufficient balance is present in
the account and second if it is within limit. In case both satisfy, the cash is issued to the customer. The
customer collects the cash. In case he wants to further select some options he is provided with the option
to do so. If not, he collects his card and the state diagram stops.






M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 17 of 77

COMPONENT DIAGRAM FOR ATM



Fig 1.7 Component diagram for ATM
Documentation:
The Component diagram in Fig 1.7 shows the client components in the ATM system. Each
class has its own header and body file, so each class is mapped to its own components in the diagram.
For example, the ATM Screen class is mapped to the ATM Screen component.The ATM Screen class is
also mapped to a second ATM Screen component. These two components represent the header and body
of the ATM Screen class. The shaded component is called a package body. It represents the body file
(.cpp) of the ATM Screen class in C++. The unshaded component is called a package specification. The
package specification represents the header (.h) file of the C++ class. The component called ATM.exe is
a task specification and represents a thread of processing. In this case, the thread of processing is the
executable program.

Components are connected by dashed lines showing the dependency relationships between them. For
example, the Card Reader class is dependent upon the ATM Screen class. This means that the ATM
Screen class must be available in order for the Card Reader class to compile. Once all of the classes have
been compiled, then the executable called ATMClient.exe can be created.

The ATM example has two threads of processing and therefore two executables. One executable
comprises the ATM client, including the Cash Dispenser, Card Reader, and ATM Screen. The second
executable comprises the ATM server, including the Account component.




M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 18 of 77

DEPLOYMENT DIAGRAM FOR ATM


Fig 1.8 Deployment diagram for ATM

Documentation:
As shown above in figure 1.8, the Deployment diagram tells us much about the layout of the
system. The ATM client executable will run on multiple ATMs located at different sites. The ATM
client will communicate over a private network with the regional ATM server. The ATM server
executable will run on the regional ATM server. The regional ATM server will, in turn, communicate
over the local area network (LAN) with the banking database server running Oracle. Lastly, a printer is
connected to the regional ATM server.

Our ATM system will be following a three-tier architecture with one tier each for the database, regional
server, and client.

The Deployment diagram is used by the project manager, users, architect, and deployment staff to
understand the physical layout of the system and where the various subsystems will reside. This diagram
helps the project manager communicate what the system will be like to the users. It also helps the staff
responsible for deployment to plan their deployment efforts.







M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 19 of 77



2. ONLINE BOOK SHOP SYSTEM

Project Statement: The given problem is to model online book shop.

Functional Requirements:
Create an online bookshop where the user can register him/her self and can order for books and
pay the bill during the delivery of the books.

Book shop staff has to send ordered items to the customer.

USE CASE DIAGRAM FOR ONLINE BOOK SHOP SYSTEM

Fig 2.1 Use case diagram for Online Book Shop System

Documentation:
Objective: As shown above in figure 2.1, the use case diagram we have two actors, user and internet
performing different operations, represented by the use cases for the online book shopping system.
The use case diagram is initiated with the user logs in in the website.
Provide book information for user
Collect info or money from the user
Specify means for collecting book
Internet
Login in the website
search for availability of required
book
confirm about selected book
Give your debit or credit card
number
Collect book
User
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 20 of 77

Flow of messages: In response to which the system displays the books available, the user searches for
a book, checks its availability, then confirms the selection of the book. The system takes the
information about the payment of money. Then the system gives the means by which the book can be
collected
The use case diagram terminates with the user collecting the book.
Alternative flows: In case the required book is not available the use case terminates.

CLASS DIAGRAM FOR ONLINE BOOK SHOP SYSTEM

Fig 2.2 Class Diagram for Online Book Shop System
Documentation: As shown above in figure 2.2 class diagram details are as follows.
Class name: User
Attributes : Card no, name.
Functions:Enter Url, select book, give credit cardno., collect book.
Relationships : Depends on book dealer and internet.
Class name : Internet
Functions: open requested site, process user requests, gather user data and send to DB.
Relationships: depends on Database
Class name : BookDealer
Functions : Book delivery.
Relationships: Depends of Database.
Class name: Database
Attributes: Card no, name
Functions: Update book info,update credit cardno, update user info.
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 21 of 77


SEQUENCE DIAGRAM FOR ONLINE BOOK SHOP SYSTEM

Fig 2.3 Sequence diagram for Online Book Shop System











User Internet Book dealer Database
Request URL
Display request Page
Select book
Request for credit cardno
Submit details and cardno
Update user info
view requests
Collect book from book dealer
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 22 of 77

COLLABORATION DIAGRAM FOR ONLINE BOOK SHOP SYSTEM



Fig 2.4 Collaboration diagram for Online Book Shop System


















User Internet
Book
dealer
Database
1: Request URL
3: Select book
5: Submit details and cardno
2: Display request Page
4: Request for credit cardno
8: Collect book from book dealer
6: Update user info
7: view requests
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 23 of 77

ACTIVITY DIAGRAM FOR ONLINE BOOK SHOP SYSTEM

Fig 2.5 Activity diagram for Online Book Shop System

Documentation:
As shown above in figure 2.5, the activity diagram begins with the user logging in his/her
account. The system displays the list of available books, the user searches for the required book. In case
the book is not found then the activity terminates. If found then the user places an order and then makes
the payment using his/her credit card. Then the system confirms the order and the details about the
means of collecting the books are specified to the user.







Display welcome
message
Display item
information
confirm book
selection
create order from
shipping cart
display order
ship to
customer
accepted
rejected
if required book not found
Found
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 24 of 77

STATE DIAGRAM FOR ONLINE BOOK SHOP SYSTEM



Fig 2.6 State diagram for Online Book Shop System

Documentation:
As shown above in figure 2.6, the state diagram begins with the user logging in his/her
account. The system displays the list of available books, the user searches for the required book. In case
the book is not found then the activity terminates. If found then the user places an order and then makes
the payment using his/her credit card. Then the system confirms the order and the details about the
means of collecting the books are specified to the user.








M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 25 of 77

COMPONENT DIAGRAM FOR ONLINE BOOK SHOP SYSTEM



Fig 2.7 Component diagram for Online Book Shop System

Documentation:
The Component diagram in Fig 2.7 shows the components in the Online Book Shop system.
The components are Book Shop Web site, Customer, Database, Payment and Deliver Books. The Book
Shop Web site component is mapped to Database of Books and Payment of books i.e., credit card
validation. If the credit card is validated then the books will be delivered to the customer.

















M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 26 of 77

DEPLOYMENT DIAGRAM FOR ONLINE BOOK SHOP SYSTEM



Fig 2.8 Deployment diagram for Online Book Shop System

Documentation:
As shown above in figure 2.8, the Deployment diagram tells us much about the layout of the
system. The online book shop is connected Raid terminals, Server. Server is connected to client terminal
as well as Customer workstation.

The Deployment diagram is used by the project manager, users, architect, and deployment staff to
understand the physical layout of the system and where the various subsystems will reside. This diagram
helps the project manager communicate what the system will be like to the users. It also helps the staff
responsible for deployment to plan their deployment efforts.














M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 27 of 77

3. BANKING SYSTEM

Project Statement: The given problem is to model Banking System.

Functional Requirements:
Create a Banking System where the Bank Customer can perform the transactions such as
Deposit money, Withdraw money, Check Balance and Get Loan..

Bank staff such as Cashier and Bank Manager need to check, approve and update customers
transactions in the Bank Account.

USECASE DIAGRAM FOR BANKING SYSTEM



Fig 3.1 Use case diagram for Banking System


Documentation:
Objective: As shown above in figure 3.1, the use case diagram has got three actors. They are
customer, cashier and bank manager performing different operations, represented by the use cases for
the banking system.
The use case diagram is initiated with the customer performs transaction in the bank.
Flow of messages: for every customer transaction such as update balance, deposit money, withdraw
money and get loan there will be check points from bank cashier and bank manager. Then customer
will be able to perform the transactions as required.
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 28 of 77

The use case diagram terminates if the customer unable to satisy the conditions as required by bank
then the customer will not be able to perform any transactions.
Alternative flows: There is no alternate flow for this use case diagram.

CLASS DIAGRAM FOR BANKING SYSTEM


Fig 3.2 Class diagram for Banking System

Documentation: As shown above in figure 3.2 the class details are as follows.
Class name: Customer (Account Holder in Bank)
Attributes: Customer Name, Account Number, Address, Phone Number
Functions: Create new account, Deposit, Withdraw
Relationships: association with Bank.

Class name: Bank
Attributes: Customer Details, Loan Details, Rules and Regulations, Transaction Type, Transaction
Date, Transaction Time
Functions: Provide Loan, Update Details Collect Money, Transaction
Class name: Account
Attributes: Account Number, Balance, Customer Name
Functions: Update Account, Check Account.

M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 29 of 77


SEQUENCE DIAGRAM FOR BANKING SYSTEM


Fig 3.3 Sequence diagram for Banking System


COLLABORATION DIAGRAM FOR BANKING SYSTEM



Fig 3.4 Collaboration diagram for Banking System





M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 30 of 77


ACTIVITY DIAGRAM FOR BANKING SYSTEM




Fig 3.5 Activity diagram for Banking System

Documentation:
As shown above in figure 3.5, the activity diagram begins with customer details are
entered by bank staff such as customer account number and customer name. If the customers details are
valid then the account is updated according to the transactions performed such as deposit money,
withdraw money etc.










M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 31 of 77


STATE DIAGRAM FOR BANKING SYSTEM



Fig 3.6 State diagram for Banking System


Documentation:
As shown above in figure 3.6, the state diagram begins when the customer wants to open the
account with the bank. After account is opened then the customer will be able to perform the
transactions such as deposit, withdraw and check balance for his/her account.











M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 32 of 77

COMPONENT DIAGRAM FOR BANKING SYSTEM



Fig 3.7 Component diagram for Banking System


Documentation:
The Component diagram in Fig 3.7 shows the components in the Banking system. The
components are Bank, Branch, Customer, Deposit, Withdraw and Employee. The Bank component is
mapped to Customer and Branch. Similarly Branch is mapped to Deposit, Withdraw and Employee.




















M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 33 of 77

DEPLOYMENT DIAGRAM FOR BANKING SYSTEM


Fig 3.8 Deployment diagram for Banking System

Documentation:
As shown above in figure 3.8, the Deployment diagram tells us much about the layout of the
system. The Bank is connected Customer, Branch, and Terminal. Branch is connected to Employee.

The Deployment diagram is used by the project manager, users, architect, and deployment staff to
understand the physical layout of the system and where the various subsystems will reside. This diagram
helps the project manager communicate what the system will be like to the users. It also helps the staff
responsible for deployment to plan their deployment efforts.















M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 34 of 77

4. AIRPORT SIMULATION

Aim: To create a system to perform Airport Simulation

Project Statement: Everyday number of airplanes lands and takes off from airport. It is the
responsibility of the Air Traffic Control (ATC) to regulate these planes. The aim of this simulation is to
reconstruct the events occurring during landing or take off.
Whenever plane enters the RADAR space, RADAR signals the ATC about the plane. Then the
pilot sends the plane details. ATC checks the runway and decides priority. ATC signals the pilot whose
plane is having highest priority to land/take off. Then the pilot performs the corresponding command.
Let us just have an overview of the airport simulation:

RADAR senses the plane and signals the ATC
Pilot sends plane details to the ATC
ATC checks runway
ATC decides priority and gives signal to corresponding pilot
Pilot then lands/ takes off as per signal from the ATC

Functional requirements:
The Airport runway provides facilities like landing, take off etc. Before landing or take off, the Radar
signal need to be checked and the information should be passed to plane pilot.























M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 35 of 77

USECASE DIAGRAM FOR AIRPORT SIMULATION


Fig 4.1 Use case diagram for Airport Simulation

Documentation:
Objective: As shown above in figure 4.1, the use case diagram we have two actors, ATC (Airport
Traffic Control) and Pilot (Plane Driver).
The use case diagram is initiated with the ATC staff looks at Radar signals, then checks for
occupancy of runway and give signal according to priority whichever plane want land or take off.
Flow of messages: After detecting Radar signal, the ATC staff gives permission to land or take off
planes in the airport, so that there will be no accidents and simultaneous planes land or take off should
not take place.
The use case diagram terminates with the plane lands or takes off in the airport.
Alternative flows: Until the permission is given to Plane Pilot for take off, the other planes want to
land in the airport should wait and make rounds outside the airport.








M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 36 of 77


CLASS DIAGRAM FOR AIRPORT SIMULATION


Fig 4.2 Class diagram for Airport Simulation

Documentation: As shown above in figure 4.2 class details are as follows.
Class name: ATC (Airport Traffic Control)
Attributes: name, eid
Functions: response_on_RADAR, receive_plane_details, check_runway, determine_priority,
give_signal
Relationships: association with ATC and PILOT and has many to many relationship.

Class name: PILOT
Attributes: name, pid, plane_no
Functions: send_plane_details, receive_signal, land_or_take_off
























M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 37 of 77

SEQUENCE DIAGRAM FOR AIRPORT SIMULATION

Fig 4.3 Sequence diagram for Airport Simulation


COLLABORATION DIAGRAM FOR AIRPORT SIMULATION

Fig 4.4 Collaboration diagram for Airport Simulation







M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 38 of 77

ACTIVITY DIAGRAM FOR AIRPORT SIMULATION


Fig 4.5 Activity diagram for Airport Simulation

Documentation:
As shown above in figure 4.5, the activity diagram starts when the response on the Radar is
detected. After that plane details from pilot is received. If the runway is vacant then the signal will be
given for landing, otherwise priority is determined and signal is given for landing or take off. If the
landing signal is not given then the pilot with the plane should make rounds outside the airport.









M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 39 of 77

STATE DIAGRAM FOR AIRPORT SIMULATION



Fig 4.6 State diagram for Airport Simulation

Documentation:
As shown above in figure 4.5, the state diagram starts when the response on the Radar is
detected. After that plane details from pilot is received. If the runway is vacant then the signal will be
given for landing, otherwise priority is determined and signal is given for landing or take off. If the
landing signal is not given then the pilot with the plane should make rounds outside the airport.










M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 40 of 77

COMPONENT DIAGRAM FOR AIRPORT SIMULATION



Fig 4.7 Component diagram for Airport Simulation

Documentation:
The Component diagram in Fig 4.7 shows the components in the Airport Simulation. Each
class is mapped to its own components in the diagram. The components are Airport Terminal, Radar,
Runway and Pilot.

Components are connected by dashed lines showing the dependency relationships between
them.






















M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 41 of 77

DEPLOYMENT DIAGRAM FOR AIRPORT SIMULATION



Fig 4.8 Deployment diagram for Airport Simulation



Documentation:
As shown above in figure 4.8, the Deployment diagram tells us much about the layout of the
Airport simulation. The ATC of Airport Terminal will communicate with pilot after looking at the Radar
signals then gives permission to land or take off.

The Deployment diagram is used by the project manager, users, architect, and deployment staff to
understand the physical layout of the system and where the various subsystems will reside. This diagram
helps the project manager communicate what the system will be like to the users. It also helps the staff
responsible for deployment to plan their deployment efforts.















M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 42 of 77




DWDM LAB PROGRAMS











M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 43 of 77

1. Introduction of Data Warehousing and Data Mining
Data Warehouse:
A data warehouse is a subject oriented, integrated, time variant, and nonvolatile collection of
data in support of managements decision making process.
Data warehouse is a relational database that is designed for query and analysis rather than for
transaction processing. It usually contains historical data derived from transaction data, but it can
include data from other sources. It separates analysis workload from transaction workload and enables
an organization to consolidate data from several sources.
A common way of introducing data warehousing is to refer to the characteristics of a data
warehouse as set forth by William Inmon:
Subject Oriented
Integrated
Nonvolatile
Time Variant
1. Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more about your
company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you
can answer questions like "Who was our best customer for this item last year?" This ability to define a
data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented.
2. Integrated
Integration is closely related to subject orientation. Data warehouses must put data from disparate
sources into a consistent format. They must resolve such problems as naming conflicts and
inconsistencies among units of measure. When they achieve this, they are said to be integrated.
3. Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This is logical
because the purpose of a warehouse is to enable you to analyze what has occurred.
4. Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very much in
contrast to online transaction processing (OLTP) systems, where performance requirements demand that
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 44 of 77

historical data be moved to an archive. A data warehouse's focus on change over time is what is meant
by the term time variant.

Fig: 1.1 Architecture of Data Warehouse
2. Introduction of Data Mining
Simply stated, data mining refers to extracting or \mining" knowledge from large amounts of
data. The term is actually a misnomer. Remember that the mining of gold from rocks or sand is referred
to as gold mining rather than rock or sand mining. Thus, \data mining" should have been more
appropriately named \knowledge mining from data", which is unfortunately somewhat long. \Knowledge
mining", a shorter term, may not reflect the emphasis on mining from large amounts of data.
Nevertheless, mining is a vivid term characterizing the process that finds a small set of precious nuggets
from a great deal of raw material. Thus, such a misnomer which carries both data" and \mining"
became a popular choice. There are many other terms carrying a similar or slightly different meaning to
data mining, such as knowledge mining from databases, knowledge extraction, data/pattern analysis,
data archaeology, and data dredging.
Many people treat data mining as a synonym for another popularly used term, Knowledge
Discovery in Databases", or KDD. Alternatively, others view data mining as simply an essential step in
the process of knowledge discovery in databases. Knowledge discovery as a process is depicted in
Figure 1.4, and consists of an iterative sequence of the following steps:
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 45 of 77

data cleaning: to remove noise or irrelevant data.
data integration: where multiple data sources may be combined.
data selection: where data relevant to the analysis task are retrieved from the database.
data transformation: where data are transformed or consolidated into forms appropriate for
mining by performing summary or aggregation operations, for instance.
data mining: an essential process where intelligent methods are applied in order to extract data
patterns.
pattern evaluation: to identify the truly interesting patterns representing knowledge based on
some interestingness measures.
knowledge presentation: where visualization and knowledge representation techniques are used
to present the mined knowledge to the user.

Fig: 2.1 Data mining as a process of knowledge discovery
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 46 of 77


Fig: 2.2 Architecture of a typical data mining system.
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 47 of 77

3.How to Load Weka Software
About Weka
The purpose of this assignment is to install and run Weka, a widely used, FREE, Data Mining
Software Toolbox in Java. This homework will walk you through the basic steps of installing, running
the software, building classifiers, and labeling test cases. For this assignment, you will need to download
the TRAINING and TEST sets from the course website. Note: It is important that you properly install
and learn how to run Weka because we will use Weka for future hands on assignments as well as for the
data mining competition and course project.
Step 1: Installing Weka
Go to the Weka website, http://www.cs.waikato.ac.nz/ml/weka/, and download the software. On
the left hand side, click on the link that says download. Select the appropriate link corresponding to the
version of the software based on your operating system and whether or not you already have Java VM
running on your machine (if you dont know what Java VM is, then you probably dont). The link will
forward you to a site where you can download the software from a mirror site. Save the self-extracting
executable to disk and then double click on it to install Weka. Answer yes or next to the questions
during the installation. Click yes to accept the Java agreement if necessary. After you install the program
Weka should appear on your start menu under Programs (if you are using Windows).
Step 2: Running Weka
From the start menu select Programs, then Weka, then Weka 3*.
You will see the Weka GUI Chooser. Select Explorer. The Weka Explorer will then launch.
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 48 of 77


Step 3: Load Training Set
You will find the training set, TRAIN.arff on the course website. The training set includes the
records you will use in your next homework assignment. The TRAINING set contains the following
data:

M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 49 of 77

On the Weka Explorer, push the button that says open file. Open TRAIN.arff.

Step 4: Constructing the Initial Decision Tree
Select the tab that says Classify. In the box that says classifier, you can choose a classifier. Click
on the Choose button and you will be presented with a hierarchy of methods. Pick weka, classifiers,
trees, J48. Click on the text box in the classifer box (which says J48 and some cryptic options instead of
ZeroR which is the default classifier). In the popup, change the following settings, minNumObj to 1 and
unpruned to True and then Click OK. (Note: The order the options appear might vary depending on
which mirror site you choose. For example, we found minNumObj is closer to the top of the GUI in
some versions)

You will find the test set, TEST.arff on the course website. The TEST set includes the records
you will use in future homework assignments. The TEST set contains the data below. In the box that
says test options, pick Supplied test set. Click on the Set button and select your TEST.arff file.
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 50 of 77


Now press Start!!!!!!!!!!!!! AND WATCH WEKA GO!




Step 5: Results
You may have to scroll up and down in the classifier output box to see all the results. Cut and
paste the results in the classifier output window to a text editor and HAND IN (or email) with your
assignment.
You will compare these results with a future homework assignment. Dont worry that you dont
yet know how to interpret the output. In a short time, you will. This exercise is only to get you started
with WEKA.
In the results box, on the bottom left, Right click on the item that says trees.J48. Select
Visualize Classification Errors from the list. Click Save. And save the results as RESULTS.arff. This
file will include your original TEST set plus an extra column for the predicted classification.
Cut and paste the text in the RESULTS.arff file to the end of your assignment and HAND IN.
So, for the first part of the assignment, you simply need to hand in (or email to me) a text
document with the results output from Weka along with the prediction results found in your
RESULTS.arff file



M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 51 of 77

4. Classification of Decision Tree Induction by Employee Details
4.1 Definition:
Decision tree induction is the learning of decision trees from class-labeled training tuples.
4.2 Method of Using for Classification:
Given a tuple X for which associated class label is unknown, the attribute values of the tuple are
tested against the decision tree. A path is traced from root to the leaf node, which holds the class
prediction for that tuple.
If you have certainty then we have information, if there is uncertainty then no information.
Example:
Assume that a coin having head on both sides. If it is tossed we wont get any information. But if
we toss a coin having both sides different then the out comes gives information for us.
The information Gain here is
m

IG = information Gain :- P
i
log
2
P
i

i=1

Which means the expected information needed to classify a tuple in n,

where

D = set of tuples.
P
i
is the probability that on arvitrary tuple in D belong to class C
i
and is estimated by | C
i ,
D | / | D |
Since 0 P
i
1
Log
2
P
i
is less than zero (always)
i.e, Log
2
P
i
< 0 [since information always gives positive values]
P
i
Log
2
P
i
> 0
Then for understanding assume the following data.

M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 52 of 77

The Notations:
Let D, the data partition be a training set of class. Labelled tuples. Suppose the class label
attribute has m distinct values defining m distinct classes, C
i
for i = 1,2,m
Let C
i
,D be the set of tuples of class C
i,
in D. let |D| and | C
i
, D | are denote the number of tuples
in D and C
i
, D respectively.
Consider an event that has one of two possible values. Let the probabilities of two values be P
1
and P
2.

If P
1
= 1 , P
2
= 0 then there is no information i.e., IG = 0.
If P
1
= , P
2
= then IG = 1 . The information of an event is called Entropy .
4.3 Attribute Selection Measures:
An attribute selection measures is a for selecting the splitting criteria that best separates a given data
partition.
The attribute selection measure provides a ranking for each attribute describing the given
training tuples. One of the popular attribute selection measures is Information Gain.
4.4 Information Gain:
Let node N represents or hold the tuples of partition D. The attribute with the highest information gain is
chosen to classify the tuples in the resulting partitions and reflects the least randomness or impurith
in these partitions such an approach minimizes the expected number of tests needed to classify a given
sample tuple and guarantees that a simple tree is found. The more information would we still need in
order to arrive at on exact classification.
This amount is measured by

n
Info

A(D) =

|D
j
| info (D
j
)

j=1
|D|
Information gain is defined as the difference between the original information requirement and
the new requirement.
i.e, gain (A) = info (D) info

A(D).


M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 53 of 77

Example:
Decision tree classification model to classify bank-loan application by assigning applications to
one of the three risk classes.
Own
home
Married Gender Employed Credit-rates Risk-class
Y Y M Y A B
N N F Y A A
Y Y F Y B C
Y N M N B B
N Y F Y B C
N N F Y B A
N N M N B B
Y N F Y A A
N Y F Y A C
Y Y F Y A C
Table: 2.1 Decision tree table
Y = Yes, N = No, M = Mode, F = Female
Number of tuples = D = 10
Number of classes=3 and their frequencies A = 3 , B = 3 , C = 4 .
Information Gain =
m

i=1
P
i
log
2
P
i

= - 3/10 log
2
3/10 3/10 log
2
3/10 4/10 log
2
4/10 = 1.57


I) OWNHOME (Attribute) :
Values = Yes, 5 there are in classes A = 1 , B = 2 , C = 2 .
Values = No, 5 there are in classes A = 2 , B = 1 , C = 2 .
IG (Yes) = -1/5 log
2
1/5 - 2/5 log
2
2/5 2/5 log
2
2/5 = 1.52
IG (No) = -1/5 log
2
1/5 - 2/5 log
2
2/5 2/5 log
2
2/5 = 1.52
Total information gain = 5/10 IG (Y) + 5/10 IG (N) =1.52
II) MARRIED (Attribute) :
Values: Y: Yes = 5, A = 0, B = 1, C = 4
Values: No: N = 5, A = 3, B = 2, C = 0
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 54 of 77

IG (Y) = -0/5 log
2
0/5 - 1/5 log
2
1/5 4/5 log
2
4/5 = 0.72
IG (N) = -3/5 log
2
3/5 - 2/5 log
2
2/5 0/5 log
2
0/5 = 0.971
Now total information gain = 0.5 * 0.72 + 0.5 * 0.971 = 0.846
III) GENDER (Attribute) :
Values of type male 3 i.e., A = 0, B = 3, C = 0
Values of type female 7 i.e., A = 3, B = 0, C = 4
IG (M) = -3/3 log
2
3/3 = 0
IG (F) = -3/7 log
2
3/7 4/7 log
2
4/7 = 0.985
Total information gain = 3/10 * 0 + 7/10 * 0.985 = 0.69
IV) EMPLOYED (Attribute) :
Values of type Yes = 8 i.e., A = 3, B = 1, C = 4
Values of type No = 2 i.e., A = 0, B = 2, C = 0
IG(Y) = - 3/8 log
2
3/8 - 1/8 log
2
1/8 - 4/8 log
2
4/8 = 1.41
IG (N) = 0.
Total information Gain of sub tree = 8/10 * 1.41 = 1.12.
V) CREDIT RATES (Attribute) :
Values of type A = 5 i.e., A = 2 , B = 1 , C = 2
Values of type B =5 i.e., A = 1 , B = 2 , C = 2
IG (A) = -2/5 log
2
2/5 - 1/5 log
2
1/5 - 2/5 log
2
2/5 = 1.52
IG (B) = -1/5 log
2
1/5 - 2/5 log
2
2/5 - 2/5 log
2
2/5 = 1.52
Total information of sub tree = 5/10 * 1.52 + 5/10 * 1.52 = 1.52.





M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 55 of 77

Splitting
attribute
Info. Before
splitting (a)
Info. After
splitting (b)
Info. Gain ( a b )
Own home 1.57 1.52 0.05
Married 1.57 0.85 0.75
Gender 1.57 0.69 0.88
Employed 1.57 1.12 0.45
Credit-rates 1.57 1.52 0.05
From the table largest information gain is from Gender attribute. So, splitting attribute is Gender.
Since, risk-class B have type Gender - male only we can remove class B.

Own home Married Employed Credit-rates Risk-class
N N Y A A
Y Y Y B C
N Y Y B C
N N Y B A
Y N Y A A
N Y Y A C
Y Y Y A C
The information in this data of the two classes (i.e., A , C ) is given by,
Number of tuples = 7
Number of classes = 2 and their frequencies
A = 3 , C = 4
Now IG = -3/7 log
2
3/7 4/7 log
2
4/7 = 0.99 .
I) OWN HOME (Attribute) :
Number of values type = Y are 3 in this A = 1, C = 2
Number of values of type = N are 4 in this A = 2 , C = 2
IG ( Y ) = -1/3 log
2
1/3 - 2/3 log
2
2/3 = 0.92
IG ( N ) = -2/4 log
2
2/4 - 2/4 log
2
2/4 = 1 .
Total information gain = 3/7 * 0.92 + 4/7 * 1 = 0.96 .
II) MARRIED (Attribute) :
No. of values of type = Y are 4 in this A = 0 , C = 4
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 56 of 77

No. of values of type = N are 3 in this A = 3 , C = 0
Now from this IG (Y) = IG (N) = 0
Hence total information of sub tree = 0 .
III) CREDIT RATES (Attribute) :
No. of values of type A are 4 in this A = 2 , C = 2.
No. of values of type B are 3 in this A = 1 , C = 2.
IG ( A ) = -2/4 log
2
2/4 - 2/4 log
2
2/4 = 1.00
IG ( B ) = -1/3 log
2
1/3 - 2/3 log
2
2/3 = 0.92
Total information of sub tree = 4/7 * 1 + 3/7 * 0.92 = 0.96 .
Splitting attribute Info. Before split Info. After split Info. Gain
Own home 0.99 0.96 0.03
Married 0.99 0 0.99
Credit-rates 0.99 0.96 0.03

Clearly from this married attribute is next splitting attribute.

From these the concern decision tree is


Fig: 4.1 Decision Tree



Married

Class B
Class C Class A
Male

Female
Yes
es
No
Gender

M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 57 of 77



Program:
@relation bank-loan-data
@attribute ownhouse {y,n}
@attribute married {y,n}
@attribute gender {m,f}
@attribute employed {y,n}
@attribute cre-rates {a,b}
@attribute risk-class {a,b,c}
@data
y,y,m,y,A,B
n,n,f,y,A,A
y,y,f,y,B,C
y,n,m,n,B,B
n,y,f,y,B,C
n,n,f,y,B,A
n,n,m,n,B,B
y,n,f,y,A,A
n,y,f,y,A,C
y,y,f,y,A,C

=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: bank
Instances: 10
Attributes: 6
ownhouse
married
gender
employed
cre-rates
risk-class
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
J48 pruned tree
------------------
gender = m: b (3.0)
gender = f
| married = y: c (4.0)
| married = n: a (3.0)
Number of Leaves : 3
Size of the tree : 5
Time taken to build model: 0.05 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 9 90 %
Incorrectly Classified Instances 1 10 %
Kappa statistic 0.8462
Mean absolute error 0.0667
Root mean squared error 0.2582
Relative absolute error 13.9535 %
Root relative squared error 50.8001 %
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 58 of 77

Total Number of Instances 10
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
1 0 1 1 1 a
0.667 0 1 0.667 0.8 b
1 0.167 0.8 1 0.889 c
=== Confusion Matrix ===

a b c <-- classified as
3 0 0 | a = a
0 2 1 | b = b
0 0 4 | c = c
















M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 59 of 77

5.Association rule Mining Using Apriori Property by Taking 9 Items
Consider the following transaction data base







First we identify which items are purchased together frequently
Then we derive strong rules from the frequent item sets
Assume that minimum support (s=30%) or at least three transactions
First we scan the data base and identify all individual item along with their support counts these
are called Candidate -1- items and denoted by C1






Frequent -1-item sets L1 and its support
We defined all frequent -1- items
Customer Items
C1 A,B,C,D
C2 B,F,D,E
C3 B,C,D
C4 A,B,C,F,D,E
C5 A,C,E
C6 B,C,E
C7 A,C,D
C8 A,B,C,D,G
C9 A,B,D,G
Item set Support
count
{A} 6
{B} 7
{C} 6
{D} 7
{E} 4
{F} 2
{G} 2
Item set Support
count
A 6
B 7
C 6
D 7
E 4
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 60 of 77

Next we need to do the similar thing to identify all frequent -2- item sets
First we generate all -2- item sets which could be potentially frequent
These are called candidate -2- items or C2.
These can be done by generating all possible -2- item sets from L1 and scan data base to
determine support of item sets on C2
From C2 we select those which satisfy min support,
This constitute table L2 as shown below








We repeat the above process until there is no more candidate item sets
So the process looks like C1 L1 C2 L2 C3 L3
Before we proceed further , lets consider an important property called APRIORI property.
APRIORI PROPERTY: For an item set to be frequent all its non empty subsets must be
frequent.
We can use this property in this following process.
Compute C3 from L2.
We join two frequent 2- item set say I1 and I2 to generate a candidate 3- item set.
We do this only if the first item in I1 is same as the first item of I2.
Here we assume that items in item sets are sorted in a particular order C3.
Candidate 3- Support
Candidate -2- items Support
{A,B} 4
{A,C} 5
{A,D} 5
{A,E} 2
{B,C} 5
{B,D} 6
{B,E} 3
{C,D} 5
{C,E} 3
{D,E} 2
Item set Support
count
{A,B} 4
{A,C} 5
{A,D} 5
{B,C} 5
{B,D} 6
{B,E} 3
{C,D} 5
{C,E} 3
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 61 of 77

item sets
{A,B,C} 3
{A,B,D} 4
{A,C,D} 4
{B,C,D} 4
{B,C,E} 2
{B,D,E} 2
{C,D,E} 1
To determine support we need to scan the data base to determine the supports
|C3| is very large, it will take a lot of time
So, we need to remove from C3 the item sets which can not be used to generate L3
This is called Prune process
If a candidate -3- item set has at least one -2- item subset that is not frequent, we can remove
I from C3
In the above C3 {B,D,E} is deleted from C3 because {D,E} is not frequent, so as {C,D,E}
After pruning we can scan the data base and determine the support.
The output as follows
Determine L3 from C3 as follows,
Item set Support
count
{A,B,C} 3
{A,B,D} 4
{A,C,D} 4
{B,C,D} 4
{B,C,E} 2
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 62 of 77

Compute C4 from L3.
Again when we join 2 frequent 3-item sets we first check whether the first two items are
identical we perform join only when that as the case,
Since all 3-item sub sets of {A,B,C,D} are in C3 kept it
Scan the data base and determine the support the outputs as
C5= we stop here
Frequent item set C=L1 U L2 U L3 U L4
Deriving strong rules
Consider a frequent 3-item set {B,C,D}
Since there are items are purchased together frequently, we can probably derive some rules from
this 3-itemset
First we identify all non empty sub sets {B},{C},{D},{B,C},{B,D},{C,D}
Then for each sub set we form a rule as follows
S I-S
R1 = {B} {C,D}
R2 = {C} {B,D}
R3 = {D} {B,C}
R4 = {B,C} {D}
R5 = {B,D} {C}
R6 = {C,D} {B}
The RHS of a rule is obtained by {B,C,D} LHS.
To determine which rules are strong we compute the confidences
Confidence = Supporting value of item set (I)
individual item set value (S)
R1: {B} {C, D} = 4/7 = 57.1%
R2: {C} {B, D} = 4/7 = 57.1%
Item set Support
count
{A,B,C,D} 3
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 63 of 77

R3: {D} {B, C} = 4/7 = 57.1%
R4: {B, C} {D} = 4/5 = 80.0%
R5: {B, D} {C} = 4/6 = 66.7%
R6: {C,D} {B} = 4/5 = 80.0%
IF maximum confidence is 80% we select R4,R6 as strong rules
The procedure can be summarized as follows
For each frequent item set I we identify all non empty ,proper subsets of I
For each sub set S of I we form a rule (I)/sub(s)
@relation Association_rule_mining
@attribute A {0, 1}
@attribute B {0, 1}
@attribute C {0, 1}
@attribute D {0, 1}
@attribute E {0, 1}
@attribute F {0, 1}
@attribute G {0, 1}
@data
1,1,1,1,0,0,0
0,1,0,1,1,1,0
0,1,1,1,0,0,0
1,1,1,1,1,1,0
1,0,1,0,1,0,0
0,1,0,1,1,0,0
1,0,1,1,0,0,0
1,1,1,1,0,0,1
1,1,0,1,0,0,1
========= Run information ========
Scheme: weak.associations.Apriori N 15 T 0 C 0.9 D 0.05 U 1.0 M 0.1 S -1.0
Relation: Association_rule_mining
Instances: 9
Attriutes: 7
A
B
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 64 of 77

C
D
E
F
G
=========Associator model ( full training set )=========
Apriori
===============
Minimum support: 0.5 ( 4 instances )
Minimum metric <confidence>:0.9
Number of cycles performed:10
Generated sets of large item sets:
Size of set of large items L(1): 8
Size of set of large items L(2): 21
Size of set of large items L(3): 19
Size of set of large items L(4): 3
Best rules found :
1. B=1 7==> D=1 7 conf:(1)
2. E=0 5==> D=1 F=0 5 conf:(1)
3. D=1 E=0 5 ==> F=0 5 conf:(1)
4. E=0 F=0 5 ==> D=1 5 conf:(1)
5. B=1 G=0 5 ==> D=1 5 conf:(1)
6. B=1 F=0 5 ==> D=1 5 conf:(1)
7. E=0 5 ==> F=0 5 conf:(1)
8. E=0 5 ==> D=1 5 conf:(1)
9.C=1E=0 4==>D=1 F=0 4 conf:(1)
10.C=1 D=1 E=0 4 ==>F=0 4 conf:(1)
11.C=1 D=1 F=0 4 ==>E=0 4 conf:(1)
12.C=1 E=0 F= 0 4==>D=1 4 conf:(1)
13.B=1 E=0 4 ==> D=1 F=0 4 conf:(1)
14.B=1 D=1 E=0 4 ==> F=0 4 conf:(1)
15.B=1 E=0 F=0 4 ==> D=1 4 conf:(1)




M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 65 of 77

6. Bayesian Classifications By Using Employee Details
6.1 Navie Bayesian Classification:
Notations:
p(A) refers to the probability that an event A will occur.
p(A/B) stands for the probability that A will happen given that event B has already happened.
In other words p(A/B) is the conditional probability of A based on the condition that B has
already happened.
Eg: A and B may be probabilities of passing course A and passing another course B respectively
p(A/B) is probability of passing A when we know that B has been passed.
6.2 Bayes Theorem:
p(B/A) . p(A)
p(A/B) = ______________
p(B)
If we consider X to be an object to be classified then Bayes theorem may be read as giving the
probability of it belonging to one of the classes c1,c2,cn by calculating p(c
i
/x) . Once these
probability have been computed for all the classes. We simply assign X to the class that has the
highest conditional probability. Let us now consider how probability p(ci/x) may be calculated.
We have
p(x/c
i
) . p(c
i
)
p(c
i
/x) = ________________
p(x)
Where
p(c
i
/x) is the posterior probability of c conditional of x
i.e., the probability of object x belongs to class c
i
.
p(x/c
i
) is the probability of obtaining attribute values of x if we know that it
belongs to the class c
i
.
p(c
i
) is the probability of any object belonging to class c
i
without any
other information.
p(x) is the probability of obtaining attribute values x whatever class the
Object belongs.
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 66 of 77

So we need to compute p(x/c
i
) ,p(c
i
),p(x).But p(x) is the independent of c
i
and is not required
to ne known since we are interested only in computing probability
(p(x) is a constant) p(c
i
/x).
Therefore we need to compute p(x/c
i
) and p(c
i
).
s
i

p(c
i
) = ____
s
Where
S
i
is the no.of samples of class c
i

S is the total no.of samples.
To compute p(x/c
i
) we use a navie approach by assuming that are attribute of x are
independent.
Then
n
p(x/c
i
) = p(x
k
/c
i
)
k=1
can be estimated from the training samples, where or if A
k
is categorical, then
s
i
k
p (x
k
/c
i
) = __________
s
i

Where
s
i
k is number of training samples of class c
i
having the value

x
k
for A
k

s
i
is the continuous valued then the attribute typically assumed to have Gaussian
distribution so that
p (x
k
/c
i
) = g(x
k
,
ci
,
ci
)

=1/2
ci
. e
-(x
k
-
ck
)2
__________
2
ci
2
Where

ci
is the mean of the attribute values A
k
for class c
i


ci
is the standard deviation of attribute A
k
for class c
i

We then determine the class allocation of x by computing [ p(x/c
i
) p(c
i
) ]
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 67 of 77

for each of the classes and allocating x to the class with the highest values.
1.Relation: Bank
a. For known tuple
Own
Home
Married Gender Employed Credit
Rating
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes A C
P(A)=3/10=0.3; P(B)=3/10=0.3; P(C)=4/10=0.4;
If the given known sample is {Y,N,F,Y,A} for the 5 attributes, we can complete posterior
probability as follows:
k
P(X/Ci) = P(Xk/Ci)
i=1
P(X/Ci) = P({Y,N,F,Y.A}/Ci) = P(Own home=Yes/Ci) * P(Married=No/Ci) *
P(Gender=male/Ci) * P(Employed=Yes/Ci) * P(Credit rating=A/Ci).
Using expression like that above we are able to compute the three posteriors prob for the 3
classes A,B,C.
We compute P(X/Ci)*P(Ci) for each of 3 classes P(A)=0.3, P(B)=0.3, P(C)=0.4 and these values
are the bases for comparision. To compute P(X/Ci)=P({Y,N,F,Y,A}/Ci) for each of the classes, we
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 68 of 77

need the following prob for each P(Ownhome=Y/Ci), P(Married=N/Ci), P(Gender=F/Ci),
P(Employed=Y/Ci), P(Credit rating=A/Ci).
These probability are shown below after we order the data by risk classes.
Own
Home
Married Gender Employed Credit
Rating
Risk
Class
No No F Yes A A
No No F Yes B A
Yes No F Yes A A
Y=1/3 N=1 F=1 Y=1 A=2/3
Prob of having {Y,N,F,Y,A} .

Attribute values giving Risk class A=1/3*1*1*1*2/3 =2/9.


P(X/B)=2/3*2/3*0*1/3=0.
P(X/C)=1/2*0*1*1*1/2=0.
Own
Home
Married Gender Employed Credit
Rating
Risk
Class
Yes Yes M Yes A B
Yes No M No B B
No No M No B B
Y=2/3 N=2/3 F=0 Y=1/3 A=1/3
Own
Home
Married Gender Employed Credit
Rating
Risk
Class
Yes Yes F Yes B C
No Yes F Yes B C
No Yes F Yes A C
Yes Yes F Yes A C
Y=1/2 N=0 F=1 Y=1 A=1/2
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 69 of 77

P(A/X)=P(X/A)*P(A)/P(X)
=2/9*0.3 =0.67
P(B/X)=P(X/B)*P(B)/P(X)
=0*0.3 =0
P(C/X)=P(X/C)*P(C)/P(X)
=0*0.4 =0
Result : The known tuple X is assigned to class A
b.For unknown tuple
Own
Home
Married Gender Employed Credit
Rating
Risk
Class
Yes Yes M Yes A B
No No F Yes A A
Yes Yes F Yes B C
Yes No M No B B
No Yes F Yes B C
No No F Yes B A
No No M No B B
Yes No F Yes A A
No Yes F Yes A C
Yes Yes F Yes A C
P(A) = 3/10 = 0.3 P(B) =3/10 = 0.3 P(C) =4/10 = 0.4
If the given unknown sample is {N, Y, M, Y, B} for the 5 attributes ,we can Complete posterior
probability as follows.
P(X/Ci) = P({N, Y, M, Y, B} / Ci)
= P(owns home = no/ci) * p(Married = yes/ci) *
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 70 of 77

P(gender = m/ci) * p(employed = yes/ci) * p(credit rating = B/ci ).
Using expression like that above we are able to compute the 3 posteriors
Probability for the 3 classes A, B, C
We compute P(A)=0.3 , P(B) = 0.3, P(C) = 0.4 and these values are
the basis for comparison. to compute P(X/Ci) = P({N, Y, M, Y, B}/Ci) for each of the classes, we need
the following probability for each p(ownhome = N/Ci), p(married = Y/Ci),p(gender =
M/Ci),p(employed = Y/Ci), p(credit rating = A/Ci).
These probability are shown below after we order the data by risk classes
Own
Home
Married Gender Employed Credit
Rating
Risk
Class
No Yes M Yes B A
No No F Yes A A
No No F Yes B A
Yes No F Yes A A
N=3/4 Y=1/4 M=1/4 Y=1 B=1/2
Prob of having { N, Y, M, Y, B }.
Attribute values giving Risk class A = 3/4*1/4*1/4*1*1/2
= 3/128.
Own
home
Married Gender Employed Credit
Rating
Risk
Class
No Yes M Yes B B
Yes Yes M Yes A B
Yes No M No B B
No No M No B B
N=1/2 Y=1/2 M=1 Y=1/2 B=3/4
P(X/B) = 1/2*1/2*1*1/2*3/4 = 3/32.
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 71 of 77

Own
home
Married Gender Employed Credit
Rating
Risk
Class
No Yes M Yes B C
Yes Yes F Yes B C
No Yes F Yes B C
Yes Yes F Yes A C
No Yes F Yes A C
N=3/5 Y=1 M=1/5 Y=1 B=3/5
P(X/C) = 3/5*1*1/5*1*3/5 = 9/125.
P(A/X)=P(X/A)*P(A)/P(X) =3/128*0.3 =0.007
P(B/X)=P(X/B)*P(B)/P(X) =3/32*0.3 =0.0281
P(C/X)=P(X/C)*P(C)/P(X) =9/125*0.4 =0.0288
Result : The unknown tuple X is assigned to class C









M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 72 of 77

7. Cluster Analysis By Using Simple K-Means Method To Form 3 Clusters
7.1 K-Mean Introduction:
K-Means is the simplest and most popular classical method that is easy to implement. The
classical method can only be used if the data about all the objects is located in the main memory. This
method is called K-Means since each of the cluster is represented by the mean of the objects (called the
centroid) within it. It is also called the centroid method since at each step the centroid point of each
cluster is assumed to be known and each of the remaining points are allocated to the cluster whose
centroid is closest to it.
Once this allocation is completed, the centroid of the clusters is recomputed using simple means
and the process of allocating points to each, cluster is Repeated until there is no change in the clusters(or
some other stopping criterion
Eg: no of significant reduction in the squared error, is met).The method may also may be looked at as a
search problem. Where the aim is essentially to find the optimum clusters given the number of cluster
and seeds specified by the users obviously, we cannot use a brute-force or exhaustive search method to
find the optimum but may be computed efficiently.
The K-means method users the Euclidean distances measure, which appears to Work well with
compact clusters. If instead of the Euclidean distance, the Manhattan distance is used the method is
called the K-median method. The K-median method can be less sensitive to outliers.
7.2 Algorithm: K-Means:-
The K-M algorithm for partitioning based on the mean value of the objects in the cluster.
Input:-
A set of K cluster that minimizes the sequerated error criterion.
Method:
1).Arbitrarily choose K object as the initial cluster centers.
2).Repeat
3) (re) assign each object to the cluster to which the object is the most similar,
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 73 of 77

based on the mean value of the objects in the cluster.
4).Update the cluster means i.e., calculate the mean value of the objects for each
cluster.
5).Until no change.
7.3 The K-Means Method May Be Described As Follows:
1) Select the number of clusters. Let the number be K.
2) Pick K seeds as centroid, of the K clusters. The seeds may be picked randomly unless the
user has some insight into the data.
3) Compute the Euclidean distance of each object in the date set from each of the centroids.
4) Allocate each object to the cluster it is nearest to based on the distance computed on the previous step.
5) Compute the centroids of the clusters by computing the means of the attribute values of the objects
in each cluster.
6) Check if the stopping criterion has been met (e.g) the cluster membership is unchanged. If yes go to
step-7 .if not go to step-3.
7) [Optional] one may decide to stop at this stage or to splot cluster or combine two clusters heuristically
until a stopping criterion is met.
The method is scalable and efficient (The time complexity is of on) and is guaranteed to find a
local minimum.We first discuss an example and then discuss heuristics that can generally improve the
chances of the method. Finding a global minimum











M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 74 of 77

Example:
Consider the data about students given in following:
STUDENT AGE MARK1 MARK2 MARK3
S1 18 73 75 57
S2 18 79 85 75
S3 23 70 70 52
S4 20 55 55 55
S5 22 85 86 87
S6 19 91 90 89
S7 20 70 65 60
S8 21 53 56 59
S9 19 82 82 60
S10 47 75 76 77
Step1 and 2: let the three seeds be the first three students are shown below.
STUDENT AGE MARK1 MARK2 MARK3
S1 18 73 75 57
S2 18 79 85 75
S3 23 70 70 52
Step 3 and step 4: Now compute the distances using the four attributes and using the sum of absolute
differences for simplicity. The distance values for all the objects are given in the following data column
6, 7, 8 give the three distances values from three seeds respectively.
Based on these distances, each student as allocated to the nearest cluster. We obtain the first
iteration result as shown below.




M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 75 of 77

First iteration: Allocating each object to the nearest cluster.
Distance from cluster Allocation
on the
nearest
cluster
C1 18.0 73.0 75.0 57.0
From From From
C2 18.0 79.0 85.0 75.0
C3 23.0 70.0 70.0 52.0 C1 C2 C3
S1 18.0 73.0 75.0 57.0 0.0 34.0 18.0 C1
S2 18.0 79.0 85.0 75.0 34.0 0.0 52.0 C2
S3 23.0 70.0 70.0 52.0 18.0 52.0 0.0 C3
S4 20.0 55.0 55.0 55.0 42.0 76.0 36.0 C3
S5 22.0 85.0 86.0 87.0 57.0 23.0 67.0 C2
S6 19.0 91.0 90.0 89.0 66.0 32.0 82.0 C2
S7 20.0 70.0 65.0 60.0 18.0 46.0 16.0 C3
S8 51.0 53.0 56.0 59.0 44.0 74.0 40.0 C3
S9 19.0 82.0 60.0 20.0 19.0 82.0 60.0 C1
S10 47.0 75.0 76.0 77.0 52.0 44.0 60.0 C2
The first iteration leads to two students in the first cluster and four each in the second and third
cluster.
Step 5:
Following table compress the cluster means of clusters found in previous table with the original
seeds.


Age Mark1 Mark2 Mark3
C1 18.5 77.5 78.5 58.5
C2 26.7 82.5 84.25 82.0
C3 21 62 61.5 56.5
Seed1 18 73 75 57
Seed2 18 79 85 75
Seed3 23 70 70 52
It is interesting to note that the mean marks for c3 are significantly lower than for c1 and c2.

M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 76 of 77

Step3 and 4:
Use the new cluster means to recompute the distance of each object to each of the means again
allocating each object to the nearest cluster. The following table shows the second iteration.
Second Iteration: Allocating each object to the nearest customer.
Distance from cluster
C1 18.5 77.5 78.5 58.5 From
C1
From
C2

From
C3

Allocating the
nearest cluster C2 26.5 82.5 84.25 82.0
C3 21.0 62 61.5 56.5
S1 18.0 73.0 75.0 57.0 10.0 52.25 28.0 C1
S2 18.0 79.0 85.0 75.0 25.0 19.75 62.0 C2
S3 23.0 70.0 70.0 52.0 27.0 60.25 23.0 C3
S4 20.0 55.0 55.0 55.0 51.0 90.55 16.0 C3
S5 22.0 85.0 86.0 87.0 47.0 13.75 79.0 C2
S6 19.0 91.0 90.0 89.0 56.0 21.75 92.0 C2
S7 20.0 70.0 65.0 60.0 24.0 60.25 16.0 C3
S8 21.0 53.0 56.0 59.0 50.0 86.25 17.0 C3
S9 19.0 82.0 82.0 60.0 10.0 32.25 46.0 C1
S10 47.0 75.0 76.0 77.0 52.0 41.25 74.0 C2
The number of students in cluster1 is again 2 and the other two clusters still is four students
each. A more careful look shows that the clusters have not changed at all. Therefore the method has
converged rather quickly for this very simple data set. the cluster membership is as follows:
Cluster1 S1, S9
Cluster2 S2, S5, S6, S10
Cluster3 S3, S4, S7, S8
Another point worth nothing is about the with in cluster variance and the between cluster
variance .In the following table we present the average Euclidean distance of objects in each cluster to
the cluster centroids. Therefore, the average distance with in C1 of objects with in it from its centriod is
5.9 while the average distance between objects in C2 and the centriod of C1 is 23.3.These numbers do
show that the clustering method has done were in minimizing with in cluster variance, although these
numbers do not show if there is another result i.e better we do get different results if we start with
different seeds.
M.TECH(CSE)-II SEMESTER SOFTWARE LAB-2
Page 77 of 77

With in cluster and between cluster distances
CLUSTER C1 C2 C3
C1 5.9 26.5 23.3
C2 29.5 14.3 22.6
C3 23.9 41.0 10.7
Take the following sample data.
@relation clustering-analysis
@attribute age numeric
@attribute mark1 numeric
@attribute mark2 numeric
@attribute mark3 numeric
@data
18,73,75,57
18,79,85,75
23,70,70,52
20,55,55,55
22,85,86,87
19,91,90,89
20,70,65,60
21,53,56,59
19,82,82,60
47,75,76,77
=== Run information ===
Scheme: weka.clusterers.SimpleKMeans -N 3 -S 10
Relation: cluterin_analysis1
Instances: 10
Attributes: 4
age
mark1
mark2
mark3
Test mode: evaluate on training data
=== Model and evaluation on training set ===
kMeans
======
Number of iterations: 3
Within cluster sum of squared errors: 1.335584580982163
Cluster centroids:
Cluster 0
Mean/Mode: 20.5 54 55.5 57
Std Devs: 0.7071 1.4142 0.7071 2.8284
Cluster 1
Mean/Mode: 19.6667 85 87 83.6667
Std Devs: 2.0817 6 2.6458 7.5719
Cluster 2
Mean/Mode: 25.4 74 73.6 61.2
Std Devs: 12.2188 4.9497 6.4265 9.4181
Clustered Instances
0 2 ( 20%)
1 3 ( 30%)
2 5 ( 50%)